Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Producing DESI-Y1 Lya mocks with quickquasars. #578

Merged
merged 17 commits into from Mar 5, 2024

Conversation

HiramHerrera
Copy link
Contributor

This PR introduces the necessary tools to produce DESI-Y1 Lya mocks.

Here are the Major changes listed:

  • Introduction of the survey_release.py module and the SurveyRelease class instance with the necessary tools to create mock catalogs to be read by quickquasars.
  • Introduced new flags to quickquasars.
    • --raw-mock [saclay, london, ohio] allows to generate Y5 mocks directly from quickquasars for the specified raw mock team.
    • --dn_dzdm was reworked to boolean flag, used when redshift and magnitude distributions are to be downsampled by quickquasarsdirectly.
    • --year1-throughput adds the Y1 throughput model to mock spectra including the 440 nm dip observed during Y1
    • --from-catalog allows to read from an input pre-processed mock catalog generated by the SurveyRelease instance.
    • --metal-strengths allows to change the default metal strengths used for the --metals flag. This flag should be the same length and in the same order as the --metals flag in order to work. The lya_spectra.py script was modified to make this flag work.

An example quickquasars command to run DESI-Y1 mocks would be:

quickquasars -i <input_path> --outdir <output_path> --zbest  --bbflux  --save-continuum --zmin 1.7 --dla file --balprob 0.16 --metals LYB LY3 LY4 LY5 SiII\(1260\) SiIII\(1207\) SiII\(1193\) SiII\(1190\) --metal-strengths 0.1901 0.0697 0.0335 0.0187 1.3e-03 3.5e-03 0.7e-03 1.4e-03 --from-catalog <input_preprocessed_y1catalog> --seed <seed>

Important notes:

  1. The --year1-throughput flag works if a file called desiY1.yaml file exists within specsim/data/config pointing to the specpsf/psf-quicksim.fits and files thru-[brz]_y1measured.fits which include the dip feature at 440 nm. We can take advantage of Simulate DESI spectra with the same output wavelength binning as the pipeline specsim#125 to include this file. However, if I'm correct, that PR changes the defaults of the desi.yaml file to those of DESI-Y1. I think it is important to keep the desi.yaml pointing to a throughput model without the 440 nm dip (default DESIMODEL) as we might want mocks without this feature.
  2. The SurveyRelease downsampling by NPASS functionality relies on a pixel map currently stored in /global/cfs/cdirs/desi/users/hiramk/desi/quickquasars/sampling_tests/npass_pixmap.fits this file is large enough (1.1GB) to not be included in this PR. The code should work as long as the current path (my local cfdirs) is not modified. However, we should probably copy this file to somewhere else. Is there a recommended place to store it? Maybe DESIMODEL?
  3. Probably we might want to deprecate the --dn_dzdm and --raw-mock flags as the SurveyRelease instance already does this and its probably more efficient to downsample and assign magnitudes in pre-processing than within quickquasars. I kept them since @andreicuceu DESI-Y5 mocks were produced using these flags.

@coveralls
Copy link

coveralls commented Nov 15, 2023

Coverage Status

coverage: 44.724% (-0.8%) from 45.526%
when pulling 599de34 on HiramHerrera:y1mocks_v3
into 50998bf on desihub:main.

@andreufont
Copy link
Contributor

I won't be able to review this until Nov 24, but you can merge without me if others have approved it.

I was a bit surprised to see the --raw-mock option in QQ. What does it do? Is it just to create a picca-readable quasar catalog to be used in the raw analyis? Because we don't want to create spectra files in the raw analysis...

@HiramHerrera
Copy link
Contributor Author

The --raw-mock flag was introduced to somehow tell quickquasars what kind of raw mocks are inputted. Principally for saclay or lyacolore. This was done for two main reason where these mocks are handled differently:

  1. The downsampling fractions to match the observed redshift distribution are different depending if we use saclay or lyacolore while ohio don't require downsampling: Making quickquasars compute these fractions internally was Memory inefficient as this requires opening the complete raw mocks master catalog. The way quickquasars is coded would make this process to happen for every node we use. To avoid this we computed the fractions in pre-processing and saved it in a fits file that is read by the --dn_dzdm flag according to the raw mock team inputted. This is no longer required if we make the catalogs in pre-processing for all kind of mocks we do (including Y5).
  2. The relative metal strengths of the quickquasars method (--metals) is different for saclay and for lyacolore mocks: The previous version of the code had by default the strengths for lyacolore while the information for saclay mocks was on another branch. Right now the --metal-strengths flag changes the defaults, so maybe this is no longer a problem.

These are the reasons of why I say the --dn_dzdm and the --raw-mock flags can be deprecated

@andreufont
Copy link
Contributor

Oh, I understand now. Thanks @HiramHerrera ! I thought these were related to the "raw analysis". I see now that you are just specifying which type of transmission files are being provided.

@alxogm
Copy link
Contributor

alxogm commented Nov 16, 2023

Regarding this question:

  1. The SurveyRelease downsampling by NPASS functionality relies on a pixel map currently stored in /global/cfs/cdirs/desi/users/hiramk/desi/quickquasars/sampling_tests/npass_pixmap.fits this file is large enough (1.1GB) to not be included in this PR. The code should work as long as the current path (my local cfdirs) is not modified. However, we should probably copy this file to somewhere else. Is there a recommended place to store it? Maybe DESIMODEL?

@sbailey Since this is some sort of a template for the Y1 lya mock production, could we store it in /global/cfs/cdirs/desi/spectro/templates/ where other templates used by quickquasars are? maybe in a new directory (name TBD, suggestions accepted)?

bins = ((z - zmin)/(zmax - zmin) * len(zcenters) + 0.5).astype(np.int64)
selection_z = np.random.uniform(size=z.size) < fraction[bins]

np.random.set_state(rnd_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you restore the old random state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this to restore the quickquasars state to what it would be if we don't downsample to match the redshift distribution i.e not passing the --dn_dzdm flag. This is probably not needed since in either case the realizations would differ in number of objects and therefore the all the random calls done by the code would be affected. So I think It can be removed if requested.

mask_footprint=desimodel.footprint.is_point_in_desi(tiles,mock['RA'],mock['DEC'])
mock=mock[mask_footprint]
log.info(f"Keeping {sum(mask_footprint)} mock QSOs in footprint TILES")
if release is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to write if release == "iron": ... else: raise NotImplementedError

log.info(f"Downsampling by NPASSES fraction in {release} release")
# TODO: Implement this in desimodel instead of local path
pixmap=Table.read('/global/cfs/cdirs/desi/users/hiramk/desi/quickquasars/sampling_tests/npass_pixmap.fits')
mock_pixels = hp.ang2pix(1024, np.radians(90-mock['DEC']),np.radians(mock['RA']),nest=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using nside=1024? That is very high resolution

mock_pixels = hp.ang2pix(1024, np.radians(90-mock['DEC']),np.radians(mock['RA']),nest=True)
try:
data_pixels = hp.ang2pix(1024,np.radians(90-self.data['TARGET_DEC']),np.radians(self.data['TARGET_RA']),nest=True)
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a good use of try-except. I recommend if 'TARGET_DEC in self.data.dtype.names` or similar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, never use naked except!

mock_pass_counts = np.bincount(mock_passes,minlength=8)
mock['NPASS'] = mock_passes
downsampling=np.divide(data_pass_counts,mock_pass_counts,out=np.zeros(8),where=mock_pass_counts>0)
rand = np.random.uniform(size=len(mock))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to have a seed here or leave a detailed comment why not.

Comment on lines +188 to +192
filename=os.path.join(os.path.dirname(desisim.__file__),'data/dn_dzdM_EDR.fits')
zcenters=fitsio.FITS(filename)['Z_CENTERS'][:]
dz = zcenters[1]-zcenters[0]
rmagcenters=fitsio.FITS(filename)['RMAG_CENTERS'][:]
dn_dzdm=fitsio.FITS(filename)['dn_dzdm'][:,:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be written as
with fitsio.FITS(filename) as fts:
zcenters = fts['Z_CENTERS'].read()
rmagcenters = fts['RMAG_CENTERS'].read()
dn_dzdm = fts['dn_dzdm'].read()
dz = zcenters[1] - zcenters[0]

if self.data is None:
raise ValueError("No data catalog was provided")
dz = 0.1
zbins = np.arange(0,10,dz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not recommended to use np.arange with a non-integer step. Better use np.linspace

dz = 0.1
zbins = np.arange(0,10,dz)
zcenters=0.5*(zbins[1:]+zbins[:-1])
rmagbins = np.arange(15,25,0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Use np.linspace

for i,z_bin in enumerate(zcenters):
w_z = (self.mockcatalog['Z'] > z_bin-0.5*dz) & (self.mockcatalog['Z'] <= z_bin+0.5*dz)
if np.sum(w_z)==0: continue
rand = np.random.uniform(size=np.sum(w_z))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seed?

mock=self.mockcatalog
try:
data_pixels = hp.ang2pix(1024,np.radians(90-self.data['TARGET_DEC']),np.radians(self.data['TARGET_RA']),nest=True)
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad use of try-except again. Use an if statement

Copy link
Contributor

@p-slash p-slash Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Never use naked except!

Comment on lines +255 to +257
random_variable = rv_discrete(values=(np.arange(1,len(pdf)+1),pdf))
is_pass = self.mockcatalog['NPASS'] == tile_pass
exptime_mock[is_lya_mock&is_pass]=1000*random_variable.rvs(size=np.sum(is_pass&is_lya_mock))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you generating random variables? What is the seed?

Comment on lines +203 to +208
if 'RMAG' in self.data.colnames:
dn_dzdm=np.histogram2d(self.data['Z'],self.data['RMAG'],bins=(zbins,rmagbins))[0]
elif 'FLUX_R' in self.data.colnames:
dn_dzdm=np.histogram2d(self.data['Z'],22.5-2.5*np.log10(self.data['FLUX_R']),bins=(zbins,rmagbins))[0]
else:
raise ValueError("No magnitude information in data catalog")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write
if 'RMAG' in self.data.colnames:
data_rmag = self.data
elif 'FLUX_R' in self.data.colnames:
data_rmag = 22.5-2.5*np.log10(self.data['FLUX_R'])
else:
raise KeyError("No magnitude information in data catalog")
dn_dzdm=np.histogram2d(self.data['Z'], data_rmag, bins=(zbins,rmagbins))[0]

dist = Table.read(os.path.join(os.path.dirname(desisim.__file__),'data/redshift_dist_chaussidon2022.ecsv'))
dz=dist['z'][1]-dist['z'][0]
factor=0.1/dz # Account for redshift bin width difference.
zbins = np.arange(0,10.1,0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use np.linspace instead

w_z = (zcenters-0.5*dz > zmin) & (zcenters+0.5*dz <= zmax)
dndz=dndz[w_z]
zcenters=zcenters[w_z]
zbins = np.arange(zmin,zmax+dz,dz,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.linspace

Comment on lines +99 to +102
dn_dzdr=fitsio.FITS(filename)[3][:,:]
dndz=np.sum(dn_dzdr,axis=1)

zcenters=fitsio.FITS(filename)[1][:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a with statement. Read zcenters first.
with fitsio.FITS(filenames) as fts:
zcenters = fts[1].read()
dn_dzdt = fts[3].read()

Comment on lines +58 to +61
try:
pixels = hp.ang2pix(nside, np.radians(90-catalog['DEC']),np.radians(catalog['RA']),nest=True)
except KeyError:
pixels = hp.ang2pix(nside, np.radians(90-catalog['TARGET_DEC']),np.radians(catalog['TARGET_RA']),nest=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad use of try-except again. Use an if statement

self.mockcatalog['TARGETID']=self.mockcatalog['MOCKID']
self.mockcatalog['Z']=self.mockcatalog['Z_QSO_RSD']
self.invert=invert
np.random.seed(seed)
Copy link
Contributor

@p-slash p-slash Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is the seed. I recommend creating a member random engine rng_engine = np.random.default_rng(seed) and calling it to generate the random numbers you need. default_rng is recommended since numpy 1.17

@@ -433,7 +489,8 @@ def simulate_one_healpix(ifilename,args,model,obsconditions,decam_and_wise_filte
DZ_FOG = DZ_FOG[indices]
nqso = args.nmax

if args.dn_dzdm is not None:
if args.dn_dzdm:
# TODO: Deprecate this option. It will be faster to do this on preprocessing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add warnings.warn("dn_dzdm will be deprecated. It is faster to do this in preprocessing", DeprecationWarning)

Copy link
Contributor

@p-slash p-slash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I went through the code and left comments for best practices and recommended functions. They are minor changes but if unfixed they can introduce critical bugs. For example, a naked except can be really bad.

Also, why is the npass file so large?

@julienguy
Copy link
Contributor

I am merging as is because this is the version used for KP6 mocks. We will then tag. BUT the comments can still be addressed by reopening this PR after we merged and tagged.

@julienguy julienguy merged commit 69dc6e9 into desihub:main Mar 5, 2024
6 of 8 checks passed
@sbailey
Copy link
Contributor

sbailey commented Mar 5, 2024

The post-merge version with a docstring fix has been tagged as desisim 0.38.0. I also installed this at NERSC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants