Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

U/jrbogart/nsides #63

Merged
merged 5 commits into from
Oct 1, 2023
Merged

U/jrbogart/nsides #63

merged 5 commits into from
Oct 1, 2023

Conversation

JoanneBogart
Copy link
Collaborator

Add options to the catalog creation script, applying to galaxies only, which

  • specify value for nsides in healpix partitioning of output (default of 32)
  • specify row group size (default of one million)

like sed_val_bulge (an array of lists) for the subpixels.
Try compressing before doing the transform.
A couple minor fixes in CatalogCreator._ceate_galaxy_flux_pixel:
   * don't need while loop; just output a row group the same size as for
     main file
   * set minimum timeout value for chunks > 0
skycatalogs/scripts/create_sc.py Show resolved Hide resolved
skycatalogs/scripts/create_sc.py Outdated Show resolved Hide resolved
skycatalogs/catalog_creator.py Outdated Show resolved Hide resolved
skycatalogs/catalog_creator.py Outdated Show resolved Hide resolved
skycatalogs/catalog_creator.py Show resolved Hide resolved
self._out_pixels = out_pixels
skip_count = 0
for p in out_pixels:
output_path = os.path.join(self._output_dir, f'galaxy_{p}.parquet')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would very convenient for scanning file names of these parquet files if the pixel number could be formatted with leading zeros so that file names like galaxy_09556.parquet are listed before galaxy_10000.parquet when sorted. Some code like this would accomplish that:

digits = len(str(healpy.nside2npix(nside)))
f'galaxy_{p:0{digits}d}.parquet'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also need something comparable in the code that searches for files. For backwards compatibility that code should accept either form. Better, the templates in the yaml config could be modified to accept either form. However there are places where the code has to know whether or not the file for a particular pixel already exists so as not to overwrite it or allow two versions of the same thing.
This is getting complicated enough that I'm inclined to put it in a separate PR.

@jchiang87
Copy link
Collaborator

Since we'll be changing the galaxy SED model soon, and since so much code has been added/changed here having to do with the top hat SEDs, this seems like a good opportunity to refactor the catalog_creator.py code to remove the assumption that top hat SEDs will be used for galaxies. So, I'd suggest making _make_tophat_columns a method of TopHatSedFactory and integrating the functionality in _get_tophat_info into TopHatSedFactory.__init__, and having things like ._sed_bins be attributes of TopHatSedFactory instead of the CatalogCreator class.

@JoanneBogart
Copy link
Collaborator Author

Concerning your comment about tophat SEDs - yes, the current assumptions about tophat SEDs will have to go, but there will be other changes involved in supporting multiple galaxy types. I'd rather deal with them together (as they are likely inter-related) in a separate PR.

@JoanneBogart JoanneBogart merged commit 3ab197b into main Oct 1, 2023
@JoanneBogart JoanneBogart deleted the u/jrbogart/nsides branch October 1, 2023 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants