Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final clean up of DESI_SPECTRO_REDUX, DESI_TARGET #171

Merged
merged 17 commits into from Jun 5, 2023

Conversation

weaverba137
Copy link
Member

This PR cleans up some remaining verification loose ends in DESI_SPECTRO_REDUX/SPECPROD and DESI_TARGET.

  • Most of the changes are changing data types to actually match what is in the files in EDR.
  • For the record: zpix/ztile files have ZCAT_NSPEC as int16, but that changes to int64 in zall files. The data model reflects the zpix/ztile files.
  • Add a data model for uncompressed fiberassign files, which may have extra HDUs compared to the compressed versions.
  • Clean up in DESI_TARGET/TARG_DIR to match files on disk. This mainly involved fixing up the regular expressions that match the files.
  • While investigating the regular expressions in DESI_TARGET/TARG_DIR, I discovered two entirely new classes of random files, plus two complete oddballs:
    • Files with an extra randoms directory, e.g.: $DESI_TARGET/catalogs/dr9/2.4.0/randoms/resolve/randoms-1-15/randoms-1-hp-539.fits
    • Files in the north or south, e.g.: $DESI_TARGET/catalogs/dr9/0.48.0/randoms/noresolve/north/randoms-noresolve-1-19.fits
    • $DESI_TARGET/catalogs/dr9/0.48.0/randoms/resolve/survey-bricks-dr9-randoms-0.48.0.fits
    • $DESI_TARGET/catalogs/dr9/0.47.0/randoms/resolve/randoms-survey-bricks.fits

For the record, the two extra classes of random files are matched by these regular expressions:

  • [0-9.]+/randoms/resolve/randoms-[0-9]+-[0-9]+/randoms-[0-9]+-hp-[0-9]+\.fits
  • [0-9.]+/randoms/noresolve/(north|south)/randoms-noresolve-[0-9]+-[0-9]+\.fits

@geordie666 would you be able to make a recommendation on what to do about documenting these extra classes of randoms? I think we can just ignore the two bricks files.

@weaverba137 weaverba137 self-assigned this Jun 3, 2023
@coveralls
Copy link

coveralls commented Jun 3, 2023

Coverage Status

coverage: 100.0%. remained the same when pulling cbae3d6 on edr-spectro-redux into 88e8805 on main.

@geordie666
Copy link
Contributor

@weaverba137: Thanks for catching all the little inconsistencies in the desitarget files. I've reviewed all of the unit/type/header keyword updates and I'm happy with your changes, so you could merge your work so far, if you'd like.

Regarding the oddball files (again, thanks for noticing these):

Files with an extra randoms directory, e.g.: $DESI_TARGET/catalogs/dr9/2.4.0/randoms/resolve/randoms-1-15/randoms-1-hp-539.fits

  • The columns included in these files are identical to the "original" files without the extra randoms directory. These files are different in that the random points have been split into HEALPixels.
    • So, e.g.: "/global/cfs/cdirs/desi/target/catalogs/dr9/2.4.0/randoms/resolve/randoms-1-0/randoms-1-hp-123.fits corresponds to the random points in /global/cfs/cdirs/desi/target/catalogs/dr9/0.49.0/randoms/resolve/randoms-1-0.fits but only includes the points that occupy HEALPixel 123 in the NSIDE=8, NESTED scheme.
    • The original file that was split-by-HEALPixel is recorded in the INFILE header key.
      • This header key is the only different header card compared to the "original" (not split-by-HEALPixel) random catalogs.
      • e.g., for the example file, INFILE='/global/cfs/cdirs/desi/target/catalogs/dr9/0.49.0/randoms/resolve/randoms-1-0.fits'
    • The other information referenced in the first bullet point (only includes the points that occupy HEALPixel 123 in the NSIDE=8, NESTED scheme) is recorded in other, standard header cards: FILENSID=8, FILENEST=True, FILEHPX=123.
  • We could document these files by copying most of the content for the "original" (not split-by-HEALPixel) randoms files, noting these different versions are "split-by-HEALPixel" and documenting the INFILE keyword.
    • Or, we could provide a mostly empty page for these files that links back to the data model for the "original" randoms files and includes some or all the information I've outlined above.

Files in the north or south, e.g.: $DESI_TARGET/catalogs/dr9/0.48.0/randoms/noresolve/north/randoms-noresolve-1-19.fits

  • The columns included in these files are identical to the "original" files in the resolve directory. These files are different because they have not been resolved by the desitarget code. Resolving in this sense means only retaining northern points in northern bricks in the northern portion of the Legacy Surveys footprint and only retaining southern points in southern bricks in the southern portion of the Legacy Surveys footprint (see Section 4.1.3. of the desitarget paper).
    • These files are also documented on the Legacy Surveys website.
    • The region (north or south) covered by the file is recorded in the REGION header key.
      • This header key is the only different header card compared to the resolved random catalogs.
      • e.g., for the example file, REGION='north'
  • We could document these files by copying most of the content for the "resolved" randoms files, noting these different versions are not resolved as described in Section 4.1.3. of the desitarget paper and documenting the REGION keyword.
    • Or, we could provide a mostly empty page for these files that links back to the data model for the "original" (resolved) randoms files and includes some or all the information I've outlined above.

$DESI_TARGET/catalogs/dr9/0.48.0/randoms/resolve/survey-bricks-dr9-randoms-0.48.0.fits

DESI_TARGET/catalogs/dr9/0.47.0/randoms/resolve/randoms-survey-bricks.fits

  • This file is identical to survey-bricks-dr9-randoms-0.48.0.fits. It was produced by version 0.47.0 of the code and poorly named.
  • We could reference away the definition of this file to the definition of the survey-bricks-dr9-randoms-0.48.0.fits file?

@weaverba137
Copy link
Member Author

@geordie666, thank you this is extremely helpful. We have a way to cross-reference data model files internally, so we can just cross-reference the "original" randoms rather than even needing to copy all the content.

For the bricks files though, I'm going to propose that we simply pretend that they don't exist for the purposes of the data model.

@geordie666
Copy link
Contributor

@weaverba137: I think it's absolutely fine simply to ignore the existence of the bricks files, if that's your preference. If a user asks about their content, it would be straightforward to point them to the Legacy Surveys website (if they don't think to look there first).

Should we be bombarded with questions about the bricks files (!) we can revisit including them in a later Data Release.

@weaverba137
Copy link
Member Author

I will merge as soon as the latest set of tests pass.

@weaverba137 weaverba137 merged commit 521f87e into main Jun 5, 2023
22 checks passed
@weaverba137 weaverba137 deleted the edr-spectro-redux branch June 5, 2023 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants