-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate fibermap units through all steps of pipeline #2068
Comments
To speed up the discussion a little bit, I know that units are dropped internally for performance reasons, which explains why units are not present in the A better-phrased warning might be: "the units described in the data model for |
That is how I have always interpreted the meaning of the datamodel — describing the actual units of the data, not the existence (or not) of TUNIT and BUNIT keywords. I don't think that requires a specific caveat in the text of the data model. We can separately address if there are any unintended side effects from adding TUNIT keywords to the data files in the future. I agree that in general it is preferable for us to propagate those and it was an oversight (not purposeful) that they are not included in the final redshift catalogs. |
But will the public interpret it that way? Especially since some files do have units and some don't. |
Adding units to the files is a post-EDR activity for future specprods / releases. For the EDR Countdown dashboard I'm moving this into the optional category for adding caveats to the datamodel and/or the known issues section of desidatadocs. |
I have removed this ticket from EDR Countdown and opened desihub/desidatamodel#178 to document the specific issue of the presence or absence of units in existing files. |
The units are being dropped far earlier than redrock — even the original fibermap generated by Short term fix for DR1:
Long term fix for Y3 reprocessing:
The lowest-level place to fix this is in |
@sbailey, in terms of closing out the 'DR1 Patching' project, what is the status of this ticket? |
This is done for the purposes of DR1, but not yet for future productions, so I'll remove it from "DR1 Patching" and move it to "Jura". For DR1, units were added to the redshift catalogs at the end as part of running desi_zcatalog; for future productions we want this units propagation to be automatic through all intermediate files. Steps to check/fix:
Contributions to fix any of those steps would be valuable, even if someone else needs to dig deeper to fix another step. Note that anywhere that we use fitsio, it is for a specific purposeful reason, so we shouldn't fix this issue just by replacing fitsio with astropy.table.Table stuff. |
Additional note: I originally thought that we'd have assemble_fibermap output the correct units in the first place, and then have all subsequent read+write steps propagate those units. That might be technically messy due to astropy Tables vs. astropy BinTableHDUs vs. fitsio etc. We additionally need to support pipeline steps adding new columns including their units. An alternative to propagate+add would be to have every writer use desispec.io.fibermap.fibermap_columns to re-assert what the units should be and add those while writing, since that is likely what assemble_fibermap would be using in the first place for the original columns. This would also keep the units for all columns in a single place instead of spread over the N>1 steps that add columns. As belt-and-suspenders, readers could also complain if they read in a fibermap column with a claimed unit different from |
Linking to relevant issue #2171. |
Since we're working on |
The DESI data model claims that columns in
redrock
,zpix
,ztile
andzall
files have units, but in fact they do not. There are noTUNIT
keywords set in any of the files in the fuji zcatalog directory, and a spot check shows that they are not set in individual redrock files either.This also makes it much more difficult to propagate units even further downstream to VACs derived from these files.
For the purposes of EDR, I think the response has to be documentation: "even though the data model claims these columns have units, in fact, the files do not contain unit metadata."
All along we've been strongly encouraging VAC authors to put units in data files, but the pipeline is not actually doing that for some of the most critical outputs, so I think for the future, we should really be more careful about propagating units.
The text was updated successfully, but these errors were encountered: