Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update raw data model based on spectrograph tests #73

Merged
merged 23 commits into from Apr 4, 2019

Conversation

weaverba137
Copy link
Member

This PR still needs some work, and very careful review. Do not merge yet!

In addition to updating the data model for raw data files, this PR:

There are several open questions about the data model that are written directly in the desi-EXPID.rst file.

@weaverba137 weaverba137 added the WIP Work in Progress label Feb 26, 2019
@weaverba137 weaverba137 self-assigned this Feb 26, 2019
Copy link
Contributor

@sbailey sbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments inline. High level issue is that the FV tests are writing a per-spectrograph format which is very similar but slightly different to the final multi-spectrograph format. Let's keep this documenting the final multi-spectrograph format and see if we can get ICS to make the single-spectrograph format a simple subset of that instead of a different format.

Number EXTNAME Type Contents
================= ========= ======== ====================================
HDU0_ SP0 IMAGE Blank except for header keywords
HDU1_ SPECTCON0 BINTABLE Telemetry data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until the FV tests started, I hadn't realized that Klaus has one format for single spectrograph readouts and a different but closely related format for multi-spectrograph readouts. Let's keep this datamodel file focused on what the final multi-spectrograph format should be, and then work with ICS to see if they would be willing to make the single-spectrograph case a simple subset of that (instead of a slightly different format).

Single spectrograph Multi-spectrograph pixsims (so far)
HDU0 = SP0 .. SP9 HDU0 = SPS HDU0 = (no EXTNAME)
HDU1 = SPECTCON0 .. SPECTCON9 last HDU = SPECCONS not in pixsims
filename sp{n}-*.fits.fz filename desi-*.fits.fz filename desi-*.fits.fz

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this data model should be focused on the final multi-spectrograph file. But what I have to work with to get all the details of the header keywords are the single spectrograph files. One does what one must...

doc/DESI_SPECTRO_DATA/NIGHT/EXPID/desi-EXPID.rst Outdated Show resolved Hide resolved
doc/DESI_SPECTRO_DATA/NIGHT/EXPID/desi-EXPID.rst Outdated Show resolved Hide resolved
DATASUM 0 str data unit checksum updated 2019-02-20T21:27:57
TELRA 335.03 float Telescope pointing RA [degrees] **MISSING**
TELDEC 19.88 float Telescope pointing dec [degrees] **MISSING**
AIRMASS 1.0 float Airmass at middle of exposure **MISSING**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a data model structure for documenting optional header keywords? e.g. TELRA, TELDEC should be present for on-sky FLAVOR=SCIENCE exposures, but doesn't apply to dome flats or test-slit exposures of single spectrographs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think about that. I think the current verification code treats all non-trivial keywords as mandatory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only does the verification code treat all non-trivial keywords as mandatory, the order of the keywords also matters, because the easiest way to compare the file to the model is one-to-one. We're pushing the limits of the .rst format as it is. Let's punt optional keywords to a separate issue.

doc/DESI_SPECTRO_DATA/NIGHT/EXPID/desi-EXPID.rst Outdated Show resolved Hide resolved
* The compressed HDUs in the "sp0" files contain ``ZSIMPLE`` keyword. This would
be appropriate in a compressed *primary* HDU but not in a compressed extension.
Make sure that the images are actually compressed *as extensions*, not as
individual images that are then shoved into an HDU.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to have notes here, but we'll likely need a different way to converge with Klaus/ICS about issues like this (i.e. don't expect Klaus to be reading the datamodel and responding...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted.

@weaverba137
Copy link
Member Author

@sbailey, I'd like to get this finished before too much time passes, however, I've made as much progress as I can using the spN files, and I don't see any recent, realistic desi- files in DESI_SPECTRO_DATA. Is it time to have a telecon with Klaus?

@sbailey
Copy link
Contributor

sbailey commented Apr 3, 2019

Thanks for remembering this dangling update.

/project/projectdirs/desi/spectro/data/20190306/00001209/desi-00001209.fits.fz is an example file from spectrograph functional verification. It has 2 spectrographs of data in the merged format output by ICS. That particular case is a dark exposure, and includes features such as

  • not all spectrographs exist (ok)
  • spectrograph HDUs can be in any order (ok)

Other examples NIGHT/EXPID combos listed at https://desi.lbl.gov/trac/wiki/SpectrographsFunctionalVerification/SP2/DataSummary .

Please update the datamodel to match that file, while still keeping the MISSING tags for keywords we were expecting.

For datamodel validation, ok to punt the concept of optional keywords to a future update. We can also punt the order of the keywords issue to a future update (though I really would like to support differently ordered keywords unless there is a compelling data reason why the keywords must be in order, not just validation convenience).

@weaverba137
Copy link
Member Author

OK, thanx. When you say that spectrograph HDUs can be in any order, is that going to be a permanent thing, or just for these tests? I anticipate that validating HDUs in an arbitrary order might be somewhat more difficult than validating keywords in arbitrary order within one HDU.

@sbailey
Copy link
Contributor

sbailey commented Apr 3, 2019

Spectrograph HDUs in any order is a permanent feature, which is one of the reasons why people should access HDUs by name rather than number. For the validator, it means we need to treat these like comparing two dictionaries (with arbitrary order keys, even if they are the same set) instead of two lists (with a guaranteed order).

This is also true for offline pipeline results: we only have one serial writer per file so they happen to always be in the same order, but we don't guarantee the order and reserve the right to change that order or add an extra debugging/optional HDU and it shouldn't break people's code if they are using names instead of numbers to access HDUs.

The raw data is a particularly tricky case for an automated validator, since SPECTCONS should always exist in the raw data (though we don't yet have it in the pixsims), and at least one of the [BRZ][0-9] HDUs should exist, but which set of those is completely arbitrary. Feel free to spin this off to a future ticket.

For the purposes of this PR, let's just get the datamodel to match the files as-written by ICS, while flagging any keywords that we were expecting but don't see. Thanks.

@weaverba137
Copy link
Member Author

OK. IF we can really enforce that every HDU (OK, not necessarily the first) has an EXTNAME, then that would make the problem simpler to solve.

@sbailey
Copy link
Contributor

sbailey commented Apr 3, 2019

See issue #69 for EXTNAME policy. HDU > 1 shall have an EXTNAME, and HDU 0 is a special case depending upon whether it has meaningful data and/or header keywords. Klaus is also a fan of using EXTNAME instead of number, so I think we can rely upon that for both ICS-generated data and pipeline-generated data.

@weaverba137
Copy link
Member Author

Yes, I know about the policy, but what progress have we made throughout the entire DESI code base, not just the raw data, on enforcing the policy?

@sbailey
Copy link
Contributor

sbailey commented Apr 3, 2019

I think we have nearly 100% compliance for the "shall" clauses, and decent compliance with the "should" clauses for HDU 0. The only exceptions to the "shall" clauses that I know about are some desimodel data files with a single binary table HDU 1 (like footprint/desi-tiles.fits). I.e. I'm fine with the datamodel validator complaining about cases that don't match the policy and then us fixing any cases we find (e.g. adding an EXTNAME when we update the tiles file with Eddie's new dither pattern). In fact I thought the validator was already doing that after PR #39 addressing issue #36.

@weaverba137
Copy link
Member Author

Yes, the validator is already doing that, but if we are going to discard the idea that HDUs are in a particular order, then EXTNAME is an absolute necessity. Just like comparing header keywords within an HDU by dictionary keys, we would also need to compare entire HDUs by dictionary keys, and EXTNAME is that key.

@daqiii
Copy link

daqiii commented Apr 3, 2019 via email

@weaverba137
Copy link
Member Author

@daqiii, I think we're good in terms of the raw data. It's the files used and/or produced in other parts of the pipeline that I'm concerned about. And I think we're mostly OK there as well, but this has to apply to all DESI data files, so I have to ask.

@weaverba137
Copy link
Member Author

I have updated the desi-EXPID.rst file as best as I can given 20190306/00001209/desi-00001209.fits.fz. I have added more details to the problems section. Please read the whole file carefully.

@weaverba137
Copy link
Member Author

@sbailey, should I leave this open until all the problems are addressed?

@sbailey
Copy link
Contributor

sbailey commented Apr 4, 2019

Rather than try to describe in English my suggested changes to this branch before merging, I opened PR #75 with a specific suggestion to document all cameras. I recognize that this might cause the validator to generate a bunch of warnings about missing HDUs, but ok. The previous example file was 20190306/00001209/desi-00001209.fits.fz with spectrographs 0 and 2 configured, but other exposures taken near that time had just spectrograph 0 or just spectrograph 2, so this current "spectrograph-tests" branch (documenting B0,R0,Z0,B2,R2,Z2) would only cleanly validate on a subset of recent exposures anyway.

We don't need to resolve full validator flexibility before merging this PR, but I would like to have at least:

  • documenting that all cameras B0..Z9 could exist, in any order and subset combination
  • ok if the validator complains about missing HDUs when run on a recent file, as long as it doesn't outright crash from me getting the doc syntax wrong.
  • ok to wait for future branches/PRs for checking HDUs, keywords, and columns by name instead of a specific order (issue Compare by key, not by list. #74)

sbailey and others added 2 commits April 4, 2019 15:55
@weaverba137 weaverba137 merged commit 72cf1ca into master Apr 4, 2019
@weaverba137 weaverba137 deleted the spectrograph-tests branch April 4, 2019 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in Progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants