-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DL2 data model and output #1673
DL2 data model and output #1673
Conversation
…nto refactor/dl1writer_to_datawriter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed together, at least for stereo I vouch for
/dl2/event/stereo
- shower
- energy
- classification
and then under each of those the output of the algorithm(s).
For mono, I can't say much since I am not working on that, but I would say to minimize the divergencies and just use the same approach we have now but with only 1 tel_id
I left a couple of comments to start!
Codecov Report
@@ Coverage Diff @@
## master #1673 +/- ##
==========================================
+ Coverage 90.78% 90.83% +0.05%
==========================================
Files 183 183
Lines 14197 14299 +102
==========================================
+ Hits 12889 12989 +100
- Misses 1308 1310 +2
Continue to review full report at Codecov.
|
Note that it would be quite easy to also add the ability to write DL0 data to this, and then we would have a nice testbed for what HDF5 DL0 data could look like. |
the more I think about it, I think having the option to split the per-tel mono reconstructions into datasets by tel_id or tel_type would still be useful, even if it adds some overhead from having many more tables. First because it then mirrors how DL1 is organized, and second because it makes it easier to generate mono reconstruction before merging files by telescope (e.g. one has the option to do DL1+DL2_mono, merge, and then compute DL2_stereo). Of course that still is possible with flat mono tables, but it would require a |
…nto refactor/dl1writer_to_datawriter
…#1717) * Allow regexp in table name for TableWriter.exclude() * updated docstring
- setup now uses regexps for exclusions and transforms, much simpler - output is now split like DL1 for telescope/mono output
fix some style warnings and optimize imports fix bug introduced in last commit (overwrote existing variable) fixed a bunch of pyflakes warnings - made HDF5TableWriter._h5file public - optimized some imports - changed some log statements to use lazy logging instead of f-strings
009ac3b
to
86043d0
Compare
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
ctapipe/io/metadata.py
Outdated
["R0", "R1", "DL0", "DL1", "DL2", "DL3", "DL4", "DL5", "DL6", "Other"], "Other" | ||
data_category = Enum(["Sim", "A", "B", "C", "Other"], "Other") | ||
data_level = List( | ||
Enum( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better just use the DataLevel
enum: datalevel=List(Enum([level.name for level in DataLevel]))
85a7cf7
to
08f56ed
Compare
DL1_PARAMETERS = auto() | ||
DL2 = auto() | ||
DL3 = auto() | ||
R0 = auto() # Raw data in camera or simulation format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you put it with a colon above, it actually works for the sphinx docs:
#: Raw data in ...
R0 = auto()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
This PR is the first of several changes needed to support the "stage 2" output (DL1→DL2) and also training output (files with a subset of DL1 + DL2 information in them). In general the user should see no main change here, ctapipe-stage1 still works as before; it will in the future be superceded (but not replaced) by a version that can optionally also generate DL2 info once this PR is merged.
The main changes are
event.dl2
to support telescope-wise (event.dl2.tel
)and shower-wise (event.dl2.stereo
) dl2 parametersDL1Writer
toDataWriter
, and now supports two new optionswrite_stereo_shower
andwrite_mono_shower
, which generate the/dl2/event
part of the data modelTelListToMaskTransform
(which somehow was missed in the previous refactoring of transformations). This is needed to turn a variable-length tel_id list (e.g. which telescopes were used in reconstruction) into a fixed-length bitmask, just like the trigger pattern.DL2 output supports multiple reconstructions done at once. In the current DL2 data model it generates looks like this:
![image](https://user-images.githubusercontent.com/11677812/118020081-022f1280-b35a-11eb-8418-36a413ee2b3e.png)
the Reference metadata are also extended to support multiple data levels , currently as a stringified list in JSON format, but it could be perhaps better as just a comma-separated list:
Data are currently split just like for DL1, but with two additional group nodes: quantity being reconstructed (energy, geometry, particle class), and the algorithm name.
Open questions:
misc
Future Todos (for another PR):