Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DL2 data model and output #1673

Merged
merged 36 commits into from
May 19, 2021

Conversation

kosack
Copy link
Contributor

@kosack kosack commented Apr 6, 2021

This PR is the first of several changes needed to support the "stage 2" output (DL1→DL2) and also training output (files with a subset of DL1 + DL2 information in them). In general the user should see no main change here, ctapipe-stage1 still works as before; it will in the future be superceded (but not replaced) by a version that can optionally also generate DL2 info once this PR is merged.

The main changes are

  • containers: cleanups to event.dl2 to support telescope-wise (event.dl2.tel )and shower-wise (event.dl2.stereo) dl2 parameters
  • dl1writer: renamed DL1Writer to DataWriter, and now supports two new options write_stereo_shower and write_mono_shower, which generate the /dl2/event part of the data model
  • dl1eventsource: support reading files that happen to also have DL2 info in them (no change otherwise, still only reads DL1 info, since not sure it's needed to read DL2 eventwise)
  • io.tableio: added a TelListToMaskTransform (which somehow was missed in the previous refactoring of transformations). This is needed to turn a variable-length tel_id list (e.g. which telescopes were used in reconstruction) into a fixed-length bitmask, just like the trigger pattern.
  • various minor changes elsewhere to support the renaming of containers and DL1Writer.

DL2 output supports multiple reconstructions done at once. In the current DL2 data model it generates looks like this:
image

the Reference metadata are also extended to support multiple data levels , currently as a stringified list in JSON format, but it could be perhaps better as just a comma-separated list:

image

Data are currently split just like for DL1, but with two additional group nodes: quantity being reconstructed (energy, geometry, particle class), and the algorithm name.

Open questions:

misc

  • should we add DL2_MONO and DL2_STEREO as datalevels? It only makes sense for full reconstruction workflows, not training workflows (where you might want other subsets of DL2, like only the geometry and not the energy)
    • this should be determined later (See note below about data sub-levels)

Future Todos (for another PR):

  • Update ctapipe-merge to also merge DL2 info → deferred to another PR
  • Implement DL0 writing (quite simple to do now), to serve as a testbed for DL0 format

@kosack kosack requested a review from HealthyPear April 6, 2021 19:41
Copy link
Member

@HealthyPear HealthyPear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed together, at least for stereo I vouch for

/dl2/event/stereo

  • shower
  • energy
  • classification

and then under each of those the output of the algorithm(s).

For mono, I can't say much since I am not working on that, but I would say to minimize the divergencies and just use the same approach we have now but with only 1 tel_id

I left a couple of comments to start!

ctapipe/containers.py Show resolved Hide resolved
ctapipe/io/tests/test_datawriter.py Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Apr 7, 2021

Codecov Report

Merging #1673 (08f56ed) into master (a68e357) will increase coverage by 0.05%.
The diff coverage is 95.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1673      +/-   ##
==========================================
+ Coverage   90.78%   90.83%   +0.05%     
==========================================
  Files         183      183              
  Lines       14197    14299     +102     
==========================================
+ Hits        12889    12989     +100     
- Misses       1308     1310       +2     
Impacted Files Coverage Δ
ctapipe/io/tests/test_metadata.py 100.00% <ø> (ø)
ctapipe/reco/impact.py 44.82% <0.00%> (ø)
ctapipe/reco/tests/test_ImPACT.py 64.93% <0.00%> (ø)
ctapipe/reco/reco_algorithms.py 91.66% <50.00%> (ø)
ctapipe/io/tableio.py 91.44% <82.60%> (-1.89%) ⬇️
ctapipe/io/datawriter.py 87.93% <95.94%> (ø)
ctapipe/containers.py 100.00% <100.00%> (ø)
ctapipe/io/__init__.py 100.00% <100.00%> (ø)
ctapipe/io/datalevels.py 100.00% <100.00%> (ø)
ctapipe/io/dl1eventsource.py 91.16% <100.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a68e357...08f56ed. Read the comment docs.

@kosack
Copy link
Contributor Author

kosack commented Apr 7, 2021

Note that it would be quite easy to also add the ability to write DL0 data to this, and then we would have a nice testbed for what HDF5 DL0 data could look like.

@kosack kosack changed the title Refactor dl1writer to general DataWriter (needed for stage2) Refactor DL1Writer to general DataWriter (needed for stage2) Apr 7, 2021
@kosack kosack marked this pull request as ready for review April 7, 2021 17:07
@kosack
Copy link
Contributor Author

kosack commented Apr 9, 2021

the more I think about it, I think having the option to split the per-tel mono reconstructions into datasets by tel_id or tel_type would still be useful, even if it adds some overhead from having many more tables. First because it then mirrors how DL1 is organized, and second because it makes it easier to generate mono reconstruction before merging files by telescope (e.g. one has the option to do DL1+DL2_mono, merge, and then compute DL2_stereo). Of course that still is possible with flat mono tables, but it would require a sort() operation after appending to do the merge by telescope.

kosack and others added 8 commits May 13, 2021 22:37
…#1717)

* Allow regexp in table name for  TableWriter.exclude()

* updated docstring
- setup now uses regexps for exclusions and transforms, much simpler
- output is now split like DL1 for telescope/mono output
fix some style warnings and optimize imports

fix bug introduced in last commit (overwrote existing variable)

fixed a bunch of pyflakes warnings

- made HDF5TableWriter._h5file public
- optimized some imports
- changed some log statements to use lazy logging instead of f-strings
@kosack kosack force-pushed the refactor/dl1writer_to_datawriter branch from 009ac3b to 86043d0 Compare May 13, 2021 20:37
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

ctapipe/io/datawriter.py Outdated Show resolved Hide resolved
["R0", "R1", "DL0", "DL1", "DL2", "DL3", "DL4", "DL5", "DL6", "Other"], "Other"
data_category = Enum(["Sim", "A", "B", "C", "Other"], "Other")
data_level = List(
Enum(
Copy link
Member

@maxnoe maxnoe May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better just use the DataLevel enum: datalevel=List(Enum([level.name for level in DataLevel]))

@kosack kosack force-pushed the refactor/dl1writer_to_datawriter branch from 85a7cf7 to 08f56ed Compare May 16, 2021 13:32
@kosack kosack requested review from HealthyPear and maxnoe May 17, 2021 18:15
DL1_PARAMETERS = auto()
DL2 = auto()
DL3 = auto()
R0 = auto() # Raw data in camera or simulation format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you put it with a colon above, it actually works for the sphinx docs:

#: Raw data in ...
R0 = auto()

Copy link
Member

@maxnoe maxnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@maxnoe maxnoe merged commit 5a874ec into cta-observatory:master May 19, 2021
@kosack kosack deleted the refactor/dl1writer_to_datawriter branch May 19, 2021 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants