DL2 data model and output #1673

kosack · 2021-04-06T19:40:59Z

This PR is the first of several changes needed to support the "stage 2" output (DL1→DL2) and also training output (files with a subset of DL1 + DL2 information in them). In general the user should see no main change here, ctapipe-stage1 still works as before; it will in the future be superceded (but not replaced) by a version that can optionally also generate DL2 info once this PR is merged.

The main changes are

containers: cleanups to event.dl2 to support telescope-wise (event.dl2.tel )and shower-wise (event.dl2.stereo) dl2 parameters
dl1writer: renamed DL1Writer to DataWriter, and now supports two new options write_stereo_shower and write_mono_shower, which generate the /dl2/event part of the data model
dl1eventsource: support reading files that happen to also have DL2 info in them (no change otherwise, still only reads DL1 info, since not sure it's needed to read DL2 eventwise)
io.tableio: added a TelListToMaskTransform (which somehow was missed in the previous refactoring of transformations). This is needed to turn a variable-length tel_id list (e.g. which telescopes were used in reconstruction) into a fixed-length bitmask, just like the trigger pattern.
various minor changes elsewhere to support the renaming of containers and DL1Writer.

DL2 output supports multiple reconstructions done at once. In the current DL2 data model it generates looks like this:

the Reference metadata are also extended to support multiple data levels , currently as a stringified list in JSON format, but it could be perhaps better as just a comma-separated list:

Data are currently split just like for DL1, but with two additional group nodes: quantity being reconstructed (energy, geometry, particle class), and the algorithm name.

Open questions:

misc

should we add DL2_MONO and DL2_STEREO as datalevels? It only makes sense for full reconstruction workflows, not training workflows (where you might want other subsets of DL2, like only the geometry and not the energy)
- → this should be determined later (See note below about data sub-levels)

Future Todos (for another PR):

Update ctapipe-merge to also merge DL2 info → deferred to another PR
Implement DL0 writing (quite simple to do now), to serve as a testbed for DL0 format

…nto refactor/dl1writer_to_datawriter

HealthyPear

As we discussed together, at least for stereo I vouch for

/dl2/event/stereo

shower
energy
classification

and then under each of those the output of the algorithm(s).

For mono, I can't say much since I am not working on that, but I would say to minimize the divergencies and just use the same approach we have now but with only 1 tel_id

I left a couple of comments to start!

ctapipe/containers.py

ctapipe/io/tests/test_datawriter.py

codecov · 2021-04-07T09:35:49Z

Codecov Report

Merging #1673 (08f56ed) into master (a68e357) will increase coverage by 0.05%.
The diff coverage is 95.65%.

@@            Coverage Diff             @@
##           master    #1673      +/-   ##
==========================================
+ Coverage   90.78%   90.83%   +0.05%     
==========================================
  Files         183      183              
  Lines       14197    14299     +102     
==========================================
+ Hits        12889    12989     +100     
- Misses       1308     1310       +2

Impacted Files	Coverage Δ
ctapipe/io/tests/test_metadata.py	`100.00% <ø> (ø)`
ctapipe/reco/impact.py	`44.82% <0.00%> (ø)`
ctapipe/reco/tests/test_ImPACT.py	`64.93% <0.00%> (ø)`
ctapipe/reco/reco_algorithms.py	`91.66% <50.00%> (ø)`
ctapipe/io/tableio.py	`91.44% <82.60%> (-1.89%)`	⬇️
ctapipe/io/datawriter.py	`87.93% <95.94%> (ø)`
ctapipe/containers.py	`100.00% <100.00%> (ø)`
ctapipe/io/__init__.py	`100.00% <100.00%> (ø)`
ctapipe/io/datalevels.py	`100.00% <100.00%> (ø)`
ctapipe/io/dl1eventsource.py	`91.16% <100.00%> (ø)`
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a68e357...08f56ed. Read the comment docs.

kosack · 2021-04-07T15:24:15Z

Note that it would be quite easy to also add the ability to write DL0 data to this, and then we would have a nice testbed for what HDF5 DL0 data could look like.

kosack · 2021-04-09T09:01:09Z

the more I think about it, I think having the option to split the per-tel mono reconstructions into datasets by tel_id or tel_type would still be useful, even if it adds some overhead from having many more tables. First because it then mirrors how DL1 is organized, and second because it makes it easier to generate mono reconstruction before merging files by telescope (e.g. one has the option to do DL1+DL2_mono, merge, and then compute DL2_stereo). Of course that still is possible with flat mono tables, but it would require a sort() operation after appending to do the merge by telescope.

…nto refactor/dl1writer_to_datawriter

…#1717) * Allow regexp in table name for TableWriter.exclude() * updated docstring

- setup now uses regexps for exclusions and transforms, much simpler - output is now split like DL1 for telescope/mono output

fix some style warnings and optimize imports fix bug introduced in last commit (overwrote existing variable) fixed a bunch of pyflakes warnings - made HDF5TableWriter._h5file public - optimized some imports - changed some log statements to use lazy logging instead of f-strings

… (cta-observatory#1720)

review-notebook-app · 2021-05-13T22:32:16Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ctapipe/io/datawriter.py

ctapipe/io/hdf5tableio.py

maxnoe · 2021-05-14T09:54:39Z

ctapipe/io/metadata.py

-        ["R0", "R1", "DL0", "DL1", "DL2", "DL3", "DL4", "DL5", "DL6", "Other"], "Other"
+    data_category = Enum(["Sim", "A", "B", "C", "Other"], "Other")
+    data_level = List(
+        Enum(


Better just use the DataLevel enum: datalevel=List(Enum([level.name for level in DataLevel]))

maxnoe · 2021-05-19T16:00:18Z

ctapipe/io/datalevels.py

-    DL1_PARAMETERS = auto()
-    DL2 = auto()
-    DL3 = auto()
+    R0 = auto()  # Raw data in camera or simulation format


I think if you put it with a colon above, it actually works for the sphinx docs:

#: Raw data in ... R0 = auto()

maxnoe

Looks good to me.

kosack added 2 commits April 6, 2021 21:23

clean up DL2 containers, adding mono reco

164d36e

refactor DL1Writer to DataWriter

3bcf0a6

kosack requested a review from HealthyPear April 6, 2021 19:41

more general loop over reconstruction containers

35ecb7e

HealthyPear mentioned this pull request Apr 7, 2021

Refactoring of the pipeline (TRAINING) cta-observatory/protopipe#110

Closed

9 tasks

kosack added 4 commits April 7, 2021 10:36

allow multiple data levels in reference metadata headers

b7f89d8

Merge branch 'master' of https://github.com/cta-observatory/ctapipe i…

000f6db

…nto refactor/dl1writer_to_datawriter

update docs

e2a0d3f

minor doc correction

3484a06

HealthyPear requested changes Apr 7, 2021

View reviewed changes

ctapipe/containers.py Show resolved Hide resolved

ctapipe/io/tests/test_datawriter.py Show resolved Hide resolved

kosack added 3 commits April 7, 2021 11:23

update __all__

93b9846

fix metadata tests (now data level requires list)

6c23a30

fix metadata

1cb55a0

kosack added 6 commits April 7, 2021 11:46

doc fix

166080e

another doc fix

db79283

renamed output_data_level to datalevels as in EventSource

132448b

fix test

aa58531

change group names to telescope and subarray to match other data models

0567f64

also test classification

a5de494

kosack changed the title ~~Refactor dl1writer to general DataWriter (needed for stage2)~~ Refactor DL1Writer to general DataWriter (needed for stage2) Apr 7, 2021

kosack added 2 commits April 7, 2021 18:55

added Transform for variable list of tel ids to fixed-length mask

8391d5e

properly transform the tel_ids list to a mask

92de75e

kosack marked this pull request as ready for review April 7, 2021 17:07

kosack added 2 commits April 27, 2021 14:18

Merge branch 'master' of https://github.com/cta-observatory/ctapipe i…

455976a

…nto refactor/dl1writer_to_datawriter

Fix test after merge

14bd56e

This was referenced Apr 28, 2021

Implement a shower processor #1675

Merged

implements add_column_transform_regexp feature #1690

Merged

kosack and others added 8 commits May 13, 2021 22:37

Allow regexp in table name for TableWriter.exclude() (cta-observatory…

fe70894

…#1717) * Allow regexp in table name for TableWriter.exclude() * updated docstring

clean up output and setup

d47b8f8

- setup now uses regexps for exclusions and transforms, much simpler - output is now split like DL1 for telescope/mono output

fixed doc reference to old DL1Writer

497415b

fixed broken link

8cb7983

more doc fixes

22f498a

Precompile table regex for column exclusion, fixes cta-observatory#1719…

d45855d

… (cta-observatory#1720)

changed metadata process id to str for merged runs

86043d0

kosack force-pushed the refactor/dl1writer_to_datawriter branch from 009ac3b to 86043d0 Compare May 13, 2021 20:37

fix notebook

4b09f79

optimize imports

b576126