Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ctapipe-process #1726

Merged
merged 44 commits into from
Aug 2, 2021
Merged

Implement ctapipe-process #1726

merged 44 commits into from
Aug 2, 2021

Conversation

kosack
Copy link
Contributor

@kosack kosack commented May 24, 2021

This is intended to be the minimal change to implement DL2 processing in ctapipe. The interface is not perfect (see issues below), but is enough to get started - other issues can be solved in future PRs.

Features Added:

  • Renamed ctapipe-stage1ctapipe-process
  • added a ShowerProcessor to the event loop, so DL2 can optionally be computed and written
  • added example configurations for stage1 (DL1), stage2 (DL2), and training (DL1+DL2)
  • defined two internal properties: should_compute_dl1 and should_compute_dl2 where the logic for when (re-)computation should occur is placed (see note below). Right now these are set to very simplistic defaults: compute DL2 if either mono or stereo parameters are requested to be written, and compute DL1 if the user wants to write parameters or DL2 showers. This doesn't cover all useful cases yet.
ctapipe-process --config stage1.json --input events.simtel.gz --output event.DL1.h5
ctapipe-process --config stage2.json --input events.DL1.h5  --output events.DL2.h5
ctapipe-process --config training.json --input events.simtel.gz --output events.DL1DL2.h5

Minor issue on re-computation logic:

right now the logic for should_compute_dl1 is to always re-compute DL1_PARAMETERS if DL2 is requested and they do not already exist in the input file. It may be useful to add add an option like "force_recompute_parameters" in case the user wants to ensure they are re-computed even if they already exist (this can be a separate PR).

TODOs:

  • clean up recompute logic
  • add unit tests for each "configuration" (stage1, stage2, training) and for starting from both simtel and DL1 files.

to think about in future PRs:

config includes

Right now the config files have a lot of repetition. That would benefit from using a config system that allows "includes": which means either switching from JSON to Python-based config files (which is supported already by traitlets.config, but is slighly ugly in that every parameter needs to have it's full hierarchy specified, like c.ShowerProcessor.ShowerQualityQuery=...), or have a way to specify multiple --config options to a Tool, so that we can do for example:

ctapipe-process --config stage1_config.json --config cuts_config.json --input gammas.simtel.gz

EDIT: this is now in issue #1732

subarray selection

to do a full analysis at the DL2 level, one needs to select the appropriate subarray. This is only possible right now by using the --allowed-tels option, which is not so friendly. Allowing it to be selected by name from a file containing mappings (e.g see #1632) would make this much friendlier. e.g. add options like

--subarray-definition=prod5_subarrays.yaml --subarray="1LST3MST"

@kosack kosack marked this pull request as draft May 24, 2021 15:57
@kosack kosack added this to In progress in Reference Pipeline Tools May 24, 2021
@codecov
Copy link

codecov bot commented May 24, 2021

Codecov Report

Merging #1726 (1b88a42) into master (f92cbf7) will increase coverage by 0.05%.
The diff coverage is 96.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1726      +/-   ##
==========================================
+ Coverage   91.98%   92.03%   +0.05%     
==========================================
  Files         186      187       +1     
  Lines       14640    14683      +43     
==========================================
+ Hits        13466    13514      +48     
+ Misses       1174     1169       -5     
Impacted Files Coverage Δ
ctapipe/tools/dl1_merge.py 82.10% <ø> (ø)
ctapipe/tools/tests/test_process.py 95.12% <95.12%> (ø)
ctapipe/tools/process.py 95.45% <97.14%> (ø)
ctapipe/conftest.py 95.41% <100.00%> (ø)
ctapipe/core/tests/test_tool.py 100.00% <100.00%> (ø)
ctapipe/io/datawriter.py 88.07% <100.00%> (ø)
ctapipe/io/tests/test_dl1eventsource.py 98.68% <100.00%> (-0.05%) ⬇️
ctapipe/reco/shower_processor.py 100.00% <100.00%> (ø)
ctapipe/tools/tests/test_merge.py 100.00% <100.00%> (ø)
ctapipe/tools/tests/test_tools.py 100.00% <100.00%> (+2.58%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f92cbf7...1b88a42. Read the comment docs.

@kosack
Copy link
Contributor Author

kosack commented Jul 22, 2021

It should automatically transform that column into a fixed-length boolean array. @HealthyPear can you say what was the typo in your config that caused it to malfunction? We should give a better error if there is a typo...

@HealthyPear
Copy link
Member

HealthyPear commented Jul 22, 2021

It should automatically transform that column into a fixed-length boolean array. @HealthyPear can you say what was the typo in your config that caused it to malfunction? We should give a better error if there is a typo...

Nothing that you can really fix from ctapipe I think: as you can see I set the ellipticity limit to 0.1 two times instead of > 0.1 & < 0.6

@maxnoe
Copy link
Member

maxnoe commented Jul 22, 2021

It should automatically transform that column into a fixed-length boolean array.

That is not the case

can you say what was the typo in your config that caused it to malfunction? We should give a better error if there is a typo...

Not really something you can catch, the image quality query was defined so that no events could ever survive.

@kosack
Copy link
Contributor Author

kosack commented Jul 22, 2021

It should automatically transform that column into a fixed-length boolean array.

That is not the case

It should be, but I guess only if there is a first event to make it initialize (with no events, you're correct we can't really check for it)

writer.add_column_transform_regexp(
table_regexp="dl2/event/subarray/.*",
col_regexp="tel_ids",
transform=tr_tel_list_to_mask,
)

@maxnoe
Copy link
Member

maxnoe commented Jul 22, 2021

It should be, but I guess only if there is a first event to make it initialize (with no events, you're correct we can't really check for it)

I think it is because it is None, not an empty list. It probably should be an empty list.

@HealthyPear
Copy link
Member

I noticed that /dl1/monitoring/telescope/pointing/tel_XXX doesn't have refernce to either of ["obs_id", "tel_id", "event_id"].

In case of simulated data it's OK I guess, but since we are anyway preparing for real data, shouldn't we add such columns that table anyway?

@maxnoe
Copy link
Member

maxnoe commented Jul 30, 2021

In case of simulated data it's OK I guess, but since we are anyway preparing for real data, shouldn't we add such columns that table anyway?

There are several issues connected to this:

TL:DR the current format is actually ideal for observations (where timestamps are unique and we have pointing information every couple of seconds that needs to interpolated to the timestamp of each event) but is is very suboptimal for the simulations.

@HealthyPear HealthyPear mentioned this pull request Jul 30, 2021
3 tasks
@maxnoe maxnoe merged commit cdd2fe5 into master Aug 2, 2021
@maxnoe maxnoe deleted the feature/processortool branch August 2, 2021 08:10
@HealthyPear HealthyPear moved this from In progress to Done in Reference Pipeline Tools Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants