Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input DL2 format #5

Closed
HealthyPear opened this issue May 20, 2020 · 8 comments · Fixed by #36
Closed

Input DL2 format #5

HealthyPear opened this issue May 20, 2020 · 8 comments · Fixed by #36
Labels
input/output Format and file extensions of the input/output data.
Milestone

Comments

@HealthyPear
Copy link
Member

At the start of the project the DL2 input format was that of protopipe.
The final format to be read by pyirf should be the one provided by ctapipe.

For reference, here are some of the relevant issues/PR present there,

Of course that work is still ongoing, so this is just an initial direction.

@HealthyPear HealthyPear added the input/output Format and file extensions of the input/output data. label May 20, 2020
@HealthyPear HealthyPear added this to To do in Next release May 20, 2020
@HealthyPear HealthyPear added this to the 0.1.0 milestone May 20, 2020
@HealthyPear
Copy link
Member Author

Temporary input formats to be supported:

@kosack
Copy link

kosack commented Jun 17, 2020

DL2 is fortunately much simpler than DL1, especially since you only need the DL2/Event/Subarray information (a single table with shower + discrimination parameters). It essentially will looks like the DL3 format, only in HDF5 and with possibly more than one shower reconstruction, so we'll need to think of some conventions for that.

I'd suggest designing the software similar to how IACT-Tools (or whatever the FACT one is called), where the columns you use as input are just a dict somewhere, so you can easily adapt to different formats.

One question from me: will this package do all of the parts of Stage 3 of the pipeline, or just the IRF parts? For that I mean, there is a first step to divide events into reconstruction classes and choose which reconstruction to use for each event (if multiple are included at once). It's not clear if that is part of this or not. If not, then what you really take as input is not DL2/Event, but really DL3/Event (where the final reconstruction has been chosen, event quality classification has been applied, and gamma/hadron discrimination has been applied)

@HealthyPear
Copy link
Member Author

One question from me: will this package do all of the parts of Stage 3 of the pipeline, or just the IRF parts? For that I mean, there is a first step to divide events into reconstruction classes and choose which reconstruction to use for each event (if multiple are included at once). It's not clear if that is part of this or not. If not, then what you really take as input is not DL2/Event, but really DL3/Event (where the final reconstruction has been chosen, event quality classification has been applied, and gamma/hadron discrimination has been applied)

The package is meant to do what protopipe.perf does (even though in that case there are a lot of simplifications).

Then, if it has to be really an independent tool (even from ctapipe), I guess we can discuss about the part in which cuts are applied.
In my (maybe limited) view, the event classes or quality levels should be defined generally (like the definitions of the single IRFs), so it makes sense that this part is also done generally (if the purpose of pyirf is to be used also by other IACT facilities outside CTA).

@HealthyPear
Copy link
Member Author

DL2 is fortunately much simpler than DL1, especially since you only need the DL2/Event/Subarray information (a single table with shower + discrimination parameters). It essentially will looks like the DL3 format, only in HDF5 and with possibly more than one shower reconstruction, so we'll need to think of some conventions for that.

If the purpose of pyirf is to be used by CTA and other related insturments, it is obvious that there will not be a sigle data format in the end.
Otherwise, the current plan remains unchanged.

I'd suggest designing the software similar to how IACT-Tools (or whatever the FACT one is called), where the columns you use as input are just a dict somewhere, so you can easily adapt to different formats.

Could you point this to me? I am afraid I am not familiar with it...

@maxnoe
Copy link
Member

maxnoe commented Jun 17, 2020

If the purpose of pyirf is to be used by CTA and other related insturments, it is obvious that there will not be a sigle data format in the end.

I don't think pyirf should concern itself to much with input file formats.

Provide library functions that take plain arrays of the needed quantities and some example scripts how to use these functions with file format X

Could you point this to me? I am afraid I am not familiar with it...

https://github.com/fact-project/aict-tools

@HealthyPear
Copy link
Member Author

I don't think pyirf should concern itself to much with input file formats.

Provide library functions that take plain arrays of the needed quantities and some example scripts how to use these functions with file format X

But if the same variable in some other DL2 file (not produced with a ctapipe-based pipeline) has a different name, this has to be coded into pyirf no?

https://github.com/fact-project/aict-tools

Thank you!

@maxnoe
Copy link
Member

maxnoe commented Jun 17, 2020

But if the same variable in some other DL2 file (not produced with a ctapipe-based pipeline) has a different name, this has to be coded into pyirf no?

No, if you only provide library functions it is on the user side from where to read a given quantity.

calculate_sensitivity(
     e_est=my_df['gamma_energy_prediction'].to_numpy(),
     ...
)

If you want to provide command line utilities that take input files, the story is different.
Than we need to think about what to support and how flexible it should be.

But even in that case, we could get away with fits + hdf5 (maybe root with uproot) and some configuration values for column names.

@HealthyPear HealthyPear mentioned this issue Sep 3, 2020
7 tasks
@HealthyPear HealthyPear linked a pull request Sep 26, 2020 that will close this issue
@HealthyPear
Copy link
Member Author

As per #36, now any specific input format translation to the internal data format of pyirf is left to the specific pipeline which will import pyirf functions.

Next release automation moved this from To do to Done Sep 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
input/output Format and file extensions of the input/output data.
Projects
Next release
  
Done
Development

Successfully merging a pull request may close this issue.

3 participants