Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiplicity cut #15

Closed
HealthyPear opened this issue Jul 30, 2020 · 13 comments
Closed

Add multiplicity cut #15

HealthyPear opened this issue Jul 30, 2020 · 13 comments
Labels
enhancement New feature or request
Milestone

Comments

@HealthyPear
Copy link
Member

Description:

This cut should be applied on DL2 data and it should target the number of telescopes used to reconstruct successfully an image.
Both CTA-MARS and EventDisplay do this, but in a different way (see this document), so this should be a function which, depending on an argument read from the configuration file, applies the given method.

Notes:

Since we used protopipe.perf to create the new DL3 generator pyirf, it is best for this feature to be implemented directly there.

@HealthyPear HealthyPear transferred this issue from cta-observatory/protopipe Sep 9, 2020
@HealthyPear HealthyPear added the enhancement New feature or request label Sep 9, 2020
@HealthyPear HealthyPear mentioned this issue Sep 9, 2020
6 tasks
@HealthyPear HealthyPear added this to To do in Next release via automation Sep 9, 2020
@HealthyPear HealthyPear added this to the 0.1.0 milestone Sep 9, 2020
@HealthyPear
Copy link
Member Author

IMPORTANT

It seems that EventDisplay DL2 data (at least a gamma_onSource.S.3HB9-FD_ID0.eff-0 file) has the multiplicity cut already applied (i.e. no event with multiplicity less than 4 is present).

This means that we cannot effectively test this cut in pyirf with these version of the files, and we would need them produced with all multiplicities.

I refer here @GernotMaier and @TarekHC for reference.

@TarekHC
Copy link

TarekHC commented Sep 18, 2020

@HealthyPear: Quick and very naive question: why do you need multiplicities less than 4 to test pyirf? If those multiplicities were not used to produce the public performance IRFs, then why do you need them?

In any case I guess we could eventually add them. @GernotMaier I understand 2-telescope events may be more problematic, but why 3-telescope events are not present in the files?

@HealthyPear
Copy link
Member Author

HealthyPear commented Sep 18, 2020

@HealthyPear: Quick and very naive question: why do you need multiplicities less than 4 to test pyirf? If those multiplicities were not used to produce the public performance IRFs, then why do you need them?

This would be a good point if we were only trying to reproduce EventDisplay current IRFs, but meanwhile, the pipelines algorithms, arrays, etc are evolving, so in general, we need uncut DL2 data where we can optimize ourselves quantities like the telescope multiplicity (which again - in general) depend on the science case or custom needs of an observational proposal.

So, to better answer your question: for the near future steps (#4 ) we don't need it but later we should be as general as possible

@vuillaut
Copy link
Member

If there are flat cuts applied a priori, it should be sufficient to know them and apply the same for the comparison.

@HealthyPear
Copy link
Member Author

If there are flat cuts applied a priori, it should be sufficient to know them and apply the same for the comparison.

This is the problem (at least as I see it): we cannot apply it, because it has been already applied before entering pyirf.
Thus, we cannot check that (when we apply it on other uncut data) we do it correctly.

@vuillaut
Copy link
Member

Sorry, I'll try to be more clear.
There are two things to achieve:

  1. get similar (same) IRFs with pyIRF than with EventDisplay starting from the same event list.
    For that, you don't care (you don't even need to know) that some pre-selection cuts have been applied, because for sure that have been applied in both pipelines. That will validate pyIRF.
  2. get similar IRFs with (pyIRF + ctapipe) than with EventDisplay starting from the same RAW data. That would validate the whole pipeline. In this case, indeed we need to know if there were pre-selection cuts. But we are not there yet imo.

Or I am missing something?

@HealthyPear
Copy link
Member Author

Sorry, I'll try to be more clear.
There are two things to achieve:

1. get similar (same) IRFs with pyIRF than with EventDisplay starting from the same event list.
   For that, you don't care (you don't even need to know) that some pre-selection cuts have been applied, because for sure that have been applied in both pipelines. That will validate pyIRF.

Yes

2. get similar IRFs with (pyIRF + ctapipe) than with EventDisplay starting from the same RAW data. That would validate the whole pipeline. In this case, indeed we need to know if there were pre-selection cuts. But we are not there yet imo.

Yes, but since we are about to refactor/rewrite it is better to remember that we need to check (or even require) for any pre-selection cuts

@HealthyPear
Copy link
Member Author

Sorry, it is likely that we are saying the same thing - these specific files are sufficient, I just wanted to report this since this issue is about adding this cut which doesn't exist yet in the code.

@vuillaut
Copy link
Member

Yes, but since we are about to refactor/rewrite it is better to remember that we need to check (or even require) for any pre-selection cuts

Ok but this is not an issue for pyIRF imo, but rather for the pipeline getting to DL2.
There are other pre-selection cuts (e.g. cleaning) that are applied. If you want a pipeline to pipeline comparison, then you need to know and apply the exact same pre-selection cuts in both of them.

@TarekHC
Copy link

TarekHC commented Sep 20, 2020

My comment was coming mainly from the bold, capitalized "IMPORTANT". I just wanted to understand why it is that important.

To test pyIRF to cross-check eventDisplay IRFs (which would be very relevant!), which is what I originally thought the objective of producing these tables, I don't think adding those multiplicities is that important.

For other uses it may be, but I just wanted to understand it... So no pressure. Let's see what Gernot thinks.

@maxnoe
Copy link
Member

maxnoe commented Sep 20, 2020

I also don't think this is relevant. If it the problem is testing if a multiplicity cut in pyiRF works, we can just test a harder cut than what is present in the current files and all hell would have to break loose if the code breaks for multiplicity > 2 but not for multiplicity > 4.

@GernotMaier
Copy link
Collaborator

Being late here: I apply fix multiplicity cuts to the analysis. This is non-ideal, as obviously the multiplicity should be one of the optimisation parameter - but I simply cannot do this at this point without putting a bit of work into some code changes.

But this means also that all training of g/h separators and cut optimisation steps are done after the application of the multiplicity cuts. I will prepare files with the other multiplicities, but performance estimates are different (as Tarek said: for smaller multiplicities, the precision is lower)

@HealthyPear
Copy link
Member Author

Since this cut is as simple as doing e.g.

p["events"] = p["events"][p["events"]["multiplicity"] >= 4

I will close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Next release
  
Done
Development

No branches or pull requests

5 participants