Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use signal_fraction for training particle classifier #2465

Merged
merged 8 commits into from Feb 23, 2024

Conversation

LukasBeiske
Copy link
Contributor

@LukasBeiske LukasBeiske commented Nov 20, 2023

This removes the n_signal and n_background options of ctapipe-train-particle-classifier. Instead the total number of training events n_events and the signal_fraction can be chosen, where
$$\texttt{signal\_fraction}= \frac{n_s}{n_s + n_b}.$$
If n_events is not specified, as many events as possible will be used considering the given signal_fraction.

Copy link

codecov bot commented Nov 20, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (9b2088d) 92.53% compared to head (3e2e736) 92.53%.

Files Patch % Lines
src/ctapipe/tools/train_particle_classifier.py 92.85% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2465   +/-   ##
=======================================
  Coverage   92.53%   92.53%           
=======================================
  Files         235      235           
  Lines       20024    20062   +38     
=======================================
+ Hits        18529    18565   +36     
- Misses       1495     1497    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@LukasBeiske
Copy link
Contributor Author

LukasBeiske commented Nov 22, 2023

I'm not sure why this provenance test is failing right now. It works on my machine...

Edit: Fixed by #2469

Tobychev
Tobychev previously approved these changes Nov 22, 2023
Tobychev
Tobychev previously approved these changes Nov 23, 2023
Tobychev
Tobychev previously approved these changes Dec 14, 2023
# - [type, "LST*", 1000]
# - [type, "MST*", 1000]
# - [type, "MST*", 1000] # If not specified, as many events as possible are used.
signal_fraction: 0.5 # signal_fraction = n_signal / n_events
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find it more intuitive I think to define the ratio?

I.e. 1.0 means use as much signal as background?

Copy link
Contributor

@Tobychev Tobychev Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think signal fraction being how much signal is in the total is pretty intuitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are merits to both options:

  • signal fraction in combination with n_events makes it a little bit more intuitive how many signal and how many background events are used, but we write out this information in the logs anyway.
  • signal ratio is maybe a bit closer to actual use cases, e.g. "Let's use twice as many signal events than background events".

I don't have a personal preference.

@maxnoe maxnoe merged commit d739701 into cta-observatory:main Feb 23, 2024
13 of 14 checks passed
@LukasBeiske LukasBeiske deleted the gh_fraction branch February 23, 2024 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants