-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Selector class for keeping track of quality cuts #1207
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1207 +/- ##
==========================================
+ Coverage 86.24% 86.64% +0.39%
==========================================
Files 185 187 +2
Lines 11600 11759 +159
==========================================
+ Hits 10004 10188 +184
+ Misses 1596 1571 -25
Continue to review full report at Codecov.
|
to mitigate security issue of using eval here, see #1163 (comment) |
@kosack the cutflow tests fail now |
ensure only np.X and u.X functions and builtins can be used in the eval() of each selection function (Statements like import are already excluded by eval()).
since dicts lose their order when converted to/from JSON, YAML, etc, we have to use a list instead, since order matters
I've now made the security changes requested, and also switched to using a list of tuples of (criteria_description, selection_function) for configuration. One further simplification could be to remove the criteria_description, so you just have the function string, and no more detail. Any preference whether or not we keep the descriptive name or not? The name was just there to make it easy for somebody to understand why the cut was there, but for most cuts I guess the function string would be simple enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the description, I would leave it in
This reminds me of pandas.DataFrame.query. So how about the class being called |
That's not a bad idea - yes, it does not drop anything, that's still up to the user, it just returns an array of which criteria passed and keeps a running total. Originally I think I called it It's only useful for event-wise analysis, of course, anything operating on all events at once like a In a standard analysis there would be one of these for:
If we switch to "chunked" analysis, where we write out DL1 tables and reconstruct them whole, then this is better done another way. So it's just a stop-gap until then and could eventually be dropped. |
This replaces the older
utils.CutFlow
with a new Component class that can be configured from a config file. It is a class that can apply a series of quality criteria to an object, and keep track of totals (e.g. for cuts that happen per-event, before data are written to an output table) - it doesn't reject events, rather it just keeps track of which criteria pass. The totals can also be written to an output file, as anastropy.Table
.It's used like so:
Table output (from the unit test)
You can also for example store the results in an output table with HDF5TableWriter, as a column of flags, for each event.
Note for example, you could pass it a Container or something if you wanted and have criteria like:
An example can be found in #1163 , where it's used to select good images, configured as follows in a JSON file:.
Caveats:
The name 'Selector' may be confusing since there already exists a→ Renamed to QualityQueryGainSelector
that is a different thing altogether (suggestions for a better name, or keep this one? I had originally called it 'Checker')Currently the critieria are stored as adict
, but there is a strong assumption on order (for the cumulative product), so perhaps this is not a good idea if the data will be written to e.g. a YAML file in the future, where dicts don't preserve order (in pyyaml at least). Perhaps a list of tuples would be better? or a list of dicts?