Add a Selector class for keeping track of quality cuts #1207

kosack · 2019-12-16T18:31:14Z

This replaces the older utils.CutFlow with a new Component class that can be configured from a config file. It is a class that can apply a series of quality criteria to an object, and keep track of totals (e.g. for cuts that happen per-event, before data are written to an output table) - it doesn't reject events, rather it just keeps track of which criteria pass. The totals can also be written to an output file, as an astropy.Table.

It's used like so:

criteria = [
    ('criteria_1',  'lambda x: (x>2.0) and (x < 100)') ,
    ('another_check', 'lambda x: x < 3'), 
]
select = QualityQuery(quality_criteria=criteria) # can also be configured from a config file

# assuming the input is always the same type, we just call select(input) for each item:
for ii in range(100):
    result = query(ii) # result is an array of bools for each criterion
    if all(result == True):
         # do stuff

print(select.to_table()) # shows the total results as a table, with both individual and cumulative product sums.

Table output (from the unit test)

In [11]: query.to_table(functions=True)
Out[11]:
<Table length=4>
      criteria       counts cumulative_counts        func
       str20         int64        int64             str17
-------------------- ------ ----------------- -----------------
               TOTAL      4                 4    lambda x: True
         high_enough      3                 3   lambda x: x > 3
a_value_not_too_high      3                 2 lambda x: x < 100
            smallish      2                 1  lambda x: x < 10

You can also for example store the results in an output table with HDF5TableWriter, as a column of flags, for each event.
Note for example, you could pass it a Container or something if you wanted and have criteria like:

[
    ('good_min_width': 'lambda hillas: hillas.width > 0.2'),
    ('good_min_length': 'lambda hillas: hillas.length >1.0)'
]

An example can be found in #1163 , where it's used to select good images, configured as follows in a JSON file:.

  "ImageQualityQuery": {
        "quality_criteria": [
            ["enough_pixels", "lambda im: np.count_nonzero(im) > 2"],
            ["enough_charge", "lambda im: im.sum() > 100"]
        ]
    }

Caveats:

The name 'Selector' may be confusing since there already exists a GainSelector that is a different thing altogether (suggestions for a better name, or keep this one? I had originally called it 'Checker') → Renamed to QualityQuery
Currently the critieria are stored as a dict, but there is a strong assumption on order (for the cumulative product), so perhaps this is not a good idea if the data will be written to e.g. a YAML file in the future, where dicts don't preserve order (in pyyaml at least). Perhaps a list of tuples would be better? or a list of dicts?

issue a FutureWarning

codecov · 2019-12-16T18:46:44Z

Codecov Report

Merging #1207 into master will increase coverage by 0.39%.
The diff coverage is 96.90%.

@@            Coverage Diff             @@
##           master    #1207      +/-   ##
==========================================
+ Coverage   86.24%   86.64%   +0.39%     
==========================================
  Files         185      187       +2     
  Lines       11600    11759     +159     
==========================================
+ Hits        10004    10188     +184     
+ Misses       1596     1571      -25

Impacted Files	Coverage Δ
ctapipe/core/tests/test_qualityquery.py	`95.65% <95.65%> (ø)`
ctapipe/core/qualityquery.py	`97.72% <97.72%> (ø)`
ctapipe/core/__init__.py	`100.00% <100.00%> (ø)`
ctapipe/utils/CutFlow.py	`95.08% <100.00%> (+0.16%)`	⬆️
ctapipe/tools/display_dl1.py	`86.27% <0.00%> (-3.83%)`	⬇️
ctapipe/core/traits.py	`94.52% <0.00%> (-3.14%)`	⬇️
ctapipe/visualization/mpl_array.py	`93.80% <0.00%> (-2.28%)`	⬇️
ctapipe/visualization/mpl_camera.py	`80.00% <0.00%> (-0.45%)`	⬇️
ctapipe/image/muon/muon_diagnostic_plots.py	`11.18% <0.00%> (-0.08%)`	⬇️
ctapipe/io/simteleventsource.py	`97.72% <0.00%> (-0.06%)`	⬇️
... and 36 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3171e67...52706dc. Read the comment docs.

ctapipe/core/selector.py

…nto feature/selector

kosack · 2020-03-12T14:20:36Z

to mitigate security issue of using eval here, see #1163 (comment)

ctapipe/utils/CutFlow.py

maxnoe · 2020-03-27T12:31:48Z

@kosack the cutflow tests fail now

ctapipe/core/selector.py

ensure only np.X and u.X functions and builtins can be used in the eval() of each selection function (Statements like import are already excluded by eval()).

since dicts lose their order when converted to/from JSON, YAML, etc, we have to use a list instead, since order matters

kosack · 2020-03-27T14:19:28Z

I've now made the security changes requested, and also switched to using a list of tuples of (criteria_description, selection_function) for configuration.

One further simplification could be to remove the criteria_description, so you just have the function string, and no more detail. Any preference whether or not we keep the descriptive name or not? The name was just there to make it easy for somebody to understand why the cut was there, but for most cuts I guess the function string would be simple enough.

maxnoe

I like the description, I would leave it in

watsonjj · 2020-03-27T14:42:25Z

The name 'Selector' may be confusing since there already exists a GainSelector that is a different thing altogether (suggestions for a better name, or keep this one? I had originally called it 'Checker')

This reminds me of pandas.DataFrame.query. So how about the class being called Query? It does not seem like this class is doing any selecting itself...it is instead just querying the data.

kosack · 2020-03-27T15:00:04Z

This reminds me of pandas.DataFrame.query. So how about the class being called Query? It does not seem like this class is doing any selecting itself...it is instead just querying the data.

That's not a bad idea - yes, it does not drop anything, that's still up to the user, it just returns an array of which criteria passed and keeps a running total. Originally I think I called it DataChecker, but didn't really like that term (but maybe it was ok). Query may be too generic a term since there will be other queries, potentially like for databases. Perhaps QualityQuery (so you'd have things like ImageQualityQuery in the config file)? The purpose is to evaluate a set of criteria on the same object (event, images, etc).

It's only useful for event-wise analysis, of course, anything operating on all events at once like a DataFrame would just use frame.query() as you suggest. It's also useful for storing some quality info in the case the images are not recorded (so you can still check why for example there are no hillas parameters for some events, even if you don't have the original image).

In a standard analysis there would be one of these for:

checking if images in an event should be parameterized (just to avoid errors, or to reduce DL1 data size, but perhaps it's not a good idea to make such cuts and store all events?)
checking if telescopes in an event should be used in reconstruction
checking if an event should be reconstructed at all

If we switch to "chunked" analysis, where we write out DL1 tables and reconstruct them whole, then this is better done another way. So it's just a stop-gap until then and could eventually be dropped.

kosack added 5 commits December 16, 2019 18:18

add Selector class

15a34af

warn that utils.CutFlow is deprecated

806331e

issue a FutureWarning

check that order is preserved

6438ac5

reformat

e6c5142

update docstring

4363af6

added more checks on selection function

11c3b29

maxnoe reviewed Dec 17, 2019

View reviewed changes

ctapipe/core/selector.py Outdated Show resolved Hide resolved

kosack added 6 commits December 17, 2019 17:36

add docstring

1db7442

use collections.abc.Callable

2269af2

fix some warnings

34beefb

add rule for glossaries

36f12be

delete accidental commit

6a6df52

Merge branch 'master' of https://github.com/cta-observatory/ctapipe i…

5c9bacd

…nto feature/selector

kosack mentioned this pull request Mar 3, 2020

DL1 writer tool (ctapipe-stage1-process) #1163

Merged

maxnoe reviewed Mar 19, 2020

View reviewed changes

ctapipe/utils/CutFlow.py Show resolved Hide resolved

maxnoe reviewed Mar 27, 2020

View reviewed changes

ctapipe/core/selector.py Outdated Show resolved Hide resolved

kosack added 2 commits March 27, 2020 14:32

add some security

292b08d

ensure only np.X and u.X functions and builtins can be used in the eval() of each selection function (Statements like import are already excluded by eval()).

change to use a list rather than dict

68d5af7

since dicts lose their order when converted to/from JSON, YAML, etc, we have to use a list instead, since order matters

maxnoe previously approved these changes Mar 27, 2020

View reviewed changes

add some docstrings

4d01dba

kosack dismissed maxnoe’s stale review via 4d01dba March 27, 2020 14:35

fixed bug in CutFlow (deprecated)

48247d7

kosack added 2 commits March 27, 2020 16:26

renamed Selector -> QualityQuery

e7b029f

add to __init__ and API docs

52706dc

watsonjj approved these changes Mar 30, 2020

View reviewed changes

maxnoe approved these changes Mar 30, 2020

View reviewed changes

kosack merged commit c7832e3 into cta-observatory:master Mar 30, 2020

kosack deleted the feature/selector branch October 2, 2020 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Selector class for keeping track of quality cuts #1207

Add a Selector class for keeping track of quality cuts #1207

kosack commented Dec 16, 2019 •

edited

Loading

codecov bot commented Dec 16, 2019 •

edited

Loading

kosack commented Mar 12, 2020

maxnoe commented Mar 27, 2020

kosack commented Mar 27, 2020 •

edited

Loading

maxnoe left a comment

watsonjj commented Mar 27, 2020

kosack commented Mar 27, 2020 •

edited

Loading

Add a Selector class for keeping track of quality cuts #1207

Add a Selector class for keeping track of quality cuts #1207

Conversation

kosack commented Dec 16, 2019 • edited Loading

Caveats:

codecov bot commented Dec 16, 2019 • edited Loading

Codecov Report

kosack commented Mar 12, 2020

maxnoe commented Mar 27, 2020

kosack commented Mar 27, 2020 • edited Loading

maxnoe left a comment

Choose a reason for hiding this comment

watsonjj commented Mar 27, 2020

kosack commented Mar 27, 2020 • edited Loading

kosack commented Dec 16, 2019 •

edited

Loading

codecov bot commented Dec 16, 2019 •

edited

Loading

kosack commented Mar 27, 2020 •

edited

Loading

kosack commented Mar 27, 2020 •

edited

Loading