# Hax cut helpers example

Jelle Aalbers, September 2016

In [1]:
import numpy as np

import hax
from hax import cuts

# Put your own minitree paths here:
hax.init(minitree_paths=['/home/aalbers/minitrees', '/home/breur/minitrees'])

# Load some example data
data = hax.minitrees.load(2397)



# Basic selections

If you've worked with pandas before, you're probably used to applying selections like so:

In [2]:
data_cleaned = data[data['cs1'] > 0]

You could use this and forget about the cut helpers in `hax`.

With hax.cuts, you can use the `selection` function to do this. The bonus is you get some helpful printout:

In [3]:
data_cleaned = cuts.selection(data, data['cs1'] > 0, desc='cs1 positive')

cs1 positive selection: 2335 events removed (87.25% passed)


The returned data is still just an ordinary DataFrame:

In [12]:
data_cleaned.head()

Unnamed: 0,run_number,event_number,cs1,cs2,drift_time,largest_coincidence,largest_other_s1,largest_other_s2,largest_unknown,largest_veto,...,s1_area_fraction_top,s1_range_50p_area,s2,s2_area_fraction_top,s2_range_50p_area,x,y,z,event_duration,event_time
0,2397,0,1222.30659,294135.478484,202420.0,0.0,0.0,476.97522,15.525992,0.0,...,0.231636,59.035375,137789.578125,0.591118,860.407498,17.8302,-35.286343,-29.95816,2000000,1472143174364587770
1,2397,1,2070.467415,587224.379717,10280.0,0.0,0.0,4825.305176,15.848387,0.0,...,0.311077,60.353428,565039.5625,0.475916,503.601973,-17.580828,-31.795113,-1.52144,2000000,1472143175532110430
3,2397,3,4779.406797,964529.400183,57270.0,0.0,0.0,23758.634766,15.786177,0.0,...,0.292347,65.360561,778285.5625,0.476345,689.031898,-15.336466,29.301378,-8.47596,2000000,1472143175792061000
4,2397,4,1410.203809,333759.620045,9800.0,0.0,7.264468,1478.398315,15.08042,0.0,...,0.345447,55.567166,321728.46875,0.514536,384.873814,19.326441,34.288849,-1.4504,2000000,1472143175827041280
5,2397,5,510.975157,266.900761,266360.0,0.0,0.0,80.707474,14.407209,0.0,...,0.157664,54.571612,98.398926,0.580705,372.379972,8.852757,-46.757519,-39.42128,2000000,1472143176061278640


Shortcuts are available for simple cuts: `above`, `below`, `notnan`, `isfinite`:

In [4]:
data = cuts.above(data, 'cs1', 0)

cs1 above 0 selection: 2335 events removed (87.25% passed)


There's also a `cuts.cut` function, which removes the selected events rather than passes them. All the functions below that have `selection` in their name have a similar counterpart with `cut`.

# Range selections

You'll often want to require a certain variable to be in a certain range. That's where `range_selection` is for:

In [5]:
z_slice = cuts.range_selection(data, 'z', (-20, -10))

z in [-20, -10) selection: 14407 events removed (9.82% passed)


If you have several range selections, you can apply them in one go using `range_selections`:

In [6]:
first_quadrant = cuts.range_selections(data, ('x', (0, 60)), ('y', (0, 60)))

x in [0, 60) selection: 8132 events removed (49.10% passed)
y in [0, 60) selection: 3814 events removed (51.38% passed)


# Cut history

hax.cuts keeps track of which cuts you've applied on your DataFrames. You can access this information via cuts.history:

In [7]:
cuts.history(first_quadrant)

Unnamed: 0,selection_desc,n_before,n_after,n_removed,fraction_passed,cumulative_fraction_left
0,cs1 above 0,18311,15976,2335,0.872481,0.872481
1,"x in [0, 60)",15976,7844,8132,0.490986,0.428376
2,"y in [0, 60)",7844,4030,3814,0.513768,0.220086


This dataframe shows the selection you applied and some info about them. This is especially useful if you have multiple dataframes with different cuts around:

In [8]:
cuts.history(z_slice)

Unnamed: 0,selection_desc,n_before,n_after,n_removed,fraction_passed,cumulative_fraction_left
0,cs1 above 0,18311,15976,2335,0.872481,0.872481
1,"z in [-20, -10)",15976,1569,14407,0.09821,0.085686


Here's what the columns mean:
 - **selection_desc**: Description of the selection, i.e., which events did this select?
 - **n_before**: events in the dataframe before the cut.
 - **n_after**: events in the dataframe after the cut.
 - **n_removed**: events removed by the cut
 - **fraction_passed**: fraction of events before the cut that were passed.
 - **cumulative_fractoin_left**: fraction of events from the *original* data (before any cuts) left after this cut.

There are some limitations to this cut history recording:
  * We can't register cuts you perform outside of hax.cuts (eg. with the `data = data[mask]` syntax)
  * If you copy a dataframe, or save/load it from a file, this history is lost.

# Repeating cuts

Sometimes you accidentally repeat cuts (for example by running a cell twice). While this is harmless, it would print out a confusing passthrough info ("100% passed!") and clutter the cut history. For this reason `hax.cuts` prevents you from applying a cut with the same description twice:

In [9]:
z_slice = cuts.range_selection(z_slice, 'z', (-20, -10))

z in [-20, -10) selection already performed on this data; cut skipped. Use force_repeat=True to repeat. Showing historical passthrough info.
z in [-20, -10) selection: 14407 events removed (9.82% passed)


In [10]:
cuts.history(z_slice)

Unnamed: 0,selection_desc,n_before,n_after,n_removed,fraction_passed,cumulative_fraction_left
0,cs1 above 0,18311,15976,2335,0.872481,0.872481
1,"z in [-20, -10)",15976,1569,14407,0.09821,0.085686


Again, there are some limitations:
 * `hax.cuts` merely checks the description (`desc` argument) of the cut. If you provide your own descriptions, it will not protect you against repeating the same cut under a different name.
 * We can't warn you against overlapping cuts, e.g. first selecting [-20, -10] in z and then selecting [-15, -5]. 
 
If for some arcane reason you want to circumvent this protection altogether, pass the force_repeat=True argument to a selection/cut function.