Binning and looping over data?
==============================
The pandas functions `cut` and `groupby` can help.
--------------------------------------------------

The trick is to store the bin information along with the other event information. 

In [1]:
import numpy as np
import pandas as pd

In [2]:
# saving the data with named fields means we can access it in two ways:
# data.E_true
# data['E_true']
# will both return arrays
data = pd.DataFrame({'E_true':[4, 4, 4, 4, 25, 25, 25, 25], 'E_measured':[3.4, 4.2, 5.7, 1.1, 22, 20.9, 23.8, 27.9], 'Yield':[0.4, 0.9, 1.3, 1.2, 1.4, 1.1, 0.8, 0.6]})

For each event, we want to know what bin it's in
------------------------------------------------

The code below will sort each event into the user-specified bins based on E_true.

Saving the "which bin" information will turn out to be handy for analysis.

In [3]:
# add a new column, the range, to the original data
# note that for normal simulated data these "events" would not already be sorted 
data['range'] = pd.cut(data['E_true'], [0,10,30])
print (data)

   E_true  E_measured  Yield     range
0       4         3.4    0.4   (0, 10]
1       4         4.2    0.9   (0, 10]
2       4         5.7    1.3   (0, 10]
3       4         1.1    1.2   (0, 10]
4      25        22.0    1.4  (10, 30]
5      25        20.9    1.1  (10, 30]
6      25        23.8    0.8  (10, 30]
7      25        27.9    0.6  (10, 30]


We can grab all the data in each bin using `groupby`
----------------------------------------------------

The `groupby` function returns an iterable that walks through each unique bin (here bin information is stored in a field named 'range').

In the example below, bin_data has all the data for one particular bin.

In [4]:
# the unique() function also gives us a list of bins we can iterate over
for bin_name, bin_data in data.groupby('range'):
    # print the bin boundaries
    print (bin_name)
    
    # print the data in this bin
    print (bin_data)
    
    # grab the variables out of the data
    E = np.array(bin_data.E_measured)
    Er_true = np.array(bin_data.E_true)
    x = np.array(bin_data.Yield)    
    #print (E, Er_true, x)
    
    # now we're ready to do the rest of the stuff in the loop

(0, 10]
   E_true  E_measured  Yield    range
0       4         3.4    0.4  (0, 10]
1       4         4.2    0.9  (0, 10]
2       4         5.7    1.3  (0, 10]
3       4         1.1    1.2  (0, 10]
(10, 30]
   E_true  E_measured  Yield     range
4      25        22.0    1.4  (10, 30]
5      25        20.9    1.1  (10, 30]
6      25        23.8    0.8  (10, 30]
7      25        27.9    0.6  (10, 30]


`unique` is useful but should not be used for iteration
-------------------------------------------------------

In [None]:
# get a list of all the values in a column
# apparently you get a "category" object back?
print (data.range.unique(), '\n')

# turn that list into a string
label_arr = data.range.unique().astype(str)

print (label_arr[0])