## Coffea Histograms
In addition to handling the nanoAOD input, coffea has a histogram format which has some features which are useful for needs.  One of the most useful thing about them is the ability to aggregate data when using a coffea processor (which you'll use later)

### Pyplot histograms
The histograms you used previous were made with matplotlib.  These are useful for quickly making a histogram from numpy arrays

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
#make histograms of three random normal distributions
p1 = np.random.normal(130,10,400)
p2 = np.random.normal(100,25,5000)
p3 = np.random.normal(80,100,2000)
plt.hist(p1,bins=np.linspace(0,200,41),histtype='step',label='Process 1')
plt.hist(p2,bins=np.linspace(0,200,41),histtype='step',label='Process 2')
plt.hist(p3,bins=np.linspace(0,200,41),histtype='step',label='Process 3')
plt.legend();

This can be thought of to represent some property of three different datasets: Process 1, Process 2, and Process 3.

Each of the histograms have the same x-axis (binned from 0 to 200, with a width of 5)

In this case, the histograms are only made when you draw them, and can not be updated (ex, adding more data).

### Coffea histograms

Histograms can be made in coffea in a similar way, but with a few key difference.

First, coffea hist objects need to be defined first, then filled (similar to a histogram in ROOT).  This makes it possible to 

Second, coffea histograms support having multiple axes.  So, for example, rather than three histograms as in the example above (one for each process), you can make one histogram with all of the values in it, with one axis for the data points, and another axis tracking the process type

In [None]:
from coffea import hist

In [None]:
#Need to declare the list of axes first

#Create axis, which we'll call 'x', for the data.
# This will be a 'Bin' axis, storing numerical values that we wanted binned from 0 to 200 with a bin width of 5
x_axis = hist.Bin("x", r"$x$", 40, 0., 200)

#Declare an axis for the dataset
# This is a 'Cat' axis, storing categories of data
# In this type of axis, if a new value is filled, a new category 'bin' is created on the axis
dataset_axis = hist.Cat("dataset","Dataset")

#declare the histogram, with a label and all of the axes
thisHistogram = hist.Hist("Observed Counts", 
                          dataset_axis, 
                          x_axis)


In [None]:
thisHistogram.fill(dataset='Process 1',
                  x=p1)
thisHistogram.fill(dataset='Process 2',
                  x=p2)
thisHistogram.fill(dataset='Process 3',
                  x=p3)


Histogram can be drawn with the hist.plot1d function, however it can only handle 1 dimension, not multiple dimensions, so we need to tell it how to handle the 'dataset' axis.

In this case, we will use the overlay option, telling it to overlay the different values in the dataset axis on the same plot


In [None]:
hist.plot1d(thisHistogram,overlay='dataset')
plt.show()

#We can also make a stacked plot, using the stack=True
hist.plot1d(thisHistogram,overlay='dataset',stack=True)
plt.show()

#and can change the order, but specifying a list with the order we want to stack the histograms in (bottom to top)
hist.plot1d(thisHistogram,overlay='dataset',stack=True,order=['Process 3','Process 2','Process 1'])
plt.show()

It is also possible to apply selections on a specific axis, for example selecting only one dataset to be drawn, or only a specific range of the graphe

This can be in one of two ways:
 - Integrating/summing over an axis.  This will select all or a portion of the values in an axis to be integrated over.  Afterwards, the axis no longer exists (similar to a Projection on a TH2 in ROOT)
 - Slicing on an axis.  This can select a range of values within an axis.  Some bins from an axis are removed, but the axis itself remains (similar to SetRange in ROOT)

In [None]:
#Summing
print('Histogram before summing:')
print(thisHistogram)
print('Histogram after summing dataset:')
print(thisHistogram.sum('dataset'))

## dataset axis will be removed by summing, so there is no second axis to overlay
hist.plot1d(thisHistogram.sum('dataset'))


In [None]:
#Integrating
print('Histogram after integrating dataset over only Process 1:')
print(thisHistogram.integrate('dataset',['Process 1']))
print('Histogram after integrating dataset over Process 1 and Process 2:')
print(thisHistogram.integrate('dataset',['Process 1','Process 2']))

#Notice in both cases, there is no dataset axis, even though they include different number of processes.  
#Similar to summing, integrating removes the axis, and selected bins are summed together

hist.plot1d(thisHistogram.integrate('dataset',['Process 1']))
plt.show()

hist.plot1d(thisHistogram.integrate('dataset',['Process 1','Process 2']))
plt.show()

In [None]:
#Slicing
# coffea histograms can support numpy style array slicing over the bins in a histogram.  
# This can be used to select a subset of bins in a histogram, but not sum over them

#example, selecting two processes
hist.plot1d(thisHistogram[['Process 1','Process 2'],:], overlay='dataset')
plt.show()

#example, selecting two processes
hist.plot1d(thisHistogram[:,75:175], overlay='dataset', stack=True)
plt.show()


In [None]:
## TODO
# make a stacked plot of process 1 and process 3, between 100 and 150






Coffea is capable of handling even more axes.  In the example below, we create a histogram with 3 dimensions: pt, eta, and dataset

In [None]:
#define a histogram with 3 axes, the dataset, and then the pt and eta of a particle
pt_axis = hist.Bin("pt", r"$p_T$", 40, 0., 200)
eta_axis = hist.Bin("eta", r"$\eta$", 10, -3, 3)
dataset_axis = hist.Cat("dataset","Dataset")

pt_eta_Histogram = hist.Hist("Observed Counts", 
                              dataset_axis, 
                              pt_axis,
                              eta_axis)

#Let's fill with randomly generated data for the pt and eta
pt_wgamma = np.random.exponential(10,1000)
eta_wgamma = np.random.normal(0,1,1000)
pt_eta_Histogram.fill(dataset='WGamma',
                     pt=pt_wgamma,
                     eta=eta_wgamma)

pt_tt = np.random.exponential(10,1000)
eta_tt = np.random.normal(0,2,1000)
pt_eta_Histogram.fill(dataset='TTbar',
                     pt=pt_tt,
                     eta=eta_tt)

pt_ttgamma = np.random.exponential(30,2000)
eta_ttgamma = np.random.normal(0,1,2000)
pt_eta_Histogram.fill(dataset='TTGamma',
                     pt=pt_ttgamma,
                     eta=eta_ttgamma)



The plot1d function can handle at most 2 dimensions (one to be plotted, one to be overlayed)

When more than two axes are present, overlaying is not enough.  We need to sum or integrate over some of the axes to reduce the number to 1 or 2

In [None]:
#Sum over eat, drawing pt for a stack of datasets
hist.plot1d(pt_eta_Histogram.sum('eta'),overlay='dataset',stack=True)


In [None]:
# you can also use plot2d function to plot two numerical axes against each other
# this can not handle overlays, so the dataset axis must be summed over
hist.plot2d(pt_eta_Histogram.integrate('dataset','TTGamma'),'pt')

In [None]:
##TODO

# Draw a histogram of eta, stacking the datasets


In [None]:
##TODO
# Draw a histogram of pt, for only eta > 0, stacking the datasets


In [None]:
##TODO
# Draw a histogram of pt, for only TTGamma, with the eta axis as the overlay


In [None]:
import awkward as ak
from coffea.nanoevents import NanoEventsFactory, NanoAODSchema

#the NanoAODSchema needs to be adjusted, to remove cross references to FSRPhotons
class SkimmedSchema(NanoAODSchema):
    def __init__(self, base_form):
        base_form["contents"].pop("Muon_fsrPhotonIdx", None)
        super().__init__(base_form)

fileName = '/udrive/staff/dnoonan/Skims/TTGamma_SingleLept_2016_skim.root'
eventsTTGamma = NanoEventsFactory.from_root(fileName, schemaclass=SkimmedSchema,entry_stop=100000).events()

fileName = '/udrive/staff/dnoonan/Skims/TTbarPowheg_Semilept_2016_skim_1of10.root'
eventsTTbar = NanoEventsFactory.from_root(fileName, schemaclass=SkimmedSchema,entry_stop=100000).events()

fileName = '/udrive/staff/dnoonan/Skims/WGamma_2016_skim.root'
eventsWGamma = NanoEventsFactory.from_root(fileName, schemaclass=SkimmedSchema,entry_stop=100000).events()


## To do

Using the same selection as you applied in LoadTopSkims, select events with exactly 1 good muon, 0 good electrons, 

 - Make a coffea histogram for the muon pt, eta, and phi
 - Make a coffea histogram for the leading jet pt, eta, and phi
 - Make a coffea histogram for the leading photon pt, eta, phi, and relIso (filled from pfRelIso03_chg branch)
 - Fill these with data from TTgamma, TTbar, and Wgamma
 - Make plots of each of these variables, with datasets stacked