# Reading ROOT files and plotting data

This example notebook reads a ROOT file and plots data on a set of variables.

We start by loading the required packages. In our case we use:
- `uproot` to read the ROOT files,
- `pandas` to process structured data,
- `matplotlib` to create plots.

### Getting started

In [None]:
import uproot as up;
import pandas as pd;
import matplotlib as mpl;

# We want the plots to appear inline in this notebook
%matplotlib inline
mpl.style.use('ggplot')

### Reading the ROOT file

Next we read the ROOT file using the `uproot` package (with alias `up`). We need to refer to the full `/lustre` location of the cached `/mss` file structure because (currently) the `/cache` link points to the theory side of the `/mss` structure (yes, there are two separate `/mss` systems).

In [None]:
tfile = up.open('/lustre/expphy/cache/hallc/qweak/rootfiles/pass5b/QwPass5b_18110.000.trees.root')

We get the tree from the file by name. You can get a list of names by calling the `tfile.keys()` function.

In [None]:
ttree = tfile.get("Hel_Tree")

### Retrieving a branch

In the tree we create a pandas data frame from the array of data in the branch "asym_qwk_bcm1". Again, we can get the list of names by calling the `ttree.keys()` function.

In [None]:
bcm1 = pd.DataFrame(ttree.array("asym_qwk_bcm1"))

The `describe()` function on a pandas data frame is often a convenient way to get a quick overview of the behavior of the data. In our case, this displays for example the mean and standard deviation of all leafs (or fields, in pandas parlance).

In [None]:
bcm1.describe()

### Retrieving a set of leaves

We can easily create pandas data frames from subsets of fields by indexing a data frame with a list. Notice how we are passing the list, with its square brackets, as an index, with a second set of square brackets.

By using the `corr()` function we get a quick numerical correlation matrix. Later we will plot the set of correlation scatter plots.

In [None]:
bcm1[["hw_sum", "block0", "block1", "block2", "block3"]].corr()

### Plotting the data

Just as `describe()` quickly returns the numerical properties of a data frame, you can use `bootstrap_plot()` to perform a quick visual inspection of the data. We specify that we look at only a fraction of the data set by plotting a fixed number of samples.

In [None]:
pd.plotting.bootstrap_plot(bcm1[["hw_sum"]], samples=500)

A lag plot can help us identify whether there is time dependence in the data set.

In [None]:
pd.plotting.lag_plot(bcm1[["hw_sum"]])

Finally, we can pass a set of leaves and create a matrix of scatter plots.

In [None]:
pd.plotting.scatter_matrix(bcm1[["hw_sum", "block0", "block1", "block2", "block3"]],
                           alpha = 0.3, figsize = (14,8), diagonal='kde')