# Reference
[uproot documentation](https://uproot.readthedocs.io/en/latest/)

# File management and helper classes

You'll find a Python module uproot_io alongside this notebook that provides some functions to process the ROOT files such that you shouldn't need to worry about how things are stored by uproot.

This sets up two classes, Events and View. The Events class holds all of the information loaded from a ROOT file (example usage is given below), as well as a couple of functions to help with filtering information by event number and PDG code. There's also a make_sequential function you don't need to worry about, which handles some technicalities around event number uniqueness. Examples using these filtering functions are given below.

You'll want to have a look at the Event class in particular, as the `__init__` function's  self.* variables represent all of the information you'll have access to. You won't necessarily need all (or even most) of these variables, but it's worthwhile to become familar with what's available.

View (a synonym for wire plane) is a helper class that can take an Event object and extract a core subset of information from an event in a given wire plane (most relevant to the BSc project). Each event in a View object is a collection of all hit coordinates in x (common to all wire planes) and z (the 'depth' in the U, V or W view as appropriate), their corresponding ADC values and the true vertex location in the coordinates of the relevant wire plane.

An example event display using View is given below.

In [2]:
from uproot_io import Events, View
import numpy as np

# Interaction type lookup table

interaction_dictionary is a simple lookup table that takes the numerical interaction type  from the Events class and converts it to a human readable string, e.g.

`   interaction_dictionary[3]`

returns

`   'CCQEL_MU_P_P_P'`

indicating that an interaction type code of 3 happens to represent a charged-current quasi-elastic interaction with a muon and three protons in the final state.

In [3]:
# Interaction type lookup
import csv
interaction_dictionary = {}
with open('interactions.csv') as f:
    reader = csv.DictReader(f)
    for row in reader:
        key = int(row.pop('Idx'))
        interaction = row.pop('Interaction')
        interaction_dictionary[key] = interaction

# Get data from tree

In [4]:
events = Events("CheatedRecoFile_1.root")

In [6]:
events.event_number[131]

3

In [7]:
idx = np.where(events.event_number == 23)

In [8]:
idx

(array([2294, 2295, 2296, 2297, 2298, 2299, 2300, 2301, 2302, 2303, 2304,
        2305, 2306, 2307, 2308, 2309, 2310, 2311, 2312, 2313, 2314, 2315,
        2316, 2317, 2318, 2319, 2320, 2321, 2322, 2323, 2324, 2325, 2326,
        2327, 2328, 2329, 2330, 2331, 2332, 2333, 2334, 2335, 2336, 2337,
        2338, 2339, 2340, 2341, 2342, 2343, 2344, 2345, 2346, 2347, 2348,
        2349, 2350, 2351, 2352, 2353, 2354, 2355, 2356, 2357, 2358, 2359,
        2360, 2361, 2362, 2363, 2364, 2365, 2366, 2367, 2368, 2369, 2370,
        2371, 2372, 2373, 2374, 2375, 2376, 2377, 2378], dtype=int64),)

In [9]:
events.interaction_type[idx]

array([200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200,
       200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200,
       200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200,
       200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200,
       200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200,
       200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200,
       200, 200, 200, 200, 200, 200, 200])

In [10]:
interaction_dictionary[130]

'NCDIS_P_PIMINUS'

# Extract hits in each view

Getting the hits from each view is quite simple, you need only pass in an Event object and specify the view you want. Now however, if you access a variable in the view and specify a specific event, e.g.

`view_w.z[1]`

you'll get back one array with all of the information for that event in the view of interest. All of the PFO details are lost, but this approach is very convenient for event displays (see below).

In [None]:
view_u = View(events, "u")
view_v = View(events, "v")
view_w = View(events, "w")

In [None]:
view_w.z[1]

# Display an event

In [None]:
import matplotlib
import matplotlib.pyplot as plt
titlesize = 20
labelsize = 14

In [None]:
 %matplotlib inline

Event displays are useful. We encourage you to make extensive use of them in helping you to understand what you're algorithms are doing, both from the perspective of thinking through what you think your algorithm should do, where it might encounter problems, and also as a debugging tool - if you're trying to reconstruct a vertex location, plot it on the relevant event display, as this will be much easier to interpret than the raw numbers.

The example below plots an event display for event 15 in the W wire plane. This example provides a little insight into using different variables in View jointly (the process is similar for Event, though keep in mind most variables in Event are sets of arrays, not single blocks). It also provides some plotting details that will hopefully be useful for constructing future plots.

The range of values of x and z covered by any given event is highly variable and so by default you can get some quite skewed aspect ratios from Python. So to help you understand what the events really look like it can be helpful to enforce a true aspect ratio, this is what get_fig_ratio supplies, and once you pick a size for a figure in x, you can use this to enforce a suitable scale in y via the figsize parameter of plt.subplots. You don't __have__ to do this, and sometimes using the true aspect ratio can be unhelpful if the ratio of z and x extents becomes very large/small, but if you have to think about angles between tracks this approach can prevent a lot of confusion.

You'll often want to provide standard formatting for axes, so this is providing by format_axis (note this assumes a single subplot, you'll want to call this in a loop if you are formatting multiple subplots, passing in each individual subplot axis).

In the event display below you can see the hits from event 15 (not a very exciting event) in various shades of blue, which represent the ADC contribution of each hit, this is achieved by using the view.adc variable to weight the hits at the corresponding coordinate (view.x and view.z). The binning is arbitrarily chosen to be 200 bins in each dimension, and the colour map and associated normalisation are specified. Again, these are arbitrary choices (you can omit them altogether), but I find this combination of colour map and normalisation to work reasonably well (i.e. a colour gradient that is easily interpreted from low to high values).

The true vertex is also added via the call to axes.scatter, where s is the marker size.

In [None]:
def get_fig_ratio(xx, zz):
    x_range = np.max(xx) - np.min(xx)
    z_range = np.max(zz) - np.min(zz)
    return z_range / x_range if x_range else 1

def format_axis(ax):
    global titlesize, labelsize
    ax.tick_params(axis='x', labelsize=labelsize)
    ax.tick_params(axis='y', labelsize=labelsize)
    
def plot_event(view, event):
    xx = view.x[event]
    zz = view.z[event]
    ww = view.adc[event]

    ratio_range = get_fig_ratio(xx, zz)
    fig_x = 12
    fig_y = fig_x / ratio_range
    fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(fig_x, fig_y))
    format_axis(axes)
    axes.set_ylabel("x / cm", fontsize=titlesize)
    axes.set_xlabel("z / cm", fontsize=titlesize)
    h = axes.hist2d(zz, xx, weights=ww, bins=(200, 200), norm=matplotlib.colors.LogNorm(), cmap=plt.get_cmap('Blues'))
    fig.colorbar(h[3], ax=axes, label='ADC')
    axes.scatter(view.true_vtx_z[event], view.true_vtx_x[event], s=50, c='red', label='True vertex')
    axes.legend(fontsize=titlesize)
    print(view.true_vtx_z[event], view.true_vtx_x[event])
    print(np.where(zz - view.true_vtx_z[event] < 0.01))
    fig.tight_layout()
    plt.show()

In [None]:
plot_event(view_w, 1)

# Event displays - Lowest z algorithm

Let's say you want to write a simple algorithm to place the reconstructed vertex at the position of the lowest z coordinate. How do you know if you're algorithm is doing what you want? Many algorithms will have subtleties you may overlook on a first attempt, and scrolling through raw coordinates is not a fun activity.

To demonstrate I've written an algorithm to demonstrate (sorry, solutions.py is not included)

The plotting function replicates that above for the event display, but instead of plotting a true vertex (which you could also plot for comparison), it uses scatter to plot the reconstructed vertex location returned by the low_z function (which returns the reconstructed vertex position for each event in the given view as the second return value, and the index into underlying array of hits for each event in the given view as the first return value (mainly as a check) - there are other entirely reasonable ways to do this).

The key point here is that it's pretty clear with a quick look at the event display that the vertex is in a sensible place given the algorithm being used and now you have a reusable function that you can call to see the reconstructed vertex for any given event and view.

In [None]:
from solutions import low_z

In [None]:
low_idx, low_hits, removals = low_z(view_w)

In [None]:
def plot_low_hit(view, event, low_hits):
    xx = view.x[event]
    zz = view.z[event]
    ww = view.adc[event]

    ratio_range = get_fig_ratio(xx, zz)
    fig_x = 12
    fig_y = fig_x / ratio_range
    fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(fig_x, fig_y))
    format_axis(axes)
    axes.set_ylabel("x / cm", fontsize=titlesize)
    axes.set_xlabel("z / cm", fontsize=titlesize)
    h = axes.hist2d(zz, xx, weights=ww, bins=(200, 200), norm=matplotlib.colors.LogNorm(), cmap=plt.get_cmap('Blues'))
    fig.colorbar(h[3], ax=axes, label='ADC')
    axes.scatter(low_hits[event][1], low_hits[event][0], s=50, c='orange', label='Reco vertex')
    axes.legend(fontsize=titlesize)
    fig.tight_layout()
    plt.show()

In [None]:
plot_low_hit(view_w, 1, low_hits)

# Event displays - r-phi

As another example, we have an r-phi algorithm for reconstructing the vertex. I won't explain the idea behind the algorithm in detail here, but the key thing you need to know is that hits get binned according to their angle relative to some point of interest.

Figuring it if this algorithm is behaving sensibly is a bit more complicated than the low Z algorithm, but event display annotations are __really__ helpful here too, if a bit more complex.

The plotting function follows a similar structure as above in terms of plotting the event. After the event has been plotted it also goes onto produced a histogram showing how many hits fall into each of a number of angular bins (there's also some weighting based on distance, but that's not important for this example). Figuring out if that histogram makes sense can be challenging, but the green lines plotted on the event display show the angular binning, where angles are measured relative to the positive z axis. Plotting these lines on the event display allows you to correlate where the hits reside in the event display with where the peaks are in the histogram, i.e. most of the hits end up in the bin immediately above the z axis.

Again, we would encourage you to try to use techniques like this when assessing how your algorithms are performing, as errors early in the project become increasingly difficult to track down later, and these approaches will give you confidence that future results are built from a solid foundation (they can also often help your understanding of the problem).

In [None]:
from solutions import rphi

In [None]:
phis, weights = rphi(view_w, 1)

In [None]:
def plot_rphi(view, event, phis, candidate_idx):
    xx = view.x[event]
    zz = view.z[event]
    ww = view.adc[event]

    ratio_range = get_fig_ratio(xx, zz)
    fig_x = 12
    fig_y = fig_x / ratio_range
    
    fig = plt.figure(figsize=(1.5 * fig_x, fig_y), tight_layout=True)
    ax = plt.subplot(1,3,(1,2))
    format_axis(ax)
    ax.set_ylabel("x / cm", fontsize=titlesize)
    ax.set_xlabel("z / cm", fontsize=titlesize)
    h = ax.hist2d(zz, xx, weights=ww, bins=(200, 200), norm=matplotlib.colors.LogNorm(), cmap=plt.get_cmap('Blues'))
    
    ax2 = plt.subplot(1,3,3)
    format_axis(ax2)
    ax2.set_ylabel("count", fontsize=titlesize)
    ax2.set_xlabel("phi / rad", fontsize=titlesize)
    binning = np.linspace(-np.pi, +np.pi, 51)
    h = ax2.hist(phis[candidate_idx], bins=binning)
    
    print(np.array(h[0] ** 2).sum())
    
    for phi in binning:
        dz = 1000 * np.cos(phi)
        dx = -1000 * np.sin(phi)
        xb = [ zz[candidate_idx], zz[candidate_idx] + dz ]
        yb = [ xx[candidate_idx], xx[candidate_idx] + dx ]
        ax.plot(xb, yb, c='green')
    
    ax.scatter(zz[candidate_idx], xx[candidate_idx], s=50, c='red', zorder=100, label='Candidate vertex')
            
    plt.show()

In [None]:
plot_rphi(view_w, 1, phis, 0)

In [None]:
plot_rphi(view_w, 1, phis, 600)

# Saving plots and numpy arrays

Below is a simple utility function you can call whenever you want to save a plot, creating directories as needed and producing output both in png, jpg and in vector graphics format

In [None]:
import os

def save_plot(fig, directory, filename):
    if not os.path.exists(directory):
        os.mkdir(directory)
    fig.savefig(f'{directory}/{filename}.png', bbox_inches='tight', dpi=200, facecolor='w')
    fig.savefig(f'{directory}/{filename}.jpg', bbox_inches='tight', dpi=200)
    fig.savefig(f'{directory}/{filename}.svg', bbox_inches='tight', dpi=200)

You'll likely produce numpy arrays as either final results or intermediate steps, it's worth getting into the habit of saving at least some of these arrays. While you will often work with small subsets of the data when testing out new functions there will be times when you will need the larger datasets, and depending on the computational efficiency of your code, some of these steps may take quite a long time.

Python often has a reputation for being slow relative to languages like C++. This is often not actually true, and well written code, using core Python features like list comprehensions can be very fast (they're often compiled C code), but on large data sets, say 100,000 events or more, even very efficient code might take a minute or two, which will quickly get annoying in a debugging cycle, but for less Pythonic code with nested loops, this might turn into many hours, so for intermediate results in particular, you want to be able to run such steps once and then just load the result when you need to do some additional post-processing.

For this numpy.save("your_filename", your_array) and the corresponding your_array = numpy.load("your_filename") will be helpful. Test that these are working on small arrays early and then be sure to save large arrays as necessary, being careful to ensure you don't overwrite things you meant to keep.

# 2D->3D coordinate transformations
The following cell defines key geometric constants for the DUNE far detector wire planes

In [None]:
theta_u = 0.623257100582
theta_v = -0.623257100582
theta_w = 0.

cos_u = np.cos(theta_u)
cos_v = np.cos(theta_v)
cos_w = np.cos(theta_w)
sin_u = np.sin(theta_u)
sin_v = np.sin(theta_v)
sin_w = np.sin(theta_w)

sin_dvu = np.sin(theta_v - theta_u)
sin_dwv = np.sin(theta_w - theta_v)
sin_duw = np.sin(theta_u - theta_w)

The following cell provides 6 functions that map 2D points from UV/UW/VW space into YZ plane of 3D world space (X being the same in wire plane and world spaces - subject to small variations in practise), and 3 functions that map from the YZ plane to the corresponding wire coordinates. Hopefully the names are self-explanatory, but please let us know if anything is unclear.

Given this is Python, if you find it convenient, you could always define uv_to_yz etc functions that return a pair of values y, z if you find it more convenient to have 3 functions for wire to 3D, rather than 6, and similarly you could write yz_to_uvw in a similar manner.

In [None]:
# wire to 3D
def uv_to_y(u, v):
    return ((u * cos_v - v * cos_u) / sin_dvu)

def uv_to_z(u, v):
    return ((u * sin_v - v * sin_u) / sin_dvu)

def uw_to_y(u, w):
    return ((w * cos_u - u * cos_w) / sin_duw)

def uw_to_z(u, w):
    return ((w * sin_u - u * sin_w) / sin_duw)

def vw_to_y(v, w):
    return ((v * cos_w - w * cos_v) / sin_dwv)

def vw_to_z(v, w):
    return ((v * sin_w - w * sin_v) / sin_dwv)

# 3D to wire
def yz_to_u(y, z):
    return z * cos_u - y * sin_u

def yz_to_v(y, z):
    return z * cos_v - y * sin_v

def yz_to_w(y, z):
    return z * cos_w - y * sin_w