# Vertex performance assessment

This notebook processes the output of the <code>VertexMonitoringAlgorithm</code> to allow comparison of the vertexing performance of different vertexing implementations.

The setup for this assessment allows for the comparison of either two or three samples (this can be helpful for looking at the performance of individual passes of the vertexing network), but it should be fairly clear how to adapt the functions to  present a single case without a reference.

The original implementation of this notebook assumed a comparison between the Pandora vertexing BDT and the newer Pandora deep-learning approach, and as such, the plot labels typically refer to 'Pandora' as the referenece BDT version, and 'Pandora DL' as the deep-learning version, with pass 1 and pass 2 suffixes as appropriate.

Hopefully the individual function names are fairly self-explanatory given that context. The cells up until the 'Two sample comparisons' can be run in order, possibly after updating labels to suit the current use case.

After this, you can run either the two or three sample sections after updating the input file locations and the output filename prefixes as appropriate.

The statistics of interest in these sections are the 'drXY' values, which indicate the maximum distance between the reconstructed and true vertice at which XY% of all event are covered, and then the '% < Xcm' value, which indicate what fraction of events have vertices within a given distance of the true vertex. Finally, these sections produce plots showing the distributions of dr, dx, dy and dz.

The 'DL recovered vertex performance' and 'True neutrino energy plots' are rather special cases that were developed in the context of atmospheric neutrinos and so may not be of interest to your use case. It was found that for atmospheric neutrino samples, not only did the DL vertexing out-perform the BDT in vertex resolution in general, it also happened to identify vertices where the BDT had failed altogether in a substantial number of cases, and these sections extract the subset of events for which this is true to understand the performance in this special case.

In [None]:
import uproot, numpy as np

In [None]:
def load_file(filename, treename):
    file = uproot.open(filename)
    tree = file[treename]
    successes = tree['success'].array(library="np")
    true_nu_energy = tree['trueNuEnergy'].array(library="np")
    drs = tree['dr'].array(library="np")
    dxs = tree['dx'].array(library="np")
    dys = tree['dy'].array(library="np")
    dzs = tree['dz'].array(library="np")
    passing_idx = np.where(successes == 1)
    file.close()
    return drs, dxs, dys, dzs, passing_idx, true_nu_energy

In [None]:
import os

def save_plot(fig, filename, subdir=None):
    if subdir is None:
        subdir = ""
    elif subdir.startswith("/"):
        subdir = subdir[1:]
        
    if not os.path.exists('images'):
        os.mkdir('images')
    for img_type in [ "png", "svg", "eps", "pdf" ]:
        if not os.path.exists(f'images/{img_type}'):
            os.mkdir(f'images/{img_type}')
        if not os.path.exists(f'images/{img_type}/{subdir}'):
            os.mkdir(f'images/{img_type}/{subdir}')
        fig.savefig(f'images/{img_type}/{subdir}/{filename}.{img_type}', dpi=200, facecolor='w')

In [None]:
labelsize=14
titlesize=18

import matplotlib.pyplot as plt
import matplotlib.ticker as tck

def plot_dr(drs1, drs2, file_prefix):
    fig, ax = plt.subplots(figsize=(12,8))
    
    bins = np.logspace(-2, 7, 10, base=3)
    weights1 = np.ones_like(drs1) / len(drs1)
    ax.hist(drs1, bins=bins, weights=weights1, histtype='step', lw=2, label="Pandora")
    weights2 = np.ones_like(drs2) / len(drs2)
    ax.hist(drs2, bins=bins, weights=weights2, histtype='step', lw=2, label="Pandora Refine")
    
    ax.set_title("3D vertex reconstruction", fontsize=titlesize)
    ax.tick_params(axis='x', labelsize=labelsize)
    ax.tick_params(axis='y', labelsize=labelsize)
    ax.set_xlabel("reco - true (cm)", fontsize=titlesize)
    ax.set_ylabel("f", fontsize=titlesize)
    ax.set_xscale('log')
    ax.set_xticks(bins)
    ax.get_xaxis().set_major_formatter(tck.LogFormatter(base=3))
    ax.legend(fontsize=titlesize)
    
    fig.tight_layout()
    plt.show()
    save_plot(fig, f'{file_prefix}')
    
    
def plot_dr_zoom(drs1, drs2, file_prefix):
    fig, ax = plt.subplots(figsize=(12,8))
    
    bins = np.linspace(0, 10, 20)
    weights1 = np.ones_like(drs1) / len(drs1)
    ax.hist(drs1, bins=bins, weights=weights1, histtype='step', lw=2, label="Pandora")
    weights2 = np.ones_like(drs2) / len(drs2)
    ax.hist(drs2, bins=bins, weights=weights2, histtype='step', lw=2, label="Pandora Refine")
    
    ax.set_title("3D vertex reconstruction", fontsize=titlesize)
    ax.tick_params(axis='x', labelsize=labelsize)
    ax.tick_params(axis='y', labelsize=labelsize)
    ax.set_xlabel("reco - true (cm)", fontsize=titlesize)
    ax.set_ylabel("f", fontsize=titlesize)
    ax.legend(fontsize=titlesize)
    
    fig.tight_layout()
    plt.show()
    save_plot(fig, f'{file_prefix}')


def plot_dx(dxs1, dxs2, file_prefix, axis='x'):
    fig, ax = plt.subplots(figsize=(12,8))
    
    #bins = np.linspace(-2.625, 2.625, 22)
    bins = np.linspace(-2.55, 2.55, 52)
    #bins = np.linspace(-5.55, 5.55, 65)
    weights1 = np.ones_like(dxs1) / len(dxs1)
    ax.hist(dxs1, bins=bins, weights=weights1, histtype='step', lw=2, label="Pandora")
    weights2 = np.ones_like(dxs2) / len(dxs2)
    ax.hist(dxs2, bins=bins, weights=weights2, histtype='step', lw=2, label="Pandora Refine")
    
    ax.set_title(f"Vertex reconstruction (d{axis})", fontsize=titlesize)
    ax.tick_params(axis='x', labelsize=labelsize)
    ax.tick_params(axis='y', labelsize=labelsize)
    ax.set_xlabel("reco - true (cm)", fontsize=titlesize)
    ax.set_ylabel("f", fontsize=titlesize)
    ax.legend(fontsize=titlesize)
    
    fig.tight_layout()
    plt.show()
    save_plot(fig, f'{file_prefix}')


def plot_energy(energy0, energy1, file_prefix):
    fig, ax = plt.subplots(figsize=(12,8))
    
    bins = np.logspace(-3, 7, 11, base=2)
    ax.hist(energy0, bins=bins, histtype='step', lw=2, label='All events')
    ax.hist(energy1, bins=bins, histtype='step', lw=2, label='Reconstructed events')
    
    ax.set_title("True neutrino energy", fontsize=titlesize)
    ax.tick_params(axis='x', labelsize=labelsize)
    ax.tick_params(axis='y', labelsize=labelsize)
    ax.set_xlabel("energy (GeV)", fontsize=titlesize)
    ax.set_ylabel("f", fontsize=titlesize)
    ax.legend(fontsize=titlesize)
    ax.set_xscale('log')
    ax.set_xticks(bins)
    ax.get_xaxis().set_major_formatter(tck.LogFormatter(base=2))
    
    fig.tight_layout()
    plt.show()
    save_plot(fig, f'{file_prefix}')


def plot_dr_vs_energy(energy, drs0):
    fig, ax = plt.subplots(figsize=(12,8))
    
    bins = np.logspace(-3, 7, 11, base=2)
    print(bins)
    indices_set = [ np.where((energy_1[passing_idx_1] >= bins[i]) & (energy_1[passing_idx_1] < bins[i + 1]))
                   for i in range(len(bins) - 1) ]
    drs_set = [ drs0[indices] for indices in indices_set ]
    bins_dr = np.logspace(-3, 10, 14, base=2)
    for i, drs in enumerate(drs_set):
        weights = np.ones_like(drs) / len(drs)
        ax.hist(drs, histtype='step', bins=bins_dr, weights=weights, lw=2, label=f"{bins[i]} - {bins[i+1]} GeV")
    ax.set_title("dr vs true neutrino energy", fontsize=titlesize)
    ax.tick_params(axis='x', labelsize=labelsize)
    ax.tick_params(axis='y', labelsize=labelsize)
    ax.set_xlabel("dr", fontsize=titlesize)
    ax.set_ylabel("f", fontsize=titlesize)
    ax.set_xscale('log')
    ax.set_xticks(bins)
    ax.get_xaxis().set_major_formatter(tck.LogFormatter(base=2))
    ax.legend(fontsize=titlesize)
    
    fig.tight_layout()
    plt.show()

In [None]:
labelsize=14
titlesize=18

import matplotlib.pyplot as plt
import matplotlib.ticker as tck

def plot3_dr(drs0, drs1, drs2, file_prefix):
    fig, ax = plt.subplots(figsize=(12,8))
    
    bins = np.logspace(-2, 7, 10, base=3)
    weights0 = np.ones_like(drs0) / len(drs0)
    ax.hist(drs0, bins=bins, weights=weights0, histtype='step', lw=2, label="Pandora BDT")
    weights1 = np.ones_like(drs1) / len(drs1)
    ax.hist(drs1, bins=bins, weights=weights1, histtype='step', lw=2, label="Pandora DL")
    weights2 = np.ones_like(drs2) / len(drs2)
    ax.hist(drs2, bins=bins, weights=weights2, histtype='step', lw=2, label="Pandora Refine")
    
    ax.set_title("3D vertex reconstruction", fontsize=titlesize)
    ax.tick_params(axis='x', labelsize=labelsize)
    ax.tick_params(axis='y', labelsize=labelsize)
    ax.set_xlabel("reco - true (cm)", fontsize=titlesize)
    ax.set_ylabel("f", fontsize=titlesize)
    ax.set_xscale('log')
    ax.set_xticks(bins)
    ax.get_xaxis().set_major_formatter(tck.LogFormatter(base=3))
    ax.legend(fontsize=titlesize)
    
    fig.tight_layout()
    plt.show()
    save_plot(fig, f'{file_prefix}')
    
    
def plot3_dr_zoom(drs0, drs1, drs2, file_prefix):
    fig, ax = plt.subplots(figsize=(12,8))
    
    bins = np.linspace(0, 10, 20)
    weights0 = np.ones_like(drs0) / len(drs0)
    ax.hist(drs0, bins=bins, weights=weights0, histtype='step', lw=2, label="Pandora BDT")
    weights1 = np.ones_like(drs1) / len(drs1)
    ax.hist(drs1, bins=bins, weights=weights1, histtype='step', lw=2, label="Pandora DL")
    weights2 = np.ones_like(drs2) / len(drs2)
    ax.hist(drs2, bins=bins, weights=weights2, histtype='step', lw=2, label="Pandora Refine")
    
    ax.set_title("3D vertex reconstruction", fontsize=titlesize)
    ax.tick_params(axis='x', labelsize=labelsize)
    ax.tick_params(axis='y', labelsize=labelsize)
    ax.set_xlabel("reco - true (cm)", fontsize=titlesize)
    ax.set_ylabel("f", fontsize=titlesize)
    ax.legend(fontsize=titlesize)
    
    fig.tight_layout()
    plt.show()
    save_plot(fig, f'{file_prefix}')


def plot3_dx(dxs0, dxs1, dxs2, file_prefix, axis='x'):
    fig, ax = plt.subplots(figsize=(12,8))
    
    #bins = np.linspace(-2.625, 2.625, 22)
    bins = np.linspace(-2.55, 2.55, 52)
    #bins = np.linspace(-5.55, 5.55, 65)
    weights0 = np.ones_like(dxs0) / len(dxs0)
    ax.hist(dxs0, bins=bins, weights=weights0, histtype='step', lw=2, label="Pandora BDT")
    weights1 = np.ones_like(dxs1) / len(dxs1)
    ax.hist(dxs1, bins=bins, weights=weights1, histtype='step', lw=2, label="Pandora DL")
    weights2 = np.ones_like(dxs2) / len(dxs2)
    ax.hist(dxs2, bins=bins, weights=weights2, histtype='step', lw=2, label="Pandora Refine")
    
    
    ax.set_title(f"Vertex reconstruction (d{axis})", fontsize=titlesize)
    ax.tick_params(axis='x', labelsize=labelsize)
    ax.tick_params(axis='y', labelsize=labelsize)
    ax.set_xlabel("reco - true (cm)", fontsize=titlesize)
    ax.set_ylabel("f", fontsize=titlesize)
    ax.legend(fontsize=titlesize)
    
    fig.tight_layout()
    plt.show()
    save_plot(fig, f'{file_prefix}')

# Two sample comparisons

In [None]:
drs_0, dxs_0, dys_0, dzs_0, passing_idx_0, energy_0 = load_file('vertices_orig_pass2.root', 'vertices')
drs_1, dxs_1, dys_1, dzs_1, passing_idx_1, energy_1 = load_file('vertices_pass2.root', 'vertices2')

In [None]:
plot_dx(dxs_0[passing_idx_0], dxs_1[passing_idx_1], "atmos_dxs", axis='x')
plot_dx(dys_0[passing_idx_0], dys_1[passing_idx_1], "atmos_dys", axis='y')
plot_dx(dzs_0[passing_idx_0], dzs_1[passing_idx_1], "atmos_dzs", axis='z')

In [None]:
plot_dr(drs_0[passing_idx_0], drs_1[passing_idx_1], "atmos_deltas")
plot_dr_zoom(drs_0[passing_idx_0], drs_1[passing_idx_1], "atmos_deltas_zoom")

In [None]:
print(f'dr68: {np.percentile(drs_0[passing_idx_0], 68.2):.1f}')
print(f'dr90: {np.percentile(drs_0[passing_idx_0], 90.0):.1f}')
print(f'dr95: {np.percentile(drs_0[passing_idx_0], 95.45):.1f}')

In [None]:
print(f'dr68: {np.percentile(drs_1[passing_idx_1], 68.2):.1f}')
print(f'dr90: {np.percentile(drs_1[passing_idx_1], 90.0):.1f}')
print(f'dr95: {np.percentile(drs_1[passing_idx_1], 95.45):.1f}')

In [None]:
sorted_drs_0 = np.sort(drs_0[passing_idx_0])
print(f"% < 1cm: {100 * np.where(sorted_drs_0 < 1)[0][-1] / len(sorted_drs_0):.1f}")
print(f"% < 2cm: {100 * np.where(sorted_drs_0 < 2)[0][-1] / len(sorted_drs_0):.1f}")
print(f"% < 3cm: {100 * np.where(sorted_drs_0 < 3)[0][-1] / len(sorted_drs_0):.1f}")
print(f"% < 5cm: {100 * np.where(sorted_drs_0 < 5)[0][-1] / len(sorted_drs_0):.1f}")
print(f"% < 10cm: {100 * np.where(sorted_drs_0 < 10)[0][-1] / len(sorted_drs_0):.1f}")

In [None]:
sorted_drs_1 = np.sort(drs_1[passing_idx_1])
print(f"% < 1cm: {100 * np.where(sorted_drs_1 < 1)[0][-1] / len(sorted_drs_1):.1f}")
print(f"% < 2cm: {100 * np.where(sorted_drs_1 < 2)[0][-1] / len(sorted_drs_1):.1f}")
print(f"% < 3cm: {100 * np.where(sorted_drs_1 < 3)[0][-1] / len(sorted_drs_1):.1f}")
print(f"% < 5cm: {100 * np.where(sorted_drs_1 < 5)[0][-1] / len(sorted_drs_1):.1f}")
print(f"% < 10cm: {100 * np.where(sorted_drs_1 < 10)[0][-1] / len(sorted_drs_1):.1f}")

# Three sample comparisons

In [None]:
drs_i, dxs_i, dys_i, dzs_i, passing_idx_i, energy_i = load_file('vertices_atmos_dl_pass1.root', 'vertices')

In [None]:
plot3_dx(dxs_0[passing_idx_0], dxs_1[passing_idx_1], dxs_i[passing_idx_i], "dxs", axis='x')
plot3_dx(dys_0[passing_idx_0], dys_1[passing_idx_1], dys_i[passing_idx_i], "dys", axis='y')
plot3_dx(dzs_0[passing_idx_0], dzs_1[passing_idx_1], dzs_i[passing_idx_i], "dzs", axis='z')

In [None]:
plot3_dr(drs_0[passing_idx_0], drs_1[passing_idx_1], drs_i[passing_idx_i], "deltas")
plot3_dr_zoom(drs_0[passing_idx_0], drs_1[passing_idx_1], drs_i[passing_idx_i], "deltas_zoom")

In [None]:
print(f'dr68: {np.percentile(drs_0[passing_idx_0], 68.2):.1f}')
print(f'dr90: {np.percentile(drs_0[passing_idx_0], 90.0):.1f}')
print(f'dr95: {np.percentile(drs_0[passing_idx_0], 95.45):.1f}')

In [None]:
print(f'dr68: {np.percentile(drs_1[passing_idx_1], 68.2):.1f}')
print(f'dr90: {np.percentile(drs_1[passing_idx_1], 90.0):.1f}')
print(f'dr95: {np.percentile(drs_1[passing_idx_1], 95.45):.1f}')

In [None]:
print(f'dr68: {np.percentile(drs_i, 68.2):.1f}')
print(f'dr90: {np.percentile(drs_i, 90.0):.1f}')
print(f'dr95: {np.percentile(drs_i, 95.45):.1f}')

In [None]:
sorted_drs_0 = np.sort(drs_0[passing_idx_0])
print(f"% < 1cm: {100 * np.where(sorted_drs_0 < 1)[0][-1] / len(sorted_drs_0):.1f}")
print(f"% < 2cm: {100 * np.where(sorted_drs_0 < 2)[0][-1] / len(sorted_drs_0):.1f}")
print(f"% < 3cm: {100 * np.where(sorted_drs_0 < 3)[0][-1] / len(sorted_drs_0):.1f}")
print(f"% < 5cm: {100 * np.where(sorted_drs_0 < 5)[0][-1] / len(sorted_drs_0):.1f}")
print(f"% < 10cm: {100 * np.where(sorted_drs_0 < 10)[0][-1] / len(sorted_drs_0):.1f}")

In [None]:
sorted_drs_1 = np.sort(drs_1[passing_idx_1])
print(f"% < 1cm: {100 * np.where(sorted_drs_1 < 1)[0][-1] / len(sorted_drs_1):.1f}")
print(f"% < 2cm: {100 * np.where(sorted_drs_1 < 2)[0][-1] / len(sorted_drs_1):.1f}")
print(f"% < 3cm: {100 * np.where(sorted_drs_1 < 3)[0][-1] / len(sorted_drs_1):.1f}")
print(f"% < 5cm: {100 * np.where(sorted_drs_1 < 5)[0][-1] / len(sorted_drs_1):.1f}")
print(f"% < 10cm: {100 * np.where(sorted_drs_1 < 10)[0][-1] / len(sorted_drs_1):.1f}")

In [None]:
sorted_drs_i = np.sort(drs_i[passing_idx_i])
print(f"% < 1cm: {100 * np.where(sorted_drs_i < 1)[0][-1] / len(sorted_drs_i):.1f}")
print(f"% < 2cm: {100 * np.where(sorted_drs_i < 2)[0][-1] / len(sorted_drs_i):.1f}")
print(f"% < 3cm: {100 * np.where(sorted_drs_i < 3)[0][-1] / len(sorted_drs_i):.1f}")
print(f"% < 5cm: {100 * np.where(sorted_drs_i < 5)[0][-1] / len(sorted_drs_i):.1f}")
print(f"% < 10cm: {100 * np.where(sorted_drs_i < 10)[0][-1] / len(sorted_drs_i):.1f}")

# DL recovered vertex performance
The atmospheric vertexing BDT has a significant fraction of events where it fails to reconstruct a vertex at all (approx 9,000 events). The DL vertexing is able to recover about half of these cases. This section looks at the characteristics of this subset. It's clear that the recovered events represent more challenging environments when compared to the performance of the subset of events where both vertexing approaches succeed, so absent this subset the DL vertex performance would appear stronger still relative to the BDT.

In [None]:
passing_idx_only_1 = list(set(passing_idx_1[0]) - set(passing_idx_0[0]))
passing_idx_both = list(set(passing_idx_1[0]) - set(passing_idx_only_1))

In [None]:
plot_dx(dxs_1[passing_idx_both], dxs_1[passing_idx_only_1], "atmos_iso_dxs", axis='x')
plot_dx(dys_1[passing_idx_both], dys_1[passing_idx_only_1], "atmos_iso_dys", axis='y')
plot_dx(dzs_1[passing_idx_both], dzs_1[passing_idx_only_1], "atmos_iso_dzs", axis='z')

In [None]:
plot_dr(drs_1[passing_idx_both], drs_1[passing_idx_only_1], "atmos_iso_deltas")
plot_dr_zoom(drs_1[passing_idx_both], drs_1[passing_idx_only_1], "atmos_iso_deltas_zoom")

In [None]:
passing_idx_both = list(set(passing_idx_1[0]) - set(passing_idx_only_1))
plot_dx(dxs_1[passing_idx_1], dxs_1[passing_idx_both], "atmos_opt_dxs", axis='x')
plot_dx(dys_1[passing_idx_1], dys_1[passing_idx_both], "atmos_opt_dys", axis='y')
plot_dx(dzs_1[passing_idx_1], dzs_1[passing_idx_both], "atmos_opt_dzs", axis='z')

In [None]:
sorted_drs_1 = np.sort(drs_1[passing_idx_both])
print(f"% < 1cm: {100 * np.where(sorted_drs_1 < 1)[0][-1] / len(sorted_drs_1):.1f}")
print(f"% < 2cm: {100 * np.where(sorted_drs_1 < 2)[0][-1] / len(sorted_drs_1):.1f}")
print(f"% < 3cm: {100 * np.where(sorted_drs_1 < 3)[0][-1] / len(sorted_drs_1):.1f}")
print(f"% < 5cm: {100 * np.where(sorted_drs_1 < 5)[0][-1] / len(sorted_drs_1):.1f}")
print(f"% < 10cm: {100 * np.where(sorted_drs_1 < 10)[0][-1] / len(sorted_drs_1):.1f}")

# True neutrino energy plots

In [None]:
plot_energy(energy_0, energy_0[passing_idx_0], "energy_bdt")

In [None]:
plot_energy(energy_1, energy_0[passing_idx_1], "energy_dl")

In [None]:
plot_dr_vs_energy(energy_1[passing_idx_1], drs_1[passing_idx_1])