# Visualization and clasterization of B meson decays
A set of tools for visulalization and clustering of B decays is described in this notebook.Clustering is performed with k means algorithm.

In [None]:
import decay_tree_tools as dtt
import kmeans_tools     as kmt
import ROOT             as r
import numpy            as np

Open TTree using PyROOT binding of the ROOT framework

In [None]:
filename = "test_tree.root"
rfile  = r.TFile(filename)
intree = rfile.Get('TEvent')
intree.GetEntries()

It is assumed that TTree contains branches gen_idhep, gen_daF and gen_daL containing arrays of inetegers. The gen_idhep array contains EvtGen MC codes of generated particles, the gen_daF and gen_daL arrays contain, correspondingly, indices of the first and last descendants.  These arrays can be obtained with get_gen_table functions from decay_tree_tools.

Arguments of the get_gen_table functions are:
* tree: TTree
* evt: index of the queried event

In [None]:
evtn = 6000
idhep, daF, daL = dtt.gen_table(intree,evtn)
for i in xrange(10):
    print(" ".join([str(idhep[i]).rjust(7),str(daF[i]).rjust(3),str(daL[i]).rjust(3)]))

Function get_full_gv_decay_tree recieves the same arguments and retrievs the graphviz graph corresponding to the full decay tree of the queried event. The decay tree graph can be shown inline in this notebook

In [None]:
decay_tree = dtt.full_gv_decay_tree(intree,evtn)
decay_tree

One can render and save .gv and .pdf files with the following line

In [None]:
#decay_tree.render('test-output/decaytree.gv', view=True)

Get graphs for all B meson decays in the queried event

In [None]:
decay_tree_b = dtt.b_meson_gv_decay_trees(intree,evtn)
len(decay_tree_b)

Two graphs are obtained. Let's take a look on them

In [None]:
decay_tree_b[0]

In [None]:
decay_tree_b[1]

Let's look at the decay tree representation adobted for k means algorithm

In [None]:
codes = kmt.b_decay_codes(intree,evtn)
len(codes)

In [None]:
len(codes[0])

In [None]:
len(codes[1])

Sparse matrix from TTree. Each row of the matrix corresponds to a B meson decay chain

In [None]:
matrix = kmt.tree_to_csr_matrix(intree)

Number of rows of the martix is two times larger than events in the TTree

In [None]:
matrix.shape

Obtaining the sparse matrix is the most computationally intensive part of the procedure. One can do it once and then save the result in .npz file for multiple recalls in future

In [None]:
kmt.save_sparse_csr("test_sparse_matrix",matrix)

In [None]:
mtx = kmt.load_sparse_csr("test_sparse_matrix.npz")

mtx contains the same object as matrix

In [None]:
mtx.shape

Let's run the k means clustering algorithm

In [None]:
k = 30
maxiter = 500
heterogeneity = []
centroids = kmt.get_kpp_centroids(mtx,k)
centroids, cluster_assignment = kmt.kmeans(mtx,k,centroids,maxiter,record_heterogeneity=heterogeneity,verbose=True)

List of the heterogeneity values for each k means iteration is stored in heterogeneity. We can draw it in the following way

In [None]:
%matplotlib inline
kmt.plot_heterogeneity(heterogeneity, k)

In [None]:
print("k = %d, h = %f" % (k,heterogeneity[-1]))

One can explore the obtained clusters. Let's print the list of events number in the cluster sorted in decreasing order.

In [None]:
print(sorted(np.bincount(cluster_assignment))[::-1])

And save 1 nearest to the centroid decay tree graph in .pdf file. Graphs for the 8 most populated clusters will be shown in your system pdf files reader

In [None]:
gvts = {}
kmt.visualize_clusters(intree, mtx, centroids, cluster_assignment, k, ntrees_to_save=1, collect_gvt = gvts)