# Results of shower reconstruction
This notebook is done to analyze the results of shower reconstruction with Machine Learning Classification.
This allows to share and comment the quality of the reconstruction.
It assumes that all reconstruction steps have been performed, leading to the reconstructed showers.

## Initializiation
First importing the required modules

In [1]:
import numpy as np
import pandas as pd
import ROOT as r
import root_numpy as rp

Welcome to JupyROOT 6.22/06


then we read the files with the dataframes

In [2]:
nruns = 4
runnumbers = [5, 6, 3, 7]
showerdataframes = [] 

showerdataframes.append(pd.read_csv("/eos/experiment/ship/data/DESY19TB/DE19_R5/RandomForest/Result_data.csv"))
showerdataframes.append(pd.read_csv("/eos/experiment/ship/data/DESY19TB/DE19_R6/RandomForest/Result_data.csv"))
showerdataframes.append(pd.read_csv("/eos/experiment/ship/data/DESY19TB/DE19_R3/RandomForest/Dati_nuovo.csv"))
showerdataframes.append(pd.read_csv("/eos/experiment/ship/data/DESY19TB/DE19_R7/RandomForest/Result_data.csv"))

energies = [2,4,6,6]

We also create output files for histograms

In [3]:
outputfilenames = ["plots/DESYRUN5histos_data.root","plots/DESYRUN6histos_data.root","plots/DESYRUN3histos_data.root","plots/DESYRUN7histos_data.root"]
outputfile = []
for run in range(nruns):
    outputfile.append(r.TFile(outputfilenames[run],"RECREATE"))

## Selection of segments classified as signal, computing size
we select the segments for which the Random Forest returned a positive output. How many are there for each shower?

In [4]:
def extractsignalshower(datadf):
 #removing na
 df = datadf[datadf["DeltaT"].isna()==False] 
 signaldf = df[df["Y_pred_forest_data"]==1]

 #number of segments witin the same Ishower
 sizedataset = signaldf.groupby("Ishower").count()
 size = sizedataset["ID"].to_numpy()
 return signaldf, size

In [5]:
signaldataframes = []
showersizes = []
for run in range(nruns):
 signaldf, size = extractsignalshower(showerdataframes[run])
 signaldataframes.append(signaldf)
 showersizes.append(size)

## Plotting size histograms
Let us check the size of reconstructed showers

In [6]:
#make histograms
hsizeML = []
for run in range(nruns):
 hsizeML.append(r.TH1D("hsizeML{}".format(run),"Size of showers reconstructed by Random Forest for RUN{};Nsegments".format(runnumbers[run]), 20,0,200))
 rp.fill_hist(hsizeML[run],showersizes[run])

In [7]:
%jsroot on
c = r.TCanvas()
c.Divide(2,2)
for run in range(nruns):
 c.cd(run+1)
 hsizeML[run].Draw()
c.Draw()

## Making a selection on the shower
We accept usually showers only with a mininmum of associated segments

In [8]:
def acceptshower(signaldf,showersize, minimumsize):
    goodshowers = np.where(showersize >= minimumsize)[0] #it returns an array of array, for some reason, we want the actual numbers
    gooddf = signaldf[signaldf["Ishower"].isin(goodshowers)]
    return gooddf

In [9]:
gooddataframes = []
minimumsize = [20,50,50,50]
for run in range(nruns):
    gooddataframes.append(acceptshower(signaldataframes[run],showersizes[run],minimumsize[run]))

In [10]:
#make histograms
hIPnorm = []
hthetaprime = [] 
for run in range(nruns):
    hIPnorm.append(r.TH1D("hIPnorm{}".format(run),"Impact parameter over distance along axis for RUN{};IP/#DeltaZ".format(runnumbers[run]),30,0.,0.3))
    hthetaprime.append(r.TH1D("hthetaprime{}".format(run),"Cone angle with respect to shower start for RUN{};#theta'[rad]".format(runnumbers[run]),40,0,0.04))
    
    rp.fill_hist(hIPnorm[run], gooddataframes[run]["Par_impact_nor"].to_numpy())
    rp.fill_hist(hthetaprime[run],gooddataframes[run]["Angolo_cono"].to_numpy())

In [11]:
cIP = r.TCanvas()
cIP.Divide(2,2)
for run in range(nruns):
 cIP.cd(run+1)
 hIPnorm[run].Draw()
cIP.Draw()

In [12]:
ctheta = r.TCanvas()
ctheta.Divide(2,2)
for run in range(nruns):
 ctheta.cd(run+1)
 hthetaprime[run].Draw()
ctheta.Draw()

In [13]:
#saving histograms to output files
for run in range(nruns):
    outputfile[run].cd()
    hsizeML[run].Write()
    hthetaprime[run].Write()
    hIPnorm[run].Write()

## Shower size dependance with energy
A linear increase of the shower size is expected with energy. We use RUN3, RUN5 and RUN6 (energies 6 GeV, 4 GeV and 2 GeV).
We do not use RUN7 since it employs a different material

In [14]:
def combinevectors(arrx,arry):
 arrx = arrx[:, np.newaxis]
 arry = arry[:, np.newaxis]
 #concatenate along columns
 arrxy = np.concatenate([arrx,arry],axis=1)
 return arrxy


In [19]:
hcali = r.TProfile("hcali","Calibration histogram;N segments; Energy[GeV]",10,0,300,0,10)
hinversecali =  r.TProfile("hinversecali","Nsegments at different energies;Energy[GeV]; N segments",7,0,7,0,300)
for run in range(nruns-1):
    #getting expected energy and size for this run
    nshowers = len(showersizes[run])
    showerenergy = np.zeros(nshowers) + energies[run]
    arr2D = combinevectors(showersizes[run],showerenergy)
    rp.fill_profile(hcali, arr2D)
    arr2D = combinevectors(showerenergy,showersizes[run])
    rp.fill_profile(hinversecali, arr2D)
    

In [20]:
c_cali = r.TCanvas()
hcali.Draw()
c_cali.Draw()

In [17]:
c_inversecali = r.TCanvas()
hinversecali.Draw()
c_inversecali.Draw()

In [18]:
for run in range(nruns):
    outputfile[run].Close()