# Particle tagger study at DarkQuest
**Author:** Dowling Wong

This is the demo on how to use the DNN based multi-class tagger for DarkQuest. In this demo, we are going to stick with dwong.py, the integrated python module for data process and tagging. Please refer to [dwong.py](dwong.py) for detailed implementation of code.

## Clustering
After reconstruct hits coordinates on EMCal, we have performed seed-searching algorithm and K-Means clustering. Since Kmeans clustering doesnot counted energy for each hits, centroid may change exceed tolerance during iterations. Return coords, labels, energy, seeds(centroids), in energy decrease order. Island clustering is in R&D stage due to computational complexity and inefficiency.

## Link algorithm
Using tracklet information of station2&3, we perform linear extrapolation to EMCal and station4 hodoscope. By tolerace distance 8cm(for most particles, this value should under 5cm), and assign closest unique track to every h4 hit. 

## Check point: CSV
We use csv to save the processed data for each particle collected in each root file, following information will be saved as entries in csv, which also will be used as NN input/push back for ROOT: ["evt_num", "gpz", "wid_x", "wid_y", "wew_x", "wew_y", "seed_x", "seed_y",
               "trkl_x", "trkl_y", "trkl_z", "trkl_px", "trkl_py", "trkl_pz", "E/p",
               "h4_41", "h4_42", "h4_43", "h4_44", "h4_45", "h4_46"]  
For sample code please see here: [sample code of csv](ref/gen_csv.ipynb)

## DNN taggers
Our DNN Multiclass tagger has 4 binary classifiers:  
-[Electron/Positron](NNs/electronID_w_track_95), with AUC 0.95   
-[Muon](NNs/muonID_w_track_99), with AUC 0.99  
-[Photon](NNs/photonID_w_track_89), with AUC 0.89  
-[Pi+/Pi-](NNs/pi+-_ID_w_track_86), with AUC 0.86  
AUC meansured with background of equal weight mixture of electron, positron, photon/Pi0, Pi+, Pi-, Klong single particle guns. Assign particle tag by inspecting array of outputs by NNs.


## Py-ROOT binding
We have used pyroot and cppyy to implement the python-C++ binding. After data been processed by the tagger, we use pyroot to access the tree, create a buffer array of compatible data type by ROOT. Write our result into the buffer, create new branch in the tree, then push into it.  
For reference, please see the sample code: [sample code of pyroot](ref/pyroot.ipynb), or [tutorial page](https://pep-root6.github.io/docs/analysis/python/pyroot.html)

## TODOs
-Some functionalities may need moderate adjustments due to the different purposes and structures of ROOT input. Please refer to [This repo](https://github.com/Dowling7/DQ_Dowling) for my full study on tagger. 
-R&D new features: ISland clustering, padded grid seed searching, and polygon-convex-hull linking algorithms.
-Please contact me via slack or [email](dowlingwong@gmail.edu) for any question or request info. I will be happy to elaborate details for future member of DarkQuest who taking over this work

In [1]:
import dwong

In [6]:
import pandas as pd

# Define the path to your CSV file
file_path = 'NNs/sample_csv/p5_80_muon_1_10000.csv'

# Specify the columns you want to import
columns = [
    "wid_x", "wid_y", "wew_x", "wew_y", "seed_x", "seed_y",
    "trkl_x", "trkl_y", "trkl_z", "trkl_px", "trkl_py", "trkl_pz", "E/p",
    "h4_41", "h4_42", "h4_43", "h4_44", "h4_45", "h4_46"
]

# Read the specified columns from the CSV
data = pd.read_csv(file_path, usecols=columns)

# Show the first few rows of the DataFrame
print(data.head())




   wid_x  wid_y         wew_x         wew_y  seed_x  seed_y       trkl_x   
0  2.765    0.0  7.598694e-01  2.520000e-09   -6.45   -0.46 -9999.000000  \
1  0.000    0.0  0.000000e+00  0.000000e+00    4.61    5.07     7.814219   
2  0.000    0.0  8.880000e-16  0.000000e+00   -6.45   -0.46    -1.889574   
3  0.000    0.0  0.000000e+00  0.000000e+00  -11.98   -0.46    -5.121831   
4  2.765    0.0  6.611169e-01  1.020000e-07   -6.45   -5.99    -0.828422   

        trkl_y     trkl_z      trkl_px      trkl_py      trkl_pz          E/p   
0 -9999.000000 -9999.0000 -9999.000000 -9999.000000 -9999.000000 -9999.000000  \
1     7.352450  1896.8289    -0.458327     0.027412    49.525528     0.011990   
2     3.099001  1896.8119    -0.432187    -0.041864    50.131190     0.008760   
3     2.507322  1859.8807    -0.750870    -0.013618    49.482600     0.014594   
4    -3.416130  1859.8684    -0.383605    -0.161098    41.867435     0.017327   

         h4_41        h4_42        h4_43        h4_44   

In [7]:
import numpy as np
from tensorflow.keras.models import load_model

class ParticleIdentifier:
    def __init__(self, model_dir):
        self.classes = ['electron', 'muon', 'pion', 'photon']
        self.models = {
            'electron': load_model(model_dir + "electronID_w_track_95"),
            'muon': load_model(model_dir + "muonID_w_track_99"),
            'pion': load_model(model_dir + "pi+-_ID_w_track_86"),
            'photon': load_model(model_dir + "photon_ID_w_track_89")
        }

    def predict_classes(self, samples):
        # Generate predictions for all models
        predictions = [model.predict(samples) for model in self.models.values()]

        # Stack predictions horizontally and find the index of the max probability
        combined_probs = np.hstack(predictions)
        class_indices = np.argmax(combined_probs, axis=1)

        # Map indices to class names
        predicted_classes = [self.classes[idx] for idx in class_indices]

        return predicted_classes

# Usage
NN_dir = "NNs/"
identifier = ParticleIdentifier(NN_dir)
samples = data
predicted_classes = identifier.predict_classes(samples)
print("done")


done


In [8]:
predicted_classes[10:30]

['muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon',
 'muon']

In [9]:
len(data)

10004