# Track Pattern Recognition using Hough Transform and Tracks Classification

## Introduction

Track pattern recognition is an early step of the reconstruction of data coming from a particle detector. It recognizes tracks among the subdetectors hits. Reconstructed track parameters allow to estimate the particle deviation in a magnetic field, and thus reconstruct its charge and momentum. This information is used for the reconstruction of the decay vertex, to identify the mother particle and for further particle identification.

There is wide variety of the track pattern recognition methods. They differ in how they process the hits, what kind of tracks they are able to recognize and which requirements these tracks should satisfy. Therefore, specifics of an experiment and the detector geometry affect the tracking performance and track pattern recognition methods should be adapted to it accordingly.

In this notebook a track pattern recognition for a 2D detector with circular geometry and uniform magnetic field is considered. The detector schema with hits and tracks of an event is shown in the figure below. The challenge is to recognize tracks of an event with the highest efficiecny. It supposed that one hit can belong to only one track. 

<img src="pic/detector.png" /> <br>

## About this notebook

This notebook demonstrate how Hough Transform and Machine Learning can be used for track pattern recognition. The notebook describes input data, the track pattern recognition method and qualyti metrics, and shows how to use them.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

import pandas
import numpy

# Input data

In [2]:
data = pandas.read_csv('hits_1000.csv', index_col=False)
#data = data[data.event.values < 100]

data.head()

Unnamed: 0,event,particle,layer,iphi,x,y
0,0,0,5,39276,55.103343,-401.233874
1,0,6,5,22685,-381.682239,135.438799
2,0,3,3,6082,160.995866,139.460859
3,0,5,2,27787,-35.433651,-150.895515
4,0,5,1,15230,-19.62735,-82.702885


# Split Data into Train/Test Samples

In [3]:
from sklearn.cross_validation import train_test_split

event_ids = numpy.unique(data.event.values)

event_ids_train, event_ids_test = train_test_split(event_ids, 
                                                   test_size=0.5, 
                                                   random_state=42)

data_train = data[data.event.isin(event_ids_train)]
data_test = data[data.event.isin(event_ids_test)]

# Hough Transform with Tracks Classification

# Hough Transform

Consider a track pattern recognition method using the Hough Tramsform in polar system. In this system a circular track can be parametrized as follow:

$$
r = 2r_{0}Cos(\phi - \theta)
$$

where:
* $r$ and $\phi$ : are coordinates of a hit in the polar system.
* $r_{0}$ and $\theta$ : are coordinates of a center of a circular track in the polar system.

A linear track corresponds to the $r_{0} = \infty$.

Transformation of cartesian coordinates of a hit to polar coordinates defined as:

$$
\phi = arctan(\frac{y}{x})
$$
$$
r = \sqrt{x^{2} + y^{2}}
$$


The Hough Transform converts a hit in $(r, \phi)$ space to a curve in $(\frac{1}{r_{0}}, \theta)$ space of the track parameters as follow:

$$
\frac{1}{r_{0}} = \frac{2Cos(\phi - \theta)}{r}
$$

A linear track in this space represents as $(0, \theta)$ point.

This section demonstrates the track pattern recognition method using Hough Transfrom described above and histogramming technique. In this technique each 'hot' bin represents one recognized track as it is shown in the figure:

<img src="pic/hough.png" /> <br>

But there are a lot of ghosts among the recognized tracks. The idea is to use the recognized tracks classification to reduce a number of ghosts. For this, each recognized track is described by the following features:

* Track parameters
* Number of hits
* RMSE of a track fit

These features are used to train a classisifer to separate good track from the ghost ones. Then, the trained classifier is used to reduce number of ghosts among the reconstructed tracks:

<img src="pic/clf.png" width="50%" /> <br>

After that, the track are processed to generate hit labels. Please, look the method script for details.

## Data Preparation

In [4]:
X_train = data_train[[u'event', u'layer', u'iphi', u'x', u'y']].values
y_train = data_train[u'particle'].values

X_test = data_test[[u'event', u'layer', u'iphi', u'x', u'y']].values
y_test = data_test[u'particle'].values

## Selection of a base track pattern recognition method and a classifier

In [5]:
from sklearn.ensemble import RandomForestClassifier
from hough import Hough

clf = RandomForestClassifier(n_estimators=1000)
base = Hough(n_theta_bins=5000, n_radius_bins=1000, min_radius=20., min_hits=4)

## Track Pattern Recognition

In [6]:
from hough_classification import HoughClassification

mh = HoughClassification(base=base, 
                   classifier=clf, 
                   proba_threshold=0.8)

mh.fit(X_train, y_train)
y_reco = mh.predict(X_test)

## Quality metrics

In [7]:
from metrics import RecognitionQuality

rq = RecognitionQuality(track_eff_threshold=0.8, min_hits_per_track=4)
report_event, report_tracks = rq.calculate(X_test, y_test, y_reco)

In [8]:
report_event.mean(axis=0)

Event                       484.638000
ReconstructionEfficiency      0.940794
GhostRate                     0.012362
CloneRate                     0.000949
AvgTrackEfficiency            0.987292
dtype: float64