This is a program that will train a model to identify and assign hits to tracks.
Written by Daniel Zurawski & Keshav Kapoor for Fermilab Summer 2017 internship.

In [None]:
import random
import pandas as pd
import numpy as np

Here, we define a LinearTracker class.
This class is used to load input and output from a .csv file in the correct format for training the model.

In [None]:
class LinearTracker():
    """ An object that classifies particles to tracks after an event. """    
    def __init__(self, dataframe, model=None):
        """ Initialize a LinearTracker.
            @param dataframe - pd.DataFrame - used to pick tracks from.
                The headers should contain: ("id", "z", "r", "phi").
            @param model - keras model - A network model that the tracker will
                use to classify particles.
            @return Nothing
        """
        self.model     = model     # keras model to figure out tracks.
        self.dataframe = dataframe # pandas.DataFrame for picking tracks.
        self.input     = None      # input to train model on.
        self.output    = None      # output to train model on.
    # END function __init__
    
    def load_data(self, num_events,
                  tracks_per_event, track_size, noise_per_event):
        """ Load input and output data from this object's dataframe.
            @param num_events - int - The number of events to generate.
            @param tracks_per_event - int - The number of tracks per event.
            @param track_size - int - The number of hits per track.
            @param noise_per_event - int - The number of hits with no track.
            @return Nothing
                However, self.input and self.output become numpy arrays.
                self.input is collection of hits of shape:
                    (num_events, hits_per_event, 3)
                self.output is list of probability matrices of shape:
                    (num_events, hits_per_event, tracks_per_event)
        """
        hits_per_event = (track_size * tracks_per_event) + noise_per_event
        data   = self.dataframe[["id", "r", "phi", "z"]].drop_duplicates()
        groups = data.groupby("id")
        valids = groups.filter(lambda track: len(track) == track_size)
        bads   = groups.filter(lambda track: len(track) != track_size)
        labels = ["phi", "r", "z"]
        
        # Populate input and output with data.
        self.input  = np.zeros((num_events, hits_per_event, len(labels)))
        self.output = np.zeros((num_events, hits_per_event, tracks_per_event))
        for n in range(num_events):
            # Retrieve the hits within this event.
            sample = random.sample(list(valids.groupby("id")), tracks_per_event)
            tracks = [track[1] for track in sample] # Make it not a tuple.
            noise  = bads.sample(noise_per_event)
            hits   = pd.concat(tracks + [noise])
            hits.sort_values(labels, inplace=True)
            
            # Populate this event's inputs.
            self.input[n, :] = hits[labels].values
            
            # Define a mapping from track ID to probability matrix column.
            T2I = dict()
            for t, track_ID in enumerate([s[0] for s in sample]):
                T2I[track_ID] = t
            
            # Populate this event's outputs.
            for t, track_ID in enumerate(hits["id"]):
                index = T2I.get(track_ID)
                if index is not None:
                    self.output[n, t, index] = 1
    # END FUNCTION load_data
# END CLASS LinearTracker

Below is how to create a LinearTracker and how to load data into it. It is important to note that after construction, a LinearTracker must call its load_data() function with user specifications for how data should be loaded.

If you get a ValueError describing how the population is not large enough for the sample, then that means that the data
loaded in from the .csv file does not contain enough tracks of size 'track_size'. Try to either load in a larger
population or change the 'track_size' variable to a different positive integer.

In [None]:
np.random.seed(7)
filename  = ('file_o_stuff3.csv')
dataframe = pd.read_csv(filename)
tracker   = LinearTracker(dataframe)
tracker.load_data(num_events=10, tracks_per_event=3, track_size=4, noise_per_event=3)

Let's take a look at the input training data.

In [None]:
tracker.input

Now let's take a look at the output training data.

In [None]:
tracker.output

Let's now try to load a model into our tracker.

In [23]:
import keras
from keras.layers import Dense, Activation
from keras.models import Sequential

Using TensorFlow backend.


In [None]:
input_shape = tracker.input[0].shape

tracker.model = Sequential()
tracker.model.add(Dense(32, input_shape=input_shape, activation='relu'))
tracker.model.add(Dense(output_shape, kernel_initializer='uniform'))
tracker.model.add(Activation('softmax'))
tracker.model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
tracker.model.summary()