# 5. Training in Physics

In the last notebook events from the Di-Higgs signal process and some of its background processes were selected which fulfilled a desired signiture consisting of lepton and jet counts and requirements on some physical observables. The selected data was then converted into a vector form which can be used as input by a neural network and saved into files.

In this notebook a neural network will be constructed which is capable of taking the selected data from the last notebook as input and target values for the training will be constructed.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import mplhep
import os

from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.initializers import RandomUniform

2022-08-22 13:55:02.035812: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-08-22 13:55:02.035867: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


In [92]:
# open the neural network input vectors

nn_input = {}

processes = ['signal', 'bgr_tt', 'bgr_st', 'bgr_Wj']

for process in processes:
    nn_input[process] = np.loadtxt('nn_input_' + process + '.txt', delimiter = ' ')
    
    # simplify data structure by transposing
    nn_input[process] = np.transpose(nn_input[process])

print(nn_input['signal'][:2])
print(len(nn_input['signal']))

[[ 3.32476807e+01 -2.20458984e+00 -1.55212402e-01  1.02250000e+02
  -3.79394531e-01  1.07714844e+00  9.66308594e-01  4.77812500e+01
  -9.70214844e-01  2.47705078e+00  9.99023438e-01  1.45375000e+02
  -1.36816406e+00 -3.69140625e-01  3.84033203e-01]
 [ 4.98770027e+01 -1.43994141e+00  2.39062500e+00  2.18250000e+02
  -1.46044922e+00  2.35888672e+00  8.22265625e-01  2.71093750e+01
  -9.82299805e-01  1.75781250e+00  9.98046875e-01  1.30375000e+02
  -1.87817383e+00 -1.58007812e+00  9.94873047e-02]]
4344


Not only do we want to understand if the if the event in question is part of the signal or the background, we also want to be able to differentiate the various background processes. Essentially, we are building a *multiclassifier* neural network. This means that the network, instead of returning a binary "signal" or "background" response, returns an output vector of which each element is associated with the probablility of identifying the events as stemming from the various processes. In our case we have 4 processes that have to be distinguished from one another; an exemplary output vector of the shape (0 0 1 0) would indicate the ideal case of identifying an event as a single top background event.

In [83]:
# define structure of the network

# dimensions of input, hidden and output layers
N0 = len(nn_input['signal'][0])
N1 = 30
N2 = len(processes)

# number of hidden layers
layer_count = 3

# layer structure of the network
layers = [N0]

for l in range(layer_count):
    layers.append(N1)
    
layers.append(N2)

print("Layer structure of the network: ", layers)

Layer structure of the network:  [15, 30, 30, 30, 4]


In [84]:
# construct the network in keras

net = Sequential()

# set the weights initializer
initializer = RandomUniform(minval=-10, maxval=10)

# make the first layer with the input shape as an argument
net.add(Dense(layers[1], input_shape = (layers[0],), activation = 'relu', 
              use_bias = True, kernel_initializer = initializer))

# make all the other layers
for i in range(2, len(layers) - 1):
    net.add(Dense(layers[i], activation = 'relu', use_bias = True, kernel_initializer = initializer))
    
net.add(Dense(layers[-1], activation = 'softmax', use_bias = True, kernel_initializer = initializer))

In [85]:
# compile the network

net.compile(loss = 'CategoricalCrossentropy', optimizer = keras.optimizers.SGD(learning_rate = 0.5), metrics = ['accuracy'])

In [87]:
# making of the target output

# target dictionary
nn_target = {}

training_duration = 0

# make target data for a particular process
def make_target(p, proc):
    
    vectors = []
    
    for i in range(len(nn_input[proc])):
    
        v = np.zeros(len(processes))
        v[p] = 1

        vectors.append(v)
    
    return vectors, len(vectors)

In [88]:
# make target data for all processes
for count, process in enumerate(processes):
    nn_target[process] = make_target(count, process)[0]
    training_duration += make_target(count, process)[1]

print(nn_target['bgr_Wj'][10])

[0. 0. 0. 1.]


In [90]:
# cost
costs = np.empty(training_duration)

In [99]:
# train network

for i in range(len(nn_target['signal'])):
    costs[i] = net.train_on_batch(nn_input['signal'][i], nn_target['signal'][i])[0]

ValueError: Data cardinality is ambiguous:
  x sizes: 15
  y sizes: 4
Make sure all arrays contain the same number of samples.

In [81]:
print(len(np.transpose(nn_input['signal'])))

4344


In [93]:
print(nn_input['signal'][:2])
print(nn_target['signal'][:2])

[[ 3.32476807e+01 -2.20458984e+00 -1.55212402e-01  1.02250000e+02
  -3.79394531e-01  1.07714844e+00  9.66308594e-01  4.77812500e+01
  -9.70214844e-01  2.47705078e+00  9.99023438e-01  1.45375000e+02
  -1.36816406e+00 -3.69140625e-01  3.84033203e-01]
 [ 4.98770027e+01 -1.43994141e+00  2.39062500e+00  2.18250000e+02
  -1.46044922e+00  2.35888672e+00  8.22265625e-01  2.71093750e+01
  -9.82299805e-01  1.75781250e+00  9.98046875e-01  1.30375000e+02
  -1.87817383e+00 -1.58007812e+00  9.94873047e-02]]
[array([1., 0., 0., 0.]), array([1., 0., 0., 0.])]


In [97]:
print(np.shape(np.reshape(nn_input['signal'], (4344, 15))))
print(np.shape(nn_target['signal']))

(4344, 15)
(4344, 4)


In [100]:
print(nn_input['signal'][0])
print(nn_target['signal'][0])

[ 33.24768066  -2.20458984  -0.1552124  102.25        -0.37939453
   1.07714844   0.96630859  47.78125     -0.97021484   2.47705078
   0.99902344 145.375       -1.36816406  -0.36914062   0.3840332 ]
[1. 0. 0. 0.]
