# NN Training

In this example notebook, we will utilize the `wvz_ml_framework` module to:

1) Load data in the 4l-DF signal region.

2) Train a neural network to pick out signal events.

## Load data

There is already a utility created to load data and generate train, test, and validation sets for NN training in one function.

In [1]:
import sys
sys.path.append('../')

from wvz_ml_framework.nn_training import data_management

We must specify the data paths, the training features, and the file used for rescaling. We can specify the training features separately from the features to be rescaled, which is useful if we have a feature that we don't want to rescale.

If we don't already have a file to be used for rescaling, we can generate one first:

In [2]:
data_paths = {
    'Signal': '/home/grabanal/WVZ/gabriel_ML_data/20220301_ELReLMIs54_MUReLMIs31_btag77_VVZ.arrow',
    'Other': '/home/grabanal/WVZ/gabriel_ML_data/20220301_ELReLMIs54_MUReLMIs31_btag77_others.arrow',
    'ttZ': '/home/grabanal/WVZ/gabriel_ML_data/20220301_ELReLMIs54_MUReLMIs31_btag77_ttZ.arrow',
    'tWZ': '/home/grabanal/WVZ/gabriel_ML_data/20220301_ELReLMIs54_MUReLMIs31_btag77_tWZ.arrow',
    'tZ': '/home/grabanal/WVZ/gabriel_ML_data/20220301_ELReLMIs54_MUReLMIs31_btag77_tZ.arrow',
    'WZ': '/home/grabanal/WVZ/gabriel_ML_data/20220301_ELReLMIs54_MUReLMIs31_btag77_WZ.arrow',
    'Zgamma': '/home/grabanal/WVZ/gabriel_ML_data/20220301_ELReLMIs54_MUReLMIs31_btag77_Zgamma.arrow',
    'Zjets': '/home/grabanal/WVZ/gabriel_ML_data/20220301_ELReLMIs54_MUReLMIs31_btag77_Zjets.arrow',
    'ZZ': '/home/grabanal/WVZ/gabriel_ML_data/20220301_ELReLMIs54_MUReLMIs31_btag77_ZZ.arrow'
}

with open('training_features.txt', 'r') as file:
    training_features = [line.strip() for line in file.readlines()]
    
rescale_features = [feat for feat in training_features if feat not in ['SR']]

data_management.generate_scale_params_file(data_paths, rescale_features, 'rescaling_parameters.json')

Now we can load the data for NN training:

In [3]:
x_train, y_train, w_train, x_test, y_test, w_test, x_val, y_val, w_val \
    = data_management.get_train_test_val_data(data_paths=data_paths, 
                                              train_feats=training_features,
                                              sr_to_train='DF',
                                              test_prop=0.2,
                                              val_prop=0.1,
                                              rescale_filepath='rescaling_parameters.json',
                                              rescale_feats=rescale_features
                                             )

Data loaded...
Data scaled...
Data cut down to DF signal region...
Splits generated... Finished.


We can verify that the training data has been scaled appropriately:

In [4]:
x_train.head()

Unnamed: 0,HT,MET,METPhi,METSig,Njet,Nlep,SR,Wlep1_ambiguous,Wlep1_dphi,Wlep1_eta,...,phi_1,phi_2,phi_3,phi_4,pt_1,pt_2,pt_3,pt_4,pt_4l,total_HT
68322,0.037396,6.7e-05,0.515023,0.022302,0.071429,0.0,2,0.0,0.252059,0.549023,...,0.985345,0.794212,0.364434,0.461831,2.2e-05,0.03043,0.031303,0.005583,2.9e-05,8.9e-05
37230,0.00532,5e-06,0.256754,0.001806,0.017857,0.0,2,0.5,0.024255,0.221941,...,0.816731,0.308365,0.412169,0.937142,5.2e-05,0.138499,0.140681,0.097185,8e-06,0.000147
98144,0.117883,0.00012,0.335295,0.02882,0.089286,0.0,2,0.5,0.002533,0.533782,...,0.277947,0.817833,0.590887,0.669867,3.7e-05,0.091178,0.131994,0.095179,2.5e-05,0.000266
43627,0.020949,3e-05,0.258334,0.017295,0.035714,0.0,2,0.5,0.757576,0.923661,...,0.7005,0.479316,0.379768,0.104574,2.1e-05,0.025476,0.050751,0.058359,2.7e-05,7.5e-05
64068,0.016065,2.1e-05,0.328759,0.008755,0.053571,0.0,2,0.5,0.51663,0.410597,...,0.952049,0.780097,0.554233,0.082905,4e-06,0.018293,0.047815,0.032044,1.8e-05,4.4e-05


And that the datasets are the correct size:

In [5]:
total_size = len(x_train) + len(x_test) + len(x_val)

print('Training proportion: %.2f'%(len(x_train) / total_size))
print('Test proportion: %.2f'%(len(x_test) / total_size))
print('Validation proportion: %.2f'%(len(x_val) / total_size))

Training proportion: 0.70
Test proportion: 0.20
Validation proportion: 0.10


## Train neural network

There is a utility written to easily train the neural networks that we have been using.

In [6]:
from wvz_ml_framework.nn_training import nn_training

2022-06-01 20:03:34.347430: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-06-01 20:03:35.968600: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-06-01 20:03:35.973230: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-06-01 20:03:36.168289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:65:00.0 name: Quadro RTX 4000 computeCapability: 7.5
coreClock: 1.545GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 387.49GiB/s
2022-06-01 20:03:36.168384: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-06-01 20:03:36.175992: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2022-0

It trains a 3-layer model with a specified number of nodes per layer and dropout per layer, using the Adam optimizer. One simply needs to feed in train and validation data and hyperparameters. The model will be saved to a specified folder. We can also generate an ONNX version of the model by specifying `generate_onnx=True`.

The train and validation data must be formatted in tuples of the form (training features, labels, weights).

In [7]:
nn_training.make_and_train_model(
    training_data=(x_train, y_train, w_train),
    validation_data=(x_val, y_val, w_val),
    batch_size=512,
    num_nodes=64,
    dropout=0.1,
    learn_rate=1e-4,
    epochs=15,
    patience=5,
    model_dir='models/',
    model_name='example',
    generate_onnx=True
)

2022-06-01 20:04:27.742470: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-01 20:04:27.745091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:65:00.0 name: Quadro RTX 4000 computeCapability: 7.5
coreClock: 1.545GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 387.49GiB/s
2022-06-01 20:04:27.745191: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-06-01 20:04:27.745226: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2022-06-01 20:04:27.745248: I tensorflow/stream_executor/plat

Epoch 1/15
 1/76 [..............................] - ETA: 42s - loss: 0.0011 - accuracy: 0.4590

2022-06-01 20:04:31.162758: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10


Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


2022-06-01 20:04:37.052879: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: models/example/assets
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`


2022-06-01 20:04:37.441721: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2022-06-01 20:04:37.442360: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2022-06-01 20:04:37.443659: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:65:00.0 name: Quadro RTX 4000 computeCapability: 7.5
coreClock: 1.545GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 387.49GiB/s
2022-06-01 20:04:37.443756: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-06-01 20:04:37.443799: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2022-06-01 20:04:37.443832: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2022-06-01 20:04:37.443863: I ten