## DSC180B Group11 Model Assessment

#### This notebook demonstrates the model training and assessment by displaying learning curve(training&validation) and resolution plot for test sets.

### CAVEAT: Make sure you run `python3 run.py test` prior to reviewing this notebook as some contents in the notebook reflect the result from initial model fitting process

In [None]:
# Imports

import torch
import torch_geometric
import torch.nn as nn
from torch.nn import Sequential as Seq, Linear as Lin, ReLU, BatchNorm1d, Flatten, Module
from torch_scatter import scatter_mean
from torch.utils.data import random_split
from torch_geometric.data import DataListLoader, Batch
device =  torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
from tqdm.notebook import tqdm
import numpy as np
import os
import sys
import matplotlib.pyplot as plt
import seaborn as sns
import random
import yaml

sys.path.insert(0, '../src/')
from model import Net
from GraphDataset import GraphDataset
from load_data import path_generator, random_test_path_generator

ROOT = "/home/h8lee/DSC180B-A11-Project"
CONFIG ='conf/reg_defs.yml'
batch_size = 32

with open(os.path.join(ROOT, CONFIG)) as file:
    # The FullLoader parameter handles the conversion from YAML
    # scalar values to Python the dictionary format
    definitions = yaml.load(file, Loader=yaml.FullLoader)

features = definitions['features']
spectators = definitions['spectators']
labels = definitions['labels']

nfeatures = definitions['nfeatures']
nlabels = definitions['nlabels']

%matplotlib inline

------

### Model Assessment -- Learning curve on <font color=blue>training</font> & <font color=orange>validation set</font>

 Below is the learning curve of our NN jet mass regressor model, collected during model training process. The model is trained on multiple signal and QCD jet data along with the validation set used to prevent overfitting and enforce early stopping of training epochs. By default, the training epoch is set to $100$, and early stopping is implemented only to be enforced after at least 30 epochs. 

<img src='learning_curve.png'>

-----

### Model Assessment -- Resolution plot on <font color='green'>test set</font>

We will now evaluate the performance of our NN regressor model by making predictions on unseen jet data. After loading the fitted weights, model will predict the ground-truth jet mass in test set and subsequently, resolutions of each of the predictions will get calculated. After all predictions, we will sketch resolution plot to measure how well the predictions align with their target values.

In [None]:
def collate(items): return Batch.from_data_list(sum(items, []))

test_files = random_test_path_generator()
test_dir_path = os.path.join(ROOT, 'test_data')
test_graph_dataset = GraphDataset(test_dir_path, features, labels, spectators, n_events=1000, n_events_merge=1, 
                         file_names=test_files)

test_loader = DataListLoader(test_graph_dataset, batch_size=batch_size, 
                             pin_memory=True, shuffle=True)
test_loader.collate_fn = collate
test_samples = len(test_graph_dataset)

test_p = tqdm(enumerate(test_loader), total=test_samples/batch_size)
test_lst = []
net = Net().to(device)
modpath = os.path.join(ROOT, 'simplenetwork_best.pt')

# Retrieve the model weights that produced smallest validation loss
net.load_state_dict(torch.load(modpath));
net.eval();
with torch.no_grad():
    for k, tdata in test_p:
        tdata = tdata.to(device) # Moving data to memory
        y = tdata.y # Retrieving target variable
        tpreds = net(tdata.x, tdata.batch) 
        loss_t = (tpreds.float() - y.float()) / (y.float())
        loss_t_np = loss_t.cpu().numpy()
        loss = loss_t_np.ravel().tolist()
        test_lst+=loss
        
        
test_masked = np.ma.masked_invalid(test_lst).tolist()
test_resolution = [x for x in test_masked if x is not None]

avg_resolution = np.average(test_resolution)
std_resolution = np.std(test_resolution)

In [None]:
fig = plt.figure(figsize=(12,8))
ax = fig.gca()

_ = sns.histplot(test_resolution, stat='frequency',
            color='lightgreen', ax=ax, bins=50,
                label=f'NN regressor mass resolution mean={avg_resolution:.4}, std={std_resolution:.4}')

_ = ax.legend(frameon=True, prop={'size':15})

Test resolution we calculate from our test data form a cluster on a left-hand side with couple outliers stretching to the right-hand side, causing skewness in the distribution of test resolution. To mitigate the right-skew, we can log-scale the resolution to reshape the distribution closer to normal shape. Let's demonstrate below

In [None]:
log_resolution = [np.sqrt(x) for x in test_resolution]
avg_log_resolution = np.average(log_resolution)
std_log_resolution = np.std(log_resolution)

fig = plt.figure(figsize=(12,8))
ax = fig.gca()

_ = sns.histplot(log_resolution, stat='frequency',
            color='darkgreen', ax=ax, bins=50,
                label=f'NN regressor mass log-scaled resolution mean={avg_log_resolution:.4}, std={std_log_resolution:.4}')


_ = ax.legend(frameon=True, prop={'size':15})