### Thesis notebook 4.4. - NOVA IMS

#### LSTM - Temporal data representation

In this notebook, we will finally start our application of temporal representation using LSTMs and bi-directional LSTMs.
The argument for the usage of Deep Learning stems from the fact that sequences themselves encode information that can be extracted using Recurrent Neural Networks and, more specifically, Long Short Term Memory Units.

#### First Step: Setup a PyTorch environment that enables the use of GPU for training. 

The following cell wll confirm that the GPU will be the default device to use.

In [1]:
import torch
import pycuda.driver as cuda

cuda.init()
## Get Id of default device
torch.cuda.current_device()
# 0
cuda.Device(0).name() # '0' is the id of your GPU

#set all tensors to gpu
torch.set_default_tensor_type('torch.cuda.FloatTensor')

#### Second Step: Import the relevant packages and declare global variables

In [2]:
#import necessary modules/libraries
import numpy as np
import scipy
import pandas as pd
import datetime as dt
import warnings
import time

#tqdm to monitor progress
from tqdm.notebook import tqdm, trange
tqdm.pandas(desc="Progress")

#time related features
from datetime import timedelta
from copy import copy, deepcopy

#vizualization
import matplotlib.pyplot as plt
import seaborn as sns

#imblearn, scalers, kfold and metrics
from imblearn.over_sampling import SMOTE
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler, QuantileTransformer,PowerTransformer
from sklearn.model_selection import train_test_split, RepeatedKFold, RepeatedStratifiedKFold, cross_val_score, GridSearchCV
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, roc_curve, recall_score, classification_report, average_precision_score, precision_recall_curve

#import torch related
import torch.nn as nn
from torch.nn import functional as F
from torch.autograd import Variable 
from torch.utils.data import TensorDataset, DataLoader
from torch.utils.data.sampler import SubsetRandomSampler


#and optimizer of learning rate
from torch.optim.lr_scheduler import ReduceLROnPlateau

#import pytorch modules
warnings.filterwarnings('ignore')

In [3]:
#global variables that may come in handy
#course threshold sets the % duration that will be considered (1 = 100%)
duration_threshold = [0.1, 0.25, 0.33, 0.5, 1]

#colors for vizualizations
nova_ims_colors = ['#BFD72F', '#5C666C']

#standard color for student aggregates
student_color = '#474838'

#standard color for course aggragates
course_color = '#1B3D2F'

#standard continuous colormap
standard_cmap = 'viridis_r'

#Function designed to deal with multiindex and flatten it
def flattenHierarchicalCol(col,sep = '_'):
    '''converts multiindex columns into single index columns while retaining the hierarchical components'''
    if not type(col) is tuple:
        return col
    else:
        new_col = ''
        for leveli,level in enumerate(col):
            if not level == '':
                if not leveli == 0:
                    new_col += sep
                new_col += level
        return new_col
    
#number of replicas - number of repeats of stratified k fold - in this case 10
replicas = 1

#names to display on result figures
date_names = {
             'Date_threshold_10': '10% of Course Duration',   
             'Date_threshold_25': '25% of Course Duration', 
             'Date_threshold_33': '33% of Course Duration', 
             'Date_threshold_50': '50% of Course Duration', 
             'Date_threshold_100':'100% of Course Duration', 
            }

target_names = {
                'exam_fail' : 'At risk - Exam Grade',
                'final_fail' : 'At risk - Final Grade', 
                'exam_gifted' : 'High performer - Exam Grade', 
                'final_gifted': 'High performer - Final Grade'
                }

#targets
targets = ['exam_fail' , 'final_fail' , 'exam_gifted' , 'final_gifted']
temporal_columns = ['0 to 4%', '4 to 8%', '8 to 12%', '12 to 16%', '16 to 20%', '20 to 24%',
       '24 to 28%', '28 to 32%', '32 to 36%', '36 to 40%', '40 to 44%',
       '44 to 48%', '48 to 52%', '52 to 56%', '56 to 60%', '60 to 64%',
       '64 to 68%', '68 to 72%', '72 to 76%', '76 to 80%', '80 to 84%',
       '84 to 88%', '88 to 92%', '92 to 96%', '96 to 100%']

#### Step 3: Import data and take a preliminary look at it 

In [4]:
#imports dataframes
course_programs = pd.read_excel("../Data/Modeling Stage/Nova_IMS_Temporal_Datasets_25_splits.xlsx", 
                                dtype = {
                                    'course_encoding' : int,
                                    'userid' : int},
                               sheet_name = None)

#save tables 
student_list = pd.read_csv('../Data/Modeling Stage/Nova_IMS_Filtered_targets.csv', 
                         dtype = {
                                   'course_encoding': int,
                                   'userid' : int,
                                   })

#drop unnamed 0 column
for i in course_programs:
        
    #merge with the targets we calculated on the other 
    course_programs[i] = course_programs[i].merge(student_list, on = ['course_encoding', 'userid'], how = 'inner')
    course_programs[i].drop(['Unnamed: 0', 'exam_mark', 'final_mark'], axis = 1, inplace = True)
    
    #convert results to object
    course_programs[i]['course_encoding'], course_programs[i]['userid'] = course_programs[i]['course_encoding'].astype(object), course_programs[i]['userid'].astype(object)

In [5]:
course_programs['Date_threshold_100'].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9296 entries, 0 to 9295
Data columns (total 31 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   course_encoding  9296 non-null   object
 1   userid           9296 non-null   object
 2   0 to 4%          9296 non-null   int64 
 3   4 to 8%          9296 non-null   int64 
 4   8 to 12%         9296 non-null   int64 
 5   12 to 16%        9296 non-null   int64 
 6   16 to 20%        9296 non-null   int64 
 7   20 to 24%        9296 non-null   int64 
 8   24 to 28%        9296 non-null   int64 
 9   28 to 32%        9296 non-null   int64 
 10  32 to 36%        9296 non-null   int64 
 11  36 to 40%        9296 non-null   int64 
 12  40 to 44%        9296 non-null   int64 
 13  44 to 48%        9296 non-null   int64 
 14  48 to 52%        9296 non-null   int64 
 15  52 to 56%        9296 non-null   int64 
 16  56 to 60%        9296 non-null   int64 
 17  60 to 64%        9296 non-null   

In [6]:
course_programs['Date_threshold_100'].describe(include = 'all')

Unnamed: 0,course_encoding,userid,0 to 4%,4 to 8%,8 to 12%,12 to 16%,16 to 20%,20 to 24%,24 to 28%,28 to 32%,...,76 to 80%,80 to 84%,84 to 88%,88 to 92%,92 to 96%,96 to 100%,exam_fail,final_fail,exam_gifted,final_gifted
count,9296.0,9296.0,9296.0,9296.0,9296.0,9296.0,9296.0,9296.0,9296.0,9296.0,...,9296.0,9296.0,9296.0,9296.0,9296.0,9296.0,9296.0,9296.0,9296.0,9296.0
unique,138.0,1590.0,,,,,,,,,...,,,,,,,,,,
top,150.0,3178.0,,,,,,,,,...,,,,,,,,,,
freq,178.0,14.0,,,,,,,,,...,,,,,,,,,,
mean,,,1.081863,8.307874,10.752797,11.193739,10.127797,8.966652,10.545396,11.445245,...,11.718051,13.136403,22.827883,27.341007,12.599613,0.0,0.201377,0.149957,0.276893,0.30809
std,,,3.526351,13.580025,13.626754,16.400023,14.291254,12.180177,13.507892,15.932226,...,28.186874,36.690068,47.158607,54.963959,35.194597,0.0,0.401051,0.357048,0.447487,0.461729
min,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,,,0.0,0.0,1.0,2.0,2.0,1.0,2.0,3.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,,,0.0,2.0,7.0,7.0,6.0,5.0,7.0,7.0,...,2.0,2.0,4.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,,,1.0,12.0,15.0,15.0,13.0,13.0,14.0,14.0,...,10.0,10.0,23.0,27.0,5.0,0.0,0.0,0.0,1.0,1.0


In our first attempt, we will use the absolute number of clicks made by each student - scaled using standard scaler. 
Therefore, we can start by immediately placing our course encoding/userid pairings into the index.

In [7]:
def normalize(train, test, scaler):
    
    if scaler == 'MinMax':
        pt = MinMaxScaler()
    elif scaler == 'Standard':
        pt = StandardScaler()
    elif scaler == 'Robust':
        pt = RobustScaler()
    elif scaler == 'Quantile':
        pt = QuantileTransformer()
    else:
        pt = PowerTransformer(method='yeo-johnson')
    
    data_train = pt.fit_transform(train)
    data_test = pt.transform(test)
    # convert the array back to a dataframe
    normalized_train = pd.DataFrame(data_train,columns=train.columns)
    normalized_test = pd.DataFrame(data_test,columns=test.columns)
        
    return normalized_train, normalized_test

#### Implementing Cross-Validation with Deep Learning Model

**1. Create the Deep Learning Model**

In this instance, we will follow-up with on the approach used in Chen & Cui - CrossEntropyLoss with applied over a softmax layer.

In [8]:
class LSTM_Uni(nn.Module):
    def __init__(self, num_classes, input_size, hidden_size, num_layers, seq_length):
        super(LSTM_Uni, self).__init__()
        self.num_classes = num_classes #number of classes
        self.num_layers = num_layers #number of layers
        self.input_size = input_size #input size
        self.hidden_size = hidden_size #hidden state
        self.seq_length = seq_length #sequence length

        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
                          num_layers=num_layers, batch_first = True) #lstm
        
        self.dropout = nn.Dropout(p = 0.5)
    
        self.fc = nn.Linear(self.hidden_size, num_classes) #fully connected last layer

    def forward(self,x):
        h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)) #hidden state
        c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)) #internal state
        
        #Xavier_init for both H_0 and C_0
        torch.nn.init.xavier_normal_(h_0)
        torch.nn.init.xavier_normal_(c_0)
        
        # Propagate input through LSTM
        lstm_out, (hn, cn) = self.lstm(x, (h_0, c_0)) #lstm with input, hidden, and internal state
        last_output = hn.view(-1, self.hidden_size) #reshaping the data for Dense layer next
        
        drop_out = self.dropout(last_output)
        pre_softmax = self.fc(drop_out) #Final Output - dense
        return pre_softmax

**2. Define the train and validation Functions**

In [9]:
def train_epoch(model,dataloader,loss_fn,optimizer):
    
    train_loss,train_correct=0.0,0 
    model.train()
    for X, labels in dataloader:

        optimizer.zero_grad()
        output = model(X)
        loss = loss_fn(output,labels)
        loss.backward()
        optimizer.step()
        train_loss += loss.item() * X.size(0)
        scores, predictions = torch.max(F.log_softmax(output.data), 1)
        train_correct += (predictions == labels).sum().item()
        
    return train_loss,train_correct
  
def valid_epoch(model,dataloader,loss_fn):
    valid_loss, val_correct = 0.0, 0
    targets = []
    y_pred = []
    probability_1 = []
    
    model.eval()
    for X, labels in dataloader:

        output = model(X)
        loss=loss_fn(output,labels)
        valid_loss+=loss.item()*X.size(0)
        probability_1.append(F.softmax(output.data)[:,1])
        predictions = torch.argmax(output, dim=1)
        val_correct+=(predictions == labels).sum().item()
        targets.append(labels)
        y_pred.append(predictions)
    
    #concat all results
    targets = torch.cat(targets).data.cpu().numpy()
    y_pred = torch.cat(y_pred).data.cpu().numpy()
    probability_1 = torch.cat(probability_1).data.cpu().numpy()
    
    #calculate precision, recall and AUC score
    
    precision = precision_score(targets, y_pred)
    recall = recall_score(targets, y_pred)
    auroc = roc_auc_score(targets, probability_1)
    
    #return all
    return valid_loss,val_correct, precision, recall, auroc

**3. Define main hyperparameters of the model, including splits**

In [12]:
#Model
num_epochs = 200 #50 epochs
learning_rate = 0.01 #0.001 lr
input_size = 1 #number of features
hidden_size = 40 #number of features in hidden state
num_layers = 1 #number of stacked lstm layers

#Shape of Output as required for SoftMax Classifier
num_classes = 2 #output shape

batch_size = 32

k=10
splits= RepeatedStratifiedKFold(n_splits=k, n_repeats=replicas, random_state=15) #kfold of 10 with 30 replicas
criterion = nn.CrossEntropyLoss()    # cross-entropy for classification

**4. Make the splits and Start Training**

In [None]:
for i in tqdm(course_programs.keys()):
    
    print(i)
    threshold_dict = {} #dict to store information in for each threshold
    data = deepcopy(course_programs[i])
    
    data.set_index(['course_encoding', 'userid'], drop = True, inplace = True)
    data.fillna(0, inplace = True)
    
    #set X and Y columns
    X = data[data.columns[:25]] #different timesteps
    y = data[data.columns[-4:]] #the 4 different putative targets
    
    for k in tqdm(targets):
        print(k)
        
        #Start with train test split
        X_train_val, X_test, y_train_val, y_test, = train_test_split(
                                    X,
                                   y[k], #replace when going for multi-target 
                                   test_size = 0.20,
                                   random_state = 15,
                                   shuffle=True,
                                   stratify = y[k] #replace when going for multi-target
                                    )
        
        #create dict to store fold performance
        foldperf={}
        
        #reset "best accuracy for treshold i and target k"
        best_accuracy = 0

        #make train_val split
        for fold, (train_idx,val_idx) in tqdm(enumerate(splits.split(X_train_val, y_train_val))):

            print('Split {}'.format(fold + 1))
            
            #make split between train and Val
            X_train, y_train = X_train_val.iloc[train_idx], y_train_val.iloc[train_idx]
            X_val, y_val = X_train_val.iloc[val_idx], y_train_val.iloc[val_idx]
            
            #apply SMOTE to training split
            over = SMOTE()
            X_train, y_train = over.fit_resample(X_train, y_train)
            
            #apply scaling after 
            X_train, X_val = normalize(X_train, X_val, 'Standard')
            
            #second, convert everything to pytorch tensor - we will convert to tensor dataset and 
            X_train_tensors = Variable(torch.Tensor(X_train.values))
            X_val_tensors = Variable(torch.Tensor(X_val.values))

            y_train_tensors = Variable(torch.Tensor(y_train.values))
            y_val_tensors = Variable(torch.Tensor(y_val.values)) 

            #reshaping to rows, timestamps, features 
            X_train_tensors = torch.reshape(X_train_tensors,   (X_train_tensors.shape[0], X_train_tensors.shape[1], 1))
            X_val_tensors = torch.reshape(X_val_tensors,  (X_val_tensors.shape[0], X_val_tensors.shape[1], 1))
        
            #convert y tensors to format longtensor
            y_train_tensors = y_train_tensors.type(torch.cuda.LongTensor)
            y_val_tensors = y_val_tensors.type(torch.cuda.LongTensor)
            
            #create Tensor Datasets and dataloaders for both Train and Val
            train_dataset = TensorDataset(X_train_tensors, y_train_tensors)
            val_dataset = TensorDataset(X_val_tensors, y_val_tensors)
            train_loader = DataLoader(train_dataset, batch_size=batch_size)
            val_loader = DataLoader(val_dataset, batch_size=batch_size)
    
            #creates new model for each 
            model = LSTM_Uni(num_classes, input_size, hidden_size, num_layers, X_train_tensors.shape[1]).to('cuda') #our lstm class
            optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) 
            scheduler = ReduceLROnPlateau(optimizer, 
                                  'min', 
                                  patience = 10,
                                  cooldown = 20,
                                 verbose = True)
    
            history = {'train_loss': [], 'val_loss': [],'train_acc':[],'val_acc':[], 'precision': [],
                      'recall' : [], 'auroc': []}

            for epoch in tqdm(range(num_epochs)):
                train_loss, train_correct=train_epoch(model,train_loader,criterion,optimizer)
                val_loss, val_correct, precision, recall, auroc = valid_epoch(model,val_loader,criterion)

                train_loss = train_loss / len(train_loader.sampler)
                train_acc = train_correct / len(train_loader.sampler) * 100
                val_loss = val_loss / len(val_loader.sampler)
                val_acc = val_correct / len(val_loader.sampler) * 100
        
        
                if (epoch+1) % 10 == 0: 
                    print("Epoch:{}/{} AVG Training Loss:{:.3f} AVG Validation Loss:{:.3f} AVG Training Acc {:.2f} % AVG Validation Acc {:.2f} %".format(epoch + 1,
                                                                                                             num_epochs,
                                                                                                             train_loss,
                                                                                                             val_loss,
                                                                                                             train_acc,
                                                                                                             val_acc))
                history['train_loss'].append(train_loss)
                history['val_loss'].append(val_loss)
                history['train_acc'].append(train_acc)
                history['val_acc'].append(val_acc)
                history['precision'].append(precision)
                history['recall'].append(recall)
                history['auroc'].append(auroc)
                scheduler.step(val_loss)
    
                if val_acc > best_accuracy:
            
                #replace best accuracy and save best model
                    print(f'New Best Accuracy found: {val_acc:.2f}%\nEpoch: {epoch + 1}')
                    best_accuracy = val_acc
                    best = deepcopy(model)
                    curr_epoch = epoch + 1
                    
            #store fold performance
            foldperf['fold{}'.format(fold+1)] = history
        
        #saves fold performance for target 
        threshold_dict[k] = pd.DataFrame.from_dict(foldperf, orient='index') # convert dict to dataframe
        
        #explode to get eacxh epoch as a row
        threshold_dict[k] = threshold_dict[k].explode(list(threshold_dict[k].columns))
        torch.save(best,f"../Models/{i}/SMOTE_Nova_IMS_best_{k}_{curr_epoch}_epochs.h")
        
    # from pandas.io.parsers import ExcelWriter
    with pd.ExcelWriter(f"../Data/Modeling Stage/Results/IMS/Clicks per % duration/SMOTE_25_splits_{i}_{replicas}_replicas.xlsx") as writer:  
        for sheet in targets:
                threshold_dict[sheet].to_excel(writer, sheet_name=str(sheet))

  0%|          | 0/5 [00:00<?, ?it/s]

Date_threshold_10


  0%|          | 0/4 [00:00<?, ?it/s]

exam_fail


0it [00:00, ?it/s]

Split 1


  0%|          | 0/200 [00:00<?, ?it/s]

New Best Accuracy found: 20.16%
Epoch: 1
Epoch:10/200 AVG Training Loss:0.571 AVG Validation Loss:2.342 AVG Training Acc 77.03 % AVG Validation Acc 20.16 %
Epoch:20/200 AVG Training Loss:0.501 AVG Validation Loss:6.305 AVG Training Acc 82.97 % AVG Validation Acc 20.16 %
Epoch    22: reducing learning rate of group 0 to 1.0000e-03.
New Best Accuracy found: 20.43%
Epoch: 26
New Best Accuracy found: 20.56%
Epoch: 28
New Best Accuracy found: 20.70%
Epoch: 29
Epoch:30/200 AVG Training Loss:0.694 AVG Validation Loss:0.785 AVG Training Acc 52.19 % AVG Validation Acc 21.64 %
New Best Accuracy found: 21.64%
Epoch: 30
New Best Accuracy found: 22.45%
Epoch: 31
New Best Accuracy found: 22.72%
Epoch: 32
New Best Accuracy found: 23.52%
Epoch: 33
New Best Accuracy found: 24.46%
Epoch: 34
New Best Accuracy found: 25.27%
Epoch: 35
New Best Accuracy found: 26.48%
Epoch: 36
New Best Accuracy found: 27.28%
Epoch: 37
New Best Accuracy found: 27.42%
Epoch: 38
New Best Accuracy found: 28.76%
Epoch: 39
Epoch:

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.455 AVG Validation Loss:3.933 AVG Training Acc 81.54 % AVG Validation Acc 20.16 %
Epoch:20/200 AVG Training Loss:0.600 AVG Validation Loss:5.746 AVG Training Acc 70.73 % AVG Validation Acc 20.16 %
Epoch    27: reducing learning rate of group 0 to 1.0000e-03.
Epoch:30/200 AVG Training Loss:0.709 AVG Validation Loss:0.954 AVG Training Acc 54.22 % AVG Validation Acc 21.64 %
Epoch:40/200 AVG Training Loss:0.682 AVG Validation Loss:0.884 AVG Training Acc 56.80 % AVG Validation Acc 25.54 %
Epoch:50/200 AVG Training Loss:0.674 AVG Validation Loss:0.860 AVG Training Acc 58.41 % AVG Validation Acc 26.21 %
Epoch:60/200 AVG Training Loss:0.673 AVG Validation Loss:0.834 AVG Training Acc 59.20 % AVG Validation Acc 27.02 %
Epoch:70/200 AVG Training Loss:0.666 AVG Validation Loss:0.855 AVG Training Acc 60.75 % AVG Validation Acc 25.94 %
Epoch:80/200 AVG Training Loss:0.660 AVG Validation Loss:0.826 AVG Training Acc 60.94 % AVG Validation Acc 26.21 %
Epoch    89: reduc

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.560 AVG Validation Loss:3.123 AVG Training Acc 78.64 % AVG Validation Acc 20.16 %
Epoch:20/200 AVG Training Loss:0.626 AVG Validation Loss:1.821 AVG Training Acc 69.40 % AVG Validation Acc 20.16 %
Epoch:30/200 AVG Training Loss:0.677 AVG Validation Loss:11.385 AVG Training Acc 71.30 % AVG Validation Acc 20.16 %
Epoch    37: reducing learning rate of group 0 to 1.0000e-03.
Epoch:40/200 AVG Training Loss:0.754 AVG Validation Loss:0.875 AVG Training Acc 49.77 % AVG Validation Acc 21.10 %
Epoch:50/200 AVG Training Loss:0.685 AVG Validation Loss:0.761 AVG Training Acc 55.65 % AVG Validation Acc 25.67 %
Epoch:60/200 AVG Training Loss:0.676 AVG Validation Loss:0.779 AVG Training Acc 57.69 % AVG Validation Acc 36.96 %
Epoch    68: reducing learning rate of group 0 to 1.0000e-04.
Epoch:70/200 AVG Training Loss:0.679 AVG Validation Loss:0.751 AVG Training Acc 57.16 % AVG Validation Acc 45.43 %
Epoch:80/200 AVG Training Loss:0.670 AVG Validation Loss:0.705 AVG Tra

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.549 AVG Validation Loss:3.134 AVG Training Acc 75.94 % AVG Validation Acc 20.16 %
Epoch:20/200 AVG Training Loss:0.621 AVG Validation Loss:2.793 AVG Training Acc 67.94 % AVG Validation Acc 20.16 %
Epoch    29: reducing learning rate of group 0 to 1.0000e-03.
Epoch:30/200 AVG Training Loss:1.015 AVG Validation Loss:1.142 AVG Training Acc 50.01 % AVG Validation Acc 20.16 %
Epoch:40/200 AVG Training Loss:0.705 AVG Validation Loss:0.863 AVG Training Acc 53.13 % AVG Validation Acc 20.43 %
Epoch:50/200 AVG Training Loss:0.617 AVG Validation Loss:1.256 AVG Training Acc 68.71 % AVG Validation Acc 20.56 %
Epoch:60/200 AVG Training Loss:0.679 AVG Validation Loss:0.857 AVG Training Acc 58.02 % AVG Validation Acc 22.45 %
Epoch    60: reducing learning rate of group 0 to 1.0000e-04.
Epoch:70/200 AVG Training Loss:0.674 AVG Validation Loss:0.716 AVG Training Acc 58.52 % AVG Validation Acc 45.70 %
Epoch:80/200 AVG Training Loss:0.670 AVG Validation Loss:0.704 AVG Trai

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.447 AVG Validation Loss:4.172 AVG Training Acc 81.02 % AVG Validation Acc 20.16 %
Epoch:20/200 AVG Training Loss:0.585 AVG Validation Loss:3.062 AVG Training Acc 73.41 % AVG Validation Acc 20.16 %
Epoch:30/200 AVG Training Loss:0.604 AVG Validation Loss:7.684 AVG Training Acc 79.55 % AVG Validation Acc 20.16 %
Epoch:40/200 AVG Training Loss:0.535 AVG Validation Loss:6.110 AVG Training Acc 79.49 % AVG Validation Acc 20.16 %
Epoch    48: reducing learning rate of group 0 to 1.0000e-03.
Epoch:50/200 AVG Training Loss:0.814 AVG Validation Loss:1.014 AVG Training Acc 50.40 % AVG Validation Acc 20.70 %
Epoch:60/200 AVG Training Loss:0.685 AVG Validation Loss:0.775 AVG Training Acc 55.43 % AVG Validation Acc 31.45 %
Epoch:70/200 AVG Training Loss:0.681 AVG Validation Loss:0.773 AVG Training Acc 57.31 % AVG Validation Acc 36.83 %
Epoch:80/200 AVG Training Loss:0.672 AVG Validation Loss:0.780 AVG Training Acc 58.78 % AVG Validation Acc 39.11 %
Epoch    81: reduc

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.532 AVG Validation Loss:8.362 AVG Training Acc 79.83 % AVG Validation Acc 20.16 %
Epoch:20/200 AVG Training Loss:0.560 AVG Validation Loss:2.466 AVG Training Acc 78.53 % AVG Validation Acc 20.16 %
Epoch:30/200 AVG Training Loss:0.642 AVG Validation Loss:8.816 AVG Training Acc 65.58 % AVG Validation Acc 20.16 %
Epoch    37: reducing learning rate of group 0 to 1.0000e-03.
Epoch:40/200 AVG Training Loss:0.705 AVG Validation Loss:0.814 AVG Training Acc 51.02 % AVG Validation Acc 22.04 %
Epoch:50/200 AVG Training Loss:0.680 AVG Validation Loss:0.803 AVG Training Acc 56.45 % AVG Validation Acc 31.99 %
Epoch:60/200 AVG Training Loss:0.676 AVG Validation Loss:0.807 AVG Training Acc 58.49 % AVG Validation Acc 35.62 %
Epoch    68: reducing learning rate of group 0 to 1.0000e-04.
Epoch:70/200 AVG Training Loss:0.680 AVG Validation Loss:0.781 AVG Training Acc 57.81 % AVG Validation Acc 43.95 %
Epoch:80/200 AVG Training Loss:0.665 AVG Validation Loss:0.706 AVG Trai

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.568 AVG Validation Loss:3.094 AVG Training Acc 76.38 % AVG Validation Acc 20.05 %
Epoch:20/200 AVG Training Loss:0.625 AVG Validation Loss:3.560 AVG Training Acc 70.75 % AVG Validation Acc 20.05 %
Epoch    25: reducing learning rate of group 0 to 1.0000e-03.
Epoch:30/200 AVG Training Loss:0.709 AVG Validation Loss:0.802 AVG Training Acc 50.97 % AVG Validation Acc 21.13 %
Epoch:40/200 AVG Training Loss:0.682 AVG Validation Loss:0.781 AVG Training Acc 56.86 % AVG Validation Acc 34.32 %
Epoch:50/200 AVG Training Loss:0.676 AVG Validation Loss:0.784 AVG Training Acc 58.83 % AVG Validation Acc 40.11 %
Epoch    56: reducing learning rate of group 0 to 1.0000e-04.
Epoch:60/200 AVG Training Loss:0.673 AVG Validation Loss:0.748 AVG Training Acc 58.46 % AVG Validation Acc 48.99 %
Epoch:70/200 AVG Training Loss:0.664 AVG Validation Loss:0.704 AVG Training Acc 60.65 % AVG Validation Acc 53.57 %
Epoch:80/200 AVG Training Loss:0.664 AVG Validation Loss:0.692 AVG Trai

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.495 AVG Validation Loss:7.276 AVG Training Acc 79.33 % AVG Validation Acc 20.05 %
Epoch:20/200 AVG Training Loss:0.618 AVG Validation Loss:2.021 AVG Training Acc 70.84 % AVG Validation Acc 20.05 %
Epoch:30/200 AVG Training Loss:0.650 AVG Validation Loss:1.621 AVG Training Acc 65.58 % AVG Validation Acc 20.05 %
Epoch:40/200 AVG Training Loss:0.618 AVG Validation Loss:7.753 AVG Training Acc 75.14 % AVG Validation Acc 20.05 %
Epoch    47: reducing learning rate of group 0 to 1.0000e-03.
Epoch:50/200 AVG Training Loss:0.763 AVG Validation Loss:0.915 AVG Training Acc 50.06 % AVG Validation Acc 20.46 %
Epoch:60/200 AVG Training Loss:0.686 AVG Validation Loss:0.756 AVG Training Acc 54.32 % AVG Validation Acc 26.92 %
Epoch:70/200 AVG Training Loss:0.679 AVG Validation Loss:0.752 AVG Training Acc 56.80 % AVG Validation Acc 34.05 %
Epoch:80/200 AVG Training Loss:0.672 AVG Validation Loss:0.749 AVG Training Acc 58.39 % AVG Validation Acc 36.74 %
Epoch:90/200 AVG T

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.525 AVG Validation Loss:5.875 AVG Training Acc 80.52 % AVG Validation Acc 20.05 %
Epoch:20/200 AVG Training Loss:0.645 AVG Validation Loss:1.615 AVG Training Acc 66.38 % AVG Validation Acc 20.05 %
Epoch:30/200 AVG Training Loss:0.622 AVG Validation Loss:3.477 AVG Training Acc 66.88 % AVG Validation Acc 20.05 %
Epoch    32: reducing learning rate of group 0 to 1.0000e-03.
Epoch:40/200 AVG Training Loss:0.688 AVG Validation Loss:0.764 AVG Training Acc 54.78 % AVG Validation Acc 25.17 %
Epoch:50/200 AVG Training Loss:0.681 AVG Validation Loss:0.762 AVG Training Acc 56.80 % AVG Validation Acc 30.69 %
Epoch:60/200 AVG Training Loss:0.676 AVG Validation Loss:0.768 AVG Training Acc 57.76 % AVG Validation Acc 32.84 %
Epoch    63: reducing learning rate of group 0 to 1.0000e-04.
Epoch:70/200 AVG Training Loss:0.670 AVG Validation Loss:0.718 AVG Training Acc 58.57 % AVG Validation Acc 51.14 %
Epoch:80/200 AVG Training Loss:0.665 AVG Validation Loss:0.688 AVG Trai

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.519 AVG Validation Loss:4.599 AVG Training Acc 81.43 % AVG Validation Acc 20.19 %
Epoch:20/200 AVG Training Loss:0.587 AVG Validation Loss:8.063 AVG Training Acc 77.90 % AVG Validation Acc 20.19 %
Epoch:30/200 AVG Training Loss:0.605 AVG Validation Loss:11.771 AVG Training Acc 65.99 % AVG Validation Acc 20.19 %
Epoch:40/200 AVG Training Loss:0.654 AVG Validation Loss:1.504 AVG Training Acc 64.62 % AVG Validation Acc 20.32 %
Epoch:50/200 AVG Training Loss:0.536 AVG Validation Loss:3.597 AVG Training Acc 73.43 % AVG Validation Acc 20.32 %
Epoch    52: reducing learning rate of group 0 to 1.0000e-03.
Epoch:60/200 AVG Training Loss:0.699 AVG Validation Loss:0.816 AVG Training Acc 53.12 % AVG Validation Acc 23.15 %
Epoch:70/200 AVG Training Loss:0.688 AVG Validation Loss:0.795 AVG Training Acc 55.63 % AVG Validation Acc 24.50 %
Epoch:80/200 AVG Training Loss:0.683 AVG Validation Loss:0.756 AVG Training Acc 57.01 % AVG Validation Acc 33.24 %
Epoch:90/200 AVG 

0it [00:00, ?it/s]

Split 1


  0%|          | 0/200 [00:00<?, ?it/s]

New Best Accuracy found: 14.92%
Epoch: 1
Epoch:10/200 AVG Training Loss:0.542 AVG Validation Loss:8.318 AVG Training Acc 76.12 % AVG Validation Acc 14.92 %
Epoch:20/200 AVG Training Loss:0.630 AVG Validation Loss:1.840 AVG Training Acc 68.83 % AVG Validation Acc 14.92 %
Epoch:30/200 AVG Training Loss:0.520 AVG Validation Loss:9.130 AVG Training Acc 80.49 % AVG Validation Acc 14.92 %
Epoch    34: reducing learning rate of group 0 to 1.0000e-03.
New Best Accuracy found: 17.47%
Epoch: 38
Epoch:40/200 AVG Training Loss:0.592 AVG Validation Loss:1.505 AVG Training Acc 73.44 % AVG Validation Acc 15.99 %
Epoch:50/200 AVG Training Loss:0.525 AVG Validation Loss:1.702 AVG Training Acc 76.57 % AVG Validation Acc 15.59 %
Epoch:60/200 AVG Training Loss:0.556 AVG Validation Loss:1.657 AVG Training Acc 73.29 % AVG Validation Acc 15.32 %
Epoch    65: reducing learning rate of group 0 to 1.0000e-04.
New Best Accuracy found: 18.41%
Epoch: 67
New Best Accuracy found: 23.92%
Epoch: 68
New Best Accuracy f

  0%|          | 0/200 [00:00<?, ?it/s]

New Best Accuracy found: 55.38%
Epoch: 5
Epoch:10/200 AVG Training Loss:0.480 AVG Validation Loss:6.549 AVG Training Acc 81.83 % AVG Validation Acc 15.05 %
Epoch    16: reducing learning rate of group 0 to 1.0000e-03.
Epoch:20/200 AVG Training Loss:0.690 AVG Validation Loss:0.977 AVG Training Acc 56.06 % AVG Validation Acc 16.13 %
Epoch:30/200 AVG Training Loss:0.680 AVG Validation Loss:1.055 AVG Training Acc 58.52 % AVG Validation Acc 16.40 %
Epoch:40/200 AVG Training Loss:0.688 AVG Validation Loss:0.892 AVG Training Acc 56.77 % AVG Validation Acc 24.87 %
Epoch:50/200 AVG Training Loss:0.671 AVG Validation Loss:0.961 AVG Training Acc 59.90 % AVG Validation Acc 24.33 %
Epoch    51: reducing learning rate of group 0 to 1.0000e-04.
Epoch:60/200 AVG Training Loss:0.665 AVG Validation Loss:0.717 AVG Training Acc 59.16 % AVG Validation Acc 50.00 %
Epoch:70/200 AVG Training Loss:0.658 AVG Validation Loss:0.686 AVG Training Acc 60.10 % AVG Validation Acc 55.24 %
Epoch:80/200 AVG Training Loss

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.371 AVG Validation Loss:4.537 AVG Training Acc 86.68 % AVG Validation Acc 15.19 %
Epoch:20/200 AVG Training Loss:0.535 AVG Validation Loss:6.634 AVG Training Acc 78.45 % AVG Validation Acc 15.05 %
Epoch:30/200 AVG Training Loss:0.640 AVG Validation Loss:10.347 AVG Training Acc 82.96 % AVG Validation Acc 15.05 %
Epoch    37: reducing learning rate of group 0 to 1.0000e-03.
Epoch:40/200 AVG Training Loss:0.700 AVG Validation Loss:1.163 AVG Training Acc 58.47 % AVG Validation Acc 15.05 %
Epoch:50/200 AVG Training Loss:0.685 AVG Validation Loss:1.019 AVG Training Acc 57.66 % AVG Validation Acc 15.05 %
Epoch:60/200 AVG Training Loss:0.663 AVG Validation Loss:1.255 AVG Training Acc 59.47 % AVG Validation Acc 15.19 %
Epoch    68: reducing learning rate of group 0 to 1.0000e-04.
Epoch:70/200 AVG Training Loss:0.720 AVG Validation Loss:0.891 AVG Training Acc 53.52 % AVG Validation Acc 19.89 %
New Best Accuracy found: 60.08%
Epoch: 79
Epoch:80/200 AVG Training Lo

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.498 AVG Validation Loss:3.414 AVG Training Acc 83.03 % AVG Validation Acc 15.05 %
Epoch:20/200 AVG Training Loss:0.550 AVG Validation Loss:9.301 AVG Training Acc 78.63 % AVG Validation Acc 15.05 %
Epoch    26: reducing learning rate of group 0 to 1.0000e-03.
Epoch:30/200 AVG Training Loss:0.724 AVG Validation Loss:0.863 AVG Training Acc 49.76 % AVG Validation Acc 15.32 %
Epoch:40/200 AVG Training Loss:0.668 AVG Validation Loss:0.802 AVG Training Acc 59.62 % AVG Validation Acc 37.50 %
Epoch:50/200 AVG Training Loss:0.663 AVG Validation Loss:0.792 AVG Training Acc 61.32 % AVG Validation Acc 43.55 %
Epoch:60/200 AVG Training Loss:0.660 AVG Validation Loss:0.792 AVG Training Acc 62.01 % AVG Validation Acc 46.37 %
Epoch    65: reducing learning rate of group 0 to 1.0000e-04.
Epoch:70/200 AVG Training Loss:0.657 AVG Validation Loss:0.734 AVG Training Acc 62.16 % AVG Validation Acc 53.76 %
Epoch:80/200 AVG Training Loss:0.649 AVG Validation Loss:0.683 AVG Trai

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.495 AVG Validation Loss:3.974 AVG Training Acc 83.57 % AVG Validation Acc 15.05 %
Epoch:20/200 AVG Training Loss:0.509 AVG Validation Loss:5.901 AVG Training Acc 82.20 % AVG Validation Acc 15.05 %
Epoch    22: reducing learning rate of group 0 to 1.0000e-03.
Epoch:30/200 AVG Training Loss:0.713 AVG Validation Loss:0.969 AVG Training Acc 53.66 % AVG Validation Acc 15.46 %
Epoch:40/200 AVG Training Loss:0.665 AVG Validation Loss:1.116 AVG Training Acc 61.65 % AVG Validation Acc 16.80 %
Epoch:50/200 AVG Training Loss:0.660 AVG Validation Loss:1.024 AVG Training Acc 61.82 % AVG Validation Acc 22.18 %
Epoch    53: reducing learning rate of group 0 to 1.0000e-04.
Epoch:60/200 AVG Training Loss:0.666 AVG Validation Loss:0.734 AVG Training Acc 59.38 % AVG Validation Acc 50.27 %
Epoch:70/200 AVG Training Loss:0.657 AVG Validation Loss:0.687 AVG Training Acc 61.19 % AVG Validation Acc 57.80 %
Epoch:80/200 AVG Training Loss:0.653 AVG Validation Loss:0.684 AVG Trai

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.535 AVG Validation Loss:3.335 AVG Training Acc 79.80 % AVG Validation Acc 15.05 %
Epoch:20/200 AVG Training Loss:0.537 AVG Validation Loss:2.297 AVG Training Acc 81.77 % AVG Validation Acc 15.05 %
Epoch:30/200 AVG Training Loss:0.573 AVG Validation Loss:5.058 AVG Training Acc 77.55 % AVG Validation Acc 15.05 %
Epoch    31: reducing learning rate of group 0 to 1.0000e-03.
Epoch:40/200 AVG Training Loss:0.693 AVG Validation Loss:0.870 AVG Training Acc 53.98 % AVG Validation Acc 19.09 %
Epoch:50/200 AVG Training Loss:0.684 AVG Validation Loss:0.847 AVG Training Acc 56.06 % AVG Validation Acc 20.83 %
Epoch:60/200 AVG Training Loss:0.669 AVG Validation Loss:0.901 AVG Training Acc 59.26 % AVG Validation Acc 25.54 %
Epoch    65: reducing learning rate of group 0 to 1.0000e-04.
Epoch:70/200 AVG Training Loss:0.667 AVG Validation Loss:0.756 AVG Training Acc 59.26 % AVG Validation Acc 47.58 %
Epoch:80/200 AVG Training Loss:0.654 AVG Validation Loss:0.690 AVG Trai

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.486 AVG Validation Loss:7.131 AVG Training Acc 80.96 % AVG Validation Acc 14.94 %
Epoch:20/200 AVG Training Loss:0.517 AVG Validation Loss:6.416 AVG Training Acc 80.97 % AVG Validation Acc 14.94 %
Epoch:30/200 AVG Training Loss:0.592 AVG Validation Loss:1.856 AVG Training Acc 72.85 % AVG Validation Acc 14.94 %
Epoch:40/200 AVG Training Loss:0.453 AVG Validation Loss:5.705 AVG Training Acc 86.57 % AVG Validation Acc 14.94 %
Epoch    45: reducing learning rate of group 0 to 1.0000e-03.
Epoch:50/200 AVG Training Loss:0.692 AVG Validation Loss:0.852 AVG Training Acc 54.44 % AVG Validation Acc 15.61 %
Epoch:60/200 AVG Training Loss:0.671 AVG Validation Loss:0.863 AVG Training Acc 59.64 % AVG Validation Acc 19.78 %
Epoch:70/200 AVG Training Loss:0.665 AVG Validation Loss:0.866 AVG Training Acc 60.34 % AVG Validation Acc 21.53 %
Epoch    76: reducing learning rate of group 0 to 1.0000e-04.
Epoch:80/200 AVG Training Loss:0.665 AVG Validation Loss:0.771 AVG Trai

  0%|          | 0/200 [00:00<?, ?it/s]

Epoch:10/200 AVG Training Loss:0.455 AVG Validation Loss:8.306 AVG Training Acc 84.92 % AVG Validation Acc 14.94 %
Epoch:20/200 AVG Training Loss:0.620 AVG Validation Loss:2.454 AVG Training Acc 68.30 % AVG Validation Acc 14.94 %
Epoch:30/200 AVG Training Loss:0.504 AVG Validation Loss:2.686 AVG Training Acc 80.01 % AVG Validation Acc 14.94 %
Epoch    32: reducing learning rate of group 0 to 1.0000e-03.
Epoch:40/200 AVG Training Loss:0.699 AVG Validation Loss:0.797 AVG Training Acc 49.86 % AVG Validation Acc 16.55 %
Epoch:50/200 AVG Training Loss:0.684 AVG Validation Loss:0.926 AVG Training Acc 57.00 % AVG Validation Acc 28.94 %
Epoch:60/200 AVG Training Loss:0.669 AVG Validation Loss:0.895 AVG Training Acc 60.41 % AVG Validation Acc 36.07 %
Epoch    63: reducing learning rate of group 0 to 1.0000e-04.
Epoch:70/200 AVG Training Loss:0.678 AVG Validation Loss:0.737 AVG Training Acc 60.95 % AVG Validation Acc 58.01 %
Epoch:80/200 AVG Training Loss:0.660 AVG Validation Loss:0.697 AVG Trai