## Developing Hierarchical/Straight through Classification Approach

**Author:** Shaun Khoo  
**Date:** 1 Oct 2021  
**Context:** Adapting Shopify's approach to classifying products using a hierarchical classifier (see reference below)  
**Objective:** Develop code for training a hierarchical classifier neural network

Some references:

* [this article by Shopify](https://shopify.engineering/introducing-linnet-using-rich-image-text-data-categorize-products)
* [How to do transfer learning on PyTorch / Transformers](https://github.com/abhimishra91/transformers-tutorials/blob/master/transformers_multiclass_classification.ipynb)

#### A) Importing libraries and data

Changing the working directory to the top-level project folder

In [1]:
import os
os.chdir('..')

Importing the required libraries

In [2]:
import pandas as pd
import numpy as np
import copy

import torch
from torch.utils.data import Dataset, DataLoader
from torch.autograd import Variable
from transformers import DistilBertModel, DistilBertTokenizer, DistilBertForSequenceClassification
import time
from datetime import datetime

# Enable debugging while on GPU
# This doesn't seem to work for me though
# os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

Importing our training functions from our own codebase

In [3]:
from ssoc_autocoder import model_training

Filling in the required parameters for the model training

In [13]:
colnames = {
    'SSOC': 'Predicted_SSOC_2020',
    'job_description': 'description',
    'job_title': 'title'
}

parameters = {
    'architecture': 'hierarchical',
    'version': 'V2pt2',
    'sequence_max_length': 512,
    'max_level': 5,
    'training_batch_size': 32,
    'validation_batch_size': 32,
    'epochs': 19,
    'learning_rate': 0.001,
    'pretrained_tokenizer': 'C:\\Users\\shaun\\PycharmProjects\\ssoc-autocoder\\Models\\distilbert-tokenizer-pretrained-7epoch',
    'pretrained_model': 'C:\\Users\\shaun\\PycharmProjects\\ssoc-autocoder\\Models\\mcf-pretrained-7epoch', #'distilbert-base-uncased',
    'local_files_only': True,
    'num_workers': 4,
    'loss_weights': {
        'SSOC_1D': 20,
        'SSOC_2D': 5,
        'SSOC_3D': 3,
        'SSOC_4D': 2,
        'SSOC_5D': 1
    },
    'device': 'cuda'
}

In [5]:
train = pd.read_csv('Data/Train/Train.csv')
test = pd.read_csv('Data/Train/Test.csv')
SSOC_2020 = pd.read_csv('Data/Reference/SSOC_2020.csv')

In [16]:
test

Unnamed: 0,MCF_Job_Ad_ID,Predicted_SSOC_2020,title,description
0,MCF-2021-0042824,42241,Admin/Receptionist,Handling telephone calls and enquiries. Attend...
1,MCF-2021-0142643,51421,Beautician Supervisor,"Understand customer needs & skin condition, an..."
2,MCF-2021-0182163,21494,Senior / Quantity Surveyor (C&S/Tender/Project),"Responsible for payment/progress claims, varia..."
3,MCF-2021-0090664,12133,Compliance Manager [FinTech / Risk Management ...,"Manage compliance risk strategies, policies an..."
4,MCF-2021-0159738,21422,Building and Construction Site Engineer,Supervise and coordinate the activities of sit...
...,...,...,...,...
2885,MCF-2021-0103410,12222,Communications and Marketing Manager,Work with the communications and marketing tea...
2886,MCF-2021-0175627,51422,Manicurist/Beauty Therapist,Providing manicure and pedicure services as we...
2887,MCF-2021-0072528,71220,Floor / Wall Tiler,Cut tiles and shape them properly to ensure th...
2888,MCF-2021-0064583,51312,Service Crew,Job Descriptions. Serve customers with quality...


#### B) Preparing the model and data for training

Encoding the SSOCs into indices for the model

In [6]:
encoding = model_training.generate_encoding(SSOC_2020)
encoded_train = model_training.encode_dataset(train, encoding, colnames)
encoded_test = model_training.encode_dataset(test, encoding, colnames)

In [7]:
encoded_train[encoded_train['SSOC'] == 94102]

Unnamed: 0,Title,Text,SSOC,SSOC_1D,SSOC_2D,SSOC_3D,SSOC_4D,SSOC_5D
969,Food/Drink stall assistant,Food/Drink stall assistant assists in serving ...,94102,8,40,140,405,969
1131,Kitchen Assistant (Coffee Shop),Kitchen Assistant (Coffee Shop) Full Time / Pa...,94102,8,40,140,405,969
4325,Hawker Assistant,Jiak Song Mee Hoon Kway is looking for an hour...,94102,8,40,140,405,969
11668,Hawker Assistant,Assistant to head chef:Cutting of Vegetables.W...,94102,8,40,140,405,969
12339,NUS New Canteen Fruit Juice Stall Assistant,"New canteen, spacious and friendly working env...",94102,8,40,140,405,969


In [8]:
encoded_test[encoded_test['SSOC'] == 94102]

Unnamed: 0,Title,Text,SSOC,SSOC_1D,SSOC_2D,SSOC_3D,SSOC_4D,SSOC_5D
1691,Coffee Shop Assistant,Table cleaning and clearing plates to dishwash...,94102,8,40,140,405,969


Loading the DistilBERT tokenizer

In [9]:
tokenizer = DistilBertTokenizer.from_pretrained(parameters['pretrained_tokenizer'])

Creating the `DataLoader` object for both the train and test sets, as well as initialising the model

In [10]:
train_loader, test_loader = model_training.prepare_data(encoded_train, encoded_test, tokenizer, colnames, parameters)
model, loss_function, optimizer = model_training.prepare_model(encoding, parameters)

Some weights of the model checkpoint at C:\Users\shaun\PycharmProjects\ssoc-autocoder\Models\mcf-pretrained-7epoch were not used when initializing DistilBertModel: ['vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_projector.weight', 'vocab_layer_norm.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [14]:
model_training.train_model(model, loss_function, optimizer, train_loader, test_loader, parameters)

Training started on: 09 Jan 2022 - 21:51:28
> Epoch 1 started on: 09 Jan 2022 - 21:51:28
--------------------------------------------------------------------
>> Training Loss per 50 steps: 32.8190 
>> Training Accuracy per 50 steps: 21.44%
>> Batch of 50 took 3.78 mins
>> Training Loss per 50 steps: 32.9418 
>> Training Accuracy per 50 steps: 21.03%
>> Batch of 50 took 3.70 mins
>> Training Loss per 50 steps: 32.3827 
>> Training Accuracy per 50 steps: 22.54%
>> Batch of 50 took 3.70 mins
>> Training Loss per 50 steps: 32.2306 
>> Training Accuracy per 50 steps: 22.86%
>> Batch of 50 took 3.70 mins
>> Training Loss per 50 steps: 31.7108 
>> Training Accuracy per 50 steps: 23.62%
>> Batch of 50 took 3.70 mins
>> Training Loss per 50 steps: 31.4859 
>> Training Accuracy per 50 steps: 24.00%
>> Batch of 50 took 3.70 mins
>> Training Loss per 50 steps: 31.2732 
>> Training Accuracy per 50 steps: 24.35%
>> Batch of 50 took 3.68 mins
----------------------------------------------------------

In [15]:
torch.save(model.state_dict(), 'Models/autocoder-v2pt2-9jan-pretrained7epoch-20epoch.pt')

report x-d accuracy for eval set  
begin training with only the 1D loss function  
error analysis

In [18]:
model1 = DistilBertModel.from_pretrained(, local_files_only = True)

Some weights of the model checkpoint at C:\Users\shaun\PycharmProjects\ssoc-autocoder\Models\mcf-pretrained-1epoch were not used when initializing DistilBertModel: ['vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_layer_norm.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [201]:
sourceFile = open('demo.txt', 'w')
print('Hello, Python!', file = sourceFile)
print('hello', file = sourceFile)
print('testing', file = sourceFile)
sourceFile.close()

hello
testing


In [12]:
model_test, loss_function, optimizer = model_training.prepare_model(encoding, parameters)

ERROR! Session/line number was not unique in database. History logging moved to new session 265


Some weights of the model checkpoint at C:\Users\shaun\PycharmProjects\ssoc-autocoder\Models\mcf-pretrained-3epoch were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_transform.weight', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [145]:
model_test = torch.load()

In [13]:
model_test.load_state_dict(torch.load('Models/autocoder-30dec-pretrained-60epoch.pt'))


<All keys matched successfully>

In [18]:

# benchmark: 5.88%, 2.59; 6.94%, 2.44

torch.save(model.state_dict(), 'Models/autocoder-30dec-pretrained-60epoch.pt')


In [None]:
stop here

In [19]:
# PATH = 'Data/Processed/Training/train-aws/sm-model-5D-20epoch.pth'
# model.load_state_dict(torch.load(PATH))

In [199]:
from transformers import PreTrainedTokenizer

tokenizer2 = DistilBertTokenizer.from_pretrained("distilbert-tokenizer")

In [166]:
tokenizer.save_pretrained("distilbert-tokenizer")

('distilbert-tokenizer\\tokenizer_config.json',
 'distilbert-tokenizer\\special_tokens_map.json',
 'distilbert-tokenizer\\vocab.txt',
 'distilbert-tokenizer\\added_tokens.json')

In [170]:
tokenizer.pad_token

'[PAD]'

In [172]:
idxx = 26
text = test['description'][idxx]
test_target = str(test['Predicted_SSOC_2020'][idxx])
print(f'SSOC 2020: {test_target}')
print(text)



available_ssoc_codes = SSOC_2020['SSOC 2020']
#pd.read_excel("Data/Processed/Training/train-aws/SSOC2020_Detailed_Definitions.xlsx", skiprows = 4)["SSOC 2020"]
print('-----------------------------------------------')
generate_single_prediction(model_test, tokenizer2, text, test_target, encoding, parameters, ssoc_prediction_parameters)

SSOC 2020: 53203
Manage the day to day work of dental clinic. Chair-side assistance of the dentist. Sterilisation of Equipment. Answering patient enquiries. Making appointments. Clinic management. Requirements: Singaporeans only. Minimum 2 years experience in Dental Clinic. Relevant experience preferred. Check out Abercare sg for more detail. Interested candidate may send resume to career@abercare sg or call +65 6721 9231. Abercare sg | EA License Number 18C9070 | Germaine Er Si Ying | Registration Number R1875721.
-----------------------------------------------


{'SSOC_1D': {'predicted_ssoc': ['5', '4'],
  'predicted_proba': [0.9915245, 0.005478936],
  'accurate_prediction': True},
 'SSOC_2D': {'predicted_ssoc': ['53', '42', '32', '96', '22'],
  'predicted_proba': [0.99386096,
   0.005356934,
   0.00033705527,
   0.00032240854,
   7.1901624e-05],
  'accurate_prediction': True},
 'SSOC_3D': {'predicted_ssoc': ['532', '422', '322', '222', '962'],
  'predicted_proba': [0.97498643,
   0.01901617,
   0.0037244232,
   0.0017405519,
   0.00020084006],
  'accurate_prediction': True},
 'SSOC_4D': {'predicted_ssoc': ['5320', '4224', '3220', '2220', '3240'],
  'predicted_proba': [0.9883781,
   0.008069869,
   0.0017323107,
   0.0016664745,
   7.554526e-05],
  'accurate_prediction': True},
 'SSOC_5D': {'predicted_ssoc': ['22200',
   '83229',
   '34341',
   '36100',
   '93334',
   '83321',
   '51312',
   '51201',
   '83329',
   '14121'],
  'predicted_proba': [0.071881905,
   0.06389979,
   0.049810156,
   0.037628226,
   0.036753647,
   0.035747174,
   0.0

In [None]:
other_data = data[10000:]
test_data = other_data[other_data['SSOC 2020'] == data[0:10000]['SSOC 2020'].sample().values[0]].sample()
test_target = test_data['SSOC 2020'].values[0]
text = test_data['Cleaned_Description'].values[0]
print(test_data['SSOC 2020'].values[0])
print(text)


In [31]:
def convert_others_SSOC(predicted_SSOC_with_proba, threshold, available_ssoc_codes):
    out_prediction = []
    out_probability = []
    
    for prediction, probability in predicted_SSOC_with_proba:
        if probability < threshold and prediction[-1] != '9':
            new_prediction = prediction[:-1] + '9'
            if new_prediction in available_ssoc_codes.values:
                print(f'Converting {prediction} to {new_prediction}')
                out_prediction.append(new_prediction)
            else:
                out_prediction.append(prediction)
        else:
            out_prediction.append(prediction)
        out_probability.append(probability)
            
    return zip(out_prediction, out_probability)

In [80]:
type(test['Predicted_SSOC_2020'].values[0]) == np.int64

True

In [20]:
import torch

ssoc_prediction_parameters = {
    'SSOC_1D': {'top_n': 2, 'min_prob': 0.5},
    'SSOC_2D': {'top_n': 5, 'min_prob': 0.4},
    'SSOC_3D': {'top_n': 5, 'min_prob': 0.3},
    'SSOC_4D': {'top_n': 5, 'min_prob': 0.05},
    'SSOC_5D': {'top_n': 10, 'min_prob': 0.05}
}

def generate_single_prediction(model, 
                               tokenizer, 
                               title,
                               text, 
                               target, 
                               encoding,
                               training_parameters,
                               ssoc_prediction_parameters, 
                               failsafe = True):
        
    """
    Generates a single prediction from the trained neural network.
    
    """

    # Check data type
    if type(title) != str:
        raise TypeError("Please enter a string for the 'text' argument.")
    if type(text) != str:
        raise TypeError("Please enter a string for the 'text' argument.")
    if type(target) != str:
        raise TypeError("Please enter a string for the 'target' argument.")

    # Tokenize the text using the DistilBERT tokenizer
    tokenized_title = tokenizer(
        text = title,
        text_pair = None,
        add_special_tokens = True,
        max_length = training_parameters['sequence_max_length'],
        padding = 'max_length',
        return_token_type_ids = True,
        truncation = True
    )
    tokenized_text = tokenizer(
        text = text,
        text_pair = None,
        add_special_tokens = True,
        max_length = training_parameters['sequence_max_length'],
        padding = 'max_length',
        return_token_type_ids = True,
        truncation = True
    )
    
    # Extract the tensors from the tokenizer
    test_title_ids = torch.tensor([tokenized_title['input_ids']], dtype = torch.long)
    test_title_mask = torch.tensor([tokenized_title['attention_mask']], dtype = torch.long)
    test_text_ids = torch.tensor([tokenized_text['input_ids']], dtype = torch.long)
    test_text_mask = torch.tensor([tokenized_text['attention_mask']], dtype = torch.long)
    
    # Set the model to evaluation mode and generate the predictions
    model.eval()
    with torch.no_grad():
        preds = model(test_title_ids, test_title_mask, test_text_ids, test_text_mask)
        m = torch.nn.Softmax(dim=1)
    
    # Iteratively generate predictions for each SSOC level that is specified
    predictions_with_proba = {}
    for ssoc_level, ssoc_level_params in sorted(ssoc_prediction_parameters.items()):
        
        # Extract the indices of the top n predicted SSOCs for the given SSOC level
        predicted_idx = preds[ssoc_level].detach().numpy().argsort()[0][::-1][:ssoc_level_params["top_n"]]
        
        # Extract the actual predicted probabilities from the softmax layer using the indices
        predicted_proba_all = m(preds[ssoc_level]).detach().numpy()[0]
        predicted_proba = [predicted_proba_all[idx] for idx in predicted_idx]
        
        # Convert the indices to the actual SSOC using the encoding dictionary
        predicted_ssoc = [encoding[ssoc_level]['idx_ssoc'][idx] for idx in predicted_idx]
        
        # Check if the model made an accurate prediction
        # Meaning whether the correct SSOC appeared in the list of predictions
        accurate_prediction = False
        for ssoc in predicted_ssoc:
            if ssoc == target[0:len(ssoc)]:
                accurate_prediction = True
        
        # Append predictions with the predicted probability to the output
        predictions_with_proba[ssoc_level] = {
            'predicted_ssoc': predicted_ssoc,
            'predicted_proba': predicted_proba,
            'accurate_prediction': accurate_prediction
        }
        
    return predictions_with_proba

def generate_predictions(model, 
                         tokenizer, 
                         test_set,
                         encoding,
                         training_parameters,
                         ssoc_prediction_parameters,
                         ssoc_level = 'SSOC_4D'):
    
    """
    
    
    """
        
    output = []
    accurate_predictions = []
    for i, row in test_set.iterrows():
        print(f'Generating prediction for {i+1}/{len(test_set)}...', end = '\r')
        predictions_with_proba = generate_single_prediction(model, 
                                                            tokenizer, 
                                                            row['title'],
                                                            row['description'], 
                                                            str(row['Predicted_SSOC_2020']),
                                                            encoding,
                                                            training_parameters,
                                                            ssoc_prediction_parameters)
        output.append(predictions_with_proba)
        accurate_predictions.append(predictions_with_proba[ssoc_level]['accurate_prediction'])
    
    print('')
    accuracy = sum(accurate_predictions)/len(accurate_predictions)
    print(f'Overall {ssoc_level} accuracy: {accuracy:.2%}')
    
    return output


In [21]:
mrsd_val = pd.read_csv('Data/Train/MRSD_Validation.csv')

In [22]:
model.to('cpu')

HierarchicalSSOCClassifier_V2pt2(
  (l1): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0): TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
            (lin1)

In [None]:
all_predictions = generate_predictions(model, tokenizer, mrsd_val, encoding, parameters, ssoc_prediction_parameters)

Generating prediction for 447/492...

In [31]:
import pickle

with open('all_predictions_v2pt2_34ep.pickle', 'wb') as handle:
    pickle.dump(all_predictions, handle, protocol=pickle.HIGHEST_PROTOCOL)


In [27]:
with open('all_predictions.pickle', 'rb') as handle:
    all_ = pickle.load(handle)

In [30]:
selected = []
for prediction in all_predictions:
    selected.append(prediction['SSOC_5D']['accurate_prediction'])
accuracy = sum(selected)/len(selected)
print(f'Overall accuracy: {accuracy:.2%}')   

Overall accuracy: 94.22%


In [29]:
all_predictions == all_

True

In [None]:
    
    # Check validty of inputs
    if (type(texts) != list) or any([type(item) != str for item in texts]):
        raise AssertionError("Please pass in a list of strings into the 'texts' argument.")
    if (type(targets) != list) or any([type(item) != np.int64 for item in targets]):
        raise AssertionError("Please pass in a list of integers (np.int64) into the 'targets' argument.")    
    if len(text) != len(targets):
        raise AssertionError("Length of text descriptions do not match length of targets.")
        
        
        
        
    predicted_1D_idx = preds["SSOC_1D"].detach().numpy().argsort()[0][::-1][:top_n_threshold["SSOC_1D"]["n_digit"]]
    predicted_1D = [encoding['SSOC_1D']['idx_ssoc'][idx] for idx in predicted_1D_idx]
    predicted_1D_proba_all = m(preds['SSOC_1D']).detach().numpy()[0]
    predicted_1D_proba = [predicted_1D_proba_all[idx] for idx in predicted_1D_idx]
    predicted_1D_with_proba = zip(predicted_1D, predicted_1D_proba)

    predicted_2D_idx = preds["SSOC_2D"].detach().numpy().argsort()[0][::-1][:top_n_threshold["SSOC_2D"]["n_digit"]]
    predicted_2D = [encoding['SSOC_2D']['idx_ssoc'][idx] for idx in predicted_2D_idx]
    predicted_2D_proba_all = m(preds['SSOC_2D']).detach().numpy()[0]
    predicted_2D_proba = [predicted_2D_proba_all[idx] for idx in predicted_2D_idx]
    predicted_2D_with_proba = zip(predicted_2D, predicted_2D_proba)
    
    predicted_3D_idx = preds["SSOC_3D"].detach().numpy().argsort()[0][::-1][:top_n_threshold["SSOC_3D"]["n_digit"]]
    predicted_3D = [encoding['SSOC_3D']['idx_ssoc'][idx] for idx in predicted_3D_idx]
    predicted_3D_proba_all = m(preds['SSOC_3D']).detach().numpy()[0]
    predicted_3D_proba = [predicted_3D_proba_all[idx] for idx in predicted_3D_idx]
    predicted_3D_with_proba = zip(predicted_3D, predicted_3D_proba)
    
    predicted_4D_idx = preds["SSOC_4D"].detach().numpy().argsort()[0][::-1][:top_n_threshold["SSOC_4D"]["n_digit"]]
    predicted_4D = [encoding['SSOC_4D']['idx_ssoc'][idx] for idx in predicted_4D_idx]
    predicted_4D_proba_all = m(preds['SSOC_4D']).detach().numpy()[0]
    predicted_4D_proba = [predicted_4D_proba_all[idx] for idx in predicted_4D_idx]
    predicted_4D_with_proba = zip(predicted_4D, predicted_4D_proba)
    
    predicted_5D_idx = preds["SSOC_5D"].detach().numpy().argsort()[0][::-1][:top_n_threshold["SSOC_5D"]["n_digit"]]
    predicted_5D = [encoding['SSOC_5D']['idx_ssoc'][idx] for idx in predicted_5D_idx]
    predicted_5D_proba_all = m(preds['SSOC_5D']).detach().numpy()[0]
    predicted_5D_proba = [predicted_5D_proba_all[idx] for idx in predicted_5D_idx]
    predicted_5D_with_proba = zip(predicted_5D, predicted_5D_proba)
    
    print(f"Target: {target}")
    verboseprint(f'Model top {top_n_threshold["SSOC_1D"]["n_digit"]} predicted 1D:')
    for predicted, prob in predicted_1D_with_proba:
        print(f'{predicted}: {prob*100:.2f}%')
        
    if failsafe:
        predicted_2D_with_proba = convert_others_SSOC(predicted_2D_with_proba, top_n_threshold["SSOC_2D"]["threshold"], available_ssoc_codes)
    
    print(f'Model top {top_n_threshold["SSOC_2D"]["n_digit"]} predicted 2D:')
    for predicted, prob in predicted_2D_with_proba:
        print(f'{predicted}: {prob*100:.2f}%')
        
    if failsafe:
        predicted_3D_with_proba = convert_others_SSOC(predicted_3D_with_proba, top_n_threshold["SSOC_3D"]["threshold"], available_ssoc_codes)
        
    print(f'Model top {top_n_threshold["SSOC_3D"]["n_digit"]} predicted 3D:')
    for predicted, prob in predicted_3D_with_proba:
        print(f'{predicted}: {prob*100:.2f}%')
    
    if failsafe:
        predicted_4D_with_proba = convert_others_SSOC(predicted_4D_with_proba, top_n_threshold["SSOC_4D"]["threshold"], available_ssoc_codes)
       
    print(f'Model top {top_n_threshold["SSOC_4D"]["n_digit"]} predicted 4D:')
    for predicted, prob in predicted_4D_with_proba:
        print(f'{predicted}: {prob*100:.2f}%')
    
    if failsafe:    
        predicted_5D_with_proba = convert_others_SSOC(predicted_5D_with_proba, top_n_threshold["SSOC_5D"]["threshold"], available_ssoc_codes)
        
    print(f'Model top {top_n_threshold["SSOC_5D"]["n_digit"]} predicted 5D:')
    for predicted, prob in predicted_5D_with_proba:
        print(f'{predicted}: {prob*100:.2f}%')

In [None]:
import json
with open("encoding.json", 'w') as outfile:
    json.dump(encoding, outfile)

In [69]:
'111' in available_ssoc_codes.values

True

In [44]:
SSOC_listing = pd.read_excel("Data/Processed/Training/train-aws/SSOC2020_Detailed_Definitions.xlsx", skiprows = 4)
available_ssoc_codes = SSOC_listing["SSOC 2020"]

In [None]:
model.load_state_dict(torch.load('Models/sm-model-10epoch.pth'))

In [None]:
model.eval()

va_n_correct = 0
va_loss = 0
nb_va_steps = 0
nb_va_examples = 0
def calculate_accuracy(big_idx, targets):
    n_correct = (big_idx == targets).sum().item()
    return n_correct
# Disable the calculation of gradients
with torch.no_grad():

    # Iterate over each batch
    for batch, data in enumerate(validation_loader):

        # Extract the data
        ids = data['ids'].to(parameters['device'], dtype = torch.long)
        mask = data['mask'].to(parameters['device'], dtype = torch.long)

        # Run the forward prop
        predictions = model(ids, mask)

        # Iterate through each SSOC level
        for ssoc_level, preds in predictions.items():

            # Extract the correct target for the SSOC level
            targets = data[ssoc_level].to(parameters['device'], dtype = torch.long)

            # Compute the loss function using the predictions and the targets
            level_loss = loss_function(preds, targets)

            # Initialise the loss variable if this is the 1D level
            # Else add to the loss variable
            # Note the weights on each level
            if ssoc_level == 'SSOC_1D':
                loss = level_loss * parameters['loss_weights'][ssoc_level]
            else:
                loss += level_loss * parameters['loss_weights'][ssoc_level]

        # Use the deepest level predictions to calculate accuracy
        # Exploit the fact that the last preds object is the deepest level one
        top_probs, top_probs_idx = torch.max(preds.data, dim = 1)
        va_n_correct += calculate_accuracy(top_probs_idx, targets)

        # Add this batch's loss to the overall training loss
        va_loss += loss.item()

        # Keep count for the batch steps and number of examples
        nb_va_steps += 1
        nb_va_examples += targets.size(0)


In [None]:
epoch_va_loss = va_loss / nb_va_steps
epoch_va_accu = (va_n_correct * 100) / nb_va_examples
print(f"Validation: {epoch_va_loss:.3f}")
print(f"Validation: {epoch_va_accu:.2f}%")

## Analysis of the underlying data

In [None]:
data = data[data[colnames['SSOC']].notnull()]

In [None]:
encoding = train.generate_encoding(SSOC_2020)
encoded_data = train.encode_dataset(data, encoding, colnames)

In [None]:
encoded_data['SSOC_1D'].value_counts()

In [None]:
encoded_data['SSOC_2D'].value_counts()

Importing our datasets

Use a custom function to encode the category correctly as PyTorch requires (as a dictionary)

In [None]:

def generate_encoding(reference_data, ssoc_colname = 'SSOC 2020'):

    '''
    Generates encoding for SSOC to indices, as required by PyTorch
    for multi-class classification, for the training data

    Args:
        reference_data: Pandas dataframe containing all SSOCs
        ssoc_colname: Name of the SSOC column

    Returns:
        Dictionary containing the SSOC to index mapping (for preparing the
        dataset) and index to SSOC mapping (for interpreting the predictions),
        for each SSOC level from 1D to 5D.
    '''

    # Initialise the dictionary object to store the encodings for each level
    encoding = {}

    # Iterate through each level from 1 to 5
    for level in range(1, 6):

        # Initialise a dictionary object to store the respective-way encodings
        ssoc_idx_mapping = {}

        # Slice the SSOC column by the level required, drop duplicates, and sort
        ssocs = list(np.sort(reference_data[ssoc_colname].astype('str').str.slice(0, level).unique()))

        # Iterate through each unique SSOC (at i-digit level) and add to dict
        for i, ssoc in enumerate(ssocs):
            ssoc_idx_mapping[ssoc] = i

        # Add each level's encodings to the output dictionary
        encoding[f'SSOC_{level}D'] = {

            # Store the SSOC to index encoding
            'ssoc_idx': ssoc_idx_mapping,
            # Store the index to SSOC encoding
            'idx_ssoc': {v: k for k, v in ssoc_idx_mapping.items()}
        }

    return encoding

def encode_dataset(data,
                   encoding,
                   ssoc_colname = 'SSOC 2020'):

    '''
    Uses the generated encoding to encode the SSOCs at each
    digit level.

    Args:
        data: Pandas dataframe of the training data with the correct SSOC
        encoding: Encoding for each SSOC level
        ssoc_colname: Name of the SSOC column

    Returns:
        Pandas dataframe with each digit SSOC encoded correctly
    '''

    # Create a copy of the dataframe
    encoded_data = copy.deepcopy(data)[~data[ssoc_colname].str.contains('X')]

    # For each digit, encode the SSOC correctly
    for ssoc_level, encodings in encoding.items():
        encoded_data[ssoc_level] = encoded_data[ssoc_colname].astype('str').str.slice(0, int(ssoc_level[5])).replace(encodings['ssoc_idx'])

    return encoded_data

# Create a new Python class to handle the additional complexity
class SSOC_Dataset(Dataset):

    # Define the class attributes
    def __init__(self, dataframe, tokenizer, max_len):
        self.len = len(dataframe)
        self.data = dataframe
        self.tokenizer = tokenizer
        self.max_len = max_len

    # Define the iterable over the Dataset object 
    def __getitem__(self, index):

        # Extract the text
        text = self.data[colnames['job_description']][index]

        # Pass in the data into the tokenizer
        inputs = self.tokenizer(
            text = text,
            text_pair = None,
            add_special_tokens = True,
            max_length = self.max_len,
            pad_to_max_length = True,
            return_token_type_ids = True,
            truncation = True
        )

        # Extract the IDs and attention mask
        ids = inputs['input_ids']
        mask = inputs['attention_mask']

        # Return all the outputs needed for training and evaluation
        return {
            'ids': torch.tensor(ids, dtype = torch.long),
            'mask': torch.tensor(mask, dtype = torch.long),
            'SSOC_1D': torch.tensor(self.data.SSOC_1D[index], dtype = torch.long),
            'SSOC_2D': torch.tensor(self.data.SSOC_2D[index], dtype = torch.long),
            'SSOC_3D': torch.tensor(self.data.SSOC_3D[index], dtype = torch.long),
            'SSOC_4D': torch.tensor(self.data.SSOC_4D[index], dtype = torch.long),
            'SSOC_5D': torch.tensor(self.data.SSOC_5D[index], dtype = torch.long),
        } 

    # Define the length attribute
    def __len__(self):
        return self.len

In [None]:
def prepare_data(encoded_data,
                 colnames,
                 parameters):
    
    # Split the dataset into training and validation
    training_data, validation_data = train_test_split(encoded_data,
                                                   test_size = 0.2,
                                                   random_state = 2021)
    training_data.reset_index(drop = True, inplace = True)
    validation_data.reset_index(drop = True, inplace = True)
    
    tokenizer = DistilBertTokenizer.from_pretrained(parameters['pretrained_model'])
    
    # Creating the dataset and dataloader for the neural network
    training_loader = DataLoader(SSOC_Dataset(training_data, tokenizer, parameters['sequence_max_length']),
                                 batch_size = parameters['training_batch_size'],
                                 num_workers = parameters['num_workers'],
                                 shuffle = True,
                                 persistent_workers=True)
    validation_loader = DataLoader(SSOC_Dataset(validation_data, tokenizer, parameters['sequence_max_length']),
                                   batch_size = parameters['training_batch_size'],
                                   num_workers = parameters['num_workers'],
                                   shuffle = True,
                                   persistent_workers=True)
    
    return training_loader, validation_loader, tokenizer

class HierarchicalSSOCClassifier(torch.nn.Module):
        
        def __init__(self):
            
            super(HierarchicalSSOCClassifier, self).__init__()
            
            self.l1 = DistilBertModel.from_pretrained(parameters['pretrained_model'])

            # Generating dimensions
            SSOC_1D_count = len(encoding['SSOC_1D']['ssoc_idx'].keys())
            SSOC_2D_count = len(encoding['SSOC_2D']['ssoc_idx'].keys())
            SSOC_3D_count = len(encoding['SSOC_3D']['ssoc_idx'].keys())
            SSOC_4D_count = len(encoding['SSOC_4D']['ssoc_idx'].keys())
            SSOC_5D_count = len(encoding['SSOC_5D']['ssoc_idx'].keys())            
            
            # Stack 1: Predicting 1D SSOC (9)
            if parameters['max_level'] >= 1:
                self.ssoc_1d_stack = torch.nn.Sequential(
                    torch.nn.Linear(768, 768), 
                    torch.nn.ReLU(),
                    torch.nn.Dropout(0.3),
                    torch.nn.Linear(768, 128),
                    torch.nn.ReLU(),
                    torch.nn.Dropout(0.3),
                    torch.nn.Linear(128, SSOC_1D_count)
                )

            # Stack 2: Predicting 2D SSOC (42)
            if parameters['max_level'] >= 2:
                n_dims_2d = 768 + SSOC_1D_count
                self.ssoc_2d_stack = torch.nn.Sequential(
                    torch.nn.Linear(n_dims_2d, n_dims_2d), 
                    torch.nn.ReLU(),
                    torch.nn.Dropout(0.3),
                    torch.nn.Linear(n_dims_2d, 128),
                    torch.nn.ReLU(),
                    torch.nn.Dropout(0.3),
                    torch.nn.Linear(128, SSOC_2D_count)
                )        

        def forward(self, input_ids, attention_mask):

            # Obtain the sentence embeddings from the DistilBERT model
            embeddings = self.l1(input_ids=input_ids, attention_mask=attention_mask)
            hidden_state = embeddings[0]
            X = hidden_state[:, 0]

            predictions = {}
            
            # 1D Prediction
            if parameters['max_level'] >= 1:
                predictions['SSOC_1D'] = self.ssoc_1d_stack(X)

            # 2D Prediction
            if parameters['max_level'] >= 2:
                X = torch.cat((X, predictions['SSOC_1D']), dim = 1)
                predictions['SSOC_2D'] = self.ssoc_2d_stack(X)

            return {f'SSOC_{i}D': predictions[f'SSOC_{i}D'] for i in range(1, parameters['max_level'] + 1)}

def prepare_model(encoding, parameters):
        
    model = HierarchicalSSOCClassifier()
    model.to(parameters['device'])
    loss_function = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(params =  model.parameters(), lr = parameters['learning_rate'])
    
    return model, loss_function, optimizer
    

In [None]:
import time
from datetime import datetime

def calculate_accu(big_idx, targets):
    n_correct = (big_idx==targets).sum().item()
    return n_correct

def train_model(model, loss_function, optimizer, epochs):

    start_time = time.time()
    now = datetime.now()
    current_time = now.strftime("%d %b %Y - %H:%M:%S")
    print("Training started on:", current_time)
    
    for epoch in range(epochs):
        tr_loss = 0
        n_correct = 0
        nb_tr_steps = 0
        nb_tr_examples = 0
        
        epoch_start_time = time.time()
        batch_start_time = time.time()

        # Set the NN to train mode
        model.train()

        # Iterate over each batch
        for batch, data in enumerate(training_loader):

            # Extract the data
            ids = data['ids'].to(parameters['device'], dtype = torch.long)
            mask = data['mask'].to(parameters['device'], dtype = torch.long)

            # Run the forward prop
            predictions = model(ids, mask)

            # Iterate through each SSOC level
            for ssoc_level, preds in predictions.items():

                # Extract the correct target for the SSOC level
                targets = data[ssoc_level].to(parameters['device'], dtype = torch.long)

                # Compute the loss function using the predictions and the targets
                level_loss = loss_function(preds, targets)

                # Initialise the loss variable if this is the 1D level
                # Else add to the loss variable
                # Note the weights on each level
                if ssoc_level == 'SSOC_1D':
                    loss = level_loss * parameters['loss_weights'][ssoc_level]
                else:
                    loss += level_loss * parameters['loss_weights'][ssoc_level]

            # Use the deepest level predictions to calculate accuracy
            top_probs, top_probs_idx = torch.max(preds.data, dim = 1)
            n_correct += calculate_accu(top_probs_idx, targets)

            # Calculate the loss
    #         targets_1d = data['targets_1d'].to(device, dtype = torch.long)
    #         targets_2d = data['targets_2d'].to(device, dtype = torch.long)
    #         loss1 = loss_function(preds_1d, targets_1d)
    #         loss2 = loss_function(preds_2d, targets_2d)
    #         loss = loss1*5 + loss2

            # Add this batch's loss to the overall training loss
            tr_loss += loss.item()

            nb_tr_steps += 1
            nb_tr_examples += targets.size(0)

            optimizer.zero_grad()
            loss.backward()
            # # When using GPU
            optimizer.step()
            
            if (batch+1) % 500 == 0:
                loss_step = tr_loss/nb_tr_steps
                accu_step = (n_correct*100)/nb_tr_examples 
                print(f"Training Loss per 500 steps: {loss_step}")
                print(f"Training Accuracy per 500 steps: {accu_step}")
                print(f"Batch of 500 took {(time.time() - batch_start_time)/60:.2f} mins")
                batch_start_time = time.time()

        print(f'The Total Accuracy for Epoch {epoch}: {(n_correct*100)/nb_tr_examples}')
        epoch_loss = tr_loss/nb_tr_steps
        epoch_accu = (n_correct*100)/nb_tr_examples
        print(f"Training Loss Epoch: {epoch_loss}")
        print(f"Training Accuracy Epoch: {epoch_accu}")
        print(f"Epoch training time: {(time.time() - epoch_start_time)/60:.2f} mins")

    print(f"Total training time: {(time.time() - start_time)/60:.2f} mins")
    now = datetime.now()
    current_time = now.strftime("%d %b %Y - %H:%M:%S")
    print("Training ended on:", current_time)
        
    return

In [None]:
colnames = {
    'SSOC': 'SSOC 2020',
    'job_description': 'Cleaned_Description'
}

parameters = {
    'sequence_max_length': 512,
    'max_level': 2,
    'training_batch_size': 4,
    'validation_batch_size': 2,
    'epochs': 1,
    'learning_rate': 1e-05,
    'pretrained_model': 'distilbert-base-uncased',
    'num_workers': 0,
    'loss_weights': {
        'SSOC_1D': 20,
        'SSOC_2D': 5,
        'SSOC_3D': 3,
        'SSOC_4D': 2,
        'SSOC_5D': 1
    },
    'device': 'cuda'
}

In [None]:
import pandas as pd
data = pd.read_csv('Data/Processed/Training/train_full.csv')
SSOC_2020 = pd.read_csv('Data/Processed/Training/train.csv')
encoding = generate_encoding(SSOC_2020)
encoded_data = encode_dataset(data[0:10000], encoding)
training_loader, validation_loader = prepare_data(encoded_data, colnames, parameters)
model, loss_function, optimizer = prepare_model(encoding, parameters)

In [None]:
data

In [None]:
10000*.8/4/50*45/3600*1

In [None]:
train_model(model, loss_function, optimizer, parameters['epochs'])

In [None]:
stop here

In [None]:
model.eval()

In [None]:
other_data[other_data['Cleaned_Description'] == 'Duties and Responsibilities: Implementation of Sage 300 ERP (Financials, Distribution, Project) Providing Pre & Post-Sales Consulting. Perform Business requirement analysis and provide professional advises. Install, Implement, Train and Support users on Sage 300 ERP Software. On-site and Back-end support on ERP Software. Diploma or Degree in Accountancy/Business Admin/Computer Science, Information Systems. Good Knowledge in MSSQL Server, MS Excel, Crystal Report and Visual Basic. Good analytical and problem-solving skills are essential. Good interpersonal and communication skills. Must have Sage 300 ERP Software. At least 4 - 5 years of working experience in relevant field. Must be able to work in Singapore and travel to other country.']

In [None]:
tokenizer = DistilBertTokenizer.from_pretrained(parameters['pretrained_model'])   

In [None]:
#torch.save(model.state_dict(), 'Models/autocoder-v1.pt')

In [None]:
model1 = HierarchicalSSOCClassifier()
model1.load_state_dict(torch.load('Models/autocoder-v1.pt'))
model1.eval()

In [None]:
tokenized = tokenizer(
    text = text,
    text_pair = None,
    add_special_tokens = True,
    max_length = parameters['sequence_max_length'],
    pad_to_max_length = True,
    return_token_type_ids = True,
    truncation = True
)

In [None]:
test_ids = torch.tensor([tokenized['input_ids']], dtype = torch.long)
test_mask = torch.tensor([tokenized['attention_mask']], dtype = torch.long)

In [None]:
preds = model(test_ids, test_mask)
targets = torch.tensor([encoding['SSOC_1D']['ssoc_idx']['2']], dtype = torch.long)

In [None]:
loss_function(preds["SSOC_1D"], targets)

In [None]:
encoding['SSOC_1D']['idx_ssoc'][np.argmax(preds["SSOC_1D"].detach().numpy())]

In [None]:
encoding['SSOC_2D']['idx_ssoc'][np.argmax(preds["SSOC_2D"].detach().numpy())]

In [None]:
m = torch.nn.Softmax(dim=1)
m(preds['SSOC_2D'])

In [None]:
'Assist with installation, configuration and set-up of new IT accounts & IT equipment for new users. Liaising with vendors for procurement, logistic and maintenance of IT equipment. Managing & troubleshooting of office IT equipment & systems. Analyze, monitor and resolve application and system failures and provide operational support. Perform, review and enhance business and IT systems & processes for enhanced improvement for the company.'

In [None]:
# Defining some key variables that will be used later on in the training
MAX_LEN = 512
TRAIN_BATCH_SIZE = 2
VALID_BATCH_SIZE = 2
EPOCHS = 1
LEARNING_RATE = 1e-05
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')

In [None]:
class Triage(Dataset):
    def __init__(self, dataframe, tokenizer, max_len):
        self.len = len(dataframe)
        self.data = dataframe
        self.tokenizer = tokenizer
        self.max_len = max_len
        
    def __getitem__(self, index):
        
        text = self.data.Description[index]
        inputs = self.tokenizer.encode_plus(
            text,
            None,
            add_special_tokens = True,
            max_length = self.max_len,
            pad_to_max_length = True,
            return_token_type_ids = True,
            truncation = True
        )
        
        ids = inputs['input_ids']
        mask = inputs['attention_mask']

        return {
            'ids': torch.tensor(ids, dtype=torch.long),
            'mask': torch.tensor(mask, dtype=torch.long),
            'targets_1d': torch.tensor(self.data.SSOC_1D[index], dtype=torch.long),
            'targets_2d': torch.tensor(self.data.SSOC_2D[index], dtype=torch.long),
        } 
    
    def __len__(self):
        return self.len

In [None]:
# Creating the dataset and dataloader for the neural network
training_set = Triage(train, tokenizer, MAX_LEN)
testing_set = Triage(test, tokenizer, MAX_LEN)

In [None]:
train_params = {'batch_size': TRAIN_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

test_params = {'batch_size': VALID_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

training_loader = DataLoader(training_set, **train_params)
testing_loader = DataLoader(testing_set, **test_params)

In [None]:
# Creating the customized model, by adding a drop out and a dense layer on top of distil bert to get the final output for the model. 

class DistillBERTClass(torch.nn.Module):
    def __init__(self):
        super(DistillBERTClass, self).__init__()
        self.l1 = DistilBertModel.from_pretrained("distilbert-base-uncased")
        
        # Stack 1: Predicting 1D SSOC (9)
        self.ssoc_1d_stack = torch.nn.Sequential(
            torch.nn.Linear(768, 768), 
            torch.nn.ReLU(),
            torch.nn.Dropout(0.3),
            torch.nn.Linear(768, 128),
            torch.nn.ReLU(),
            torch.nn.Dropout(0.3),
            torch.nn.Linear(128, 9)
        )
        
        # Stack 2: Predicting 2D SSOC (40 + 2 nec)
        self.ssoc_2d_stack = torch.nn.Sequential(
            torch.nn.Linear(777, 777), 
            torch.nn.ReLU(),
            torch.nn.Dropout(0.3),
            torch.nn.Linear(777, 128),
            torch.nn.ReLU(),
            torch.nn.Dropout(0.3),
            torch.nn.Linear(128, 42)
        )        

    def forward(self, input_ids, attention_mask):
        
        # Obtain the sentence embeddings from the DistilBERT model
        embeddings = self.l1(input_ids=input_ids, attention_mask=attention_mask)
        hidden_state = embeddings[0]
        X = hidden_state[:, 0]
        
        # 1D Prediction
        preds_1d = self.ssoc_1d_stack(X)
        
        # 2D Prediction
        X = torch.cat((X, preds_1d), dim = 1)
        preds_2d = self.ssoc_2d_stack(X)
        
        return preds_1d, preds_2d

In [None]:
model = DistillBERTClass()

In [None]:
custom_loss_fn
# think of how to adjust the crossentropyloss function
# change the targets upfront before passing it in

In [None]:
def compare_ssoc(predicted, actual):
    base_penalty = 10
    penalty = 0
    for i in range(len(predicted)):
        if predicted[i] != actual[i]:
            penalty += base_penalty/(i+1)
    return penalty

def custom_loss_fn(top_probs_idx, targets, ssoc_level):
          
    if ssoc_level == '1d':
          mapping = idx_ssoc1d
    elif ssoc_level == '2d':
          mapping = idx_ssoc2d
          
    loss = 0
    
    for i in range(len(top_probs_idx)):
        predicted_ssoc = mapping[top_probs_idx[i].item()]
        actual_ssoc = mapping[targets[i].item()]
        loss += compare_ssoc(predicted_ssoc, actual_ssoc)
        
    return Variable(torch.tensor(float(loss)), requires_grad = True)

# need to use Torch variable

In [None]:
testing1 = Variable(torch.tensor([float(5), float(15)]), requires_grad = True)
print(testing1.grad)

In [None]:
Variable(torch.tensor(float(1)), requires_grad = True)

In [None]:
# Creating the loss function and optimizer
loss_function = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params =  model.parameters(), lr=LEARNING_RATE)

In [None]:
# Function to calcuate the accuracy of the model

def calcuate_accu(big_idx, targets):
    n_correct = (big_idx==targets).sum().item()
    return n_correct

In [None]:
# Defining the training function on the 80% of the dataset for tuning the distilbert model

def train(epoch):
    tr_loss = 0
    n_correct = 0
    nb_tr_steps = 0
    nb_tr_examples = 0
    
    # Set the NN to train mode
    model.train()
    
    # Iterate over each batch
    for batch, data in enumerate(training_loader):
        
        # Extract the data
        ids = data['ids'].to(device, dtype = torch.long)
        mask = data['mask'].to(device, dtype = torch.long)
        targets_1d = data['targets_1d'].to(device, dtype = torch.long)
        targets_2d = data['targets_2d'].to(device, dtype = torch.long)
        
        # Run the forward prop
        preds_1d, preds_2d = model(ids, mask)
        
        # Find the indices of the top prediction
        top_probs_1d, top_probs_idx_1d = torch.max(preds_1d.data, dim = 1)
        top_probs_2d, top_probs_idx_2d = torch.max(preds_2d.data, dim = 1)
        
        # Calculate the loss
        
        loss1 = loss_function(preds_1d, targets_1d)
        loss2 = loss_function(preds_2d, targets_2d)
        loss = loss1*5 + loss2
        #print(f'Overall loss: {loss} = {loss1} + {loss2}')

        # Deprecated
        #loss = loss_function(preds_1d, targets_1d) + loss_function(preds_2d, targets_2d)
        
        # Add this batch's loss to the overall training loss
        tr_loss += loss.item()
        
        n_correct += calcuate_accu(top_probs_idx_2d, targets_2d)

        nb_tr_steps += 1
        nb_tr_examples += targets_2d.size(0)
        
        if batch % 50 == 0:
            loss_step = tr_loss/nb_tr_steps
            accu_step = (n_correct*100)/nb_tr_examples 
            print(f"Training Loss per 50 steps: {loss_step}")
            print(f"Training Accuracy per 50 steps: {accu_step}")

        optimizer.zero_grad()
        loss.backward()
        # # When using GPU
        optimizer.step()

    print(f'The Total Accuracy for Epoch {epoch}: {(n_correct*100)/nb_tr_examples}')
    epoch_loss = tr_loss/nb_tr_steps
    epoch_accu = (n_correct*100)/nb_tr_examples
    print(f"Training Loss Epoch: {epoch_loss}")
    print(f"Training Accuracy Epoch: {epoch_accu}")

    return

In [None]:
device = 'cuda'
model.to(device)

In [None]:
for epoch in range(1):
    train(epoch)

In [None]:
for epoch in range(4):
    train(epoch)

In [None]:
100 % 100