In [None]:
from google.colab import drive
drive.mount('/content/gdrive',force_remount=True)

Mounted at /content/gdrive


Importing Python Libraries and preparing the environment
At this step we will be importing the libraries and modules needed to run our script. Libraries are:

* Pandas
* Pytorch
* Pytorch Utils for Dataset and Dataloader
* Transformers
* DistilledBERT Model and Tokenizer

In [None]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/2c/4e/4f1ede0fd7a36278844a277f8d53c21f88f37f3754abf76a5d6224f76d4a/transformers-3.4.0-py3-none-any.whl (1.3MB)
[K     |▎                               | 10kB 24.2MB/s eta 0:00:01[K     |▌                               | 20kB 11.4MB/s eta 0:00:01[K     |▉                               | 30kB 15.2MB/s eta 0:00:01[K     |█                               | 40kB 15.3MB/s eta 0:00:01[K     |█▎                              | 51kB 11.0MB/s eta 0:00:01[K     |█▋                              | 61kB 11.2MB/s eta 0:00:01[K     |█▉                              | 71kB 10.5MB/s eta 0:00:01[K     |██                              | 81kB 11.5MB/s eta 0:00:01[K     |██▍                             | 92kB 11.3MB/s eta 0:00:01[K     |██▋                             | 102kB 11.0MB/s eta 0:00:01[K     |██▉                             | 112kB 11.0MB/s eta 0:00:01[K     |███▏                            | 

### Importing the libraries needed

In [None]:
import pandas as pd
import torch
import os
import transformers
from torch.utils.data import Dataset, DataLoader
from transformers import DistilBertModel, DistilBertTokenizer

In [None]:
from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'

In [None]:
###  define epochs, learning rtaes and validation batch size
MAX_LEN = 512
TRAIN_BATCH_SIZE = 8
VALID_BATCH_SIZE = 2
EPOCHS = 5
LEARNING_RATE = 1e-05
### load Distilledbert tokenizer
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




In [None]:
import pickle
##load pre-processed data
with open('/content/gdrive/My Drive/upwork/train_content_new.pkl','rb') as f:
  train_content = pickle.load(f)
with open('/content/gdrive/My Drive/upwork/train_rating_new.pkl','rb') as f:
  train_rating = pickle.load(f)
with open('/content/gdrive/My Drive/upwork/test_content_new.pkl','rb') as f:
  test_content = pickle.load(f)
with open('/content/gdrive/My Drive/upwork/test_rating_new.pkl','rb') as f:
  test_rating = pickle.load(f)

**Preparing the Dataset and Dataloader**



We will start with defining few key variables that will be used later during the training/fine tuning stage. Followed by creation of CustomDataset class - This defines how the text is pre-processed before sending it to the neural network. We will also define the Dataloader that will feed the data in batches to the neural network for suitable training and processing. Dataset and Dataloader are constructs of the PyTorch library for defining and controlling the data pre-processing and its passage to neural network

**Dataloader**


* Dataloader is used to for creating training and validation dataloader that load data to the neural network in a defined manner. This is needed because all the data from the dataset cannot be loaded to the memory at once, hence the amount of dataloaded to the memory and then passed to the neural network needs to be controlled.
* This control is achieved using the parameters such as batch_size and max_len.
* Training and Validation dataloaders are used in the training and validation part of the flow respectively

In [None]:
class DistillationBertTrain(Dataset):
    def __init__(self,type,tokenizer, max_len):
        self.len = len(train_content)
        self.tokenizer = tokenizer
        self.max_len = max_len
        if type == "train":
            self.data = train_content
            self.rating = train_rating
        else:
            self.data = test_content
            self.rating = test_rating
    def __getitem__(self, index):
        content = self.data[index]
        output = self.rating[index]
        inputs = self.tokenizer.encode_plus(
            content,
            None,
            add_special_tokens=True,
            max_length=self.max_len,
            pad_to_max_length=True,
            return_token_type_ids=True,
            truncation=True
        )
        ids = inputs['input_ids']
        mask = inputs['attention_mask']

        return {
            'ids': torch.tensor(ids, dtype=torch.long),
            'mask': torch.tensor(mask, dtype=torch.long),
            'targets': torch.tensor(output, dtype=torch.long)
        } 
    
    def __len__(self):
        return self.len

In [None]:
training_set = DistillationBertTrain('train', tokenizer, MAX_LEN)
testing_set = DistillationBertTrain('test', tokenizer, MAX_LEN)

In [None]:
train_params = {'batch_size': TRAIN_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

test_params = {'batch_size': VALID_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

training_loader = DataLoader(training_set, **train_params)
testing_loader = DataLoader(testing_set, **test_params)

**Creating the Neural Network for Fine Tuning**


**Neural Network**
* We will be creating a neural network with the DistillBERTClass.
* This network will have the DistilBERT Language model followed by a dropout and finally a Linear layer to obtain the final outputs.
* The data will be fed to the DistilBERT Language model as defined in the dataset.
* Final layer outputs is what will be compared to the rating to determine the accuracy of models prediction.
* We will initiate an instance of the network called model. This instance will be used for training and then to save the final trained model for future inference.

**Loss Function and Optimizer**


* Loss Function and Optimizer and defined in the next cell.
* The Loss Function is used the calculate the difference in the output created by the model and the actual output.
* Optimizer is used to update the weights of the neural network to improve its performance.

In [None]:
class DistillBERTClass(torch.nn.Module):
    def __init__(self):
        super(DistillBERTClass, self).__init__()
        self.l1 = DistilBertModel.from_pretrained("distilbert-base-uncased")
        self.pre_classifier = torch.nn.Linear(768, 768)
        self.dropout = torch.nn.Dropout(0.3)
        self.classifier = torch.nn.Linear(768, 6)

    def forward(self, input_ids, attention_mask):
        output_1 = self.l1(input_ids=input_ids, attention_mask=attention_mask)
        hidden_state = output_1[0]
        pooler = hidden_state[:, 0]
        pooler = self.pre_classifier(pooler)
        pooler = torch.nn.ReLU()(pooler)
        pooler = self.dropout(pooler)
        output = self.classifier(pooler)
        return output

In [None]:
### store the trained model output_dir
output_dir = '/content/gdrive/My Drive/upwork/state_dict_model.pt'
model = DistillBERTClass()

In [None]:
model.load_state_dict(torch.load(output_dir))
model.to(device)

DistillBERTClass(
  (l1): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0): TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
            (lin1): Linear(in_feat

In [None]:
loss_function = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params =  model.parameters(), lr=LEARNING_RATE)

In [None]:
### load the optimizer to cintinue training
optimizer.load_state_dict(torch.load('/content/gdrive/My Drive/upwork/optimizer.pt'))

In [None]:
def calcuate_accu(big_idx, targets):
    n_correct = (big_idx==targets).sum().item()
    return n_correct

In [None]:
def train(epoch):
    tr_loss = 0
    n_correct = 0
    nb_tr_steps = 0
    nb_tr_examples = 0
    model.train()
    for i,data in enumerate(training_loader, 0):
        ids = data['ids'].to(device, dtype = torch.long)
        mask = data['mask'].to(device, dtype = torch.long)
        targets = data['targets'].to(device, dtype = torch.long)

        outputs = model(ids, mask)
        loss = loss_function(outputs, targets)
        tr_loss += loss.item()
        big_val, big_idx = torch.max(outputs.data, dim=1)
        n_correct += calcuate_accu(big_idx, targets)

        nb_tr_steps += 1
        nb_tr_examples+=targets.size(0)
        
        if i%500==0:
            loss_step = tr_loss/nb_tr_steps
            accu_step = (n_correct*100)/nb_tr_examples 
            print(f"Training Loss per {i} steps: {loss_step}")
            print(f"Training Accuracy per {i} steps: {accu_step}")

        optimizer.zero_grad()
        loss.backward()
        # # When using GPU
        optimizer.step()
    torch.save(model.state_dict(), output_dir)
    torch.save(optimizer.state_dict(), os.path.join('/content/gdrive/My Drive/upwork', 'optimizer.pt'))
    print(f'The Total Accuracy for Epoch {epoch}: {(n_correct*100)/nb_tr_examples}')
    epoch_loss = tr_loss/nb_tr_steps
    epoch_accu = (n_correct*100)/nb_tr_examples
    print(f"Training Loss {epoch}: {epoch_loss}")
    print(f"Training Accuracy {epoch}: {epoch_accu}")

    return

**Fine Tuning the Model**  



After all the effort of loading and preparing the data and datasets, creating the model and defining its loss and optimizer. This is probably the easier steps in the process.

Here we define a training function that trains the model on the training dataset created above, specified number of times (EPOCH), An epoch defines how many times the complete data will be passed through the network.

Following events happen in this function to fine tune the neural network:

* The dataloader passes data to the model based on the batch size.
* Subsequent output from the model and the actual category are compared to calculate the loss.
* Loss value is used to optimize the weights of the neurons in the network.
* After every 500 steps the loss value is printed in the console.

In [None]:
for epoch in range(EPOCHS):
    train(epoch)



Training Loss per 0 steps: 0.5905399918556213
Training Accuracy per 0 steps: 62.5
Training Loss per 500 steps: 0.6440870208059718
Training Accuracy per 500 steps: 71.8313373253493
Training Loss per 1000 steps: 0.6557453692435742
Training Accuracy per 1000 steps: 71.71578421578421
Training Loss per 1500 steps: 0.6605779838653345
Training Accuracy per 1500 steps: 71.9020652898068
Training Loss per 2000 steps: 0.6621654126821072
Training Accuracy per 2000 steps: 71.82033983008496
Training Loss per 2500 steps: 0.663321487769419
Training Accuracy per 2500 steps: 71.67133146741304
Training Loss per 3000 steps: 0.661705512487662
Training Accuracy per 3000 steps: 71.71359546817727
Training Loss per 3500 steps: 0.6598928182363782
Training Accuracy per 3500 steps: 71.76163953156241
Training Loss per 4000 steps: 0.6602719345336495
Training Accuracy per 4000 steps: 71.8507873031742
Training Loss per 4500 steps: 0.6613999477984296
Training Accuracy per 4500 steps: 71.76738502554988
Training Loss pe