# PyTorch for Natural Language Processing

## Bot Detection using BERTModel

---

**<u>_Objective:_</u>** In this short project, we fine-tune a BERT pretrained model to classify tweets made by a bot, or by a human.

This tutorial is inspired by the following walkthrough:

https://saturncloud.io/blog/pytorch-for-natural-language-processing-building-a-fake-news-classification-model/


### Introduction 

Bot detection lols

In [1]:
# import dependencies and libraries
import pandas as pd
import numpy as np
import torch
import glob
import re
import math
import seaborn as sns
import warnings
import matplotlib.pyplot as plt

from torch.utils.data import TensorDataset, DataLoader

from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve
from sklearn.metrics import auc
from sklearn.cluster import KMeans
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import f1_score
from sklearn.metrics import confusion_matrix

from transformers import EarlyStoppingCallback
from transformers import AutoTokenizer, DataCollatorWithPadding, DataCollatorForLanguageModeling
from transformers import BertModel, BertTokenizer, BertForSequenceClassification
from transformers import TrainingArguments, Trainer
from transformers import AutoModelForSequenceClassification

sns.set_style('whitegrid')
sns.set_theme(style = 'whitegrid', 
              rc    = {'figure.dpi'    : 400, 
                       'figure.figsize': (20, 12)}, 
              font_scale = 0.60)

from matplotlib import rcParams
rcParams.update({'figure.autolayout': True})

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 200)
warnings.filterwarnings('ignore', category = UserWarning, module = 'openpyxl')

## Set up Environment for Google Colab

You would need to have GPU to run the fine-tuning. Google Colab is a great platform for using this

In [2]:
'''
from google.colab import drive
drive.mount('/content/drive')
'''

"\nfrom google.colab import drive\ndrive.mount('/content/drive')\n"

In [3]:
import os

# Get current and root directory
cur_dir = os.getcwd()
root_dir = cur_dir[:-11]
data_dir = root_dir + "1_Data\\"
model_dir = root_dir + "3_Model\\"

print(f"Current directory: {cur_dir}\nRoot directory : {root_dir}\nData directory : {data_dir}\nModel directory : {model_dir}")

Current directory: i:\My Drive\Data Science and Analytics Portfolio\3 Tutorials\2_Bots_Detection\2_Notebooks
Root directory : i:\My Drive\Data Science and Analytics Portfolio\3 Tutorials\2_Bots_Detection\
Data directory : i:\My Drive\Data Science and Analytics Portfolio\3 Tutorials\2_Bots_Detection\1_Data\
Model directory : i:\My Drive\Data Science and Analytics Portfolio\3 Tutorials\2_Bots_Detection\3_Model\


## Read Datasets

For the purpose of this project, we wil only take a sample of about 5000 tweets from both dataframes, so that the training does not take too long

In [4]:
%%time
df_human_temp = pd.read_csv(f"{data_dir}/cresci-2015/TFP_tweets.csv", encoding = 'latin-1')
df_bot_temp = pd.read_csv(f"{data_dir}/cresci-2015/TWT_tweets.csv", encoding = 'latin-1')

Wall time: 2.06 s


In [5]:
print(f"Length of human dataframe : {len(df_human_temp)}\nLegnth of bots dataframe : {len(df_bot_temp)}")

Length of human dataframe : 563693
Legnth of bots dataframe : 114192


The size of the two datasets are very huge. What we can do, is to split them up into two datasets - one that will go into the train-test-split function, and another that will be used for model prediction. For the former, we only select about 5000 rows of them, as so that the model does not take too long to train. Generally speaking, the more data, the better the model performance.

In [6]:
## Split into training data
df_human_train_sample = df_human_temp.sample(n = 5000, random_state = 37)
df_bot_train_sample = df_bot_temp.sample(n = 6000, random_state = 37)

# Those tweets that are not inside the training dataset, are for predictions. Again, we only pick about 1000
df_human_eval =  df_human_temp[~df_human_temp['id'].isin(df_human_train_sample['id'].values)]
df_bot_eval = df_bot_temp[~df_bot_temp['id'].isin(df_bot_train_sample['id'].values)]

# Select only 1000 tweets for the evaluation dataset
df_human_eval_sample = df_human_eval.sample(n = 1000, random_state = 30)
df_bot_eval_sample = df_bot_eval.sample(n = 900, random_state = 30)

# Then we only select the relevant columns and encode the humans as 0 and bots as 1
df_human_train_sample['target'] = 0
df_bot_train_sample['target'] = 1

df_human_eval_sample['target'] = 0
df_bot_eval_sample['target'] = 1

# Vstack the dataframes together, and randomly shuffle them dataframe
df_train_sample = pd.concat([df_human_train_sample, df_bot_train_sample], axis = 0, ignore_index = True) 
df_eval_sample = pd.concat([df_human_eval_sample, df_bot_eval_sample], axis = 0, ignore_index = True) 

df_train_sample = df_train_sample.sample(frac = 1.0, random_state = 90)
df_eval_sample = df_eval_sample.sample(frac = 1.0, random_state = 90)


print(f"Length of human sample : {len(df_human_train_sample)}\nLegnth of bots sample : {len(df_bot_train_sample)}\nLength of train dataframe : {len(df_train_sample)}")
print(f"Length of evaluation dataframe : {len(df_eval_sample)}")

Length of human sample : 5000
Legnth of bots sample : 6000
Length of train dataframe : 11000
Length of evaluation dataframe : 1900


### Data Cleaning

Usually, we want to perform some rudimentary data cleaning steps on the dataset before we use it for training. Typically, this involves:
- Removing special characters
- Lower case all letters

In [7]:
def clean_text(tweet):
    
    tweet1 = re.sub('[^A-Za-z0-9]+', ' ', tweet)
    tweet2 = tweet1.lower()
    tweet3 = tweet2.strip()
    
    return tweet3

In [8]:
df_train = df_train_sample[['id', 'text', 'target']].copy()
df_eval = df_eval_sample[['id', 'text', 'target']].copy()

df_train['text_cleaned'] = df_train['text'].apply(lambda x: clean_text(x)) 
df_eval['text_cleaned'] = df_eval['text'].apply(lambda x: clean_text(x)) 

# Remove any rows that are blanks
df_train = df_train[df_train['text_cleaned'] != '']
df_eval = df_eval[df_eval['text_cleaned'] != '']

df_train

Unnamed: 0,id,text,target,text_cleaned
2217,179273485179297793,"#FiatRom Caro Marchionne, il capitale e' sempr...",0,fiatrom caro marchionne il capitale e sempre s...
3387,290532928440639488,Roma/ A #Ostia si continua a sparare e la poli...,0,roma a ostia si continua a sparare e la polizi...
7656,327948536136204288,http://t.co/ivA8bzYixB ÑÐ°ÑÐ¿Ð¸ÑÐ°Ð½Ð¸Ðµ Ð¿...,1,http t co iva8bzyixb
1833,292313083421011970,@Diabolikart Quoto su tutta la linea ;-))) bac...,0,diabolikart quoto su tutta la linea bacio e ne...
7497,110463456469188608,@midbrito foi na pria ontem&gt;?,1,midbrito foi na pria ontem gt
...,...,...,...,...
10919,301939393281798144,TODO MEXICO APOYA Y PIDE SE RESPETE LA DECISI...,1,todo mexico apoya y pide se respete la decisio...
9539,13304808760938497,@sigatchegarotos queria dar os parabÃ©ns a vcs...,1,sigatchegarotos queria dar os parab ns a vcs p...
6815,135782553570385920,@euphonik thank u for the brilliant tracks !!!...,1,euphonik thank u for the brilliant tracks help...
2717,311160499553333248,RT @WIPO: Free ePCT webinars: Learn how to man...,0,rt wipo free epct webinars learn how to manage...


In [9]:
df_train['target'].value_counts()

1    5739
0    5000
Name: target, dtype: int64

In [10]:
df_eval['target'].value_counts()

0    999
1    858
Name: target, dtype: int64

Now, we are ready to send the dataframe into the train-test-split function

In [11]:
texts = df_train['text'].values
targets = df_train['target'].values

X_train, X_test, y_train, y_test = train_test_split(texts, targets, test_size = 0.2, random_state = 42)

print(f"Length of X_train : {len(X_train)}, Length of X_test: {len(X_test)}\nLength of y_train : {len(y_train)}, Length of y_test: {len(y_test)}")

Length of X_train : 8591, Length of X_test: 2148
Length of y_train : 8591, Length of y_test: 2148


As we can see, the lengths of the features (X) and their corresponding labels (y) are of the same length.

## Parameters Declarations

In this section, we want to declare some variables that we will need in our model training latter

There are many bert models you can try. See the list of models hosted by Hugging Face on the link below:

https://huggingface.co/google-bert/

In [12]:
bert_model_name = 'bert-base-uncased'  # Name of the bert model
batch_size      = 8                    # Size of each batch that the dataloader will send into BERT for training
learning_rate   = 2e-5                 # Learning rate of Bert
best_accuracy   = 1.000                # Initiate best accuracy of the model
num_epoch       = 10                   # Number of epoches for the training i.e. total number of iterations

## Specify GPU

We need to have a GPU to push the model to

In [13]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
try:
    device_name = torch.cuda.get_device_name()
except:
    device_name = 'CPU'
print(device, '\nName of device:', device_name)


cpu 
Name of device: CPU


## Tokenize Texts

In order for BERT to perform its embeddings and classification, we have to split the sentences into individual words - or _tokens_. In BERT, there is a tokenizer we can use to do just this. Stil step of tokenization will take some time, depending on the compute RAM as well as how long the sentences are.

In [14]:
tokenizer = BertTokenizer.from_pretrained(bert_model_name, do_lower_case = True)

train_encodings = tokenizer(list(X_train), truncation = True, padding = True, max_length = 128) 
test_encodings = tokenizer(list(X_test), truncation = True, padding = True, max_length = 128) 

We can also check what does the tokenizer returns us. Let's pick the ```train_encodings```. The encodings actually return us a dictionary

In [15]:
print(train_encodings.keys())
for key in train_encodings.keys():
    print(f"Length of {key} : {len(train_encodings[key])}")

dict_keys(['input_ids', 'token_type_ids', 'attention_mask'])
Length of input_ids : 8591
Length of token_type_ids : 8591
Length of attention_mask : 8591


Note that this is exactly the same length of X_train! We can examine each of the keys and see what do they represent. We pick only the first text

In [16]:
print(f"Length of each input_ids : {len(train_encodings['input_ids'][0])}\n{train_encodings['input_ids'][0]}")

Length of each input_ids : 128
[101, 9594, 3762, 1005, 1055, 2047, 1001, 9121, 1012, 1012, 1012, 8299, 1024, 1013, 1013, 1056, 1012, 2522, 1013, 22851, 8093, 2102, 2487, 2015, 2078, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


Each of the ```input_ids``` are exactly 128 long - this is because we have set such as value a the ```max_length``` parameter.

In [17]:
print(f"Length of each token_type_ids : {len(train_encodings['token_type_ids'][0])}\n{train_encodings['token_type_ids'][0]}")

Length of each token_type_ids : 128
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


lastly, we have the attention mask:

In [18]:
print(f"Length of each token_type_ids : {len(train_encodings['attention_mask'][0])}\n{train_encodings['attention_mask'][0]}")

Length of each token_type_ids : 128
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


Actaully, out of the three keys, only ```input_ids``` and ```attention_mask``` are required, as well as the target values, for fine-tuning the model. Next, we create another function to convert the lists to PyTorch Tensors, and sending them to the GPU 

In [56]:
def convert_tensor_device(encodings, target, device):

    inputs = torch.tensor(encodings['input_ids']).to(device)
    masks = torch.tensor(encodings['attention_mask']).to(device)
    labels = torch.tensor(target).to(device)
    return inputs, masks, labels

# Function call
train_inputs, train_masks, train_labels = convert_tensor_device(train_encodings, y_train, device)
test_inputs, test_masks, test_labels = convert_tensor_device(test_encodings, y_test, device)


Next, we have to convert the tensors into this dataset and dataloader objects in PyTorch, in order for the model to receive as inputs. Let us just run the code first, and we will take a look at the explanations later

In [57]:
train_dataset = TensorDataset(train_inputs, train_masks, train_labels)
train_loader = DataLoader(train_dataset, batch_size = batch_size)

test_dataset = TensorDataset(test_inputs, test_masks, test_labels)
test_loader = DataLoader(test_dataset, batch_size = batch_size)

Let us again restrict our attention to just the ```train_dataset```

In [58]:
print(len(train_dataset))
for i, batch in enumerate(train_dataset):
    if i < 2:
        print(batch, '\n')
    else:
        break

8591
(tensor([  101,  9594,  3762,  1005,  1055,  2047,  1001,  9121,  1012,  1012,
         1012,  8299,  1024,  1013,  1013,  1056,  1012,  2522,  1013, 22851,
         8093,  2102,  2487,  2015,  2078,   102,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,  

The TensorDataset class is quite interesting... we can consider the following example from DataCamp:

In [22]:
'''
import numpy as np
import torch
from torch.utils.data import TensorDataset

np_features = np.array(np.random.rand(12, 8))
np_target = np.array(np.random.rand(12, 1))


# Convert arrays to PyTorch tensors
torch_features = torch.tensor(np_features)
torch_target = torch.tensor(np_target)

# Create a TensorDataset from two tensors
dataset = TensorDataset(torch_features, torch_target)

# Return the last element of this dataset
dataset[-1]
'''

'\nimport numpy as np\nimport torch\nfrom torch.utils.data import TensorDataset\n\nnp_features = np.array(np.random.rand(12, 8))\nnp_target = np.array(np.random.rand(12, 1))\n\n\n# Convert arrays to PyTorch tensors\ntorch_features = torch.tensor(np_features)\ntorch_target = torch.tensor(np_target)\n\n# Create a TensorDataset from two tensors\ndataset = TensorDataset(torch_features, torch_target)\n\n# Return the last element of this dataset\ndataset[-1]\n'

Next, we look at the ```train_loader```. The ```DataLoader``` is essentially an iterable over the dataset. You can use it split, transform and shuffle data on the fly. To know more about the DataLoader object, you can use the following link:

https://stackoverflow.com/questions/65138643/examples-or-explanations-of-pytorch-dataloaders

In [23]:
'''
print(f"Length of train_loader : {len(train_loader)}")
for i, batch in enumerate(train_loader):
    if i < 2:
        print(batch, '\n')
    else:
        break
'''

'\nprint(f"Length of train_loader : {len(train_loader)}")\nfor i, batch in enumerate(train_loader):\n    if i < 2:\n        print(batch, \'\n\')\n    else:\n        break\n'

We can also test this on the fake data as well

## Load BERT Model

Here, we have to specify the BertModel that we want to use. we can use the ```BertForSequenceClassification``` attribute to load the pretrained model

In [24]:
model = BertForSequenceClassification.from_pretrained(bert_model_name).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly i

## Train BERT Model

Here is the where the real fun begins!! The following section is called the _training loop_

For more information on different kinds of training, you can check out:

https://huggingface.co/transformers/v4.4.2/custom_datasets.html

In [25]:
'''
# Set the model 'mode' to be in training
model.train()

for epoch in range(num_epoch):
    running_loss = 0.0
    correct = 0
    total = 0
    
    for step, batch in enumerate(train_loader):
        # Move batch tensors to the same device as the model
        input_ids, attention_mask, labels = [b.to(device) for b in batch]
        
        # Clears the gradients
        optimizer.zero_grad()
        
        # Forward pass
        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        
        # Retrieves the loss
        loss = outputs.loss
        
        # Backward pass for gradient calculation
        loss.backward()
        
        # Updates the weights
        optimizer.step()
        
        # Accumulates the running loss
        running_loss += loss.item()
        
        # Predicts labels and calculates the number of correct predictions
        _, predicted = torch.max(outputs.logits, dim = 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    # Calculate accuracy and loss on the entire training set
    accuracy = correct / total
    average_loss = running_loss / len(train_loader)

    # If the current epoch's accuracy is best so far, save this model to disk
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        torch.save(model, f'{model_dir}/best_model.pt')

    print(f"Epoch {epoch + 1}/{10} - Training Loss: {average_loss:.4f} - Training Accuracy: {accuracy:.4f}")
'''


'\n# Set the model \'mode\' to be in training\nmodel.train()\n\nfor epoch in range(num_epoch):\n    running_loss = 0.0\n    correct = 0\n    total = 0\n    \n    for step, batch in enumerate(train_loader):\n        # Move batch tensors to the same device as the model\n        input_ids, attention_mask, labels = [b.to(device) for b in batch]\n        \n        # Clears the gradients\n        optimizer.zero_grad()\n        \n        # Forward pass\n        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)\n        \n        # Retrieves the loss\n        loss = outputs.loss\n        \n        # Backward pass for gradient calculation\n        loss.backward()\n        \n        # Updates the weights\n        optimizer.step()\n        \n        # Accumulates the running loss\n        running_loss += loss.item()\n        \n        # Predicts labels and calculates the number of correct predictions\n        _, predicted = torch.max(outputs.logits, dim = 1)\n        total 

We can only pick just 1 epoch, and at each step of the training process, get the loop to print out the outputs at the relevant portions.

In [55]:
# Set the model 'mode' to be in training
model.train()

for epoch in range(1):
    running_loss = 0.0
    correct = 0
    total = 0
    
    for step, batch in enumerate(train_loader):

   
        # Move batch tensors to the same device as the model
        input_ids, attention_mask, labels = [b.to(device) for b in batch]

        # Clears the gradients
        optimizer.zero_grad()
        
        # Forward pass
        input_ids = input_ids.unsqueeze(1)
        labels = labels.unsqueeze(1)

        print(f"Input ids : \n{input_ids}, size : {input_ids.size()}\n  \
                Attention mask : \n{attention_mask}, size : {attention_mask.size()}\n \
                Labels : \n{labels}, size : {labels.size()}")


        #print(f"Input ids : {input_ids},\nattention mask: {attention_mask},\nlabels:  {labels}")

        outputs = model(input_ids = input_ids, attention_mask = attention_mask, labels = labels)
        
        break

        # Retrieves the loss
        loss = outputs.loss
        
        # Backward pass for gradient calculation
        loss.backward()
        
        # Updates the weights
        optimizer.step()
        
        # Accumulates the running loss
        running_loss += loss.item()
        
        # Predicts labels and calculates the number of correct predictions
        _, predicted = torch.max(outputs.logits, dim = 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

'''
    # Calculate accuracy and loss on the entire training set
    accuracy = correct / total
    average_loss = running_loss / len(train_loader)

    # If the current epoch's accuracy is best so far, save this model to disk
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        torch.save(model, f'{model_dir}/best_model.pt')

    print(f"Epoch {epoch + 1}/{10} - Training Loss: {average_loss:.4f} - Training Accuracy: {accuracy:.4f}")
    '''


Input ids : 
tensor([[0],
        [1],
        [0],
        [1],
        [1],
        [0],
        [1],
        [1]]), size : torch.Size([8, 1])
                  Attention mask : 
tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        ...,
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0]]), size : torch.Size([8, 128])
                 Labels : 
tensor([[0],
        [1],
        [0],
        [1],
        [1],
        [0],
        [1],
        [1]]), size : torch.Size([8, 1])


'\n    # Calculate accuracy and loss on the entire training set\n    accuracy = correct / total\n    average_loss = running_loss / len(train_loader)\n\n    # If the current epoch\'s accuracy is best so far, save this model to disk\n    if accuracy > best_accuracy:\n        best_accuracy = accuracy\n        torch.save(model, f\'{model_dir}/best_model.pt\')\n\n    print(f"Epoch {epoch + 1}/{10} - Training Loss: {average_loss:.4f} - Training Accuracy: {accuracy:.4f}")\n    '

In [47]:
x = torch.tensor([0, 1, 2, 3])
x.unsqueeze(1)

tensor([[0],
        [1],
        [2],
        [3]])

## Model Evaluation

In [None]:
# Sets the model to evaluation mode
model.eval()

# Variables to gather full output
total_eval_accuracy = 0
total_eval_loss = 0
nb_eval_steps = 0

# Evaluate data for one epoch
for batch in test_loader:
    # Unpack this training batch from our dataloader and move tensors to GPU if available
    input_ids, attention_mask, labels = [b.to(device) for b in batch]
    
    # Tells PyTorch not to bother with constructing the compute graph during
    # the forward pass, since this is only needed for backprop (training)
    with torch.no_grad():        
        # Forward pass, calculate logit predictions.
        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)

    # Get the loss and logits
    loss = outputs.loss
    logits = outputs.logits
    
    # Accumulate the validation loss
    total_eval_loss += loss.item()

    # Calculate the accuracy for this batch of test sentences, and accumulate it over all batches
    _, predictions = torch.max(logits, dim=1)
    total_eval_accuracy += (predictions == labels).sum().item()

# Report the final accuracy for this validation run
avg_val_accuracy = total_eval_accuracy / len(test_loader.dataset)
print("Accuracy on the test set: {0:.2f}".format(avg_val_accuracy))

# Calculate the average loss over all of the batches
avg_val_loss = total_eval_loss / len(test_loader)
print("Test Loss: {0:.2f}".format(avg_val_loss))


## Model Prediction