# Document Classification with PyTorch and BERT

This notebook demonstrates two distinct approaches for document classification:

- A custom PyTorch-based model using pretrained Word2Vec embeddings.
- A fine-tuned transformer model using DistilBERT from Huggingface.

The dataset consists of humanitarian text excerpts, which are to be classified into sectors like health, protection, and education. This notebook covers the full pipeline: preprocessing, modeling, training, evaluation, and experiment tracking.

Each section is self-contained, and comparisons are made between traditional and transformer-based approaches.


## Part 1: Document Classification using PyTorch and Word2Vec

In this section, I build a simple yet effective document classification model using PyTorch. Each document is represented by averaging the pretrained Word2Vec embeddings of its words. This average vector is then passed through a linear layer for classification.

This approach is lightweight and fast, making it suitable for smaller models and quick experimentation.


## Preprocessing and dictionary

In [1]:
# imports

import numpy as np
import re
import os
import csv
import pandas as pd
from tqdm import tqdm
from unidecode import unidecode
from collections import defaultdict

import gensim
import gensim.downloader as api
import nltk
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.lm import Vocabulary

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from torch.utils.tensorboard import SummaryWriter
import wandb

DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

In [2]:
def preprocessing(og_data: pd.DataFrame):
    data = og_data.copy()
    
    print('Starting preprocessing... ', end='')
    
    # remove any non-words and non-spaces
    data = data.replace(to_replace=r'[^\w\s]', value='', regex=True)
    # replace numbers with placeholder
    data = data.replace(to_replace=r'\d+', value='num', regex=True)
    # replace dates with placeholder
    data = data.replace(to_replace=r'\d+/\d+/\d+', value='dates', regex=True)
    # remove accents, umlaute, etc
    data['text'] = data['text'].apply(unidecode)
    # tokenization
    data['text'] = data['text'].apply(word_tokenize)
    # remove stop words
    stop_words = set(stopwords.words('english'))
    data['text'] = data['text'].apply(lambda x: [word for word in x if word not in stop_words])
    # stemming
    ps = PorterStemmer()
    data['text'] = data['text'].apply(lambda x: [ps.stem(y) for y in x])
    
    print('Finished')
    
    return data


def create_vocabulary(og_data: pd.DataFrame, unk_cutoff, vocabulary=None):
    data = og_data.copy()
    
    if vocabulary is None:
        print('Creating vocabulary... ', end='')
        flat_list = [item for sublist in data['text'] for item in sublist]
        vocabulary = Vocabulary(flat_list, unk_cutoff=unk_cutoff)
        print('Finished')
    
    print('Applying vocabulary... ', end='')
    data['text'] = data['text'].apply(lambda x: [word if word in vocabulary else '<UNK>' for word in x])
    print('Finished')
    
    return vocabulary, data

In [3]:
# Opening the training, validation and test set accordingly 

pattern = '\d*,(.*)\n'
labels = []
with open('data/thedeep.labels.txt') as f:
    for l in f.readlines():
        result = re.search(pattern, l)
        labels.append(result.group(1))

datasets = []

for current_set in ('train', 'test', 'validation'):

    sets = [[], [], []]
    with open('data/thedeep.subset.' + current_set + '.txt', newline='') as csvf:
        creader = csv.reader(csvf, delimiter=',')
        for l in creader:
            sets[2].append(int(l[0]))
            sets[0].append(l[1].lower())
            sets[1].append(int(l[2]))
        current_dict = {'text': sets[0], 'labels': sets[1]}
        datasets.append(pd.DataFrame(data=current_dict, index=sets[2]))

train_data, test_data, val_data = datasets

In [4]:
# preprocessing
train_prep = preprocessing(train_data)
test_prep = preprocessing(test_data)
val_prep = preprocessing(val_data)

# create dictionary, adapt data accordingly
voc, train_v = create_vocabulary(train_prep, unk_cutoff=2)
_, test_v = create_vocabulary(test_prep, unk_cutoff=2, vocabulary=voc)
_, val_v = create_vocabulary(val_prep, unk_cutoff=2, vocabulary=voc)

voc_dict = {word: i for i, word in enumerate(voc)}  # dictionary to get indices of words; usage: voc_dict[word] = index

Starting preprocessing... Finished
Starting preprocessing... Finished
Starting preprocessing... Finished
Creating vocabulary... Finished
Applying vocabulary... Finished
Applying vocabulary... Finished
Applying vocabulary... Finished


## Data batching

In [5]:
def batch(data_og:pd.DataFrame, vocabulary_dict: dict, batch_size:int, max_doc_len:int, shuffle=True, seed=None):

    data = data_og.copy()
    indices = np.array(data.index)

    if shuffle == True:
        rng = np.random.default_rng(seed)
        rng.shuffle(indices)

    txt_batches = []
    label_batches = []
    
    for i in range(0, len(indices), batch_size):
        if i + batch_size > len(indices):  # cutoff last incomplete batch
            break
        else:
            end = i + batch_size
        
        # get vocabulary ids, keep only first 'max_doc_len' words, pad document with 0 up to 'max_doc_len' if necessary, convert from pd.DataFrame to torch.Tensor
        txt_batch = [np.pad(np.array([vocabulary_dict[word] for word in doc]), (0, max(0, max_doc_len - len(doc))))[:max_doc_len] for doc in data.loc[indices[i:end]]['text']]
        label_batch = np.array(data.loc[indices[i:end]]['labels'])
        
        txt_batches.append(txt_batch)
        label_batches.append(label_batch)

    return torch.tensor(np.array(txt_batches), device=DEVICE), torch.tensor(np.array(label_batches), device=DEVICE) 

In [6]:
# hyperparameters
BATCH_SIZE = 64
MAX_DOC_LEN = 50
SHUFFLE = True

train_features, train_labels = batch(train_v, voc_dict, BATCH_SIZE, MAX_DOC_LEN, SHUFFLE)
test_features, test_labels = batch(test_v, voc_dict, BATCH_SIZE, MAX_DOC_LEN, SHUFFLE)
val_features, val_labels = batch(val_v, voc_dict, BATCH_SIZE, MAX_DOC_LEN, SHUFFLE)

print(f'train set shape:\t{train_features.shape}')
print(f'test set shape:\t\t{test_features.shape}')
print(f'validation set shape:\t{val_features.shape}')

train set shape:	torch.Size([189, 64, 50])
test set shape:		torch.Size([40, 64, 50])
validation set shape:	torch.Size([40, 64, 50])


In [132]:
# add PCA representation of training set to tensorboard

writer = SummaryWriter('runs/projection')  # initialize tensorboard

class_labels = [labels[lab] for lab in train_labels.flatten()]
log_features = torch.flatten(train_features, start_dim=0, end_dim=1)

writer.add_embedding(log_features, metadata=class_labels)
writer.close()



## Word embedding lookup

In [7]:
wv = api.load('word2vec-google-news-300')  # load Word2Vec trained on Google News dataset

In [133]:
dict_size = len(voc)
embed_size = wv.vector_size
wv_mean = np.mean(wv.vectors)
wv_std = np.std(wv.vectors)

In [134]:
weight = torch.empty(size=(dict_size, embed_size), dtype=torch.float, device=DEVICE)

for i, word in enumerate(voc):
    if word in wv:  # current word is present in Word2Vec
        current_rep = np.copy(wv[word])
        weight[i] = torch.FloatTensor(current_rep)
    else:  # current word is not present in Word2Vec
        random_rep = torch.normal(wv_mean, wv_std, size=(1, embed_size))
        weight[i] = random_rep

embedding = nn.Embedding.from_pretrained(weight)

In [135]:
# check that embeddings are correct

test_words = ['support', 'malaria', 'easter']

for word in test_words:
    word_idx = voc_dict[word]
    embed_rep = embedding(torch.tensor(word_idx, device=DEVICE))
    wv_rep = wv[word]
    print(f'## test word: {word} ##')
    if torch.sum(embed_rep.cpu() - wv_rep) == 0:
        print('word correctly embedded')
    else:
        print('false embedding!')

## test word: support ##
word correctly embedded
## test word: malaria ##
word correctly embedded
## test word: easter ##
word correctly embedded


## Model definition

In [14]:
class ClassificationAverageModel(nn.Module):
    def __init__(self, embedding_layer, in_features, n_classes):
        super().__init__()

        self.embedding = embedding_layer
        self.linear = nn.Linear(in_features, n_classes)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = torch.mean(self.embedding(x), axis=1)
        output = self.softmax(self.linear(x))

        return output

## Training

In [182]:
def accuracy(logits, labels):
    
    idx = np.argmax(logits, axis=1)
    return np.mean(labels[np.arange(len(idx)), idx])

def to_one_hot(y, k=None):
    if type(y) == torch.Tensor:
        y = y.cpu()
    y = np.asarray(y, dtype='int')
    if k is None:
        k = np.amax(y) + 1

    out = np.zeros(y.shape + (k, ))
    np.put_along_axis(out, y[..., None], 1, axis=-1)
    return torch.Tensor(out)
    

def train(train_loader, val_loader, model, loss, optimizer, n_classes, epochs, device, n_early_stop, model_name='model'):

    epoch_losses = []
    epoch_accuracies = []
    best_accuracy = 0
    n_not_improved = 0
    lr = optimizer.param_groups[0]['lr']
    dataiter = iter(train_loader)
    batch_size = next(dataiter)[0].shape[1]
    doc_len = next(dataiter)[0].shape[2]

    writer = SummaryWriter('runs/train_nepochs_' + str(epochs) + '_lr_' + str(lr) + '_batchsz_' + str(batch_size) + '_doclen_' + str(doc_len))  # initialize tensorboard
    
    for epoch in range(epochs):
        print(f'Epoch [{epoch + 1}/{epochs}]\n--------------------------------')
        print('########### Training ###########')
        
        model.train()
        current_losses = []
        
        for i, (x, y) in enumerate(train_loader):

            optimizer.zero_grad()
            x = torch.squeeze(x).to(device)
            y = torch.squeeze(to_one_hot(y, 12)).to(device)
            
            logits = model.forward(x)
            batch_loss = loss(logits, y)
            current_losses.append(batch_loss.detach().cpu().numpy())
            writer.add_scalar("Loss/training set", batch_loss, epoch * len(train_loader) + i)
            
            batch_loss.backward()
            optimizer.step()

            if (i + 1) % 1500 == 0 or i+1 == len(train_loader) or i == 0:
                print(f"Step [{str(i + 1).zfill(len(str(len(train_loader))))}/{len(train_loader)}]", end='')
                print("  Loss: {:.4f}".format(np.sum(current_losses) / len(current_losses)))

        epoch_losses.append(current_losses)

        print('########## Validation ##########')
        model.eval()
        current_accuracies = []

        for i, (x, y) in enumerate(val_loader):
            x = torch.squeeze(x).to(device)
            y = torch.squeeze(to_one_hot(y, n_classes)).to(device)

            logits = model.forward(x)
            current_accuracies.append(accuracy(logits.detach().cpu().numpy(), y.detach().cpu().numpy()))

        epoch_accuracies.append(current_accuracies)
        avg_accuracy = np.sum(current_accuracies) / len(current_accuracies)
        writer.add_scalar("Accuracy/validation", avg_accuracy, epoch)

        if avg_accuracy > best_accuracy:  # save model if accuracy is best one yet
            n_not_improved = 0
            best_accuracy = avg_accuracy
            if not os.path.exists('./model_saves'):
                os.mkdir('./model_saves')
            torch.save(model.state_dict(), f'./model_saves/{model_name}.pt')

        else:
            n_not_improved += 1
        
        print('Average accuracy over validation set:  {:.4f}\n'.format((avg_accuracy)))  
        if n_not_improved == n_early_stop:
            print('Early stopping initiated!')
            break

    return epoch_losses, epoch_accuracies

In [183]:
# hyperparameters
LR = 0.01
EPOCHS = 1000
N_EARLY_STOPS = 50

train_set = TensorDataset(train_features, train_labels)
test_set = TensorDataset(test_features, test_labels)
val_set = TensorDataset(val_features, val_labels)

train_loader = DataLoader(train_set)
test_loader = DataLoader(test_set)
val_loader = DataLoader(val_set)

In [184]:
model = ClassificationAverageModel(embedding, embed_size, len(labels))
model.to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=LR)
loss = nn.CrossEntropyLoss()

train_losses, val_accuracies = train(train_loader, val_loader, model, loss, optimizer, len(labels), EPOCHS, DEVICE, N_EARLY_STOPS)

Epoch [1/1000]
--------------------------------
########### Training ###########
Step [001/189]  Loss: 2.4840
Step [189/189]  Loss: 2.2696
########## Validation ##########
Average accuracy over validation set:  0.5023

Epoch [2/1000]
--------------------------------
########### Training ###########
Step [001/189]  Loss: 2.1864
Step [189/189]  Loss: 2.1633
########## Validation ##########
Average accuracy over validation set:  0.5395

Epoch [3/1000]
--------------------------------
########### Training ###########
Step [001/189]  Loss: 2.1254
Step [189/189]  Loss: 2.1306
########## Validation ##########
Average accuracy over validation set:  0.5473

Epoch [4/1000]
--------------------------------
########### Training ###########
Step [001/189]  Loss: 2.0987
Step [189/189]  Loss: 2.1135
########## Validation ##########
Average accuracy over validation set:  0.5504

Epoch [5/1000]
--------------------------------
########### Training ###########
Step [001/189]  Loss: 2.0839
Step [189/189]

In [100]:
# add model graph to tensorboard

writer = SummaryWriter('runs/model_graph')

dataiter = iter(train_loader)
text, _ = next(dataiter)
writer.add_graph(best_model, torch.squeeze(text))

writer.close()

## Testing

In [185]:
def test(dataloader, model, loss, n_classes, device):

    print('########### Testing ###########')
    model.eval()
    current_accuracies = []
    current_losses = []

    for i, (x, y) in enumerate(dataloader):
        x = torch.squeeze(x).to(device)
        y = torch.squeeze(to_one_hot(y, n_classes)).to(device)

        logits = model.forward(x)
        
        batch_accuracy = accuracy(logits.detach().cpu().numpy(), y.detach().cpu().numpy())
        batch_loss = loss(logits, y)
        
        current_losses.append(batch_loss.detach().cpu().numpy())
        current_accuracies.append(batch_accuracy)

    avg_accuracy = np.sum(current_accuracies) / len(current_accuracies)
    avg_loss = np.sum(current_losses) / len(current_losses)

    print('Average accuracy over test set:\t{:.4f}'.format((avg_accuracy)))
    print('Average loss over test set:\t{:.4f}\n'.format((avg_loss)))

    return current_losses, current_accuracies

In [186]:
best_model = ClassificationAverageModel(embedding, embed_size, len(labels))
best_model.to(DEVICE)
best_model.load_state_dict(torch.load('./model_saves/model.pt'))

test_loss_func = nn.CrossEntropyLoss()

test_losses, test_accuracies = test(test_loader, best_model, test_loss_func, len(labels), DEVICE)

########### Testing ###########
Average accuracy over test set:	0.7387
Average loss over test set:	1.8880



## Part 2: Document Classification using DistilBERT (Huggingface)

In this section, I use a transformer-based model—**DistilBERT**—to classify the same documents. DistilBERT is a smaller, faster version of BERT and is well-suited for tasks with limited resources.

Instead of using manually averaged word embeddings, this model learns contextualized embeddings directly from subword tokens. The `[CLS]` token representation is extracted for classification, followed by a dropout and a linear layer.

This section highlights the power of transformers in comparison to traditional embedding-based models.


In [8]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
from transformers import DistilBertTokenizer, DistilBertModel
import numpy as np
from tqdm.auto import tqdm
from sklearn.metrics import accuracy_score
import torch.nn.functional as F

## Loading BERT model

In [67]:
#We picked the DistilBert instead of the normal Bert to get faster training. 
#DistilBert resulted in decent performance anyways.

model_name = 'distilbert-base-uncased'

tokenizer = DistilBertTokenizer.from_pretrained(model_name)
distil_bert_model = DistilBertModel.from_pretrained(model_name)

#Just familiarizing ourselved with the BERT's structure.
get_vocab = tokenizer.get_vocab()


## BERT tokenization

In [102]:
#We decided to implement classes for the purpose of tokenization. This seemed to be a very convenient way. 

class TokenizingDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_doc_len=50):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_doc_len = max_doc_len

    def __len__(self):
        return len(self.texts)
    
    #So, basically we use the tokenizer, the encoder from the transformers library
    #it outputs input_ids and an attention mask, both of them being of utmost importance for the BERT model
    #additionally, we also get the labels in correct tensor shape, making the dataset ready for training

    def __getitem__(self, index):
        encodings = self.tokenizer.encode_plus(
            self.texts[index],
            add_special_tokens=True,
            max_length=self.max_doc_len,
            padding='max_length',
            truncation=True,
            return_attention_mask=True,
            return_tensors='pt'
        )
        input_ids = encodings['input_ids'].squeeze()
        attention_mask = encodings['attention_mask'].squeeze()
        labels = torch.tensor(self.labels[index], dtype=torch.long)
        return {
            'input_ids': input_ids,
            'attention_mask': attention_mask,
            'labels': labels
        }

## The Model

In [69]:
class ClassificationDistilBERTModel(nn.Module):
    def __init__(self, n_classes):
        
        super(ClassificationDistilBERTModel, self).__init__()
        
        self.distilbert = distil_bert_model
        
        #it's crucial that we use a linear transfromation before the final classifier 
        #the output has the shape of the BERT model 
        #and is mapped via the next linear transformation to the number of classes 
        #the dropout probability is used to make sure that the model is not overfitting on the training data 
        
        self.transform_layer = nn.Linear(self.distilbert.config.dim, self.distilbert.config.dim)
        self.output_classfier = nn.Linear(self.distilbert.config.dim, n_classes)
        self.regularizer = nn.Dropout(p = 0.2)

    def forward(self, input_ids, attention_mask, labels=None,  return_probs=False):
        
        model_output = self.distilbert(input_ids=input_ids, attention_mask=attention_mask)
        
        #we are retaining the output of the model
        #consequently the tokens, the features in other words are extracted 
        
        
        features_extract = model_output[0][:, 0] 
        transformed = self.transform_layer(features_extract)  
        
        #we also decided to make sure there is some non-linearity involved, 
        #since our model performed terrible at the beginning
        
        activated = nn.ReLU()(transformed)  
        regularized_features = self.regularizer(activated)  
        
        logits =  self.output_classfier(regularized_features)  
        
        if return_probs:
            #whenever asked the probabilities can be returned
            #the softmax function easily translates the logits into probabilities 
            softmax_transform = F.softmax(logits, dim=-1)
            return softmax_transform
        
        return logits

## Training the BERT model 

In [75]:
def train_BERT(train_loader, val_loader, model, loss, optimizer, epochs, device, progress_every=10, n_early_stop=50, model_name='bert model'):
    
    best_val_accuracy = 0
    n_not_improved = 0
    
    for epoch in range(epochs):
        
        model.train()
        current_loss = 0.0
        
        correct_pred = 0
        nr_of_pred = 0
        
        tracking_progress = tqdm(enumerate(train_loader), total=len(train_loader), desc="Starting Epoch {}".format(epoch+1))
        
        #So, we go through all batches, most of the training process is 1:1 the same as in Task A 
        #however, we do the batching, the loss, and accuracy computation a bit differently here
        
        for i, batch in tracking_progress:
            
            batch = {key: value.to(device) for key, value in batch.items()}
            
            optimizer.zero_grad()
            
            logits = model(**batch)
            y = batch['labels']
            batch_loss = loss(logits, y)
            
            batch_loss.backward()
            optimizer.step()
            
            current_loss += batch_loss.item()
            _, predicted = torch.max(logits, dim=1)
            
            #for the accuracies 
            
            correct_pred += (predicted == batch['labels']).sum().item()
            nr_of_pred += batch['labels'].size(0)
            
            if (i + 1) % progress_every == 0:
                tracking_progress.set_postfix({
                    'Loss at the moment': current_loss / (i + 1),
                    'Accuracy at the moment': correct_pred / nr_of_pred
                })
        
    
        avg_loss = current_loss / len(train_loader)
        avg_accuracy = correct_pred / nr_of_pred
        
        tqdm.write(f'Epoch {epoch+1} ended, Average Loss during training: {avg_loss}, Average Accuracy during training: {avg_accuracy}')

        val_accuracy = model_eval(model, val_loader)
        tqdm.write(f'Average accuracy over validation set: {val_accuracy}')
        
        #Just keeping the best performing model 
        
        if val_accuracy > best_val_accuracy:
            best_val_accuracy = val_accuracy
            n_not_improved = 0
            torch.save(model.state_dict(), f'./model_saves/{model_name}.pt')
            tqdm.write('Out of all configuration the model was saved as the best model till now.')
            
        else:
            n_not_improved += 1
            if n_not_improved >= n_early_stop:
                tqdm.write('Early stopping initiated!')
                break
        
        #Early stopping is crucial for avoiding overfitting 
        

#this function is mainly used for the validation set 
#so wehenever one epoch is finished it is evaluated via this method 
                

def model_eval(current_model, dataloader):
    current_model.eval()
    
    predicted_labels = []
    true_labels = []
    
    tracking_progress = tqdm(dataloader, desc='Starting the evaluation:')
    
    with torch.no_grad():
        for batch in tracking_progress:
            
            batch = {key: value.to(DEVICE) for key, value in batch.items()}
            
            #here we use the probabilties 
            
            probs = model(**batch, return_probs=True)
            preds = torch.argmax(probs, dim=1)
            
            predicted_labels.extend(preds.cpu().numpy())
            true_labels.extend(batch['labels'].cpu().numpy())
            
    calc_accuracy = accuracy_score(true_labels, predicted_labels)
    
    return calc_accuracy


In [76]:
#Defining the model, the optimizer and the loss function of our interest
#We tried out several learning rates to enhance performance

model = ClassificationDistilBERTModel(n_classes=len(labels))
model.to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)  
loss = nn.CrossEntropyLoss()

In [77]:
train_texts = train_data['text'].tolist()
train_labels = train_data['labels'].tolist()
val_texts = val_data['text'].tolist()
val_labels = val_data['labels'].tolist()
test_texts = test_data['text'].tolist()
test_labels = test_data['labels'].tolist()

train_dataset_bert = TokenizingDataset(train_texts, train_labels, tokenizer)
val_dataset_bert = TokenizingDataset(val_texts, val_labels, tokenizer)
test_dataset_bert = TokenizingDataset(test_texts, test_labels, tokenizer)

TRAIN_LOADER = DataLoader(train_dataset_bert, batch_size=16, shuffle=True)
VAL_LOADER = DataLoader(val_dataset_bert, batch_size=16)
TEST_LOADER = DataLoader(test_dataset_bert, batch_size=16)


## The real training 

In [79]:
train_BERT(TRAIN_LOADER, VAL_LOADER, model, loss, optimizer, epochs=2, device=DEVICE, progress_every=5, n_early_stop=50, model_name='bert_model')

Starting Epoch 1:   0%|          | 0/757 [00:00<?, ?it/s]

Epoch 1 ended, Average Loss during training: 0.6185271241007031, Average Accuracy during training: 0.8336085879438481


Starting the evaluation::   0%|          | 0/163 [00:00<?, ?it/s]

Average accuracy over validation set: 0.8124036979969184
The model performing the best until now was saved.


Starting Epoch 2:   0%|          | 0/757 [00:00<?, ?it/s]

Epoch 2 ended, Average Loss during training: 0.4149468993987086, Average Accuracy during training: 0.8844756399669694


Starting the evaluation::   0%|          | 0/163 [00:00<?, ?it/s]

Average accuracy over validation set: 0.7981510015408321


In [88]:
def getting_accuracies(data_loader, model_saves_path, device):
    # Load the best model
    model =  ClassificationDistilBERTModel(n_classes=len(labels))  
    model.load_state_dict(torch.load(model_saves_path))
    model.to(device)
    
    # Evaluate the model on the test set
    accuracy = model_eval(model, data_loader)
    
    return accuracy

In [89]:
#Getting the accuracy of the best model on the test set

path_to_the_best_model = './model_saves/bert_model.pt'
test_accuracy = getting_accuracies(TEST_LOADER, path_to_the_best_model, DEVICE)

print('Our model achieved the following accuracy on the test set {}'.format(test_accuracy))

Starting the evaluation::   0%|          | 0/163 [00:00<?, ?it/s]

Our model achieved the following accuracy on the test set 0.8196531791907514


In [90]:
#Getting the accuracy of the best model on the validation set

path_to_the_best_model = './model_saves/bert_model.pt'
validation_accuracy = getting_accuracies(VAL_LOADER, path_to_the_best_model, DEVICE)

print('Our  best performing model achieved the following accuracy on the validation set {}'.format(validation_accuracy))

Starting the evaluation::   0%|          | 0/163 [00:00<?, ?it/s]

Our  best performing model achieved the following accuracy on the validation set 0.811633281972265


## Table of accuracies

In [101]:
data = {'Test set accuracy': [test_accuracy],
        'Validation set accuracy': [validation_accuracy]}
table = pd.DataFrame(data)

table = table.to_string(index=False)

print(table)

 Test set accuracy  Validation set accuracy
          0.819653                 0.811633
