# NBAiLab - Finetuning and Evaluating a BERT model for Classification
<img src="https://raw.githubusercontent.com/NBAiLab/notram/master/images/nblogo_2.png">


In this notebook we will finetune the [NB-BERTbase Model](https://github.com/NBAiLab/notram) released by the National Library of Norway. This is a model trained on a large corpus (110GB) of Norwegian texts. 

We will finetune this model on a sentiment classification task based on the [NoReC: The Norwegian Review Corpus](https://github.com/ltgoslo/norec). By simply replacing the dataset, you should be able to use this code to train any classifier.

After training, we will save the model, evaluate it and use it for predictions.

The Notebook is intended for experimentation with the pre-release NoTram models from the National Library of Norway, and is made for educational purposes. If you just want to use the model, you can instead initiate one of our finetuned models. 

## Before proceeding
Create a copy of this notebook by going to "File - Save a Copy in Drive"

In [None]:
!pip install transformers

import pandas as pd
import numpy as np
import tensorflow as tf
import json
import math
from transformers import BertTokenizer, AutoConfig, TFAutoModelForSequenceClassification, optimization_tf

# Settings
Try running this with the default settings first. The default setting should give you a pretty good result. You can then experiment with the other settings to get even better results. A warmup around 10% usually give you more stable results.

In [7]:
#@markdown Set the main model that the training should start from
model_name = 'NbAiLab/nb-bert-base' #@param ["NbAiLab/nb-bert-base", "bert-base-multilingual-cased"]
#@markdown ---
#@markdown Set training parameters
batch_size =  8#@param {type: "integer"} 
init_lr = 3e-5 #@param {type: "number"}
end_lr = 0  #@param {type: "number"}
num_warmup_steps = 300 #@param {type: "number"}
num_epochs =   4#@param {type: "integer"}
max_seq_length = 512

# Load and Prepare the Dataset used for Finetuning
The current dataset is loaded directly from a web resource. It is coded for positive/negative sentiment (1/0) and is in a csv-file. You can replace this with any other data source. This data is converted into tensor slices that Tensorflow needs. 

In [None]:
train_data = pd.read_csv ('https://raw.githubusercontent.com/ltgoslo/NorBERT/main/benchmarking/data/sentiment/no/train.csv', header = None)
dev_data = pd.read_csv ('https://raw.githubusercontent.com/ltgoslo/NorBERT/main/benchmarking/data/sentiment/no/dev.csv', header = None)
test_data = pd.read_csv ('https://raw.githubusercontent.com/ltgoslo/NorBERT/main/benchmarking/data/sentiment/no/test.csv', header = None)


#Initialize the tokenizer
tokenizer = BertTokenizer.from_pretrained(model_name)

#Turn text into tokens
train_encodings = tokenizer(list(train_data[1]), truncation=True, padding=True, max_length=max_seq_length)
dev_encodings = tokenizer(list(dev_data[1]), truncation=True, padding=True, max_length=max_seq_length)
test_encodings = tokenizer(list(test_data[1]), truncation=True, padding=True, max_length=max_seq_length)

#Create a tensorflow dataset
train_dataset = tf.data.Dataset.from_tensor_slices((dict(train_encodings),list(train_data[0])))
dev_dataset = tf.data.Dataset.from_tensor_slices((dict(dev_encodings),list(dev_data[0])))
test_dataset = tf.data.Dataset.from_tensor_slices((dict(test_encodings),list(test_data[0])))


print(f'The dataset is imported.\n\nThe training dataset has {len(train_dataset)} items.\nThe development dataset has {len(dev_dataset)} items. \nThe test dataset has {len(test_dataset)} items')
steps = math.ceil(len(train_dataset)/batch_size)
print(f'You are planning to train for a total of {steps} steps * {num_epochs} epochs = {num_epochs*steps} steps. Warmup is {num_warmup_steps}, {int(100*num_warmup_steps/(steps*num_epochs))}%. We recommend at least 10%.')


# Start Training
We are here using the Tensorflow interface provided by Huggingface. Huggingface also has a native interface as well as one for PyTorch. To see an example of how to use the native interface, please take a look at our notebook about NER/POS.

In [None]:
#Estimate the number of training steps
train_steps_per_epoch = int(len(train_dataset)/batch_size)
num_train_steps = train_steps_per_epoch * num_epochs

# Initialise a Model for Sequence Classification with 2 labels
config = AutoConfig.from_pretrained(model_name, num_labels=2)
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, config=config)

#Creating a scheduler gives us a bit more control
optimizer, lr_schedule = optimization_tf.create_optimizer(init_lr=init_lr, num_train_steps=num_train_steps, num_warmup_steps=num_warmup_steps)

#Compile the model
model.compile(optimizer=optimizer, loss=model.compute_loss, metrics=['accuracy']) # can also use any keras loss fn

#Start training
history = model.fit(train_dataset.shuffle(1000).batch(batch_size), validation_data=dev_dataset.shuffle(1000).batch(batch_size), epochs=num_epochs, batch_size=batch_size)

print(f'\nThe training has finished training after {num_epochs} epochs.')



# Run Preditions on the Test Dataset
When you have finished the training and are satisfied with the result, you are ready to see how the model works on your test set. Typically you would save your model first, and load it again. However, this is explained in our NER/POS notebook, so we are skipping this to make the notebook shorter. 

Here we show how you can calculate F1-score (also called "Macro average") you want to report.


In [None]:
from sklearn.metrics import classification_report
y_pred = model.predict(test_encodings['input_ids'])
y_pred_bool = np.argmax(y_pred['logits'], axis=1)

print(classification_report(test_data[0], y_pred_bool,digits=4))