<a href="https://colab.research.google.com/github/Praveen76/Introduction-to-Huggingface/blob/main/Fine_Tunning_BERT_Model_on_GLUE_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Huggingface: Fine Tunning BERT using Built in Data**

### Objectives:

At the end of the experiment you will be able to understand  :

* How datasets are represented in HF
* Trainer and trainingArguments objects
* Computing Metrics
* Saving and loading the trained model

In [None]:
!pip install accelerate -U

Collecting accelerate
  Downloading accelerate-0.28.0-py3-none-any.whl (290 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.1/290.1 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m39.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m43.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Accelerate is a library that enables the same PyTorch code to be run across any distributed configuration by adding just four lines of code, making training and inference at scale made simple, efficient and adaptable.

In [None]:
!pip install transformers

Hugging Face Datasets Library


In [None]:
!pip install datasets

### **Loading the dataset**
Built in data sets are included with the data sets package. Using a single line of code, we can load in the data set  which  are well formatted  and processed for us to use.

In [None]:
import numpy as np
from datasets import load_dataset
#rd1=load_dataset("amazon_polarity") # dataset for sentiment analysis
rd2=load_dataset("glue","sst2")     # Part of GLUE Benchmark

Calling the load data set function, returns a data set dict object, which is like a dictionary that may contain multiple data sets like -  a train dataset, validation dataset and test dataset.

In [None]:
rd2

Getting different data and individual entries

In [None]:
rd2['train']

In [None]:
print(rd2['train'], '\n')
type(rd2['train'])

In [None]:
rd2['train'].data

In [None]:
rd2['train'][0]

In [None]:
rd2['train'][10:13]

In [None]:
rd2['train'].features

### **Tokenization with DataSets**
Recall that previously, we mentioned the need for truncation, padding and converting the tokenized data into PyTorch Tensors. Here, all of these steps will be handled  behind the scenes and no need to do anything. We are going to do is to partially apply the tokenizer to the data set dict object.



In [None]:
from transformers import AutoTokenizer

In [None]:
checkpoint = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [None]:
tokenized_sentence = tokenizer(rd2['train'][0:3]['sentence'])
from pprint import pprint
pprint(tokenized_sentence)

By calling the map method and passing in a tokenized function, the data set library automatically knows it should apply the same function to every data set i.e.train, validation and test.

 Note that in this example we apply only truncation but not padding or conversion into Pytorch Tensors. This will be handled behind the scenes by the trainer object we create later.

In [None]:
def tokenize_fn(batch):
  return tokenizer(batch['sentence'],truncation=True)

In [None]:
tokenized_datasets=rd2.map(tokenize_fn,batched=True)

### **Loading Pre-Trained Model**

In [None]:
from transformers import AutoModelForSequenceClassification
model =AutoModelForSequenceClassification.from_pretrained(checkpoint,num_labels=2)

In [None]:
type(model)

In [None]:
model

In [None]:
!pip install torchinfo

In [None]:
from torchinfo import summary
summary(model)

#### **Adding custom classifier layter and Freezing/Un-Freezing different layers**

In [None]:
for name, param in model.named_parameters():
  print(name)

In [None]:
# Freezing
for name, param in model.named_parameters():
     if name.startswith("distilbert"): # choose whatever you like here
        param.requires_grad = False

In [None]:
# Displaying of layers gradient
for name, param in model.named_parameters():
     print(name, param.requires_grad)

#### **Modifying last classifier layers**

In [None]:
import torch

In [None]:
model.pre_classifier

In [None]:
print(model.pre_classifier)
model.pre_classifier = torch.nn.Linear(768,500)
print(model.pre_classifier)

In [None]:
print(model.classifier)
model.classifier = torch.nn.Linear(500,3)
print(model.classifier)

In [None]:
model

#### **Adding extra layer on top**

In [None]:
import copy

In [None]:
cp_model = copy.deepcopy(model)

In [None]:
cp_model.classifier = torch.nn.Sequential(
    (torch. nn.Linear(768, 192)),
    torch.nn.Dropout(0.1),
    (torch.nn.Linear(192, 2)))

In [None]:
cp_model

In [None]:
params_before=[]
for  name,p in model.named_parameters():
  params_before.append(p.detach().cpu().numpy())

In [None]:
params_before

### **Metrics**

In HuggingFace, there are predefined metrics for the built in data sets and we are going to use that but we can define custom metrics as well.

Specifically, there is a function called the load metric, also part of the data sets library. The arguments to this happens to be the same as the arguments into load data set for this example.

More generally, however, you can pass in an argument to specify a specific metric like the Blue score or a python file where you've defined your own computations.

In [None]:
from datasets import load_metric
metric =load_metric("glue","sst2")

But,  we cannot pass this metric into the trainer directly. we need to define another function to do some extra work.

Firstly, the metric we get back from `load_metric` is an object which has a method called `compute`, which will return a dictionary containing the computed metrics for a given task. This compute method it takes in to erase the predictions and the targets called a references.

This will give us back a dictionary with the value of the metric or metrics with their names as keys.

In [None]:
metric.compute(predictions=[1,0,1], references=[1,0,0])

In [None]:
def compute_metrics(logits_and_labels):
  logits, labels = logits_and_labels
  predictions =np.argmax(logits,axis=-1)
  return metric.compute(predictions=predictions,references=labels)

### **Training argument class**


The training arguments class is like a training configuration. It allows to specify things like  - `where to save the outputs of the training process, how often to compute metrics, how many epochs to train for the learning rate, and many more`. There are tons of arguments into the constructor for this class. So check out the documentation.

Note that the HuggingFace trainer uses an optimizer called **`Adam W`** by default, which is a slight variation of Adam.This is essentially built in and it's good enough for our purposes.If you want to use a different optimizer, you can always write your own custom training loop in plain Pytorch syntax as well.

In [None]:
from transformers import TrainingArguments

In [None]:
training_args = TrainingArguments(
    'sample_trainer', #  directory where the trained model will be saved,
    evaluation_strategy='epoch',
    save_strategy='epoch',
    num_train_epochs=1

)

### **Trainer**

Trainer object is what we use to run the training process. The arguments to the constructor are quite simple. We pass in the model, the trainer arguments, the training data set, the validation data set, the tokenizer and the metrics.

Now, just call `trainer` and apply `train` method to train.



In [None]:
from transformers import Trainer

In [None]:
trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset = tokenized_datasets["validation"],
    tokenizer =tokenizer,
    compute_metrics=compute_metrics

)

In [None]:
trainer.train()
 # ~ 2 mint for tunning just last classifier layers
 # ~ 8 mint for tunning whole architecture

#### **Saving the model**

In [None]:
trainer.save_model('saved_model')

In [None]:
!ls

#### **Loading the saved model**

In [None]:
from transformers import pipeline

In [None]:
my_model = pipeline('text-classification',model='saved_model',device=0)# ignore_mismatched_sizes=True)

In [None]:
my_model('this is not a good book to read')

#### **Checking whether all parameters have updated or not**

In [None]:
params_after=[]
for  name,p in model.named_parameters():
  params_after.append(p.detach().cpu().numpy())

In [None]:
for p1,p2 in zip(params_before, params_after):
  print(np.sum(np.abs(p1-p2)))