<a href="https://colab.research.google.com/github/Rami-RK/HugingFace_Transformers/blob/main/Hf_Fine_Tunning_BERT_using_Built_in_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Huggingface: Fine Tunning BERT using Built in Data**

### Objectives:

At the end of the experiment you will be able to understand  :

* How datasets are represented in HF
* Trainer and trainingArguments objects
* Computing Metrics
* Saving and loading the trained model

In [None]:
!pip install accelerate -U

In [None]:
!pip install transformers

Hugging Face Datasets Library


In [None]:
!pip install datasets

### **Loading the dataset**
Built in data sets are included with the data sets package. Using a single line of code, we can load in the data set  which  are well formatted  and processed for us to use.

In [None]:
import numpy as np
from datasets import load_dataset
#rd1=load_dataset("amazon_polarity") # dataset for sentiment analysis
rd2=load_dataset("glue","sst2")     # Part of GLUE Benchmark

Calling the load data set function, returns a data set dict object, which is like a dictionary that may contain multiple data sets like -  a train dataset, validation dataset and test dataset.

In [None]:
rd2

Getting different data and individual entries

In [None]:
rd2['train']

Dataset({
    features: ['sentence', 'label', 'idx'],
    num_rows: 67349
})

In [None]:
print(rd2['train'], '\n')
type(rd2['train'])

Dataset({
    features: ['sentence', 'label', 'idx'],
    num_rows: 67349
}) 



datasets.arrow_dataset.Dataset

In [None]:
rd2['train'].data

MemoryMappedTable
sentence: string
label: int64
idx: int32
----
sentence: [["hide new secretions from the parental units ","contains no wit , only labored gags ","that loves its characters and communicates something rather beautiful about human nature ","remains utterly satisfied to remain the same throughout ","on the worst revenge-of-the-nerds clichés the filmmakers could dredge up ",...,"you wish you were at home watching that movie instead of in the theater watching this one ","'s no point in extracting the bare bones of byatt 's plot for purposes of bland hollywood romance ","underdeveloped ","the jokes are flat ","a heartening tale of small victories "],["suspense , intriguing characters and bizarre bank robberies , ","a gritty police thriller with all the dysfunctional family dynamics one could wish for ","with a wonderful ensemble cast of characters that bring the routine day to day struggles of the working class to life ","nonetheless appreciates the art and reveals a music sc

In [None]:
rd2['train'][0]

{'sentence': 'hide new secretions from the parental units ',
 'label': 0,
 'idx': 0}

In [None]:
rd2['train'][10:13]

{'sentence': ['goes to absurd lengths ',
  "for those moviegoers who complain that ` they do n't make movies like they used to anymore ",
  "the part where nothing 's happening , "],
 'label': [0, 0, 0],
 'idx': [10, 11, 12]}

In [None]:
rd2['train'].features

{'sentence': Value(dtype='string', id=None),
 'label': ClassLabel(names=['negative', 'positive'], id=None),
 'idx': Value(dtype='int32', id=None)}

### **Tokenization with DataSets**
Recall that previously, we mentioned the need for truncation, padding and converting the tokenized data into PyTorch Tensors. Here, all of these steps will be handled  behind the scenes and no need to do anything. We are going to do is to partially apply the tokenizer to the data set dict object.



In [None]:
from transformers import AutoTokenizer

In [None]:
checkpoint = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [None]:
tokenized_sentence = tokenizer(rd2['train'][0:3]['sentence'])
from pprint import pprint
pprint(tokenized_sentence)

{'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
 'input_ids': [[101, 5342, 2047, 3595, 8496, 2013, 1996, 18643, 3197, 102],
               [101,
                3397,
                2053,
                15966,
                1010,
                2069,
                4450,
                2098,
                18201,
                2015,
                102],
               [101,
                2008,
                7459,
                2049,
                3494,
                1998,
                10639,
                2015,
                2242,
                2738,
                3376,
                2055,
                2529,
                3267,
                102]]}


By calling the map method and passing in a tokenized function, the data set library automatically knows it should apply the same function to every data set i.e.train, validation and test.

 Note that in this example we apply only truncation but not padding or conversion into Pytorch Tensors. This will be handled behind the scenes by the trainer object we create later.

In [None]:
def tokenize_fn(batch):
  return tokenizer(batch['sentence'],truncation=True)

In [None]:
tokenized_datasets=rd2.map(tokenize_fn,batched=True)

Map:   0%|          | 0/872 [00:00<?, ? examples/s]

### **Loading Pre-Trained Model**

In [None]:
from transformers import AutoModelForSequenceClassification
model =AutoModelForSequenceClassification.from_pretrained(checkpoint,num_labels=2)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.bias', 'classifier.weight', 'pre_classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
type(model)

transformers.models.distilbert.modeling_distilbert.DistilBertForSequenceClassification

In [None]:
model

In [None]:
!pip install torchinfo

Collecting torchinfo
  Downloading torchinfo-1.8.0-py3-none-any.whl (23 kB)
Installing collected packages: torchinfo
Successfully installed torchinfo-1.8.0


In [None]:
from torchinfo import summary
summary(model)

#### **Adding custom classifier layter and Freezing/Un-Freezing different layers**

In [None]:
for name, param in model.named_parameters():
     if name.startswith("distilbert"): # choose whatever you like here
        param.requires_grad = False

In [None]:
# Displaying of layers gradient
for name, param in model.named_parameters():
     print(name, param.requires_grad)

#### **Modifying last classifier layers**

In [None]:
import torch

In [None]:
model.pre_classifier

Linear(in_features=768, out_features=384, bias=True)

In [None]:
print(model.pre_classifier)
model.pre_classifier = torch.nn.Linear(768,384)
print(model.pre_classifier)

Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=384, bias=True)


In [None]:
print(model.classifier)
model.classifier = torch.nn.Linear(384,2)
print(model.classifier)

Linear(in_features=768, out_features=2, bias=True)
Linear(in_features=384, out_features=2, bias=True)


In [None]:
model

#### **Adding extra layer on top**

In [None]:
import copy

In [None]:
cp_model = copy.deepcopy(model)

In [None]:
cp_model.classifier = torch.nn.Sequential(
    (torch. nn.Linear(384, 192)),
    torch.nn.Dropout(0.1),
    (torch.nn.Linear(192, 2)))

In [None]:
cp_model

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [None]:
params_before=[]
for  name,p in model.named_parameters():
  params_before.append(p.detach().cpu().numpy())

In [None]:
params_before

### **Metrics**

In HuggingFace, there are predefined metrics for the built in data sets and we are going to use that but we can define custom metrics as well.

Specifically, there is a function called the load metric, also part of the data sets library. The arguments to this happens to be the same as the arguments into load data set for this example.

More generally, however, you can pass in an argument to specify a specific metric like the Blue score or a python file where you've defined your own computations.

In [None]:
from datasets import load_metric
metric =load_metric("glue","sst2")

But,  we cannot pass this metric into the trainer directly. we need to define another function to do some extra work.

Firstly, the metric we get back from `load_metric` is an object which has a method called `compute`, which will return a dictionary containing the computed metrics for a given task. This compute method it takes in to erase the predictions and the targets called a references.

This will give us back a dictionary with the value of the metric or metrics with their names as keys.

In [None]:
metric.compute(predictions=[1,0,1], references=[1,0,0])

{'accuracy': 0.6666666666666666}

In [None]:
def compute_metrics(logits_and_labels):
  logits, labels = logits_and_labels
  predictions =np.argmax(logits,axis=-1)
  return metric.compute(predictions=predictions,references=labels)

### **Training argument class**


The training arguments class is like a training configuration. It allows to specify things like  - `where to save the outputs of the training process, how often to compute metrics, how many epochs to train for the learning rate, and many more`. There are tons of arguments into the constructor for this class. So check out the documentation.

Note that the HuggingFace trainer uses an optimizer called **`Adam W`** by default, which is a slight variation of Adam.This is essentially built in and it's good enough for our purposes.If you want to use a different optimizer, you can always write your own custom training loop in plain Pytorch syntax as well.

In [None]:
from transformers import TrainingArguments

In [None]:
training_args = TrainingArguments(
    'sample_trainer',
    evaluation_strategy='epoch',
    save_strategy='epoch',
    num_train_epochs=1

)

### **Trainer**

Trainer object is what we use to run the training process. The arguments to the constructor are quite simple. We pass in the model, the trainer arguments, the training data set, the validation data set, the tokenizer and the metrics.

Now, just call `trainer` and apply `train` method to train.



In [None]:
from transformers import Trainer

In [None]:
trainer = Trainer(
    cp_model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset = tokenized_datasets["validation"],
    tokenizer =tokenizer,
    compute_metrics=compute_metrics

)

In [None]:
trainer.train()
 # ~ 2 mint for tunning just last classifier layers
 # ~ 8 mint for tunning whole architecture

Epoch,Training Loss,Validation Loss,Accuracy
1,0.3668,0.383758,0.830275


TrainOutput(global_step=8419, training_loss=0.3795243134222768, metrics={'train_runtime': 138.2525, 'train_samples_per_second': 487.145, 'train_steps_per_second': 60.896, 'total_flos': 515920678674600.0, 'train_loss': 0.3795243134222768, 'epoch': 1.0})

#### **Saving the model**

In [None]:
trainer.save_model('saved_model')

In [None]:
!ls

sample_data  sample_trainer  saved_model


#### **Loading the saved model**

In [None]:
from transformers import pipeline

In [None]:
my_model = pipeline('text-classification',model='saved_model',device=0)

In [None]:
my_model('this is not a good book to read')

[{'label': 'LABEL_0', 'score': 0.8451212644577026}]

#### **Checking whether all parameters have updated or not**

In [None]:
params_after=[]
for  name,p in model.named_parameters():
  params_after.append(p.detach().cpu().numpy())

In [None]:
for p1,p2 in zip(params_before, params_after):
  print(np.sum(np.abs(p1-p2)))