<a href="https://colab.research.google.com/github/virginiakm1988/Easy-Adapter/blob/main/example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Parameter-efficient Fine-tuning in NLP

This code demonstrates how to fine-tune a BERT model based on the Hugging Face Transformers library with [Easy-Adapter](https://github.com/virginiakm1988/Easy-Adapter). Adapters are a parameter-efficient way to fine-tune a pre-trained language model for a specific NLP task.
This code demonstrates a practical example of using adapters in fine-tuning a BERT model. The code can be adapted to other pre-trained models and NLP tasks.

For any suggestions or questions, please contact Zih-Ching Chen (virginia.chen2007@gmail.com).

## Setup Instructions

Before running the code, please follow these setup instructions:

1. Install the necessary packages by running the following command: 

   ```
   ! pip install transformers datasets
   ! pip install loralib
   ```

2. Check that your system has a compatible GPU installed by running the following command in your terminal:

   ```
   nvidia-smi
   ```


Once you have completed these setup instructions, you are ready to run the code.

In [4]:
! pip install transformers datasets
! pip install loralib
! nvidia-smi

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Fri Dec 22 22:34:04 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  On   | 00000000:B1:00.0 Off |                    0 |
| N/A   27C    P0    42W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                      

In [5]:
! git clone https://github.com/virginiakm1988/Easy-Adapter.git

fatal: destination path 'Easy-Adapter' already exists and is not an empty directory.


In [6]:
%cd /content/Easy-Adapter/

[Errno 2] No such file or directory: '/content/Easy-Adapter/'
/home/b0990106x/Easy-Adapter


## Define custom adapter modules
In [`adapters.py`](https://github.com/virginiakm1988/Easy-Adapter/blob/main/adapters.py), we implemented `Houlsby`, `ConvAdapters`, `AdapterBias`, and `LoRA`.
1. Houlsby Adapter ([Parameter-Efficient Transfer Learning for NLP](https://http://proceedings.mlr.press/v97/houlsby19a.html))
2. ConvAdapter ([CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models](https://arxiv.org/abs/2212.01282))
3. AdapterBias ([AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks](https://arxiv.org/abs/2205.00305))

4. LoRA ([LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685))

5. BitFit ([BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models](https://arxiv.org/abs/2106.10199)
BitFit can be implemented through the following settings:
```
mark_only_adapter_as_trainable(model_bert,bias="all")
```



## Adding Adapter to a Hugging Face Model


The code uses the `AutoModelForSequenceClassification` class to load a pre-trained BERT model (`bert-base-uncased` in this case). The code then sets the adapter type to "houlsby" and specifies the `lora_r` and `lora_alpha` parameters for the adapter.

The code then modifies each layer of the BERT encoder to include the adapter module. The output layer is modified using the `adapted_bert_output` function from the `adapter_bert` module, and the attention layer is modified using the `AdaptedBertSelfAttention` class from the same module.

Finally, the code freezes all parameters except for the adapter module by calling the `mark_only_adapter_as_trainable` function from the `utils` module.

In [36]:
from transformers import AutoModelForSequenceClassification
from adapter_bert import adapted_bert_output, AdaptedBertSelfAttention
from utils import mark_only_adapter_as_trainable
from torch import nn
import torch

BertLayerNorm = torch.nn.LayerNorm

model_bert = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
model_bert.config.adapter = "houlsby" # houlsby, conv_adapter, AdapterBias
model_bert.config.lora_r = 8
model_bert.config.lora_alpha = 8
original_state_dict = model_bert.state_dict()
#add adapter module in a bert model
for idx, layer in enumerate(model_bert.bert.encoder.layer):
  #modify the output layer
  model_bert.bert.encoder.layer[idx].output = adapted_bert_output(model_bert.bert.encoder.layer[idx].output, model_bert.config)
  #modify the attention layer for adding lora
  model_bert.bert.encoder.layer[idx].attention.self = AdaptedBertSelfAttention(model_bert.bert.encoder.layer[idx].attention.self, model_bert.config)
model_bert.load_state_dict(original_state_dict,strict = False)
#freeze parameters
mark_only_adapter_as_trainable(model_bert)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Loading datasets
### Tokenizing an IMDb Dataset with BERT

This code demonstrates a practical example of loading and tokenizing a dataset using the Hugging Face Datasets and Transformers libraries. The resulting tokenized datasets can be used for fine-tuning a pre-trained language model on the IMDb sentiment classification task.

In [37]:
from datasets import load_dataset
raw_datasets = load_dataset("imdb")
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))
full_train_dataset = tokenized_datasets["train"]
full_eval_dataset = tokenized_datasets["test"]


## Define evaluation metric

In [38]:
import numpy as np
def compute_metrics(p) :
    preds,labels=p
    preds = np.argmax(preds, axis=-1)
    # print('shape:', preds.shape, '\n')
    # precision, recall, f1, _ = precision_recall_fscore_support(lables.flatten(), preds.flatten(), average='weighted', zero_division=0)
    print((preds == p.label_ids).mean())
    return {
        'accuracy': (preds == p.label_ids).mean(),

    }

## Training a BERT Model on IMDb Dataset

In [39]:
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(output_dir = "test-trainer",per_device_train_batch_size = 4)
trainer = Trainer(
    model=model_bert, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset, compute_metrics=compute_metrics
)
trainer.train() 

Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


Step,Training Loss
500,0.7022


TrainOutput(global_step=750, training_loss=0.6970359903971354, metrics={'train_runtime': 78.215, 'train_samples_per_second': 38.356, 'train_steps_per_second': 9.589, 'total_flos': 813893437440000.0, 'train_loss': 0.6970359903971354, 'epoch': 3.0})

## Evaluate

In [40]:
# trainer.evaluate()
evaluation_results = trainer.evaluate()
print(evaluation_results)

0.558
{'eval_loss': 0.6842154264450073, 'eval_accuracy': 0.558, 'eval_runtime': 10.7038, 'eval_samples_per_second': 93.425, 'eval_steps_per_second': 11.678, 'epoch': 3.0}


## Saving Adapter State Dictionary in a BERT Model


In [41]:
from utils import adapter_state_dict
checkpoint_path = "./result"
torch.save(adapter_state_dict(model_bert), checkpoint_path)