# Fine Tuning using ReFT Adapters

In this tutorial, we will be demonstrating how to fine-tune a language model using [Representation Finetuning for Language Models](https://arxiv.org/abs/2404.03592)

We will use a lightweight encoder model and focus on fine tuning via ReFT adapters rather than the traditional full model fine tuning.

For more information on the ReFT, you can always visit their GitHub [page](https://github.com/stanfordnlp/pyreft)

### Installation

Before we can get started, we need to ensure the proper packages are installed. Here's a breakdown of what we need:

- `adapters` and `accelerate` for efficient fine-tuning and training optimization
- `evaluate` for metric computation and model evaluation

In [1]:
!pip install -qq evaluate>=0.30
!pip install -qq -U adapters accelerate

In [2]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

### Dataset

In this tutorial, we will be using the `nyu-mll/glue` dataset, created by the GLUE benchmark organization. The GLUE dataset is a collection of language datasets used to benchmark and analyze language systems. We will be using the matched Multi-Genre Natural Language Inference dataset, where the language model is tasked to classify a hypothesis statement given a premise. There are 3 classifications: "neutral", "contradiction", and "entailment".

For comparison purposes, we will finetune roberta-base as used in the original paper, but you can swap out the model for any that you prefer if needed. We will also use a similar set of hyperparameters that was used in the paper for both the model training and the reft adapter. We will train for a reduced number of epochs to allow for less computational and time usage.

In [3]:
from datasets import load_dataset

ds = load_dataset("nyu-mll/glue", "mnli")

In [4]:
print(ds.keys())

dict_keys(['train', 'validation_matched', 'validation_mismatched', 'test_matched', 'test_mismatched'])


We will use the `train` and `validation_matched` splits for our training and testing

In [5]:
train_dataset = ds["train"]
eval_dataset = ds["validation_matched"]

In [5]:
#initialize the model
model_name_or_path = "roberta-base"
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

### Data Preprocessing

We will return the example and the hypothesis statements with max padding, and remove the column `idx` as it is not needed for training

In [7]:
def preprocess_function(example):
    return tokenizer(example['premise'], example['hypothesis'], return_tensors = "pt", truncation=True, padding='max_length')

train_dataset = train_dataset.map(preprocess_function, batched=True)
eval_dataset = eval_dataset.map(preprocess_function, batched=True)

In [8]:
train_dataset = train_dataset.remove_columns(["idx"])
eval_dataset = eval_dataset.remove_columns(["idx"])

In [9]:
from transformers import default_data_collator

data_collator = default_data_collator

### Model and Adapter initialization

We load the `roberta-base` model along with the `LoReftConfig`. We can initalize a `reft` config with only one line of code, and can add it to our base model using the `add_adapter` function. On top of that, we can add a classification head to our adapter specifying 3 labels.

For more information on the `reft` adapter implementation in `adapters`, you can visit our docs page https://docs.adapterhub.ml/methods.html#reft for more explanations on supported configs and their corresponding parameters

Don't forget to activate the adapter that you want to train on.

In [10]:
from adapters import AutoAdapterModel, LoReftConfig
model = AutoAdapterModel.from_pretrained(model_name_or_path)

config = LoReftConfig(r = 1, prefix_positions = 1, dropout = 0.05)
model.add_adapter("loreft_adapter", config=config)
model.add_classification_head("loreft_adapter", num_labels=3)
model.train_adapter("loreft_adapter")

Some weights of RobertaAdapterModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['heads.default.3.bias', 'roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [11]:
print(model.adapter_summary())

Name                     Architecture         #Param      %Param  Active   Train
--------------------------------------------------------------------------------
loreft_adapter           reft                 36,888       0.030       1       1
--------------------------------------------------------------------------------
Full model                               124,645,632     100.000               0


### Evaluation

We'll use accuracy as our main metric to evaluate the perforce of the reft model on the `mnli` dataset

In [9]:
import evaluate
import numpy as np
accuracy_metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return accuracy_metric.compute(predictions=predictions, references=labels)


### Training

Now we are ready to train our model. We will use the same set of hyper-parameters used to train the `mnli` dataset in the original paper except for the number of epochs to train on. For this tutorial, we will only train on 2 epochs for demo purposes, and it is usually expected to train on more epochs. You are always welcome to tweak the hyperparameters to your own needs!

In [13]:
from adapters import AdapterTrainer
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy='epoch',
    learning_rate=6e-4,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    num_train_epochs=2,
    weight_decay=0.00,
    optim = "adamw_hf",
    lr_scheduler_type = "linear",
    warmup_ratio= 6e-2,
)

In [None]:
trainer = AdapterTrainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    compute_metrics = compute_metrics
)

trainer.train()



Epoch,Training Loss,Validation Loss,Accuracy
1,0.603,0.508186,0.795619
2,0.5598,0.475799,0.812124


TrainOutput(global_step=24544, training_loss=0.6184384013092658, metrics={'train_runtime': 13249.0278, 'train_samples_per_second': 59.28, 'train_steps_per_second': 1.853, 'total_flos': 2.0971423089834394e+17, 'train_loss': 0.6184384013092658, 'epoch': 2.0})

With only as little as  0.030% of the parameters, we were able to successfully achieve 0.81 accuracy within only two epochs! We also reduced the time to train the model drastically than if we were to fully fine-tune the model using traditional methods.

### Inference

With our tuned model using `reft` adapters, we can now do some inference on some new text!

In [15]:
import torch
mapping = {
    0: "neutral",
    1: "entailment",
    2: "contradiction"
}

def infer_text(text):
    input_ids = tokenizer(text, truncation=True, padding='max_length', return_tensors = "pt")
    outputs = model(input_ids = input_ids["input_ids"], attention_mask = input_ids["attention_mask"])
    logits = outputs["logits"]
    p = torch.nn.functional.softmax(logits).detach().numpy()
    prediction = np.argmax(p, axis = -1)
    print(f"The classification of this text is: {mapping[prediction[0]]}")

In [16]:
text = ["I like Apple Pie. I don't like Apple Pie"]

In [17]:
infer_text(text)

The classification of this text is: contradiction


  p = torch.nn.functional.softmax(logits).detach().numpy()


### Saving the adapter model

If you would like to save your model or push it to HuggingFace, you can always do so with the below code. Make sure to sign in before you do

In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.svâ€¦

In [None]:
model.save_adapter("./reft_adapter", "loreft_adapter")

In [None]:
model.push_adapter_to_hub(
    "roberta-base-reft-adapter",
    "loreft_adapter",
    datasets_tag="mnli"
)