# Lightweight Fine-Tuning Project

## Introduction

Lightweight fine-tuning is one of the most important techniques for adapting foundation models, because it allows you to modify foundation models for your needs without needing substantial computation resources.


- Hugging Face PEFT allows you to fine-tune a model without having to fine-tune all of its parameters.
- Training a model using Hugging Face PEFT requires two additional steps beyong traditional fine-tuning:
    1. Creating a PEFT config
    2. Converting the model into a PEFT model using the PEFT config.
- Inference using a PEFT model is almost identical to inference using a non-PEFT model. The only difference is that it must be loaded as a PEFT model

In [24]:
!pip install -q -U datasets peft
!pip install -q -U transformers
!pip install -q -U scikit-learn

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


In [25]:
import numpy as np
import torch
from datasets import load_dataset
from transformers import GPT2Tokenizer, GPT2ForSequenceClassification, \
    TrainingArguments, Trainer, default_data_collator
from sklearn.metrics import accuracy_score, f1_score

## Training without PEFT

In [26]:
model_name = 'gpt2'
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
model = GPT2ForSequenceClassification.from_pretrained(model_name, num_labels=4)

# Set padding token ID in the model config
model.config.pad_token_id = tokenizer.pad_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [27]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
print("Model device: ", next(model.parameters()).device)

Model device:  cuda:0


In [28]:
dataset = load_dataset('ag_news')
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 120000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 7600
    })
})

In [29]:
local_pc = True

In [30]:
if local_pc:
    train_subset = dataset['train'].select(range(12000))
    test_subset = dataset['test'].select(range(1000))
    print(f"Train dataset size: {len(train_subset)}")
    print(f"Test dataset size: {len(test_subset)}")
else:
    train_subset = dataset['train']
    test_subset = dataset['test']

Train dataset size: 12000
Test dataset size: 1000


In [31]:
def preprocess_data(examples):
    return tokenizer(examples['text'], truncation=True, padding='max_length', max_length=512)

encoded_train_subset = train_subset.map(preprocess_data, batched=True)
encoded_test_subset = test_subset.map(preprocess_data, batched=True)

encoded_train_subset = encoded_train_subset.rename_column("label", "labels")
encoded_test_subset = encoded_test_subset.rename_column("label", "labels")

Map:   0%|          | 0/12000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [32]:
def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    labels = p.label_ids
    accuracy = accuracy_score(labels, preds)
    f1 = f1_score(labels, preds, average='weighted')
    return {'accuracy': accuracy, 'f1': f1}

In [33]:
training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy='epoch',
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=5,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_train_subset,
    eval_dataset=encoded_test_subset,
    data_collator=default_data_collator,
    compute_metrics=compute_metrics,
)



In [34]:
# Evaluating the original model
results = trainer.evaluate(encoded_test_subset)
print(f"Foundation Model Results: {results}")

Foundation Model Results: {'eval_loss': 3.7251856327056885, 'eval_model_preparation_time': 0.0044, 'eval_accuracy': 0.253, 'eval_f1': 0.10587502542509872, 'eval_runtime': 9.1412, 'eval_samples_per_second': 109.394, 'eval_steps_per_second': 13.674}


## Parameter-Efficient Fine-Tuning

In [35]:
from peft import get_peft_model, LoraConfig, TaskType, AutoPeftModelForSequenceClassification, PeftModel

In [36]:
print(list(TaskType))

[<TaskType.SEQ_CLS: 'SEQ_CLS'>, <TaskType.SEQ_2_SEQ_LM: 'SEQ_2_SEQ_LM'>, <TaskType.CAUSAL_LM: 'CAUSAL_LM'>, <TaskType.TOKEN_CLS: 'TOKEN_CLS'>, <TaskType.QUESTION_ANS: 'QUESTION_ANS'>, <TaskType.FEATURE_EXTRACTION: 'FEATURE_EXTRACTION'>]


**Creating a PEFT Config**
- The PEFT config specifies the adapter configuration for your parameter-efficient fine-tuning process.
- The base class for this is a `PeftConfig`, but this example will use `LoraConfig`, the subclass used for low rank adaptation (LoRA).

In [37]:
# Create a PEFT config
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=2,  # rank
    lora_alpha=16,
    lora_dropout=0.1,
    bias='none',
)

## Converting a Transformer Model into a PEFT Model
Once you have a PEFT config object, you can load a Hugging Face `transformers` model as a PEFT model by first loading the pre-trained model as usual.

In [38]:
# Create a PEFT model
peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()  # Checking Training Parameters of a PEFT Model

trainable params: 76,800 || all params: 124,519,680 || trainable%: 0.0617




In [39]:
# Train the PEFT model
print("Fine-tuning the PEFT model...")
trainer.model = peft_model
trainer.train()

Fine-tuning the PEFT model...


Epoch,Training Loss,Validation Loss,Model Preparation Time,Accuracy,F1
1,0.4031,0.38671,0.0036,0.871,0.870383
2,0.3581,0.329669,0.0036,0.9,0.899384
3,0.3454,0.314664,0.0036,0.911,0.910567
4,0.3133,0.313104,0.0036,0.913,0.912511
5,0.3172,0.310975,0.0036,0.91,0.909516


TrainOutput(global_step=7500, training_loss=0.42143507080078124, metrics={'train_runtime': 1427.7357, 'train_samples_per_second': 42.025, 'train_steps_per_second': 5.253, 'total_flos': 1.569224392704e+16, 'train_loss': 0.42143507080078124, 'epoch': 5.0})

In [40]:
# Save the trained PEFT model
# Note: This only saves the adapter weights and not the weights of the original
# Transformer model. Thus the size of the files created will be much smaller than
# you might expect.
peft_model.save_pretrained('./peft_model')


## Inference with PEFT

Because we have only saved the adapter weights and not the full weights, we can't use `from_pretrained()` with the regular Transformers class (e.g., AutoModelForCausalLM). Instead, we need to use the PEFT version (e.g., AutoPeftModelForCausalLM)

In [41]:
# Load the trained PEFT model
trained_peft_model = AutoPeftModelForSequenceClassification.from_pretrained('./peft_model', num_labels=4, pad_token_id=tokenizer.eos_token_id)
#TODO:  use AutoPeftModelForSequenceClassification instead of PeftModel
# trained_peft_model = PeftModel.from_pretrained(trained_peft_model, './peft_model')
trained_peft_model.to(device)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D(nf=2304, nx=768)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=2, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=2, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B):

In [42]:
print("Evaluating the fine-tuned PEFT model...")
trainer.model = trained_peft_model
peft_results = trainer.evaluate()
print(peft_results)

Evaluating the fine-tuned PEFT model...


{'eval_loss': 0.31097498536109924, 'eval_model_preparation_time': 0.0036, 'eval_accuracy': 0.91, 'eval_f1': 0.9095156229827575, 'eval_runtime': 9.6897, 'eval_samples_per_second': 103.203, 'eval_steps_per_second': 12.9, 'epoch': 5.0}


In [47]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
inputs = tokenizer("Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\band of ultra-cynics, are seeing green again.", return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits
predicted_class = torch.argmax(logits, dim=1).item()
print(f"Predicted class: {predicted_class}")

Predicted class: 2


In [50]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
inputs = tokenizer("Policeman 'saw fatal train crash' An off-duty policeman watched a train plough into a car on a level crossing  in Berkshire, killing six people.", return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits
predicted_class = torch.argmax(logits, dim=1).item()
print(f"Predicted class: {predicted_class}")

Predicted class: 0


In [49]:
from datasets import load_dataset
import random

# Load the AG News dataset
dataset = load_dataset("ag_news")

# Get the total number of samples in the training set
num_samples = len(dataset['train'])

# Generate random indices
random_indices = random.sample(range(num_samples), 5)  # Select 5 random indices

# Select random samples
random_samples = dataset['train'].select(random_indices)

# Print the random sample data
for i, item in enumerate(random_samples):
    print(f"Sample {i+1}:")
    print(f"Text: {item['text']}")
    print(f"Label: {item['label']}\n")

Sample 1:
Text: Policeman 'saw fatal train crash' An off-duty policeman watched a train plough into a car on a level crossing  in Berkshire, killing six people.
Label: 0

Sample 2:
Text: Silver finale for USA In the last event of the 2004 Olympic Games, the United States track team produced one last surprise. Meb Keflezighi, a native of Eritrea who moved to the United States as 
Label: 1

Sample 3:
Text: Compuware Blasts IBM #39;s Legal Tactics Two years ago, IBM was ordered to produce the source code for its products, which Compuware identified as containing its pirated intellectual property. The code was missing. But lo and behold -- last week, they called and said they had it, quot; ...
Label: 3

Sample 4:
Text: Polish Hostage Freed in Iraq Already in Warsaw  WARSAW (Reuters) - A Polish woman kidnapped in Iraq last  month has been freed and flown to Poland and said she was  treated well, raising hopes for other foreign hostages.
Label: 0

Sample 5:
Text: Growth forecast revised up t