# Lightweight Fine-Tuning Project

## Project Overview

This project applies Parameter-Efficient Fine-Tuning (PEFT) to a pre-trained language model for a text classification task. The performance of the base model is compared with the PEFT-enhanced model to demonstrate the benefits of this approach.

PEFT techniques allow fine-tuning of large pre-trained models with much fewer trainable parameters than traditional full fine-tuning. This makes the process more efficient in terms of computation, memory usage, and storage requirements. LoRA (Low-Rank Adaptation) is one such technique that adds small trainable "adapter" modules to the model while keeping most of the original parameters frozen.

### Choices for this project:

* **PEFT technique**: Low-Rank Adaptation (LoRA) - A popular and effective PEFT method that adds trainable rank decomposition matrices to existing weights.
* **Model**: GPT-2 - A versatile language model that can be adapted for classification tasks.
* **Evaluation approach**: Accuracy on a test set using the Hugging Face Trainer.
* **Fine-tuning dataset**: AG News - A collection of news articles categorized into 4 classes: World, Sports, Business, and Sci/Tech.

## 1. Loading Libraries and Dataset

This section imports the necessary libraries and loads the AG News dataset. The dataset contains news articles categorized into four classes: World, Sports, Business, and Sci/Tech. This dataset serves as the basis for the text classification task.

In [1]:
# Import necessary libraries

from   datasets     import load_dataset
from   transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from   transformers import DataCollatorWithPadding
from   peft         import LoraConfig, get_peft_model, TaskType, PeftConfig, PeftModel, PeftModelForSequenceClassification
from   peft         import AutoPeftModelForSequenceClassification
import numpy        as     np
import torch
import os   

In [2]:
# Create directories for saving models
os.makedirs("./data/foundation_model_pretrained", exist_ok=True)
os.makedirs("./data/foundation_model_finetuned", exist_ok=True)
os.makedirs("./data/peft_model_saved", exist_ok=True)

In [3]:
# Load the AG News dataset
dataset = load_dataset("ag_news")

Downloading readme:   0%|          | 0.00/8.07k [00:00<?, ?B/s]

Downloading data: 100%|██████████| 18.6M/18.6M [00:00<00:00, 20.1MB/s]
Downloading data: 100%|██████████| 1.23M/1.23M [00:00<00:00, 9.35MB/s]


Generating train split:   0%|          | 0/120000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7600 [00:00<?, ? examples/s]

In [4]:
# Print detailed information about the dataset

print("The dataset structure:\n", dataset, '\n')
print("# "+ "=" * 50, '\n')
print("Dataset Info:")
print(dataset['train'].info)

# Alternatively, print features and other relevant information
print("Features:", dataset['train'].features, '\n')
print("Number of training samples:", len(dataset['train']))
print("Number of test samples    :", len(dataset['test']))

The dataset structure:
 DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 120000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 7600
    })
}) 


Dataset Info:
DatasetInfo(description='', citation='', homepage='', license='', features={'text': Value(dtype='string', id=None), 'label': ClassLabel(names=['World', 'Sports', 'Business', 'Sci/Tech'], id=None)}, post_processed=None, supervised_keys=None, task_templates=None, builder_name='parquet', dataset_name='ag_news', config_name='default', version=0.0.0, splits={'train': SplitInfo(name='train', num_bytes=29832303, num_examples=120000, shard_lengths=None, dataset_name='ag_news'), 'test': SplitInfo(name='test', num_bytes=1880424, num_examples=7600, shard_lengths=None, dataset_name='ag_news')}, download_checksums={'hf://datasets/ag_news@eb185aade064a813bc0b7f42de02595523103ca4/data/train-00000-of-00001.parquet': {'num_bytes': 18585438, 'checksum': None}, 'hf://datase

## 2. Creating Smaller Dataset for Faster Training

For the purposes of this project, a smaller subset of the AG News dataset is used to make training faster. This smaller dataset demonstrates the effectiveness of PEFT techniques while keeping computational requirements manageable.

The implementation randomly samples 1000 training examples and 500 test examples from the original dataset. This is sufficient to show meaningful results while keeping training time reasonable.

In [5]:
# Use this cell to generate a smaller shuffeled set of AG News
# =======================================================================================

# how many lines define
train_size = 1250
test_size  = 500
splits  = ["train", "test"]
dataset = {split: df for split, df in zip(splits, load_dataset("ag_news", split=splits))}

# Thin out the dataset to make it run faster for this example
dataset['train'] = dataset['train'].shuffle(seed=42).select(range(train_size))      # Shuffle and select 1000 for training
dataset['test' ] = dataset['test' ].shuffle(seed=42).select(range(test_size ))      # Shuffle and select 500 for testing

# Show the dataset
dataset

{'train': Dataset({
     features: ['text', 'label'],
     num_rows: 1250
 }),
 'test': Dataset({
     features: ['text', 'label'],
     num_rows: 500
 })}

## 3. Mapping Class Labels and Setting up Tokenizer

This section creates mappings between class labels and their numeric IDs. This is important for both training and inference, as it allows translation between human-readable categories (like "World" or "Sports") and the numeric representations used by the model.

The code also sets up the tokenizer for GPT-2, which converts text into the numeric format required by the model. Since GPT-2 wasn't designed with a padding token, the end-of-sequence (EOS) token is used as a padding token, which is a common practice.

In [6]:
# Mapping Class Labels to IDs and Vice Versa
# --------------------

labels             = dataset['train'].features["label"].names
label2id, id2label = dict(), dict()

for i, label in enumerate(labels):
    label2id[label] = i
    id2label[i]     = label
    
    
print("label2id:", label2id)
print("id2label:", id2label)

label2id: {'World': 0, 'Sports': 1, 'Business': 2, 'Sci/Tech': 3}
id2label: {0: 'World', 1: 'Sports', 2: 'Business', 3: 'Sci/Tech'}


In [7]:
# Set up the GPT-2 Tokenizer
tokenizer           = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

#max_length = 128
def preprocess_function(News):
    """
        Preprocess the AG News dataset by returning tokenized examples.
    """
    return tokenizer(News['text'], padding="max_length", truncation = True)   #, max_length=128)   #padding=True,

tokenized_df = {}
for split in  ["train", "test"]:
    tokenized_df[split] = dataset[split].map(preprocess_function, batched = True)
    
tokenized_df

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Map:   0%|          | 0/1250 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

{'train': Dataset({
     features: ['text', 'label', 'input_ids', 'attention_mask'],
     num_rows: 1250
 }),
 'test': Dataset({
     features: ['text', 'label', 'input_ids', 'attention_mask'],
     num_rows: 500
 })}

In [8]:
# Check a sample of the tokenized data
print(tokenized_df['train'][0]['input_ids'][:128])
print("Input IDs length:", len(tokenized_df['train'][0]['input_ids']))

[43984, 75, 13410, 1582, 47557, 416, 8956, 29560, 7941, 423, 3181, 867, 11684, 290, 4736, 287, 19483, 284, 257, 17369, 11, 262, 1110, 706, 1248, 661, 3724, 287, 23171, 379, 257, 1964, 7903, 13, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256]
Input IDs length: 1024


## 4. Loading and Evaluating the Foundation Model WITHOUT Training

This section evaluates the raw pre-trained GPT-2 model on the classification task without any fine-tuning. This provides the true baseline performance of the model.

The model is configured for sequence classification with 4 output classes corresponding to the news categories. Since GPT-2 wasn't pre-trained for classification tasks, the `ignore_mismatched_sizes` parameter handles differences in model architecture.

This evaluation establishes how well the pre-trained model performs on our specific task before any adaptation is applied.

In [9]:
# Define compute_metrics function first
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions         = np.argmax(predictions, axis=1)
    
    return {"accuracy": (predictions == labels).mean()}



# Load the foundation model for evaluation
foundation_model = AutoModelForSequenceClassification.from_pretrained("gpt2", 
                                                                      num_labels              = 4,
                                                                      id2label                = id2label,
                                                                      label2id                = label2id,
                                                                      ignore_mismatched_sizes = True)
foundation_model.config.pad_token_id = tokenizer.eos_token_id

# Setup evaluation trainer for foundation model
foundation_eval_trainer = Trainer(
                                model                      = foundation_model,
                                args                       = TrainingArguments(
                                output_dir                 = "./data/foundation_model_pretrained",
                                per_device_eval_batch_size = 1,
                                ),
    
                        eval_dataset    = tokenized_df["test"],
                        tokenizer       = tokenizer,
                        data_collator   = DataCollatorWithPadding(tokenizer),
                        compute_metrics = compute_metrics,
                                  )

# Evaluate foundation model (no training)
print("Evaluating foundation model WITHOUT training:")
foundation_model_results = foundation_eval_trainer.evaluate()
print("Foundation Model Results (before training):", foundation_model_results)

# Save the pretrained foundation model
foundation_model.save_pretrained("./data/foundation_model_pretrained")
print("Foundation model (pretrained) saved to ./data/foundation_model_pretrained", '\n')
foundation_model_results

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Evaluating foundation model WITHOUT training:


Foundation Model Results (before training): {'eval_loss': 4.468132019042969, 'eval_accuracy': 0.214, 'eval_runtime': 41.2338, 'eval_samples_per_second': 12.126, 'eval_steps_per_second': 12.126}
Foundation model (pretrained) saved to ./data/foundation_model_pretrained 



{'eval_loss': 4.468132019042969,
 'eval_accuracy': 0.214,
 'eval_runtime': 41.2338,
 'eval_samples_per_second': 12.126,
 'eval_steps_per_second': 12.126}

## 5. Training and Evaluating the Foundation Model

In [10]:
# Load a fresh copy of the model for training
base_model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels      = 4,
                                                                id2label                = id2label,
                                                                label2id                = label2id,
                                                                ignore_mismatched_sizes = True) # Needed because GPT-2 wasn't pre-trained for classification


for param in base_model.base_model.parameters():
    param.requires_grad = False
print(base_model)

# Set the pad token in the model config
base_model.config.pad_token_id = tokenizer.eos_token_id


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=4, bias=False)
)


In [11]:
# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
base_trainer  = Trainer(
                        model          = base_model,  # Load the model with 4 labels
                        args           = TrainingArguments(
                                                        output_dir                  = "./data/sentiment_analysis",
                                                        learning_rate               = 5e-4, # 2e-3,
                                                        per_device_train_batch_size = 1,
                                                        per_device_eval_batch_size  = 1,
                                                        num_train_epochs            = 3,
                                                        weight_decay                = 0.01,
                                                        evaluation_strategy         = "epoch",
                                                        save_strategy               = "epoch",
                                                        load_best_model_at_end      = True,
                                                                ),
                        train_dataset   = tokenized_df["train"],
                        eval_dataset    = tokenized_df["test"],
                        tokenizer       = tokenizer,
                        data_collator   = DataCollatorWithPadding(tokenizer),
                        compute_metrics = compute_metrics,
                        )


# Train the base model
print("Training base model...")
base_training_results = base_trainer.train()

Training base model...


Epoch,Training Loss,Validation Loss,Accuracy
1,0.7302,0.542354,0.81
2,0.5476,0.628192,0.81
3,0.4245,0.566699,0.82


### Evaluate ans save the trained base model

In [12]:
# Evaluate trained base model
print("Evaluating base model AFTER training:")
base_model_results = base_trainer.evaluate()
print("Base Model Results (after training):", base_model_results)

# Save the fine-tuned base model
base_model.save_pretrained("./data/foundation_model_finetuned")
print("Foundation model (fine-tuned) saved to ./data/foundation_model_finetuned")


Evaluating base model AFTER training:


Base Model Results (after training): {'eval_loss': 0.5423539876937866, 'eval_accuracy': 0.81, 'eval_runtime': 43.5674, 'eval_samples_per_second': 11.476, 'eval_steps_per_second': 11.476, 'epoch': 3.0}
Foundation model (fine-tuned) saved to ./data/foundation_model_finetuned


In [13]:
base_model_results

{'eval_loss': 0.5423539876937866,
 'eval_accuracy': 0.81,
 'eval_runtime': 43.5674,
 'eval_samples_per_second': 11.476,
 'eval_steps_per_second': 11.476,
 'epoch': 3.0}

## 6. Performing Parameter-Efficient Fine-Tuning

This section creates a PEFT model using LoRA and fine-tunes it on the same dataset. LoRA works by adding low-rank decomposition matrices to existing weights in the model, which allows adaptation of the model's behavior with very few trainable parameters.

The key insight of LoRA is to represent weight updates as a product of two smaller matrices: ΔW = BA, where B and A are low-rank matrices. This dramatically reduces the number of trainable parameters while still allowing the model to adapt to new tasks.

Key parameters in the LoRA configuration:
- `r=8`: The rank of the low-rank matrices (smaller r means fewer parameters)
- `lora_alpha=32`: The scaling factor for the LoRA layers (controls update magnitude)
- `lora_dropout=0.1`: Dropout probability for regularization
- `target_modules=None`: Apply LoRA to all linear layers in the model
- `bias="none"`: Don't train bias parameters
- `task_type="SEQ_CLS"`: Configure for sequence classification

After setting up the LoRA configuration, the model is converted to a PEFT model and the trainer is initialized with the same parameters used for the base model. This ensures a fair comparison between the two approaches.

In [14]:
# Load the base model
peft_base_model = AutoModelForSequenceClassification.from_pretrained("gpt2", 
                                                                     num_labels = 4, 
                                                                     id2label   = id2label,
                                                                     label2id   = label2id,
                                                                     ignore_mismatched_sizes = True
                                                                     )


# Set the pad token in the model config
peft_base_model.config.pad_token_id = tokenizer.eos_token_id

# Freeze the base model parameters
# This is important because LoRA will add adapter modules rather than modify these weights directly
# for param in peft_base_model.base_model.parameters():
#     param.requires_grad = False

# for param in peft_base_model.transformer.parameters():
#     param.requires_grad = False

    

lora_config = LoraConfig(
                        r              = 8,
                        lora_alpha     = 32,
                        lora_dropout   = 0.1,
                        target_modules = ["c_attn", "c_proj"],  # Specifically target attention modules in GPT-2
                        bias           = "none",
                        task_type      = TaskType.SEQ_CLS  # "SEQ_CLS" Sequence classification task  # used to be "SEQ_CLS"  # Sequence classification task
                        )


# Convert the model to a PEFT model
peft_model = get_peft_model(peft_base_model, lora_config)

# peft_model = PeftModelForSequenceClassification(model, lora_config)
peft_model.print_trainable_parameters()

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 817,152 || all params: 125,256,960 || trainable%: 0.6523805144241086


In [15]:
# Initialize the Trainer
peft_trainer  = Trainer(
                        model = peft_model,  # Use the PEFT model
                        args  = TrainingArguments(
                                            output_dir                  = "./data/lora_sentiment_analysis",
                                            learning_rate               = 5e-4,
                                            per_device_train_batch_size = 1,
                                            per_device_eval_batch_size  = 1,
                                            num_train_epochs            = 3,
                                            weight_decay                = 0.01,
                                            evaluation_strategy         = "epoch",
                                            save_strategy               = "epoch",
                                            load_best_model_at_end      = True,
                                                 ),
                        train_dataset   = tokenized_df["train"],    # Ensure this contains input_ids
                        eval_dataset    = tokenized_df["test"],     # Ensure this contains input_ids
                        tokenizer       = tokenizer,
                        data_collator   = DataCollatorWithPadding(tokenizer=tokenizer),  # , return_tensors="pt"
                        compute_metrics = compute_metrics,
                       )

print("Starting PEFT training...")
peft_training_results = peft_trainer.train()

Starting PEFT training...


Epoch,Training Loss,Validation Loss,Accuracy
1,0.7609,1.084642,0.846
2,0.6759,1.018301,0.864
3,0.2799,0.961758,0.874


## 7. Evaluating the PEFT Model

In this section, the PEFT model is evaluated and its performance is compared to the original model. This comparison helps understand the effectiveness of the LoRA technique for this specific task.

The accuracy is measured on the test set and the improvement gained from using PEFT is calculated. This demonstrates whether the parameter-efficient approach can match or exceed the performance of the base model while updating fewer parameters.

In [16]:
# Evaluate PEFT model
print("Evaluating PEFT model:")
peft_model_results = peft_trainer.evaluate()
print("PEFT Model Results:", "\n")
peft_model_results

Evaluating PEFT model:


PEFT Model Results: 



{'eval_loss': 0.9617578983306885,
 'eval_accuracy': 0.874,
 'eval_runtime': 46.7877,
 'eval_samples_per_second': 10.687,
 'eval_steps_per_second': 10.687,
 'epoch': 3.0}

### Save the PEFT model

In [17]:
# Save the PEFT model
peft_model.save_pretrained("./data/peft_model_saved")
print("PEFT model saved to ./data/peft_model_saved")

# Save the tokenizer alongside the model
tokenizer.save_pretrained("./data/peft_model_saved")
print("Tokenizer saved with the model")

PEFT model saved to ./data/peft_model_saved
Tokenizer saved with the model


## 7. Performance Comparison of All Models

In [18]:
# Performance Comparison of All Models
print("\n----- Performance Comparison of All Models -----")
print(f"Foundation Model (before training) Accuracy: {foundation_model_results['eval_accuracy']:.4f}")
print(f"Foundation Model (after training) Accuracy: {base_model_results['eval_accuracy']:.4f}")
print(f"PEFT Model Accuracy: {peft_model_results['eval_accuracy']:.4f}")
print(f"Improvement (PEFT vs. Foundation before training): {peft_model_results['eval_accuracy'] - foundation_model_results['eval_accuracy']:.4f}")
print(f"Improvement (PEFT vs. Foundation after training): {peft_model_results['eval_accuracy'] - base_model_results['eval_accuracy']:.4f}")


----- Performance Comparison of All Models -----
Foundation Model (before training) Accuracy: 0.2140
Foundation Model (after training) Accuracy: 0.8100
PEFT Model Accuracy: 0.8740
Improvement (PEFT vs. Foundation before training): 0.6600
Improvement (PEFT vs. Foundation after training): 0.0640


## 8. Loading the Saved PEFT Model for Inference

This section demonstrates how to load the saved PEFT model for inference on new text. This is a crucial step, as it shows how the fine-tuned model can be applied to real-world examples.

The sample texts cover different news categories to demonstrate the model's ability to classify various types of news articles. These predictions show that the PEFT-enhanced model can effectively categorize news text into the appropriate classes.

In [19]:
# Load the saved PEFT model using the correct class
#loaded_peft_model = AutoPeftModelForSequenceClassification.from_pretrained("./data/peft_model_saved")

In [None]:
# First load the config to get information about the model
peft_config = PeftConfig.from_pretrained("./data/peft_model_saved")

# Load the PEFT model with the correct number of labels
loaded_peft_model = AutoPeftModelForSequenceClassification.from_pretrained(
    "data/peft_model_saved",  # Adapter weights
    num_labels = 4,  # Ensure this matches your number of classes
    id2label   = id2label,
    label2id   = label2id
)

# Create a simple inference function
def predict_class(text):
    # Tokenize the input text
    inputs = tokenizer(text, 
                       padding=True, 
                       truncation=True, 
                       return_tensors="pt"
                      )
    
    # Get the prediction
    with torch.no_grad():
        outputs     = loaded_peft_model(**inputs)
        predictions = outputs.logits.argmax(dim=-1)
    
    # Return the predicted class
    return id2label[predictions.item()]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## 9. Performing Inference with a PEFT Model

Demonstrating Inference with AutoPeftModelForSequenceClassification

In [21]:


# First load the config
peft_config = PeftConfig.from_pretrained("./data/peft_model_saved")

# Load a base model with the correct number of labels
base_model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=4,  # Explicitly set to match your saved model
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True
)
base_model.config.pad_token_id = tokenizer.eos_token_id

# Now load the PEFT adapter onto the correctly configured base model
from peft import PeftModel
loaded_peft_model = PeftModel.from_pretrained(
    base_model,
    "./data/peft_model_saved",
    is_trainable=False
)

# Create inference function
def predict_class(text):
    # Tokenize the input text
    inputs = tokenizer(text, 
                      padding=True, 
                      truncation=True, 
                      return_tensors="pt"
                     )
    
    # Get the prediction
    with torch.no_grad():
        outputs = loaded_peft_model(**inputs)
        predictions = outputs.logits.argmax(dim=-1)
    
    # Return the predicted class
    return id2label[predictions.item()]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [22]:
# Test the model with sample texts
sample_texts = [
    "Wall Street edges higher as tech stocks gain",
    "Manchester United secure dramatic win in Premier League clash",
    "Scientists discover new exoplanet in habitable zone",
    "Federal Reserve holds interest rates steady amid economic uncertainty"
]

print("Model Predictions:")
for text in sample_texts:
    predicted_class = predict_class(text)
    print(f"Text: {text}\nPredicted Class: {predicted_class}\n")

Model Predictions:
Text: Wall Street edges higher as tech stocks gain
Predicted Class: Business

Text: Manchester United secure dramatic win in Premier League clash
Predicted Class: Sports

Text: Scientists discover new exoplanet in habitable zone
Predicted Class: Sci/Tech

Text: Federal Reserve holds interest rates steady amid economic uncertainty
Predicted Class: Business



## 10. Conclusion

This project successfully implemented Parameter-Efficient Fine-Tuning using LoRA on a GPT-2 model for news classification. The results demonstrate several key findings:

1. **Foundation vs Fine-tuned vs PEFT Performance**: 
   - Foundation Model (no training): 21.4% accuracy
   - Fine-tuned Foundation Model: 81% accuracy
   - PEFT Model with LoRA: 87.4% accuracy

2. **Parameter Efficiency**: As shown by the `print_trainable_parameters()` output, the PEFT approach required training only [P]% of the parameters compared to full fine-tuning.

3. **Practical Applicability**: The inference examples demonstrate that the model can effectively categorize different types of news articles into their appropriate classes, with [specific examples from your results].

4. **Inference Deployment**: The code demonstrates how to properly load and use a saved PEFT model for inference using `AutoPeftModelForSequenceClassification`, making it ready for real-world applications.

These results confirm that PEFT techniques like LoRA offer an efficient way to adapt large pre-trained models to specific tasks while maintaining or improving performance. This approach is particularly valuable when working with limited computational resources or when multiple task-specific adaptations of a model are needed.

Future work could explore:
- Different PEFT techniques beyond LoRA
- Combining multiple PEFT adapters for different tasks
- Applying these techniques to larger models where the efficiency gains would be even more significant