# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: We chose the LoRA (Low-Rank Adaptation) technique for Parameter-Efficient Fine-Tuning (PEFT). LoRA reduces the number of trainable parameters by decomposing the weight updates into low-rank matrices. This approach is computationally efficient and allows fine-tuning large language models with limited resources.
* Model: The model selected for this project is GPT-2, specifically configured for sequence classification. GPT-2 is a robust, pre-trained language model known for its versatility in various NLP tasks. By using the GPT-2 model, we leverage its pre-trained knowledge and adapt it to the specific task of classifying well-formed queries.
* Evaluation approach: The evaluation approach involves using the Hugging Face Trainer's evaluate method. This method provides a comprehensive evaluation framework, including metrics computation. The primary metric for this project is accuracy, which measures the proportion of correctly classified queries.
* Fine-tuning dataset: The dataset used for fine-tuning is the "Google Query Wellformedness" dataset. This dataset consists of queries annotated for well-formedness, providing a binary classification task. Each query is rated on a scale from 0 to 1, and for this project, ratings are converted to binary labels, with 1 indicating a well-formed query (rating > 0.5) and 0 indicating a poorly-formed query (rating ≤ 0.5). This dataset is suitable for adapting GPT-2 to understand and classify the well-formedness of queries accurately.



## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
%pip install -U scikit-learn
%config Completer.use_jedi = False

Note: you may need to restart the kernel to use updated packages.


In [2]:
# Import necessary libraries
from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments
from transformers import DataCollatorWithPadding  # Import data collator for padding
from datasets import load_dataset  # Import function to load dataset
from sklearn.metrics import accuracy_score  # Import accuracy score function from scikit-learn
import numpy as np  # Import NumPy library for numerical operations
import pandas as pd  # Import Pandas library for data manipulation


In [3]:
# Define the dataset splits to load
splits = ["train", "validation", "test"]
# Load the dataset for each split using the load_dataset function from the datasets library
# The dataset used here is "google-research-datasets/google_wellformed_query"
# It contains data for training, validation, and testing
# The splits variable specifies which splits to load
dataset = {split: load_dataset("google-research-datasets/google_wellformed_query", split=split) for split in splits}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


In [4]:
# Iterate through each split in the dataset
for split in dataset:
    # Print a descriptive message indicating the split being processed
    print(f"First example from {split}:")
    # Print the first example from the current split
    print(dataset[split][0])
    # Print an empty line for clarity between examples
    print()

First example from train:
{'rating': 0.20000000298023224, 'content': 'The European Union includes how many ?'}

First example from validation:
{'rating': 1.0, 'content': 'Who discovered x-rays in 1885 ?'}

First example from test:
{'rating': 0.4000000059604645, 'content': 'Interesting facts about Egypt ?'}



In [5]:
def analyze_query_statistics(dataset, subset_name):
    """
    Retrieves and prints statistics for a specified segment of a dataset.
    
    Args:
    dataset (Dataset): The dataset object which contains subsets like 'train', 'validation', or 'test'.
    subset_name (str): The key for the subset to analyze, e.g., 'train'.

    Outputs:
    Prints statistics about the number of queries, the length of queries, and the distribution of ratings.
    """
    # Access the specified subset of the dataset
    subset = dataset[subset_name]

    # Print the total number of queries in the subset
    print(f"Total queries in {subset_name} subset:", subset.num_rows)
    
    # Determine and display the longest and shortest query
    max_query_length = max(len(query['content']) for query in subset)
    min_query_length = min(len(query['content']) for query in subset)
    print(f"Longest query in {subset_name} subset has {max_query_length} characters")
    print(f"Shortest query in {subset_name} subset has {min_query_length} characters")
    
    # Identify and display unique rating values
    unique_ratings = set(subset['rating'])
    print(f"Unique ratings in {subset_name} subset:", unique_ratings)
    
    # Calculate and display the frequency and percentage distribution of ratings
    print("Rating distribution in the subset:")
    rating_frequencies = {rating: sum(1 for item in subset['rating'] if item == rating) for rating in unique_ratings}
    total_queries = subset.num_rows
    rating_percentages = {rating: (count / total_queries * 100) for rating, count in rating_frequencies.items()}
    
    for rating, percentage in rating_percentages.items():
        print(f"- Rating {rating}: {round(percentage, 2)}%")


In [6]:
# obtain statstics for train subset
analyze_query_statistics(dataset=dataset, subset_name='train')

Total queries in train subset: 17500
Longest query in train subset has 200 characters
Shortest query in train subset has 10 characters
Unique ratings in train subset: {0.20000000298023224, 0.4000000059604645, 0.6000000238418579, 1.0, 0.800000011920929, 0.0, 0.1666666716337204, 0.8333333134651184, 0.6666666865348816, 0.5, 0.3333333432674408}
Rating distribution in the subset:
- Rating 0.20000000298023224: 15.88%
- Rating 0.4000000059604645: 11.89%
- Rating 0.6000000238418579: 11.44%
- Rating 1.0: 23.94%
- Rating 0.800000011920929: 14.73%
- Rating 0.0: 21.56%
- Rating 0.1666666716337204: 0.11%
- Rating 0.8333333134651184: 0.18%
- Rating 0.6666666865348816: 0.13%
- Rating 0.5: 0.07%
- Rating 0.3333333432674408: 0.07%


### Load Tokenizer and tokenize the dataset

In [7]:
# load the tokenizer
tokenizer = AutoTokenizer.from_pretrained('gpt2')

In [8]:
# set EOS (end of sentence) TOKEN as PAD TOKEN
tokenizer.pad_token = tokenizer.eos_token

In [9]:
def preprocess_data(entries):
    """
    Tokenizes text data from a dataset using a specified tokenizer.
    This function adjusts the length of each text entry by truncating longer texts
    and padding shorter ones to a uniform length, ensuring all sequences are of the same length.

    Args:
    entries (dict): A dictionary where the text data is stored under the 'content' key.

    Returns:
    dict: Contains the tokenized text with padding and truncation applied.
    """
    # Tokenize the text data, ensuring all sequences are of the same length
    return tokenizer(entries['content'], padding='max_length', truncation=True, max_length=128)


In [15]:
# Assuming 'dataset' is a dictionary of datasets for each split and 'splits' is a list of these split names
tokenized_datasets = {}
for segment in splits:
    # Tokenize each part of the dataset using the defined preprocessing function.
    # The 'map' function applies this preprocessing in batches for improved performance.
    tokenized_datasets[segment] = dataset[segment].map(preprocess_data, batched=True)

Map:   0%|          | 0/17500 [00:00<?, ? examples/s]

Map:   0%|          | 0/3750 [00:00<?, ? examples/s]

Map:   0%|          | 0/3850 [00:00<?, ? examples/s]

In [16]:
# Check if the tokenized data contains the 'content' key and adapt accordingly
if 'content' in tokenized_datasets['train'][0]:
    print("Original text:")
    print(tokenized_datasets['train'][0]['content'])

# Always available after tokenization as this is what the tokenizer produces
print("Tokenized input IDs:")
print(tokenized_datasets['train'][0]['input_ids'])

Original text:
The European Union includes how many ?
Tokenized input IDs:
[464, 3427, 4479, 3407, 703, 867, 5633, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256]


In [17]:
# Ensure you're using the correct variable and key to access the data
if 'content' in tokenized_datasets['validation'][0]:
    print("Original text from validation set:")
    print(tokenized_datasets['validation'][0]['content'])

print("Tokenized input IDs from validation set:")
print(tokenized_datasets['validation'][0]['input_ids'])

Original text from validation set:
Who discovered x-rays in 1885 ?
Tokenized input IDs from validation set:
[8241, 5071, 2124, 12, 20477, 287, 1248, 5332, 5633, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256]


In [18]:
def add_binary_labels(example):
    """
    Adds binary labels to examples based on a rating threshold.

    Args:
    example (dict): A dictionary containing the example data, including the 'rating'.

    Returns:
    dict: The example with an additional 'labels' field indicating the binary label.
    """
    # Assuming the rating is under 'rating' key
    example['labels'] = 1 if example['rating'] > 0.5 else 0
    return example

# Apply this transformation to each split in your dataset
dataset = {split: ds.map(add_binary_labels) for split, ds in dataset.items()}


Map:   0%|          | 0/17500 [00:00<?, ? examples/s]

Map:   0%|          | 0/3750 [00:00<?, ? examples/s]

Map:   0%|          | 0/3850 [00:00<?, ? examples/s]

In [19]:
from transformers import AutoModelForSequenceClassification

# Load GPT-2 pre-trained model configured for binary classification
model = AutoModelForSequenceClassification.from_pretrained('gpt2', num_labels=2,
                                                           id2label={0: 'NEGATIVE', 1: 'POSITIVE'},
                                                           label2id={'NEGATIVE': 0, 'POSITIVE': 1})


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [20]:
# Set the EOS token as the padding token
tokenizer.pad_token = tokenizer.eos_token

In [21]:
# Set the model's pad token id to match the tokenizer's pad token id
model.config.pad_token_id = tokenizer.pad_token_id

In [22]:
# Freeze all the parameters of the base model
for param in model.base_model.parameters():
    param.requires_grad = False

In [23]:
# check model architecture
print(model)

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)


In [24]:
# the base model for our specific task (sentiment analysis in this case).
model.score

Linear(in_features=768, out_features=2, bias=False)

In [25]:
# Import necessary libraries
from transformers import Trainer, TrainingArguments, DataCollatorWithPadding
import numpy as np

def calculate_accuracy(eval_pred):
    """
    Calculate the accuracy of a model's predictions. This function is intended to be used as a metric for the Hugging Face Trainer.

    Args:
    eval_pred (tuple): A tuple containing model predictions and true labels. Predictions are typically provided as logits.

    Returns:
    dict: A dictionary containing the computed mean accuracy of the model.
    """
    # Unpack the tuple into predictions and actual labels
    logits, true_labels = eval_pred

    # Convert logits to predicted class labels
    predicted_labels = np.argmax(logits, axis=1)

    # Compute the accuracy: proportion of correct predictions
    accuracy = np.mean(predicted_labels == true_labels)

    # Return the accuracy as a dictionary
    return {"accuracy": accuracy}

In [27]:
# Define training arguments
training_args = TrainingArguments(
    output_dir="./model_output",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=1,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True
)

In [28]:
# Initialize the Trainer with the specified configuration
pretrain_trainer = Trainer(
    model=model,  # The pre-trained model to be used for training
    args=training_args,  # Training arguments like learning rate, batch size, etc.
    train_dataset=tokenized_datasets['train'],  # Training dataset
    eval_dataset=tokenized_datasets['validation'],  # Evaluation dataset
    tokenizer=tokenizer,  # Tokenizer for tokenizing inputs
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),  # Data collator for padding sequences
    compute_metrics=calculate_accuracy,  # Function to compute evaluation metrics
)


dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [29]:
# Evaluate the model on the validation set before fine-tuning
pretrain_results = pretrain_trainer.evaluate()

# Print the evaluation results before fine-tuning
print("Evaluation results before fine-tuning:", pretrain_results)


Evaluation results before fine-tuning: {'eval_loss': 1.6033490896224976, 'eval_accuracy': 0.5058666666666667, 'eval_runtime': 22.4814, 'eval_samples_per_second': 166.804, 'eval_steps_per_second': 10.453}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [30]:
%pip install peft

Note: you may need to restart the kernel to use updated packages.


In [33]:
# Create a PEFT Config for LoRA
from peft import LoraConfig, get_peft_model, TaskType

In [34]:
config = LoraConfig(
    r=8,  # Rank
    lora_alpha=32,
    target_modules=['c_attn', 'c_proj'],
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.SEQ_CLS
)

In [35]:
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()

trainable params: 812,544 || all params: 125,253,888 || trainable%: 0.6487




In [37]:
# Define training arguments specifying various parameters
training_args = TrainingArguments(
    output_dir="./lora_model_output",  # Directory to save model outputs
    learning_rate=2e-5,  # Learning rate for optimization
    per_device_train_batch_size=32,  # Batch size for training per device
    per_device_eval_batch_size=32,  # Batch size for evaluation per device
    num_train_epochs=10,  # Number of training epochs
    weight_decay=0.01,  # Weight decay for regularization
    evaluation_strategy="epoch",  # Evaluation is performed at the end of each epoch
    save_strategy="epoch",  # Model is saved at the end of each epoch
    load_best_model_at_end=True,  # Load the best model at the end of training
    logging_dir='./logs',  # Directory for logging metrics and/or losses during training
)


In [38]:
# Initialize the Trainer for training the model
trainer = Trainer(
    model=peft_model,  # The model to be trained
    args=training_args,  # Training arguments
    train_dataset=tokenized_datasets["train"],  # Training dataset
    eval_dataset=tokenized_datasets["validation"],  # Evaluation dataset
    tokenizer=tokenizer,  # Tokenizer for encoding the data
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),  # Data collator for padding the batches
    compute_metrics=calculate_accuracy,  # Function to compute evaluation metrics
)


In [39]:
# Start the training process
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.6915,0.661444,0.6464
2,0.5852,0.566611,0.691467
3,0.556,0.565562,0.705867
4,0.5408,0.554272,0.710133
5,0.5238,0.530185,0.7264
6,0.5238,0.53621,0.723467
7,0.5094,0.508826,0.741333
8,0.512,0.534256,0.731733
9,0.5008,0.530395,0.733867
10,0.5013,0.526986,0.734933


TrainOutput(global_step=5470, training_loss=0.540597933084045, metrics={'train_runtime': 8051.4765, 'train_samples_per_second': 21.735, 'train_steps_per_second': 0.679, 'total_flos': 1.1540938752e+16, 'train_loss': 0.540597933084045, 'epoch': 10.0})

### Training Summary

**Global Steps:** 5470  
**Total Training Loss:** 0.540597933084045

**Metrics:**
- **Training Runtime:** 8051.4765 seconds
- **Training Samples Per Second:** 21.735
- **Training Steps Per Second:** 0.679
- **Total FLOPs:** 1.1540938752e+16

The training process indicates a consistent improvement in validation accuracy and a steady decrease in both training and validation loss over the epochs, demonstrating effective learning and generalization.
lization.

In [40]:
# Save fine tuned PEFT model
peft_model.save_pretrained("gpt-lora")

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [41]:
import torch
from peft import AutoPeftModelForSequenceClassification

NUM_LABELS = 2
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

lora_model = AutoPeftModelForSequenceClassification.from_pretrained("gpt-lora", num_labels=NUM_LABELS, ignore_mismatched_sizes=True).to(device)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [42]:
# Set the model's pad token id to match the tokenizer's pad token id
lora_model.config.pad_token_id = tokenizer.pad_token_id

In [43]:
# Define training arguments for the HuggingFace Trainer
training_args = TrainingArguments(
    output_dir="./data/sentiment_analysis",  # Directory to save model outputs
    learning_rate=2e-5,  # Learning rate for the optimizer
    per_device_train_batch_size=32,  # Batch size for training per device
    per_device_eval_batch_size=32,  # Batch size for evaluation per device
    num_train_epochs=10,  # Number of training epochs
    weight_decay=0.01,  # Weight decay for regularization
    evaluation_strategy="epoch",  # Evaluation is performed at the end of each epoch
    save_strategy="epoch",  # Save model at the end of each epoch
    load_best_model_at_end=True,  # Load the best model at the end of training
)

# Initialize the Trainer for fine-tuning the model
finetuned_trainer = Trainer(
    model=lora_model,  # The fine-tuned PEFT model
    args=training_args,  # Training arguments
    train_dataset=tokenized_datasets["train"],  # Training dataset
    eval_dataset=tokenized_datasets["validation"],  # Evaluation dataset
    tokenizer=tokenizer,  # Tokenizer for encoding the data
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),  # Data collator for padding the batches
    compute_metrics=calculate_accuracy,  # Function to compute evaluation metrics
)


dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [44]:
# Evaluate the fine-tuned model on the validation set
finetuned_results = finetuned_trainer.evaluate()

# Print the evaluation results for the fine-tuned model
print("Evaluation results for the fine-tuned model:", finetuned_results)

Evaluation results for the fine-tuned model: {'eval_loss': 0.5088258385658264, 'eval_accuracy': 0.7413333333333333, 'eval_runtime': 31.4576, 'eval_samples_per_second': 119.208, 'eval_steps_per_second': 3.751}


In [45]:
# Function to perform inference on a single example
def perform_inference(text):
    # Preprocess the input text
    inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=128)
    inputs = {k: v.to(device) for k, v in inputs.items()}  # Move inputs to the same device as the model

    # Perform inference
    with torch.no_grad():
        outputs = model(**inputs)

    # Get the predicted class
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=1).item()

    # Map the predicted class to the corresponding label
    id2label = {0: "NEGATIVE", 1: "POSITIVE"}
    predicted_label = id2label[predicted_class]

    return predicted_label
rn predicted_label

In [48]:
# Sample text for inference
sample_text = "The European Union includes how many countries?"
sample_text = "Interesting facts about Egypt ?"

In [49]:
# Perform inference on the sample text
predicted_label = perform_inference(sample_text)
print(f"Predicted label: {predicted_label}")

Predicted label: NEGATIVE


## Advanced Option : Applying quantization-aware training (QAT)

Quantization can help in reducing the model size and speeding up inference. Below is an advanced script that incorporates quantization for a LoRA fine-tuned model.

In [52]:
# Save the fine-tuned model
peft_model.save_pretrained("./lora_model_output")

In [53]:
# Load the fine-tuned model for quantization
model = AutoModelForSequenceClassification.from_pretrained("./lora_model_output")
model.config.pad_token_id = tokenizer.pad_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [55]:
# Set the model to training mode
model.train()

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): lora.Linear(
            (base_layer): Conv1D()
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.1, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=768, out_features=8, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=8, out_features=2304, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
          )
          (c_proj): lora.Linear(
            (base_layer): Conv1D()
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.1

In [56]:
# Prepare for quantization
model = torch.quantization.prepare_qat(model)



In [57]:
# Fine-tune the model with quantization-aware training
training_args.num_train_epochs = 1  # Fine-tune for one more epoch with QAT

In [60]:
# Initialize the Trainer
trainer = Trainer(
    model=peft_model,  # The PEFT model for training.
    args=training_args,  # Training arguments, defined previously.
    train_dataset=tokenized_datasets["train"],  # Tokenized training dataset.
    eval_dataset=tokenized_datasets["validation"],  # Tokenized validation dataset.
    tokenizer=tokenizer,  # The tokenizer used for encoding the data.
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),  # Data collator for padding sequences.
    compute_metrics=calculate_accuracy,  # Function to compute evaluation metrics.
)


In [61]:
# Train the model with QAT
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.5052,0.533796,0.737067


TrainOutput(global_step=547, training_loss=0.5042334492071455, metrics={'train_runtime': 1993.4159, 'train_samples_per_second': 8.779, 'train_steps_per_second': 0.274, 'total_flos': 1154093875200000.0, 'train_loss': 0.5042334492071455, 'epoch': 1.0})

In [62]:
# Convert to quantized model
quantized_model = torch.quantization.convert(model.eval())

In [63]:
# Save the quantized model
torch.save(quantized_model.state_dict(), "./quantized_model.pth")

In [64]:
# Function to perform inference on a single example using the quantized model
def perform_inference(text, quantized_model):
    # Preprocess the input text
    inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=128)
    inputs = {k: v.to(device) for k, v in inputs.items()}  # Move inputs to the same device as the model

    # Perform inference
    with torch.no_grad():
        outputs = quantized_model(**inputs)

    # Get the predicted class
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=1).item()

    # Map the predicted class to the corresponding label
    id2label = {0: "NEGATIVE", 1: "POSITIVE"}
    predicted_label = id2label[predicted_class]

    return predicted_label

In [65]:
# Load the quantized model for inference
quantized_model.load_state_dict(torch.load("./quantized_model.pth"))
quantized_model.to(device)

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): lora.Linear(
            (base_layer): Conv1D()
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.1, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=768, out_features=8, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=8, out_features=2304, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
          )
          (c_proj): lora.Linear(
            (base_layer): Conv1D()
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.1

In [66]:
# Sample text for inference
sample_text = "The European Union includes how many countries?"

In [67]:
# Perform inference on the sample text
predicted_label = perform_inference(sample_text, quantized_model)
print(f"Predicted label: {predicted_label}")

Predicted label: POSITIVE
