<a href="https://colab.research.google.com/github/bmitch26/Airline-Sentiment-Analysis/blob/main/SentimentAnalysisLoRA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Parameter-Efficient Fine-Tuning (PEFT) of BERT Model with LoRA for Sentiment Analysis of Airline Reviews

The objective of this project is to fine-tune a pre-trained DistilBERT model using Low-Rank Adaptation (LoRA) techniques on a dataset of airline reviews. The primary goal is to classify the sentiment of the reviews (positive or negative) based on the review headers and overall ratings provided by passengers. By leveraging LoRA, the aim is to efficiently adapt the model with minimal additional parameters, specifically enhancing performance on a domain-specific sentiment analysis task (airline reviews). The project includes data preprocessing, model training, and evaluation to promote increased high accuracy and performance in predicting review sentiments.

Import Necessary Libraries

In [23]:
import pandas as pd
from transformers import (
    AutoTokenizer,
    AutoConfig,
    AutoModelForSequenceClassification,
    DataCollatorWithPadding,
    TrainingArguments,
    Trainer
)
from peft import PeftModel, PeftConfig, get_peft_model, LoraConfig
from datasets import Dataset, load_metric
import torch
import numpy as np

Load dataset with pandas in csv format

In [25]:
df = pd.read_csv("BA_AirlineReviews.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,OverallRating,ReviewHeader,Name,Datetime,VerifiedReview,ReviewBody,TypeOfTraveller,SeatType,Route,DateFlown,SeatComfort,CabinStaffService,GroundService,ValueForMoney,Recommended,Aircraft,Food&Beverages,InflightEntertainment,Wifi&Connectivity
0,0,1.0,"""Service level far worse then Ryanair""",L Keele,19th November 2023,True,4 Hours before takeoff we received a Mail stat...,Couple Leisure,Economy Class,London to Stuttgart,November 2023,1.0,1.0,1.0,1.0,no,,,,
1,1,3.0,"""do not upgrade members based on status""",Austin Jones,19th November 2023,True,I recently had a delay on British Airways from...,Business,Economy Class,Brussels to London,November 2023,2.0,3.0,1.0,2.0,no,A320,1.0,2.0,2.0
2,2,8.0,"""Flight was smooth and quick""",M A Collie,16th November 2023,False,"Boarded on time, but it took ages to get to th...",Couple Leisure,Business Class,London Heathrow to Dublin,November 2023,3.0,3.0,4.0,3.0,yes,A320,4.0,,
3,3,1.0,"""Absolutely hopeless airline""",Nigel Dean,16th November 2023,True,"5 days before the flight, we were advised by B...",Couple Leisure,Economy Class,London to Dublin,December 2022,3.0,3.0,1.0,1.0,no,,,,
4,4,1.0,"""Customer Service is non existent""",Gaylynne Simpson,14th November 2023,False,"We traveled to Lisbon for our dream vacation, ...",Couple Leisure,Economy Class,London to Lisbon,November 2023,1.0,1.0,1.0,1.0,no,,1.0,1.0,1.0


In [27]:
#creating a new feature 'label' to classify ReviewHeaders as 0 for negative
# ratings or 1 for positive ratings to feed into the model
df['label'] = df['OverallRating'].apply(lambda x: 1 if x >= 4 else 0)
#extract relevant columns. For simplicity, I am feeding the ReviewHeader
# label columns into the model.
df = df[['ReviewHeader', 'label']]
df.head()

Unnamed: 0,ReviewHeader,label
0,"""Service level far worse then Ryanair""",0
1,"""do not upgrade members based on status""",0
2,"""Flight was smooth and quick""",1
3,"""Absolutely hopeless airline""",0
4,"""Customer Service is non existent""",0


Split dataset into training and evaluation (80% train and 20% evaluation)

This allows the model to learn from (be trained on) the majority of data while reserving the remaining 20% to performance evaluation.


In [None]:
train_df = df.sample(frac=0.8, random_state=42)
eval_df = df.drop(train_df.index)

Initialize the tokenizer

The tokenizer preprocesses text data by converting the review headers into token IDs. By converting the text, it changes to a form (in numbers) that the model can understand.

In [None]:
model_checkpoint = 'distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

#tokenizing and truncating the data to cut off long sequences of text to help
# allow for rectangular tensors from batches of text with differing lengths.
def tokenize_function(example):
    return tokenizer(example['ReviewHeader'], truncation=True)

Convert pandas DataFrame to a Hugging Face Dataset format required by the Trainer API for compatibility

In [None]:
train_dataset = Dataset.from_pandas(train_df)
eval_dataset = Dataset.from_pandas(eval_df)

#creating the tokenized datasets by applying the tokenize_function to the
# training and evaluation datasets. This prepares the data to be inputted
# into the model
train_dataset = train_dataset.map(tokenize_function, batched=True)
eval_dataset = eval_dataset.map(tokenize_function, batched=True)

Define label maps

id2label and label2id mappings are set up to translate between the numerical labels and their corresponding string representations. This facilitates stronger model interpretation and output.

In [None]:
id2label = {0: "Negative", 1: "Positive"}
label2id = {"Negative": 0, "Positive": 1}

Load the model with LoRA configuration

The model is loaded with a pre-trained configuration and adjusted to include the LoRA setup. This optimizes the model for sequence classification by adding a smaller number of parameters to the model and freezing the rest. In turn, training the model is much less computationally expensive!

In [None]:
config = AutoConfig.from_pretrained(model_checkpoint, num_labels=2, id2label=id2label, label2id=label2id)
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, config=config)

LoRA configuration

This specifies the dimensions and parameters of LoRA. The aim is improve training efficiency and model adaptability.

In [None]:
peft_config = LoraConfig(
    r=8,  #dimension of the low-rank update matrices
    lora_alpha=32,  # scaling factor for the update. Similar to a learning rate
    lora_dropout=0.1,  # dropout probability for the update
    bias="none",  #which biases to apply (only 3 possible configurations: none, all, lora_only)
)

#integration of the LoRA configuration with the base model
model = get_peft_model(model, peft_config)

Training Arguments

These arguments specify the training configuration for the model. This includes the output directory, evaluation strategy, learning rate, batch sizes, # epochs (iterations), and weight decay.

These arguments can vary widely to achieve the best performance. Here I just use some standard values for simplicity.

In [None]:
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=6,
    weight_decay=0.01,
)

Data collator

The data collator dynamically pads the input sequences to the same length. this ensures efficient batching and processing during training and evaluation.

In [None]:
data_collator = DataCollatorWithPadding(tokenizer)

Define evaluation metrics

The are standard evaluation metrics for binary classification/sentiment analysis. It gives a measure of the model's performance on the test (evaluation) dataset.

In [None]:
accuracy_metric = load_metric("accuracy")
precision_metric = load_metric("precision")
recall_metric=load_metric("recall")
f1_metric= load_metric("f1")

#Computation/calculation of performance metrics
def compute_metrics(eval_pred):
  logits, labels = eval_pred
  predictions = np.argmax(logits, axis=-1)
  accuracy = accuracy_metric.compute(predictions=predictions, references=labels)["accuracy"]
  precision = precision_metric.compute(predictions=predictions, references=labels, average="binary")["precision"]
  recall = recall_metric.compute(predictions=predictions, references=labels, average="binary")["recall"]
  f1 = f1_metric.compute(predictions=predictions, references=labels, average="binary")["f1"]
  return {"accuracy": accuracy, "precision": precision, "recall": recall, "f1": f1}

Define Trainer

Configuring the Trainer with various arguments. This includes the model, training arguments, datasets, tokenizer, data collator, and compute metrics.

Finally, the model is actually trained and the results on the test set of the data is printed.

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)
#train the model
trainer.train()

#evaluate the model
eval_results = trainer.evaluate()

In [30]:
# Custom function to print results
def print_results(train_results, eval_results):
    print(f"Training Results:")
    print(f"Accuracy: {eval_results['eval_accuracy']:.2f}")
    print(f"Precision: {eval_results['eval_precision']:.2f}")
    print(f"Recall: {eval_results['eval_recall']:.2f}")
    print(f"F1-score: {eval_results['eval_f1']:.2f}\n")

    print("Evaluation Results:")
    print(f"Accuracy: {eval_results['eval_accuracy']:.2f}")
    print(f"Precision: {eval_results['eval_precision']:.2f}")
    print(f"Recall: {eval_results['eval_recall']:.2f}")
    print(f"F1-score: {eval_results['eval_f1']:.2f}\n")

# Print training evaluation results
print_results(eval_results, "Evaluation")

Training Results:
Accuracy: 0.96
Precision: 0.98
Recall: 0.98
F1-Score: 0.97

Evaluation Results:
Accuracy: 0.71
Precision: 0.73
Recall: 0.75
F1-Score: 0.71


Intrepretation of Current Results

The training results indicate that the model performs exceptionally well on the training dataset - however, the evaluation results show a significant drop in performance. The discrepancy clearly suggests that the model is overfitting, learning the training data patterns too well while failing to generalize to new, unseen data.

To mitigate this overfitting, I could further fine-tune the model's parameter arguments, such as reducing the number of epochs, adjusting the learning rate and weight decay among others. Additionally, I could implement/experiment with alternative approaches, such as prompt engineering, different adaptors, or transfer learning. However, my main goal in undertaking this project was not necessarily to achieve perfect results, but to learn through applied example how the process of fine-tuning LLM's towards a specific task or domain works!