# Fine-Tuning Falcon 1.3B Model for Binary Classification
In this Jupyter Notebook, we will guide you through the process of fine-tuning a 1.3B parameter Falcon model for a binary classification task using the GLUE MRPC dataset. Our objective is to demonstrate how to adapt a large pre-trained model for a specific NLP task, which in our case is to determine if two sentences are semantically equivalent.

## Learning Objectives:
1. Learn how to load and adapt a large pre-trained language model using PyTorch and the Transformers library.
2. Understand the process of creating a custom classification layer on top of the base model.
3. Explore dataset preprocessing and preparation for model training.
4. Implement the training and evaluation process for the model.
5. Save the fine-tuned model for future inference tasks.


# Importing Required Libraries
This cell imports essential libraries for our task. We include PyTorch for model building and training, the Transformers library for accessing pre-trained models and tokenizers, and the `datasets` library for loading the GLUE MRPC dataset.


In [19]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
from torch.optim import AdamW
from torch.nn.functional import cross_entropy
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModel
from datasets import load_dataset

# Loading the Model and Tokenizer
Here, we load the Falcon 1.3B model and its tokenizer. We also set the padding token to be the same as the EOS (end of sentence) token, which is a necessary configuration for some transformer models.


In [None]:
model_name = "tiiuae/falcon-rw-1b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
base_model = AutoModel.from_pretrained(model_name)

# Defining a Custom Classifier
In this cell, we define a custom classification layer, `CustomClassifier`, that adds a linear layer on top of the base Falcon model. This classifier will be responsible for binary classification (two labels). We also specify the forward pass operations for the model.


In [None]:
class CustomClassifier(nn.Module):
    def __init__(self, base_model, num_labels):
        super(CustomClassifier, self).__init__()
        self.base_model = base_model
        self.classifier = nn.Sequential(
            nn.Dropout(0.1),
            nn.Linear(self.base_model.config.hidden_size, num_labels)
        )

    def forward(self, input_ids, attention_mask):
        # Get the outputs from the base model
        outputs = self.base_model(input_ids, attention_mask=attention_mask)

        # Extract the last hidden state
        last_hidden_state = outputs.last_hidden_state

        # Use the hidden state of the first token (CLS token for BERT-like models)
        cls_token_state = last_hidden_state[:, 0, :]  # Shape: [batch_size, hidden_size]

        # Pass through the classifier
        logits = self.classifier(cls_token_state)
        return logits

num_labels = 2  # Adjust as per your task
model = CustomClassifier(base_model, num_labels)

# Dataset Preparation
We load the GLUE MRPC dataset and prepare a custom PyTorch Dataset, `GlueDataset`. This class handles tokenization and prepares the input data and labels for training and evaluation. We create two instances of this dataset for training and validation purposes.


In [None]:
raw_datasets = load_dataset("glue", "mrpc")  # Example dataset

class GlueDataset(Dataset):
    def __init__(self, tokenizer, raw_datasets, max_length=128):
        self.tokenizer = tokenizer
        self.inputs = []
        self.labels = []

        for sentence1, sentence2, label in zip(raw_datasets['sentence1'], raw_datasets['sentence2'], raw_datasets['label']):
            tokenized_input = self.tokenizer(sentence1, sentence2, padding='max_length', truncation=True, max_length=max_length, return_tensors="pt")
            self.inputs.append(tokenized_input)
            self.labels.append(label)

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self, idx):
        item = {key: val.squeeze(0) for key, val in self.inputs[idx].items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

train_dataset = GlueDataset(tokenizer, raw_datasets['train'])
eval_dataset = GlueDataset(tokenizer, raw_datasets['validation'])


# Setting up DataLoaders
This cell creates DataLoaders for both the training and evaluation datasets. DataLoaders are used to efficiently load data in batches, with shuffling enabled for the training data to improve model generalization.


In [None]:
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
eval_loader = DataLoader(eval_dataset, batch_size=8)

# Configuring the Model and Optimizer
We set up the device (CPU or GPU) for training and move the model to the chosen device. Additionally, we initialize the optimizer, `AdamW`, which will update the model weights during training.


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
optimizer = AdamW(model.parameters(), lr=5e-5)

# Model Training Loop

This cell contains the training loop for the model:
- Iterates over epochs
- Performs forward and backward passes
- Updates model weights using the optimizer

In [None]:
num_epochs = 3
for epoch in range(num_epochs):
    model.train()
    for batch in train_loader:
        optimizer.zero_grad()
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)
        logits = model(input_ids, attention_mask)
        loss = cross_entropy(logits, labels)
        loss.backward()
        optimizer.step()

    print(f"Epoch {epoch+1} completed.")


# Model Evaluation

Here, the model is evaluated on the validation dataset. The model is set to evaluation mode `model.eval()` and we use our evaluation data loader to pass the eval data samples through the fine-tuned model. The accuracy is calculated and printed.

In [None]:
model.eval()
total_eval_accuracy = 0
for batch in eval_loader:
    with torch.no_grad():
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)
        logits = model(input_ids, attention_mask)
        predictions = torch.argmax(logits, dim=-1)
        total_eval_accuracy += torch.sum(predictions == labels).item()

avg_accuracy = total_eval_accuracy / len(eval_loader.dataset)
print(f"Accuracy on evaluation data: {avg_accuracy:.4f}")

# Saving the Model

Finally, in this cell, the trained model's state is saved to a file for future use or deployment.

In [None]:
torch.save(model.state_dict(), "./my_custom_model_falcon.pt")