<a href="https://colab.research.google.com/github/YashNigam65/gitfolder/blob/master/genAI_concept_notebook/fine_tunning_and_transfer_learning/transfer_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Implement transfer learning using an LLM that is used by all like Mistral or Claude with a small dataset for a chosen task.

In [10]:
import torch # Import the PyTorch library for deep learning operations
import torch.nn as nn # Import neural network modules from PyTorch
from transformers import (
    AutoTokenizer, # Tool to load pre-trained tokenizers for text processing
    AutoModel, # Tool to load pre-trained models for various tasks
    AutoModelForSequenceClassification, # Specific model for sequence classification tasks
    TrainingArguments, # Class to define training configurations
    Trainer, # High-level API for training models
    DataCollatorWithPadding # Helper to pad and batch input data
)

from datasets import Dataset # Import Dataset class from Hugging Face Datasets library for easy data handling
from sklearn.metrics import accuracy_score, precision_recall_fscore_support # Import metrics for evaluating model performance
import numpy as np # Import NumPy for numerical operations, especially for array manipulation

We are using **"distilbert-base-uncased"** model because it offers a great balance of speed, efficiency, and strong general performance for transfer learning in text classification, being a smaller, faster version of BERT.

**num_labels= 2**

The num_labels parameter specifies the number of output classes for your classification task. In this specific code, it's set to 2 because the sample sentiment analysis problem is a binary classification task. This means there are only two possible categories or sentiments: 'positive' (represented by 1) and 'negative' (represented by 0).

When AutoModelForSequenceClassification is loaded, this num_labels value tells the model how many output neurons or classes the final classification layer should have to make its predictions. For our sentiment analysis, it needs to distinguish between positive and negative, hence 2 labels.

In [29]:
# Sample data for sentiment analysis
sample_data = {
    'texts': [
        "I love this product, it's amazing!",
        "This is terrible, worst purchase ever.",
        "The weather is nice today.",
        "I hate waiting in long queues.",
        "This movie was fantastic and entertaining.",
        "The service was disappointing and slow.",
        "What a beautiful sunset tonight.",
        "I'm frustrated with this situation.",
        "This book is incredibly well written.",
        "The food was bland and overpriced.",
        "I enjoy spending time with friends.",
        "This software is buggy and unreliable.",
        "The concert was absolutely wonderful.",
        "I'm tired of all these problems.",
        "This vacation was the best ever!",
        "The meeting was boring and pointless.",
        "Absolutely thrilled with the results, truly impressed.",
        "Couldn't be more unhappy, a complete disaster.",
        "The coffee was strong and just what I needed.",
        "A truly dreadful experience from start to finish.",
        "Highly recommend this place, it's a gem!",
        "Such a waste of time and money, totally regret it.",
        "Fantastic performance, everyone should see it.",
        "Utterly useless, I can't believe I bought this.",
        "The customer support was exceptional and very helpful.",
        "Never again! This was the worst decision.",
        "Smiling from ear to ear, feeling so happy.",
        "Feeling incredibly down and disappointed today.",
        "The new features are brilliant and well-implemented.",
        "This app constantly crashes, very annoying.",
        "What a pleasant surprise, exceeded all expectations.",
        "Everything went wrong, a truly awful day."
    ],
    'labels': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0]
}  # 1: positive, 0: negative

class TransferLearningModel:
    """
    A class to handle transfer learning for text classification
    """

    def __init__(self, model_name="distilbert-base-uncased", num_labels=2):
        """
        Initialize the transfer learning model

        Args:
            model_name: Pre-trained model to use as base
            num_labels: Number of classes for the new task
        """
        self.model_name = model_name
        self.num_labels = num_labels
        self.tokenizer = None
        self.model = None
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Set device to GPU if available, else CPU


    def load_pretrained_model(self):
        """
        Load the pre-trained model and tokenizer
        """
        print(f"Loading pre-trained model: {self.model_name}")

        # Load tokenizer from the specified pre-trained model name
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)

        # Load model for sequence classification from the specified pre-trained model name
        # This automatically adds a classification head on top of the base model
        self.model = AutoModelForSequenceClassification.from_pretrained(
            self.model_name,
            num_labels=self.num_labels
        ).to(self.device) # Move the model to the specified device (CPU/GPU)

        print(f"Model loaded successfully!")
        print(f"Model has {self.model.num_parameters()} parameters")

    def prepare_dataset(self, texts, labels, max_length=128):
        """
        Prepare dataset for training/evaluation by tokenizing texts
        and creating a Hugging Face Dataset object
        """
        def tokenize_function(examples):
            # Tokenize a batch of examples, truncating and padding to max_length
            return self.tokenizer(
                examples['text'],
                truncation=True,
                padding=True,
                max_length=max_length,
                return_tensors="pt" # Return PyTorch tensors
            )

        # Create a Hugging Face Dataset from the provided texts and labels
        dataset = Dataset.from_dict({
            'text': texts,
            'labels': labels
        })

        # Apply the tokenization function to the dataset
        tokenized_dataset = dataset.map(tokenize_function, batched=True) # Process in batches for efficiency

        return tokenized_dataset

    def compute_metrics(self, eval_pred):
        """
        Compute metrics (accuracy, precision, recall, f1) for evaluation
        """
        predictions, labels = eval_pred
        predictions = np.argmax(predictions, axis=1) # Get the predicted class by taking the argmax of logits

        # Calculate precision, recall, f1-score, and accuracy
        precision, recall, f1, _ = precision_recall_fscore_support(labels, predictions, average='weighted')
        accuracy = accuracy_score(labels, predictions)

        return {
            'accuracy': accuracy,
            'f1': f1,
            'precision': precision,
            'recall': recall
        }

    def train(self, train_texts, train_labels, eval_texts=None, eval_labels=None):
        """
        Train the model using transfer learning with the Hugging Face Trainer API
        """
        print("Preparing training dataset...")
        train_dataset = self.prepare_dataset(train_texts, train_labels)

        eval_dataset = None
        if eval_texts and eval_labels:
            print("Preparing evaluation dataset...")
            eval_dataset = self.prepare_dataset(eval_texts, eval_labels)

        # Data collator for dynamic padding of batches during training
        data_collator = DataCollatorWithPadding(tokenizer=self.tokenizer)

        # Define training arguments
        training_args = TrainingArguments(
            output_dir="./transfer_learning_model", # Directory to save model checkpoints and logs
            num_train_epochs=100, # Total number of training epochs (increased for small dataset)
            per_device_train_batch_size=8, # Batch size per device during training
            per_device_eval_batch_size=8, # Batch size per device during evaluation
            warmup_steps=10, # Number of warmup steps for learning rate scheduler (reduced)
            weight_decay=0.01, # Strength of weight decay
            logging_dir="./logs", # Directory for storing logs
            logging_steps=10, # Log every N updates steps
            eval_strategy="epoch" if eval_dataset else "no", # Evaluate at the end of each epoch if eval_dataset is provided
            save_strategy="epoch", # Save checkpoint at the end of each epoch
            load_best_model_at_end=True if eval_dataset else False, # Load the best model (based on metric_for_best_model) at the end of training
            metric_for_best_model="accuracy" if eval_dataset else None, # Metric to use for early stopping/best model selection
            report_to=None,  # Disable reporting to platforms like wandb
        )

        # Initialize the Hugging Face Trainer
        trainer = Trainer(
            model=self.model,
            args=training_args,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,
            tokenizer=self.tokenizer,
            data_collator=data_collator,
            compute_metrics=self.compute_metrics if eval_dataset else None, # Function to compute metrics during evaluation
        )

        # Train the model
        print("Starting transfer learning training...")
        trainer.train()

        # Save the trained model and tokenizer
        trainer.save_model("./transfer_learning_model")
        self.tokenizer.save_pretrained("./transfer_learning_model")

        print("Transfer learning completed!")

        return trainer

    def predict(self, texts):
        """
        Make predictions on new texts using the trained model
        """
        if self.model is None:
            raise ValueError("Model not loaded. Call load_pretrained_model() first.")

        self.model.eval() # Set the model to evaluation mode
        predictions = []

        for text in texts:
            # Tokenize the input text
            inputs = self.tokenizer(
                text,
                return_tensors="pt", # Return PyTorch tensors
                truncation=True,
                padding=True,
                max_length=128
            )

            # Move inputs to the same device as the model (CPU/GPU)
            inputs = {k: v.to(self.device) for k, v in inputs.items()}

            # Make prediction without computing gradients
            with torch.no_grad():
                outputs = self.model(**inputs)
                logits = outputs.logits # Get the raw output scores
                predicted_class = torch.argmax(logits, dim=1).item() # Get the class with the highest score
                confidence = torch.softmax(logits, dim=1).max().item() # Get the confidence for the predicted class

                predictions.append({
                    'text': text,
                    'predicted_class': predicted_class,
                    'confidence': confidence
                })

        return predictions

In [30]:
# def demonstrate_different_approaches():
#     """
#     Demonstrate different transfer learning approaches: full fine-tuning and feature extraction.
#     """
#     print("=== APPROACH 1: Full Fine-tuning ===")
#     # Initialize and load a model for full fine-tuning (all layers are trainable)
#     model1 = TransferLearningModel("distilbert-base-uncased")
#     model1.load_pretrained_model()

#     # Split the sample data into training and testing sets
#     train_size = int(0.8 * len(sample_data['texts']))
#     train_texts = sample_data['texts'][:train_size]
#     train_labels = sample_data['labels'][:train_size]
#     test_texts = sample_data['texts'][train_size:]
#     test_labels = sample_data['labels'][train_size:]

#     # Train the first model (full fine-tuning)
#     trainer1 = model1.train(train_texts, train_labels, test_texts, test_labels)

#     print("\n=== APPROACH 2: Feature Extraction (Frozen Base) ===")
#     # Initialize and load a second model for feature extraction
#     model2 = TransferLearningModel("distilbert-base-uncased")
#     model2.load_pretrained_model()

#     # Freeze the parameters of the base DistilBERT model so only the classification head is trained
#     for param in model2.model.distilbert.parameters():
#         param.requires_grad = False

#     print("Frozen base model parameters. Only training classification head.")
#     # Train the second model (feature extraction)
#     trainer2 = model2.train(train_texts, train_labels, test_texts, test_labels)

#     return model1, model2 # Return both trained models

In [39]:
def demonstrate_different_approaches():
    """
    Demonstrate different transfer learning approaches: full fine-tuning and feature extraction.
    """
    # print("=== APPROACH 1: Full Fine-tuning ===")
    # # Initialize and load a model for full fine-tuning (all layers are trainable)
    # model1 = TransferLearningModel("distilbert-base-uncased")
    # model1.load_pretrained_model()

    # Split the sample data into training and testing sets
    train_size = int(0.8 * len(sample_data['texts']))
    train_texts = sample_data['texts'][:train_size]
    train_labels = sample_data['labels'][:train_size]
    test_texts = sample_data['texts'][train_size:]
    test_labels = sample_data['labels'][train_size:]

    # # Train the first model (full fine-tuning)
    # trainer1 = model1.train(train_texts, train_labels, test_texts, test_labels)

    print("\n=== APPROACH 1: Feature Extraction (Frozen Base) ===")
    # Initialize and load a second model for feature extraction
    model1 = TransferLearningModel("distilbert-base-uncased")
    model1.load_pretrained_model()

    # Freeze the parameters of the base DistilBERT model so only the classification head is trained
    for param in model1.model.distilbert.parameters():
        param.requires_grad = False

    print("Frozen base model parameters. Only training classification head.")
    # Train the second model (feature extraction)
    trainer1 = model1.train(train_texts, train_labels, test_texts, test_labels)

    return model1 # Return both trained models

In [40]:
def load_and_test_model(model_path="./transfer_learning_model"):
    """
    Load a previously saved model and tokenizer, then make predictions on new examples.
    """
    print(f"\n=== TESTING SAVED MODEL ===")

    # Load the saved model for sequence classification and its tokenizer
    model = AutoModelForSequenceClassification.from_pretrained(model_path)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    # Create an instance of TransferLearningModel and assign the loaded model and tokenizer
    transfer_model = TransferLearningModel()
    transfer_model.model = model.to(transfer_model.device) # Move the loaded model to the correct device (CPU/GPU)
    transfer_model.tokenizer = tokenizer

    # Define new examples to test the loaded model
    test_examples = [
        "This is absolutely wonderful!",
        "I'm really disappointed with this.",
        "The weather looks great today.",
        "This is the worst experience ever.",
        "I love how this works perfectly."
    ]

    # Get predictions from the loaded model
    predictions = transfer_model.predict(test_examples)

    print("Predictions on new examples:",predictions)
    # Print each prediction with its sentiment and confidence
    for pred in predictions:
        sentiment = "Positive" if pred['predicted_class'] == 1 else "Negative"
        print(f"Text: '{pred['text']}'")
        print(f"Predicted: {sentiment} (Confidence: {pred['confidence']:.3f})")
        print("-" * 50)

In [41]:
def main():
    """
    Main function to orchestrate the demonstration of transfer learning, including training and prediction.
    """
    print("Transfer Learning with Language Models")
    print("=" * 50)

    # Check and print the available device (GPU or CPU)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")

    try:
        # Demonstrate both full fine-tuning and feature extraction approaches
        #model1, model2 = demonstrate_different_approaches()
        model1= demonstrate_different_approaches()

        # Define new texts for testing predictions with the first trained model
        test_texts = [
            "This product is amazing!",
            "I hate this so much.",
            "Pretty decent overall.",
        ]

        print("\n=== MAKING PREDICTIONS ===")
        # Get predictions from the fully fine-tuned model
        predictions = model1.predict(test_texts)
        print("Predictions with full fine-tuning:", predictions)
        # Print the predictions with sentiment and confidence
        for pred in predictions:
            sentiment = "Positive" if pred['predicted_class'] ==1 else "Negative"
            print(f"Text: '{pred['text']}'")
            print(f"Prediction: {sentiment} (Confidence: {pred['confidence']:.3f})")
            print("-" * 30)

        # Test loading a saved model and making predictions with it
        load_and_test_model()

    except Exception as e:
        # Handle potential errors and provide instructions for installing dependencies
        print(f"An error occurred: {e}")
        print("\nMake sure you have the required packages:")
        print("pip install torch transformers datasets scikit-learn")

In [26]:
!pip install torch transformers datasets scikit-learn # Install necessary libraries: PyTorch, Hugging Face Transformers, Hugging Face Datasets, and scikit-learn



In [42]:
if __name__ == "__main__":
    main() # Call the main function to start the demonstration when the script is executed

Transfer Learning with Language Models
Using device: cuda

=== APPROACH 1: Feature Extraction (Frozen Base) ===
Loading pre-trained model: distilbert-base-uncased


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model loaded successfully!
Model has 66955010 parameters
Frozen base model parameters. Only training classification head.
Preparing training dataset...


Map:   0%|          | 0/25 [00:00<?, ? examples/s]

Preparing evaluation dataset...


Map:   0%|          | 0/7 [00:00<?, ? examples/s]

  trainer = Trainer(


Starting transfer learning training...


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.69797,0.285714,0.190476,0.142857,0.285714
2,No log,0.695795,0.428571,0.257143,0.183673,0.428571
3,0.682300,0.692769,0.428571,0.257143,0.183673,0.428571
4,0.682300,0.689941,0.428571,0.257143,0.183673,0.428571
5,0.665100,0.688895,0.428571,0.257143,0.183673,0.428571
6,0.665100,0.685833,0.428571,0.257143,0.183673,0.428571
7,0.665100,0.681208,0.428571,0.257143,0.183673,0.428571
8,0.661900,0.677769,0.428571,0.257143,0.183673,0.428571
9,0.661900,0.672919,0.428571,0.257143,0.183673,0.428571
10,0.616900,0.670441,0.428571,0.257143,0.183673,0.428571


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize

Transfer learning completed!

=== MAKING PREDICTIONS ===
Predictions with full fine-tuning: [{'text': 'This product is amazing!', 'predicted_class': 1, 'confidence': 0.6731581687927246}, {'text': 'I hate this so much.', 'predicted_class': 0, 'confidence': 0.5747971534729004}, {'text': 'Pretty decent overall.', 'predicted_class': 1, 'confidence': 0.5548656582832336}]
Text: 'This product is amazing!'
Prediction: Positive (Confidence: 0.673)
------------------------------
Text: 'I hate this so much.'
Prediction: Negative (Confidence: 0.575)
------------------------------
Text: 'Pretty decent overall.'
Prediction: Positive (Confidence: 0.555)
------------------------------

=== TESTING SAVED MODEL ===
Predictions on new examples: [{'text': 'This is absolutely wonderful!', 'predicted_class': 1, 'confidence': 0.6599050164222717}, {'text': "I'm really disappointed with this.", 'predicted_class': 0, 'confidence': 0.551493227481842}, {'text': 'The weather looks great today.', 'predicted_class':