<a href="https://colab.research.google.com/github/SURESHBEEKHANI/Advanced-LLM-Fine-Tuning/blob/main/Mistral_7B_Finetuning_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# - accelerate: A library to optimize and accelerate deep learning training and inference, especially on distributed systems.
#   It simplifies the use of mixed-precision training, model parallelism, and multi-GPU systems.

# - peft: A library for Parameter-Efficient Fine-Tuning (PEFT) of large language models.
#   It provides tools to fine-tune pre-trained models with fewer parameters, saving memory and computational resources.

# - bitsandbytes: A lightweight library for 8-bit optimizers and memory-efficient GPU computation.
#   This library helps reduce memory usage during training by using low-precision (8-bit) computations, making large models more feasible on limited hardware.

# - trl: The Transformers Reinforcement Learning library for fine-tuning models with reinforcement learning techniques.
#   It extends the Hugging Face Transformers library by providing methods for training models using reinforcement learning (RL).

# - py7zr: A library to handle .7z archive files, used for extracting compressed data.
#   It allows extracting and managing `.7z` archives, often used to store large datasets or model weights in a compressed format.

# - auto-gptq: A library for efficient quantization of models for lower hardware requirements.
#   This library facilitates the quantization of large models, reducing their size and computational requirements, making them suitable for deployment on edge devices.

# - optimum: A library providing performance optimizations for models, especially for Hugging Face models on specific hardware.
#   It offers various optimizations to enhance model inference speed, especially on hardware like GPUs or TPUs.

# - transformers: A popular library by Hugging Face for working with pre-trained transformer models.
#   It provides easy access to a wide range of state-of-the-art transformer models for natural language processing tasks.

# Install all these libraries in one command:
!pip install accelerate peft bitsandbytes trl py7zr auto-gptq optimum transformers

In [None]:
!pip install git+https://github.com/huggingface/transformers

In [None]:
# Importing the PyTorch library, which provides tools for tensor operations and deep learning models.
import torch

# Importing the `notebook_login` function from huggingface_hub to facilitate authentication with the Hugging Face Hub.
# This allows you to upload or access models and datasets stored on the Hugging Face platform.
from huggingface_hub import notebook_login

# Importing the `load_dataset` function from the `datasets` library to load datasets from the Hugging Face Hub or local files.
# The `Dataset` class is also imported to work with datasets once they are loaded.
from datasets import load_dataset, Dataset

# Importing the `prepare_model_for_kbit_training` function from the PEFT (Parameter-Efficient Fine-Tuning) library.
# This function prepares models for efficient fine-tuning with lower precision (k-bit) to reduce memory usage and computational cost.
from peft import prepare_model_for_kbit_training

# Importing the `LoraConfig` class from the PEFT library.
# This class is used for configuring the LoRA (Low-Rank Adaptation) technique, which is an efficient method for fine-tuning large models with fewer parameters.
from peft import LoraConfig

# Importing the `get_peft_model` function from the PEFT library.
# This function helps create a parameter-efficient fine-tuning model by applying techniques like LoRA.
from peft import get_peft_model

# Importing the `SFTTrainer` class from the trl (Transformers Reinforcement Learning) library.
# `SFTTrainer` is used to fine-tune transformer models with supervised fine-tuning techniques, integrating reinforcement learning for better performance.
from trl import SFTTrainer

# Importing the `TrainingArguments` class from the Hugging Face `transformers` library.
# This class is used to configure the settings for training transformer models, such as batch size, learning rate, and logging options.
from transformers import TrainingArguments

In [None]:
notebook_login()

####  **Loading the SAMSum Dataset**

In this section, we load the **SAMSum dataset** using the Hugging Face `datasets` library. The SAMSum dataset is designed for dialogue summarization tasks, where the goal is to summarize conversations into concise summaries.

In [None]:
# Load the SAMSum dataset using the Hugging Face `datasets` library.
# "samsum" is the name of the dataset, which is designed for dialogue summarization tasks.
# The "split" parameter specifies the portion of the dataset to load.
# In this case, "train" indicates we are loading the training data.
df = load_dataset("samsum", split="train")

In [None]:
# Convert the loaded dataset (Hugging Face Dataset object) to a Pandas DataFrame.
# The `to_pandas()` method transforms the dataset into a DataFrame,
# which is a tabular data structure commonly used for data analysis in Python.
# This allows us to leverage Pandas' powerful data manipulation and analysis capabilities.
data_df = df.to_pandas()

In [None]:
# Check the shape of the DataFrame
# The `shape` attribute returns a tuple representing the dimensions of the DataFrame:
# - The first value is the number of rows (examples in the dataset).
# - The second value is the number of columns (features in the dataset).
data_df.shape

In [None]:
# Randomly sample 7361 rows from the 'data_df' DataFrame and assign it to the 'data' variable.
# This operation selects a random subset of the rows without replacement.
data = data_df.sample(7361)

In [None]:
# Create a new column 'text' in the 'data' DataFrame.
data["text"] = data[["dialogue", "summary"]].apply(
    lambda x:
    # Define a custom string format where:
    # - "###Human:" introduces the dialogue for summarization.
    # - "###Assistant:" provides the summary.
    "###Human: Summarize this following dialogue: " + x["dialogue"] + "\n###Assistant: " + x["summary"],
    axis=1  # Ensure the lambda function operates row-wise across the DataFrame.
)

In [None]:
# Get the shape (number of rows and columns) of the 'data' DataFrame.
# 'data.shape' returns a tuple: (number_of_rows, number_of_columns).
data.shape

In [None]:
# Convert the 'data' DataFrame (pandas) into a Hugging Face 'Dataset' object.
# This is useful for working with datasets in machine learning workflows,
# as the 'Dataset' object provides additional functionality and compatibility with Hugging Face tools.
data = Dataset.from_pandas(data)

In [None]:
data

### **Loading the Quantized Model (GPTQ) and Tokenizer for Preprocessing**

we will load a quantized version of the `"TheBloke/Mistral-7B-Instruct-v0.1-GPTQ"` model from Hugging Face, along with its corresponding tokenizer for preprocessing.

In [None]:
# Import the necessary modules from the Hugging Face transformers library:
from transformers import (
    # AutoModelForCausalLM: Automatically loads a pre-trained model for causal language modeling (e.g., GPT models).
    AutoModelForCausalLM,

    # AutoTokenizer: Automatically loads the appropriate tokenizer corresponding to the model architecture.
    AutoTokenizer,

    # GPTQConfig: Loads configuration settings for the GPTQ (quantized) model.
    GPTQConfig,

    # TrainingArguments: Provides arguments for configuring model training, such as learning rate, batch size, and more.
    TrainingArguments
)


In [None]:
# Load the tokenizer for the specified pre-trained model ('TheBloke/Mistral-7B-Instruct-v0.1-GPTQ')
# using the AutoTokenizer class from Hugging Face.
# The tokenizer is used to process and encode input text before feeding it into the model.
tokenizer = AutoTokenizer.from_pretrained("TheBloke/Mistral-7B-Instruct-v0.1-GPTQ")

In [None]:
tokenizer.eos_token

In [None]:
tokenizer.pad_token=tokenizer.eos_token

In [None]:
# Creating a configuration object for GPTQ (a method for quantization of models)
quantization_config_loading = GPTQConfig(
    # Set the number of bits for quantization to 4
    bits=4,

    # Disable exllama, which is an efficient transformer implementation, during quantization
    disable_exllama=True,

    # Pass the tokenizer that will be used for tokenizing input data
    tokenizer=tokenizer
)


In [None]:
# Load a pretrained causal language model using the AutoModelForCausalLM class from Hugging Face's transformers library
model = AutoModelForCausalLM.from_pretrained(
    # Specify the model identifier for the pretrained model (Mistral-7B-Instruct-v0.1-GPTQ)
    "TheBloke/Mistral-7B-Instruct-v0.1-GPTQ",

    # Pass the previously defined quantization configuration (which includes 4-bit quantization and tokenizer)
    quantization_config=quantization_config_loading,

    # Automatically determine the optimal device (CPU or GPU) for model loading and computation
    device_map="auto"
)

In [None]:
model

In [None]:
# Disable the cache mechanism during model inference to prevent storing activations
# This can help reduce memory usage at the cost of potentially slower performance
model.config.use_cache = False

# Set the pretraining tensor parallelism (pretraining_tp) value to 1
# This controls the degree of parallelism used during model training for distributed training.
# Setting it to 1 means no tensor parallelism will be used (it could affect training speed and efficiency in multi-GPU setups)
model.config.pretraining_tp = 1

# Enable gradient checkpointing for the model
# Gradient checkpointing reduces memory usage during backpropagation by saving intermediate activations
# and recomputing them during the backward pass instead of storing them in memory
model.gradient_checkpointing_enable()

### **Fine-Tuning Techniques for Pretrained Models**

In [None]:
# Importing necessary functions and classes from the PEFT library (Parameter-Efficient Fine-Tuning)
from peft import prepare_model_for_kbit_training  # Function to prepare a model for training with low-bit precision (e.g., 4-bit or 8-bit training)
from peft import LoraConfig  # Configuration class for LoRA (Low-Rank Adaptation) to define hyperparameters for PEFT methods
from peft import get_peft_model  # Function to apply PEFT methods (e.g., LoRA) to a model for efficient fine-tuning

# Importing the SFTTrainer class from the TRL (Transformer Reinforcement Learning) library
from trl import SFTTrainer  # Trainer class for supervised fine-tuning, which helps fine-tune language models efficiently

# Importing TrainingArguments from Hugging Face's transformers library for setting training configurations
from transformers import TrainingArguments  # Class to define training parameters such as learning rate, batch size, epochs, etc.


In [None]:
model = prepare_model_for_kbit_training(model)

In [None]:
# Initializing the LoraConfig class for Low-Rank Adaptation (LoRA) with specific hyperparameters
peft_config = LoraConfig(
    r=16,                    # The rank of the low-rank matrices. A higher rank allows the model to capture more complex patterns. Typically, values between 8 and 32 work well.
    lora_alpha=16,           # Scaling factor for the LoRA matrices. It controls the effect of the low-rank adaptation on the model’s weights.
    lora_dropout=0.05,       # Dropout rate applied during training to the LoRA layers. Helps prevent overfitting by randomly dropping units from the layer.
    bias="none",             # Specifies whether biases are included in the LoRA layers. "none" means no bias terms will be added to the LoRA modules.
    task_type="CAUSAL_LM",   # Defines the task type for which the model is fine-tuned. "CAUSAL_LM" refers to causal language modeling (used in autoregressive tasks like text generation).
    target_modules=["q_proj", "v_proj"]  # Specifies which parts of the model will have LoRA applied. Here, it targets the query (`q_proj`) and value (`v_proj`) projection layers in the attention mechanism of a transformer model.
)

In [None]:
# Apply the PEFT (Parameter-Efficient Fine-Tuning) method to the model using the defined LoRA configuration.
model = get_peft_model(model, peft_config)

In [None]:
# Define the training arguments for the fine-tuning process using Hugging Face's TrainingArguments class.
training_arguments = TrainingArguments(
    output_dir="mistral-finetuned-samsum",  # Directory where the model checkpoints and outputs will be saved.
    per_device_train_batch_size=8,          # Batch size for training on each device (e.g., GPU). In this case, 8 samples per batch.
    gradient_accumulation_steps=1,          # Number of steps to accumulate gradients before performing a backward pass. In this case, no gradient accumulation.
    optim="paged_adamw_32bit",              # Optimizer to be used for training. Here, "paged_adamw_32bit" is an AdamW variant optimized for memory usage.
    learning_rate=2e-4,                     # The learning rate for the optimizer. In this case, it is set to 0.0002.
    lr_scheduler_type="cosine",             # The type of learning rate scheduler. "cosine" means the learning rate will follow a cosine decay during training.
    save_strategy="epoch",                  # The strategy to save model checkpoints. "epoch" means the model will be saved after each epoch.
    logging_steps=100,                      # Number of steps between each logging event. Here, it logs every 100 steps during training.
    num_train_epochs=1,                     # The total number of training epochs. In this case, the model will be trained for 1 epoch.
    max_steps=250,                          # Maximum number of training steps. This can be used to limit the total number of updates if you don't want to train for the full number of epochs.
    fp16=True,                              # Enable mixed-precision training (float16) to speed up training and reduce memory usage, particularly on compatible GPUs.
)

In [None]:
# Initialize the SFTTrainer (Supervised Fine-Tuning Trainer) for fine-tuning the model with the specified configuration.
trainer = SFTTrainer(
    model=model,                       # The model to be fine-tuned, which has been adapted with PEFT (e.g., LoRA).
    train_dataset=data,                 # The dataset used for training. This dataset contains the data that the model will learn from.
    peft_config=peft_config,            # The PEFT configuration (e.g., LoRA) that specifies the fine-tuning method and parameters.
    dataset_text_field="text",          # The field in the dataset that contains the text data (typically "text" or "input_ids").
    args=training_arguments,            # The training arguments that define the training process (batch size, learning rate, etc.).
    tokenizer=tokenizer,                # The tokenizer used to convert text into input format for the model and vice versa.
    packing=False,                      # Whether to pack the sequences into a fixed length. Setting it to False means no packing.
    max_seq_length=512                  # The maximum length of the input sequences. Sequences longer than this will be truncated.
)