## ⚙️ 1. Environment Setup

This first cell installs all the necessary libraries for fine-tuning a large language model using modern, memory-efficient techniques.

- **`accelerate`**: A Hugging Face library that simplifies running PyTorch training on any kind of distributed setup.
- **`bitsandbytes`**: A key library that enables model quantization, allowing us to load large models like Phi-2 in 8-bit precision, which drastically reduces GPU memory usage.
- **`trl`**: The Transformer Reinforcement Learning library, which provides useful tools and trainers for fine-tuning.
- **`peft`**: The Parameter-Efficient Fine-Tuning library. This is crucial for using techniques like LoRA (Low-Rank Adaptation), which we will use here.
- **`transformers` & `datasets`**: The core Hugging Face libraries for models, tokenizers, and data handling.

In [None]:
!pip install -q accelerate -U
!pip install -q bitsandbytes -U
!pip install -q trl -U
!pip install -q peft -U
!pip install -q transformers -U
!pip install -q datasets -U

## 📥 2. Data Loading and Preprocessing

We start by loading a dataset of Amazon product details using `pandas`. The initial data needs to be reshaped for our specific task.

1.  **Load Data**: We load the CSV, keeping only the `category`, `about_product`, and `product_name` columns.
2.  **Clean Category**: The `category` column contains a breadcrumb trail (e.g., 'Electronics|Computers|Laptops'). We clean this to keep only the most specific, final category.
3.  **Restructure for Tasks**: We want to train the model on two related tasks: generating a *product name* and generating a *product description*. We split the original DataFrame into two, rename the columns to a common name (`text`), and add a `task_type` column to each to identify the task. 
4.  **Combine**: We then concatenate the two DataFrames back into one, creating a unified dataset where each row is an example of a specific task.

In [None]:
import pandas as pd
from datasets import load_dataset, Dataset, DatasetDict

df = pd.read_csv('https://github.com/laxmimerit/All-CSV-ML-Data-Files-Download/raw/master/amazon_product_details.csv', usecols=['category', 'about_product', 'product_name'])


In [None]:
df['category'] = df['category'].apply(lambda x: x.split('|')[-1])

In [None]:
products = df[['category', 'product_name']]
description = df[['category', 'about_product']]

products = products.rename(columns={'product_name': 'text'})
description = description.rename(columns={'about_product': 'text'})

products['task_type'] = 'Product Name'
description['task_type'] = 'Product Description'


In [None]:
df = pd.concat([products, description], ignore_index=True)

## 📖 3. Creating the Hugging Face Dataset

With our data preprocessed in pandas, we convert it into a Hugging Face `Dataset` object. This format is optimized for use with the `transformers` library. We then shuffle the dataset for randomness and split it into a training set (90%) and a test set (10%) to evaluate our model's performance on unseen data.

In [None]:
dataset = Dataset.from_pandas(df)
dataset = dataset.shuffle(seed=0)
dataset = dataset.train_test_split(test_size=0.1)

In [None]:
dataset

In [None]:
dataset['test'][2]

## 📊 4. Exploratory Data Analysis (EDA)

A quick EDA step is performed to understand the length of our text fields. We estimate the number of tokens for both product names and descriptions and plot histograms. This helps us confirm that product names are very short, while descriptions have a wider range of lengths. This information is useful for selecting a `max_length` for tokenization.

In [None]:
# 1.5 times of total words
products['text_tokens'] = products['text'].apply(lambda x: len(x.split())*1.5)
description['text_tokens'] = description['text'].apply(lambda x: len(x.split())*1.5)


In [None]:
products['text_tokens'].hist()

In [None]:
description['text_tokens'].hist()

## 📝 5. Creating an Instruction-Based Prompt Template

To effectively fine-tune the model, we need to structure our data into a clear prompt format. The `formatting_func` creates an instruction-based prompt for each example. It tells the model exactly what to do (e.g., "generate a 'Product Description'"), provides the necessary input (the `Category`), and then provides the expected output. This process, known as instruction fine-tuning, is a highly effective way to teach a model a new task.

In [None]:
def formatting_func(example):
    text = f"""
            Given the product category, you need to generate a '{example['task_type']}'.
            ### Category: {example['category']}\n ### {example['task_type']}: {example['text']}

            """
    return text

In [None]:
print(formatting_func(dataset['train'][0]))

## 🤖 6. Loading the Quantized Model and Tokenizer

Here, we load the pre-trained `microsoft/phi-2` model. The key parameter is `load_in_8bit=True`, which uses the `bitsandbytes` library to quantize the model's weights to 8-bit integers. This significantly reduces the GPU memory required to load the model.

We also load the tokenizer. For causal (decoder-only) models like Phi-2, it's important to set `padding_side='left'`. This ensures that padding tokens are added to the left of the sequence, which prevents the model from getting confused during generation. We also set the pad token to be the same as the end-of-sequence (EOS) token.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

base_model_id = "microsoft/phi-2"

model = AutoModelForCausalLM.from_pretrained(base_model_id, trust_remote_code=True,
                                             torch_dtype=torch.float16, load_in_8bit=True)


In [None]:
tokenizer = AutoTokenizer.from_pretrained(
    base_model_id,
    padding_side='left',
    add_eos_token=True,
    add_bos_token=True,
    use_fast=False
)

tokenizer.pad_token = tokenizer.eos_token


## ✍️ 7. Tokenizing the Dataset for Causal LM Fine-Tuning

We define a function to tokenize our formatted prompts. For fine-tuning a Causal Language Model, the goal is to predict the next token in a sequence. Therefore, the `labels` (what the model tries to predict) are simply a copy of the `input_ids`. The model reads the sequence token by token and learns to predict the subsequent token at each step.

This tokenization function is then applied to the entire dataset using `.map()`.

In [None]:
max_length = 400

def tokenize(prompt):
  result = tokenizer(
      formatting_func(prompt),
      truncation = True,
      max_length=max_length,
      padding = "max_length"
  )

  result['labels'] = result['input_ids'].copy()

  return result

In [None]:
print(tokenize(dataset['train'][0]))

In [None]:
dataset = dataset.map(tokenize)

## 🧐 8. Baseline Inference (Before Fine-Tuning)

Before we start training, we test the base model's ability to perform our task. We provide it with a prompt asking for a 'Product Description' for the 'BatteryChargers' category and see what it generates. This provides a crucial baseline, allowing us to clearly see the improvements after fine-tuning.

In [None]:
eval_prompt = """
Given the product category, you need to generate a 'Product Description'.
### Category: BatteryChargers
### Product Description:
"""


In [None]:
# tokenize -> generate -> decode

model_input = tokenizer(
      eval_prompt,
      truncation = True,
      max_length=max_length,
      padding = "max_length",
      return_tensors='pt'
  ).to("cuda")


In [None]:
model.eval()

with torch.no_grad():
  output = model.generate(**model_input, max_new_tokens=256,
                                           repetition_penalty=1.15)
  result = tokenizer.decode(output[0], skip_special_tokens=True)

  print(result)

## 🛠️ 9. QLoRA Configuration

This is the core of our memory-efficient fine-tuning strategy. Instead of training the entire model, we use QLoRA (Quantized Low-Rank Adaptation).

1.  **`LoraConfig`**: We define a LoRA configuration using the `peft` library. We specify which parts of the model we want to adapt (the `target_modules`) and set hyperparameters for the small, trainable adapter matrices (`r`, `lora_alpha`).
2.  **`get_peft_model`**: This function takes our quantized base model and the LoRA config, and injects the small, trainable adapter layers into the specified modules.
3.  **Parameter Check**: The `print_trainable_parameters` function shows the magic of LoRA. We can see that we have frozen the vast majority of the model's 2.8 billion parameters and are only training a tiny fraction (~1%). This is what makes fine-tuning large models feasible on a single GPU.

In [None]:
from peft import LoraConfig, get_peft_model


target_modules = ["Wqkv", "fc1", "fc2"]
config = LoraConfig(
    r=32,
    lora_alpha=64,
    target_modules = target_modules,
    bias = "none",
    lora_dropout=0.05,
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)



In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
print_trainable_parameters(model)

In [None]:
from accelerate import Accelerator

accelerator = Accelerator(gradient_accumulation_steps=1)

model = accelerator.prepare_model(model)

## ▶️ 10. Training the Model

We are now ready to train. We set up the `Trainer` with our LoRA-adapted model and `TrainingArguments`.

- **Optimizer**: We use `paged_adamw_8bit`, a special optimizer from `bitsandbytes` that is designed to work efficiently with quantized models.
- **Data Collator**: We use `DataCollatorForLanguageModeling` with `mlm=False` (Masked Language Modeling is set to false), which is the standard for Causal Language Model fine-tuning.

Finally, `trainer.train()` starts the fine-tuning process, where only the small LoRA adapter weights are updated.

In [None]:
# Trainer, Training Arguments, DataCollator

from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling
from datetime import datetime

project = "phi2-finetune"
run_name = 'train-dir'
output_dir = "./" + run_name

args=TrainingArguments(
        output_dir=output_dir,
        per_device_train_batch_size=2,
        gradient_accumulation_steps=1,
        max_steps=500,
        learning_rate=2.5e-5, # Want a small lr for finetuning
        optim="paged_adamw_8bit",
        logging_steps=25,              # When to start reporting loss
        logging_dir="./logs",        # Directory for storing logs
        save_strategy="steps",       # Save the model checkpoint every logging step
        save_steps=25,                # Save checkpoints every 50 steps
        evaluation_strategy="steps", # Evaluate the model every logging step
        eval_steps=25,               # Evaluate and save checkpoints every 50 steps
        do_eval=True,                # Perform evaluation at the end of training
    )

trainer = Trainer(
    model=model,
    args = args,
    train_dataset=dataset['train'],
    eval_dataset=dataset['test'],
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

## 🚀 11. Inference After Fine-Tuning

To test our fine-tuned model, we first need to merge the trained adapter weights with the original base model weights.

1.  **Load Base Model**: We load the original 8-bit quantized `microsoft/phi-2` model again.
2.  **Load PEFT Model**: We use `PeftModel.from_pretrained` to load our saved LoRA adapter from the final checkpoint (`checkpoint-500`) and apply it to the base model.
3.  **Generate Text**: We use the exact same prompt as our baseline test. By comparing the new output with the baseline, we can clearly see that the model has learned to follow the instructions and generate a relevant product description, demonstrating the success of our fine-tuning.

In [None]:
import torch
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    trust_remote_code=True,
    load_in_8bit=True,
    torch_dtype=torch.float16
)

eval_tokenizer = AutoTokenizer.from_pretrained(
    base_model_id,
    add_bos_token=True,
    trust_remote_code=True,
    use_fast=False
)
eval_tokenizer.pad_token = eval_tokenizer.eos_token

In [None]:
from peft import PeftModel

ft_model = PeftModel.from_pretrained(base_model, '/content/train-dir/checkpoint-500')

In [None]:
eval_prompt = """
Given the product category, you need to generate a 'Product Description'.
### Category: BatteryChargers
### Product Description:
"""

model_input = eval_tokenizer(eval_prompt, return_tensors="pt")

ft_model.eval()
with torch.no_grad():
  output = ft_model.generate(**model_input, max_new_tokens=256,
                                           repetition_penalty=1.15)
  result = eval_tokenizer.decode(output[0], skip_special_tokens=True)

  print(result)

## 💾 12. Saving the Adapter

Finally, we zip the contents of our final checkpoint directory. This creates a single, portable `phi2_qlora_adapter.zip` file containing our trained LoRA adapter, which can be easily shared and loaded later for inference.

In [None]:
!zip -r phi2_qlora_adapter.zip /content/train-dir/checkpoint-500
