[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Showmick119/Fine-Tuning-Open-Source-LLM/blob/main/notebooks/1_finetuning_lora.ipynb)

# üöÄ Fine-tuning CodeLlama with QLoRA on CodeAlpaca

This notebook demonstrates how to fine-tune **CodeLlama-7b-Instruct** using **QLoRA (4-bit quantization)** with the PEFT library on the **CodeAlpaca-20k** dataset. This approach allows efficient fine-tuning on a single GPU while maintaining high performance.

## üéØ What You'll Learn

- How to load and configure CodeLlama with 4-bit quantization
- How to set up LoRA adapters for efficient fine-tuning
- How to prepare the CodeAlpaca dataset for training
- How to train with QLoRA and save checkpoints

## üîß Setup

First, let's install the required packages and set up our environment. We'll be using Google Colab's GPU runtime for this tutorial.


In [None]:
%pip install -q torch transformers datasets peft bitsandbytes accelerate tqdm

### üìÅ Clone the Repository

First, let's clone our repository to get access to our training scripts, configurations, and the CodeAlpaca dataset.


In [None]:
# Clone the repository (replace with your actual repo URL)
!git clone https://github.com/your-username/llm-finetuning-lora.git
%cd llm-finetuning-lora

# Check GPU availability
import torch
print(f"üöÄ CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"üéÆ GPU: {torch.cuda.get_device_name(0)}")
    print(f"üíæ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

### üì¶ Import Dependencies

Now let's import our custom modules and other required libraries.

In [None]:
import sys
sys.path.append('.')

from model.load_base_model import ModelLoader
from data.prepare_dataset import DatasetPreparator
from train.run_lora_finetune import run_training


## üìä Data Preparation

We'll use the **CodeAlpaca-20k** dataset, which contains 20,000 high-quality instruction-following examples specifically designed for code generation tasks. This dataset is perfect for fine-tuning CodeLlama.

In [None]:
# Initialize data preparator with CodeLlama tokenizer
print("üîß Initializing data preparator with CodeLlama tokenizer...")
data_preparator = DatasetPreparator(
    tokenizer="codellama/CodeLlama-7b-Instruct-hf",
    max_length=512,
    data_path="data/code_alpaca_20k.json"
)

# Load and examine the CodeAlpaca dataset
print("üìÅ Loading CodeAlpaca-20k dataset...")
dataset = data_preparator.prepare_dataset(use_dummy=False)
print(f"‚úÖ Prepared dataset with {len(dataset)} examples")

# Let's look at a sample from the dataset
import json
with open("data/code_alpaca_20k.json", "r") as f:
    raw_data = json.load(f)

print(f"\nüìù Sample from raw dataset:")
print(f"Instruction: {raw_data[0]['instruction']}")
print(f"Input: {raw_data[0]['input']}")
print(f"Output: {raw_data[0]['output'][:200]}...")


## ü§ñ Model Preparation with QLoRA

Now let's load **CodeLlama-7b-Instruct** with **4-bit quantization** (QLoRA) and configure it with LoRA adapters. This dramatically reduces memory usage while maintaining training effectiveness.

In [None]:
# üîß Load CodeLlama with 4-bit quantization and LoRA adapter
print("üöÄ Loading CodeLlama-7b-Instruct with QLoRA configuration...")
model_loader = ModelLoader("configs/lora_config.json")

print("üì¶ Loading base model with 4-bit quantization...")
model, tokenizer = model_loader.load_base_model()

print("üîó Adding LoRA adapter for efficient fine-tuning...")
model = model_loader.add_lora_adapter(model)

# Check model memory usage
if torch.cuda.is_available():
    print(f"üíæ GPU Memory Used: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
    print(f"üíæ GPU Memory Cached: {torch.cuda.memory_reserved() / 1e9:.2f} GB")


## üèãÔ∏è Training with QLoRA

Now we can start the fine-tuning process! We'll use the training configuration optimized for CodeLlama and QLoRA. The training will save checkpoints regularly so you can resume if needed.

In [None]:
# üöÄ Start fine-tuning with QLoRA on CodeAlpaca dataset
print("üèãÔ∏è Starting fine-tuning process...")
print("üìä Training on CodeAlpaca-20k dataset")
print("‚ö° Using QLoRA (4-bit quantization) for efficient training")

run_training(
    training_config_path="configs/training_args.json",
    lora_config_path="configs/lora_config.json",
    data_path="data/code_alpaca_20k.json",
    use_dummy_data=False
)

print("‚úÖ Training completed successfully!")
print("üíæ Model checkpoints saved to: outputs/checkpoints")
print("üìã Training logs saved to: outputs/logs")


## üíæ Model Checkpoints & Next Steps

The training script automatically saves:
- **LoRA adapter weights** in `outputs/checkpoints/`
- **Training logs** in `outputs/logs/`
- **Tokenizer configuration** alongside the model

## üéØ What's Next?

Now that you have fine-tuned CodeLlama on CodeAlpaca, you can:

1. **Test your model**: Use `2_test_model.ipynb` to interactively test code generation
2. **Evaluate performance**: Use `3_evaluate_model.ipynb` to benchmark on HumanEval
3. **Deploy your model**: Use the inference scripts for production deployment
4. **Experiment further**: Try different LoRA configurations or datasets

## üßπ Cleanup (Optional)

If you're using Google Colab, you may want to free up GPU memory:

In [None]:
# Optional: Clear GPU memory
import gc
import torch

del model
del tokenizer
gc.collect()
torch.cuda.empty_cache()

print("üßπ GPU memory cleared!")
