# Fine-tuning Mistral with LoRA for Technical Documentation Generation

This notebook demonstrates how to fine-tune the Mistral-7B language model for generating technical documentation using Low-Rank Adaptation (LoRA). The process involves collecting documentation from popular open-source repositories, preparing the data, and training the model efficiently.

## Prerequisites

- Python 3.8+
- CUDA-compatible GPU with sufficient VRAM (at least 8GB recommended)
- Hugging Face account and API token
- Git installed on your system

### Required Libraries
```bash
pip install transformers bitsandbytes datasets peft gitpython huggingface_hub
pip install --upgrade peft
pip install --upgrade accelerate
pip install git+https://github.com/huggingface/peft.git
```

## Notebook Structure

1. **Model Loading and Quantization** (Cells 1-3)
   - Sets up necessary libraries
   - Loads Mistral-7B model with 4-bit quantization
   - Configures the tokenizer

2. **LoRA Configuration** (Cell 4)
   - Implements LoRA for efficient fine-tuning
   - Patches model components for device compatibility
   - Prepares trainable parameters

3. **Data Collection and Processing** (Cell 5)
   - Clones technical documentation repositories:
     - PyTorch
     - TensorFlow
     - scikit-learn
     - pandas
   - Processes Markdown files
   - Cleans and formats the documentation

4. **Dataset Preparation** (Cell 6)
   - Tokenizes the collected documents
   - Creates train/evaluation splits
   - Prepares data collator for language modeling

5. **Model Fine-tuning** (Cell 7)
   - Configures training arguments:
     - 3 epochs
     - Learning rate: 2e-4
     - Batch size: 4
     - Mixed-precision training (fp16)
   - Initializes trainer and starts training

6. **Model Evaluation** (Cell 8)
   - Compares base and fine-tuned models
   - Generates sample technical documentation
   - Analyzes output quality

## Key Features

- Uses 4-bit quantization for reduced memory usage
- Implements LoRA for efficient parameter updates
- Processes real-world technical documentation
- Includes proper error handling and device management
- Provides comparison between base and fine-tuned outputs

## Usage Notes

1. Replace `my_token` with your Hugging Face API token
2. Adjust batch sizes and model parameters based on your GPU memory
3. Modify the list of GitHub repositories as needed
4. Consider increasing epochs for better results
5. Monitor training loss for convergence

## Performance Considerations

- 4-bit quantization significantly reduces memory requirements
- LoRA minimizes the number of trainable parameters
- Training time depends on:
  - GPU capabilities
  - Dataset size
  - Number of epochs
  - Batch size

## Customization

You can customize the training by:
- Adding more documentation sources
- Adjusting LoRA parameters (r, alpha, target modules)
- Modifying training hyperparameters
- Implementing different data cleaning strategies
- Adding domain-specific preprocessing steps

## Troubleshooting

Common issues and solutions:
1. Out of Memory (OOM):
   - Reduce batch size
   - Increase quantization
   - Reduce model size

2. Training Instability:
   - Adjust learning rate
   - Modify LoRA configuration
   - Check data quality

3. Poor Generation Quality:
   - Increase training data
   - Adjust temperature and sampling parameters
   - Fine-tune for more epochs

In [1]:
# %% [code]
# %% Setup: Install necessary libraries (if not already installed)
!pip install transformers bitsandbytes datasets peft gitpython huggingface_hub
!pip install --upgrade peft
!pip install --upgrade accelerate
!pip install git+https://github.com/huggingface/peft.git

[0mCollecting git+https://github.com/huggingface/peft.git
  Cloning https://github.com/huggingface/peft.git to /tmp/pip-req-build-6cuklxjn
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git /tmp/pip-req-build-6cuklxjn
  Resolved https://github.com/huggingface/peft.git to commit 2825774d2de1c8bd0604ac685867edd79d608a9e
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[0m

In [2]:
# %% Import libraries
import os
import re
import glob
import torch
import pandas as pd

from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    Trainer,
    TrainingArguments,
    DataCollatorForLanguageModeling,
)
from datasets import Dataset
from peft import LoraConfig, get_peft_model

# Monkey-patch the `.to()` method of PreTrainedModel to be a no-op.
# This prevents Accelerate from calling .to() on the 4-bit model.
from transformers import PreTrainedModel
PreTrainedModel.to = lambda self, *args, **kwargs: self

In [3]:
# %% [markdown]
# ## 1. Load and Quantize the Base Mistral Model
#
# We load the Mistral model (here we use the "mistralai/mistral-7b-v0.1" model)
# in 4‑bit mode using BitsAndBytes. The model is loaded with our Hugging Face token.
#
# **Note:** Replace the token and model identifier as needed.

# %% Load the Mistral model and tokenizer with 4-bit quantization
model_name = "mistralai/mistral-7b-v0.1"  # Change if needed
my_token = ""  # Replace with your actual token

# Define the 4-bit quantization configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

# Load the tokenizer and set its pad token (we use the EOS token as the pad token)
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token

# Load the model in 4-bit mode.
# We do not pass device_map="auto" to avoid Accelerate dispatch (which calls .to()).
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    use_auth_token=my_token
)

`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [4]:
# Save a copy of the base model for later reference
import copy
base_model_copy = copy.deepcopy(model)

# (Apply your LoRA fine-tuning steps here)
from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
# Patch MistralRotaryEmbedding to fix device issues in rotary embeddings.
from transformers.models.mistral.modeling_mistral import MistralRotaryEmbedding

# Save the original forward method
original_forward = MistralRotaryEmbedding.forward

def patched_forward(self, x, position_ids):
    # Get the device from input x.
    device = x.device
    # If the internal inverse frequency tensor exists, move it to the same device.
    if hasattr(self, "inv_freq"):
        self.inv_freq = self.inv_freq.to(device)
    # Also ensure that position_ids are moved to the device.
    position_ids = position_ids.to(device)
    # Call the original forward method.
    return original_forward(self, x, position_ids)

# Override the forward method with our patched version.
MistralRotaryEmbedding.forward = patched_forward

In [5]:
# %% [markdown]
# ## 2. Prepare the Model for Efficient Fine-Tuning using LoRA
#
# We use PEFT’s LoRA to fine-tune only a small subset of parameters.
# The target modules ("q_proj" and "v_proj") may be adjusted based on the model's implementation.

# %% Prepare the model for LoRA fine-tuning
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],  # Adjust if necessary
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)
print("LoRA parameters prepared. Number of trainable parameters:",
      sum(p.numel() for p in model.parameters() if p.requires_grad))

LoRA parameters prepared. Number of trainable parameters: 6815744




In [6]:
# %% [markdown]
# ## 3. Data Collection, Cleansing, and Transformation
#
# We collect technical documentation from two GitHub repositories:
# - PyTorch (docs in the `docs/` folder)
# - TensorFlow Docs (markdown files at the root and subdirectories)

# %% Define GitHub repositories to clone and process
github_repos = [
    {"name": "pytorch", "url": "https://github.com/pytorch/pytorch.git", "docs_path": "docs"},
    {"name": "tensorflow_docs", "url": "https://github.com/tensorflow/docs.git", "docs_path": ""},
    {"name": "scikit-learn", "url": "https://github.com/scikit-learn/scikit-learn.git", "docs_path": "doc"},
    {"name": "pandas", "url": "https://github.com/pandas-dev/pandas.git", "docs_path": "doc"},
    # Add more repositories as needed...
]

data_texts = []

for repo in github_repos:
    repo_dir = repo["name"]
    if not os.path.exists(repo_dir):
        print(f"Cloning repository {repo['url']} ...")
        os.system(f"git clone {repo['url']} {repo_dir}")
    else:
        print(f"Repository {repo['name']} already cloned.")

    # Use the docs_path if provided; otherwise search the repo root.
    search_dir = os.path.join(repo_dir, repo["docs_path"]) if repo["docs_path"] else repo_dir

    # Recursively find Markdown files (*.md)
    md_files = glob.glob(os.path.join(search_dir, "**/*.md"), recursive=True)
    print(f"Found {len(md_files)} markdown files in {repo['name']}.")

    for file_path in md_files:
        try:
            with open(file_path, "r", encoding="utf-8") as f:
                text = f.read()
                # Remove code blocks (content between triple backticks)
                text = re.sub(r'```.*?```', '', text, flags=re.DOTALL)
                # Remove extra whitespace
                text = re.sub(r'\s+', ' ', text)
                data_texts.append(text.strip())
        except Exception as e:
            print(f"Error reading {file_path}: {e}")

print(f"\nTotal collected and cleaned markdown documents: {len(data_texts)}")

# %% Create a Dataset from the collected texts
df = pd.DataFrame({"text": data_texts})
dataset = Dataset.from_pandas(df)

Cloning into 'pytorch'...


Cloning repository https://github.com/pytorch/pytorch.git ...


Updating files: 100% (18367/18367), done.


Found 1 markdown files in pytorch.
Repository tensorflow_docs already cloned.
Found 112 markdown files in tensorflow_docs.
Cloning repository https://github.com/scikit-learn/scikit-learn.git ...


Cloning into 'scikit-learn'...


Found 3 markdown files in scikit-learn.
Cloning repository https://github.com/pandas-dev/pandas.git ...


Cloning into 'pandas'...


Found 1 markdown files in pandas.

Total collected and cleaned markdown documents: 117


In [7]:
# %% [markdown]
# ## 4. Tokenization and Dataset Preparation
#
# We tokenize the documents, truncating them to 512 tokens.
# (For larger documents, consider using a sliding window approach.)

# %% Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, max_length=512)

tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=["text"])
print("Tokenization complete.")

# Split the dataset into train (90%) and evaluation (10%) sets.
split_dataset = tokenized_dataset.train_test_split(test_size=0.1)
print("Train/evaluation split complete.")

# Create a data collator for causal language modeling (no masking).
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

Map:   0%|          | 0/117 [00:00<?, ? examples/s]

Tokenization complete.
Train/evaluation split complete.


In [8]:
# %% [markdown]
# ## 5. Fine-Tuning the Model
#
# We define training arguments and initialize the Hugging Face Trainer.

# %% Define training arguments
training_args = TrainingArguments(
    output_dir="./mistral_finetuned",
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    evaluation_strategy="epoch",
    num_train_epochs=3,
    learning_rate=2e-4,
    weight_decay=0.01,
    save_total_limit=2,
    fp16=True,  # Mixed-precision training (CUDA environment)
    logging_steps=50,
    report_to="none",
)

# %% Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=split_dataset["train"],
    eval_dataset=split_dataset["test"],
    data_collator=data_collator,
)

# %% Start fine-tuning
print("Starting fine-tuning...")
trainer.train()

Starting fine-tuning...




Epoch,Training Loss,Validation Loss
1,No log,1.480753
2,1.486300,1.469751
3,1.486300,1.488375


We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)


TrainOutput(global_step=81, training_loss=1.4175228071801456, metrics={'train_runtime': 92.6311, 'train_samples_per_second': 3.401, 'train_steps_per_second': 0.874, 'total_flos': 6852848088268800.0, 'train_loss': 1.4175228071801456, 'epoch': 3.0})

In [9]:
# %% [markdown]
# ## 6. Evaluation: Comparing the Fine-Tuned Model with the Base Model
#
# We compare the outputs of the fine-tuned model and the base model on a technical documentation prompt.

# %% Define a helper function to generate text from a prompt
def generate_text(model, prompt, max_new_tokens=200):
    # Get the device from the model's parameters.
    device = next(model.parameters()).device
    
    # Tokenize the prompt.
    inputs = tokenizer(prompt, return_tensors="pt")
    # Move each tensor to the device.
    inputs = {key: tensor.to(device) for key, tensor in inputs.items()}
    
    # Generate text using the model.
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id,
    )
    # Decode and return the generated tokens.
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

prompt = (
    """Generate detailed technical documentation for a Python function that implements a 2D convolution operation for image processing.
    
The documentation should include:
- **Problem Statement:** A clear description of the purpose of the function.
- **Algorithm Explanation:** A detailed explanation of how the 2D convolution is performed, including discussion on kernel size, padding, stride, and edge handling.
- **Input/Output Specifications:** Information on the expected types and shapes of inputs (e.g., image matrices, kernel arrays) and the output.
- **Code Examples:** Illustrative code examples that include comments explaining each part of the code.
- **Performance Analysis:** A brief discussion on the computational complexity and any optimization considerations.

Provide the answer in markdown format with proper sections and code blocks."""
)
# Load the base model (without fine-tuning adaptations) for comparison.
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    use_auth_token=my_token
)

print("### Base Model Output ###\n")
print(generate_text(base_model, prompt))

print("\n### Fine-Tuned Model Output ###\n")
print(generate_text(model, prompt))

# %% [markdown]
# ## Conclusion
#
# In this notebook, we:
# - Loaded a Mistral model in 4‑bit mode using BitsAndBytes on a CUDA environment.
# - Prepared the model for efficient fine‑tuning using LoRA.
# - Collected and cleaned technical documentation data from GitHub.
# - Tokenized and prepared a dataset.
# - Fine-tuned the model on the collected data.
# - Compared the outputs from the fine-tuned model and the original base model.
#
# Experiment further with additional repositories, data cleansing methods, and hyperparameter settings to tailor the model to your specific needs.

`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Base Model Output ###

Generate detailed technical documentation for a Python function that implements a 2D convolution operation for image processing.
    
The documentation should include:
- **Problem Statement:** A clear description of the purpose of the function.
- **Algorithm Explanation:** A detailed explanation of how the 2D convolution is performed, including discussion on kernel size, padding, stride, and edge handling.
- **Input/Output Specifications:** Information on the expected types and shapes of inputs (e.g., image matrices, kernel arrays) and the output.
- **Code Examples:** Illustrative code examples that include comments explaining each part of the code.
- **Performance Analysis:** A brief discussion on the computational complexity and any optimization considerations.

Provide the answer in markdown format with proper sections and code blocks.

## The Code:

```python

import numpy as np

def conv2d(input_image, kernel):
    # Preprocessing steps
    padded_input 