# Fine-tuning PLLuM 8B for Function Calling

This notebook implements the fine-tuning of the Polish language model [CYFRAGOVPL/Llama-PLLuM-8B-instruct](https://huggingface.co/CYFRAGOVPL/Llama-PLLuM-8B-instruct) for function calling tasks using a dataset of examples in both Polish and English.

We use the following techniques:
- **QLoRA** (Quantized Low-Rank Adaptation) with 4-bit quantization for memory efficiency
- **Unsloth** framework for optimized training speed
- Mixed dataset with both Polish and English examples

The fine-tuning adapts the model to understand the specific format of function calling requests and to generate proper JSON responses.

## Setup and Imports

In [None]:
# Install/update dependencies if needed
!pip install -q -U unsloth bitsandbytes sentencepiece

In [None]:
import os
import json
import torch
import random
import numpy as np
from pathlib import Path
from dotenv import load_dotenv
from datetime import datetime

# Import our fine-tuning utilities
from src.fine_tuning import (
    PLLuMFineTuningConfig,
    setup_model_and_tokenizer,
    prepare_dataset,
    train_model,
    format_function_calling_prompt,
    generate_function_call,
)
from src.auth import login  # For Hugging Face authentication
from src.dataset import parse_json_entry

# Set a random seed for reproducibility
def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

set_seed(42)

# Load environment variables
load_dotenv()

# Authenticate with Hugging Face
login()

## Check GPU availability

Make sure we have a GPU available for training. This notebook is designed for use with an NVIDIA RTX 4060.

In [None]:
# Check if CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    
    # Print CUDA version
    print(f"CUDA Version: {torch.version.cuda}")
else:
    print("WARNING: No GPU detected. Fine-tuning will be very slow without a GPU.")

## Load and Examine the Dataset

We'll load the translated dataset and examine it to understand its structure.

In [None]:
# Path to the translated dataset
DATASET_PATH = "../data/translated_dataset.json"

# Check if the dataset exists
if not os.path.exists(DATASET_PATH):
    raise FileNotFoundError(f"Dataset not found at {DATASET_PATH}. Please run create_translated_dataset.ipynb first.")

# Load the dataset
with open(DATASET_PATH, 'r', encoding='utf-8') as f:
    dataset = json.load(f)

print(f"Dataset loaded with {len(dataset)} examples.")

In [None]:
# Examine a few examples
def print_example(example, idx=0):
    print(f"Example {idx}:")
    print(f"Query: {example['query']}")
    print(f"Tools: {len(example['tools'])} available")
    
    # Show the first tool
    if len(example['tools']) > 0:
        print(f"First tool: {example['tools'][0]['name']}")
        print(f"Description: {example['tools'][0]['description']}")
        
    print(f"Answers: {len(example['answers'])} answers")
    
    # Show the first answer
    if len(example['answers']) > 0:
        print(f"First answer uses tool: {example['answers'][0]['name']}")
        print(f"With arguments: {example['answers'][0]['arguments']}")
    
    print("\n")

# Print a few examples
for i in range(min(3, len(dataset))):
    print_example(dataset[i], i)

In [None]:
# Let's check how the formatted prompt will look
example_prompt = format_function_calling_prompt(dataset[0])
print(example_prompt)

## Configure Fine-tuning Parameters

Set up the fine-tuning configuration with parameters optimized for an RTX 4060 GPU.

In [None]:
# Create a timestamped output directory
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
MODEL_OUTPUT_DIR = f"../models/pllum-function-calling-{timestamp}"

# Create the output directory
os.makedirs(MODEL_OUTPUT_DIR, exist_ok=True)

# Configure fine-tuning parameters
config = PLLuMFineTuningConfig(
    model_name_or_path="CYFRAGOVPL/Llama-PLLuM-8B-instruct",
    output_dir=MODEL_OUTPUT_DIR,
    
    # QLoRA settings
    lora_r=16,  # LoRA rank
    lora_alpha=32,  # LoRA alpha
    lora_dropout=0.05,
    use_4bit=True,  # Use 4-bit quantization for memory efficiency
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_quant_type="nf4",  # Normal Float 4-bit quantization
    use_nested_quant=False,
    
    # Training parameters
    num_train_epochs=3,
    per_device_train_batch_size=4,  # Adjust based on GPU memory
    gradient_accumulation_steps=2,  # Increase effective batch size
    learning_rate=2e-4,
    weight_decay=0.01,
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    lr_scheduler_type="cosine",
    
    # Logging & Saving
    logging_steps=10,
    save_steps=200,
    save_total_limit=3,
    
    # Dataset parameters
    max_seq_length=1024,  # Maximum sequence length
    dataset_path=DATASET_PATH,
)

# Save the configuration to the model directory for future reference
config_dict = {k: str(v) if isinstance(v, Path) else v for k, v in vars(config).items()}
with open(os.path.join(MODEL_OUTPUT_DIR, "config.json"), 'w', encoding='utf-8') as f:
    json.dump(config_dict, f, indent=2)

## Load Model and Tokenizer

Setup the model with QLoRA and Unsloth optimizations.

In [None]:
# Load model and tokenizer
print("Loading model and tokenizer...")
model, tokenizer = setup_model_and_tokenizer(config)
print("Model and tokenizer loaded successfully.")

## Prepare the Dataset for Training

In [None]:
# Prepare the dataset
print("Preparing dataset...")
train_dataset = prepare_dataset(
    dataset_path=config.dataset_path,
    tokenizer=tokenizer,
    max_length=config.max_seq_length
)
print(f"Dataset prepared with {len(train_dataset['input_ids'])} examples.")

## Fine-tune the Model

This is the main training process. It will take several hours depending on your hardware.

In [None]:
# Run the training
print(f"Starting fine-tuning process. Model will be saved to {config.output_dir}")
print("This may take several hours depending on your hardware.")

# Start training
trained_model = train_model(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    config=config
)

print("Fine-tuning completed successfully!")

## Test the Fine-tuned Model

Let's test our model with a few examples from the dataset.

In [None]:
# Test the model with examples from the dataset
def test_model_with_example(example_idx=0):
    example = dataset[example_idx]
    
    query = example['query']
    tools = example['tools']
    expected_answers = example['answers']
    
    print(f"Query: {query}")
    print("\nAvailable tools:")
    for i, tool in enumerate(tools):
        print(f"{i+1}. {tool['name']}: {tool['description']}")
    
    print("\nExpected answer:")
    print(json.dumps(expected_answers, indent=2, ensure_ascii=False))
    
    print("\nGenerating function call...")
    generated = generate_function_call(
        model=trained_model,
        tokenizer=tokenizer,
        query=query,
        tools=tools,
        temperature=0.1
    )
    
    print("\nGenerated answer:")
    print(json.dumps(generated, indent=2, ensure_ascii=False))
    
    return generated

# Test with a few examples
for i in range(3):
    print(f"\n--- Example {i} ---")
    generated = test_model_with_example(i)
    print("\n" + "-"*50)

## Save a Final Model Summary

Let's create a summary file with information about the fine-tuning process.

In [None]:
# Create a summary file
summary = {
    "model_name": config.model_name_or_path,
    "fine_tuned_model_path": config.output_dir,
    "training_date": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
    "dataset": {
        "path": config.dataset_path,
        "num_examples": len(train_dataset['input_ids']),
    },
    "training_parameters": {
        "epochs": config.num_train_epochs,
        "batch_size": config.per_device_train_batch_size,
        "learning_rate": config.learning_rate,
        "lora_r": config.lora_r,
        "lora_alpha": config.lora_alpha,
        "max_seq_length": config.max_seq_length,
    },
    "hardware": {
        "gpu": torch.cuda.get_device_name(0) if torch.cuda.is_available() else "None",
        "cuda_version": torch.version.cuda if torch.cuda.is_available() else "None",
    },
}

# Save the summary
with open(os.path.join(config.output_dir, "training_summary.json"), 'w', encoding='utf-8') as f:
    json.dump(summary, f, indent=2, ensure_ascii=False)

print(f"Training summary saved to {os.path.join(config.output_dir, 'training_summary.json')}")

## Conclusion

The PLLuM 8B model has been successfully fine-tuned for function calling using QLoRA techniques and the Unsloth framework for optimization. The model can now be used to parse queries and generate appropriate function calls in both Polish and English languages.

To use the fine-tuned model in your applications, check the `test_model.ipynb` notebook for examples of how to load and integrate the model into your pipeline.