<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Build Fast with AI](https://img.shields.io/badge/BuildFastWithAI-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://www.buildfastwithai.com/genai-course)
[![EduChain GitHub](https://img.shields.io/github/stars/satvik314/educhain?style=for-the-badge&logo=github&color=gold)](https://github.com/satvik314/educhain)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-tbvLOFetmNkYmev7Ie55TSe035_Pmdr?usp=sharing)

## Master Generative AI in 4 Weeks
**What You'll Learn:**
- Build with Latest LLMs
- Create Custom AI Apps
- Learn from Industry Experts
- Join Innovation Community

Transform your AI ideas into reality through hands-on projects and expert mentorship.

[Start Your Journey](https://www.buildfastwithai.com/genai-course)

*Empowering the Next Generation of AI Innovators

# Fine-tune Open-Source LLMs on <a href="https://tokenfactory.nebius.com/"><img src="https://mintcdn.com/nebius-723e8b65/jsgY7B_gdaTjMC6y/logo/Main-logo-TF-Dark.svg?fit=max&auto=format&n=jsgY7B_gdaTjMC6y&q=85&s=92ebc07d32d93f3918de2f7ec4a0754a" width="200"></a>

## Overview

**Nebius Token Factory** is a powerful platform for fine-tuning large language models (LLMs) with LoRA adapters, enabling you to customize models for domain-specific tasks efficiently and cost-effectively.

### Why Fine-Tuning?

While general-purpose LLMs like Llama, Qwen, and DeepSeek are versatile, they often struggle with:
- Domain-specific terminology and context
- Highly structured or formatted outputs
- Specialized task requirements
- Custom function calling patterns

**Fine-tuning solves these challenges by:**
- Adapting models to your specific use case
- Improving accuracy on specialized tasks
- Reducing inference costs through smaller, focused models
- Maintaining model quality while customizing behavior

### What You'll Learn

In this notebook, you'll learn to:
1. Prepare datasets for fine-tuning
2. Configure LoRA hyperparameters
3. Launch fine-tuning jobs on Nebius Token Factory
4. Deploy custom LoRA adapters
5. Evaluate fine-tuned model performance

### About Nebius Token Factory

**Nebius Token Factory** provides:
- **Cost-Efficient Training** - LoRA adapters reduce training costs significantly
- **Fast Deployment** - Deploy serverless LoRA models with per-token billing
- **Multiple Models** - Support for Llama, Qwen, Mistral, and more
- **W&B Integration** - Built-in Weights & Biases tracking
- **OpenAI-Compatible API** - Easy integration with existing code

---

**Built with ‚ù§Ô∏è by BuildFastWithAI | Powered by Nebius Token Factory**

## üìã Table of Contents

1. [Setup & Installation](#setup)
2. [Dataset Preparation](#dataset)
3. [LoRA Fine-Tuning Configuration](#lora-config)
4. [Launch Fine-Tuning Job](#fine-tuning)
5. [Monitor Training Progress](#monitoring)
6. [Deploy Custom Model](#deployment)
7. [Inference & Evaluation](#inference)
8. [Production Best Practices](#best-practices)

<a id="setup"></a>
## 1. Setup & Installation üîß

First, let's install the required dependencies and set up authentication.

In [1]:
# Install required packages
%pip install openai python-dotenv datasets pandas numpy tqdm -q

### Import Libraries

In [2]:
import os
import json
import time
import pandas as pd
import numpy as np
from typing import List, Dict, Any
from openai import OpenAI
from datasets import load_dataset, Dataset
from tqdm import tqdm
from dotenv import load_dotenv

### Configure API Access

To use Nebius Token Factory, you need an API key:

1. Sign up at [Nebius Token Factory](https://tokenfactory.nebius.com/)
2. Generate an API key from your dashboard
3. Set it as an environment variable

**For Colab:** Use the Colab secrets manager or set directly

In [4]:
# Option 1: Using Colab userdata (recommended for Colab)
try:
    from google.colab import userdata
    NEBIUS_API_KEY = userdata.get('NEBIUS_API_KEY')
except:
    # Option 2: Using environment variables
    load_dotenv()
    NEBIUS_API_KEY = os.getenv('NEBIUS_API_KEY')

    # Option 3: Set directly (not recommended for production)
    if not NEBIUS_API_KEY:
        NEBIUS_API_KEY = "your_api_key_here"  # Replace with your actual API key

# Initialize Nebius Token Factory client
client = OpenAI(
    base_url="https://api.tokenfactory.nebius.com/v1/",
    api_key=NEBIUS_API_KEY
)

print("‚úÖ Nebius Token Factory client initialized successfully!")

‚úÖ Nebius Token Factory client initialized successfully!


<a id="dataset"></a>
## 2. Dataset Preparation üìä

### Dataset Format Requirements

Nebius Token Factory expects datasets in **JSONL format** with the following structure:

```json
{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is AI?"},
    {"role": "assistant", "content": "AI stands for Artificial Intelligence..."}
  ]
}
```

### Best Practices

- **Training Set:** 80-90% of your data (minimum 10 examples)
- **Validation Set:** 10-20% of your data (optional but recommended)
- **Quality over Quantity:** Well-crafted examples are better than many poor ones
- **Diverse Examples:** Cover edge cases and variations in your domain

### Example 1: Function Calling Dataset

Let's create a dataset for function calling - a common use case for fine-tuning.

In [5]:
def create_function_calling_dataset():
    """
    Create a sample dataset for function calling fine-tuning.
    This teaches the model to properly format function calls.
    """

    training_data = [
        {
            "messages": [
                {"role": "system", "content": "You are a helpful assistant with access to weather and calculation functions."},
                {"role": "user", "content": "What's the weather like in New York?"},
                {"role": "assistant", "content": '{"function": "get_weather", "arguments": {"location": "New York", "unit": "celsius"}}'}
            ]
        },
        {
            "messages": [
                {"role": "system", "content": "You are a helpful assistant with access to weather and calculation functions."},
                {"role": "user", "content": "Calculate 25 times 4"},
                {"role": "assistant", "content": '{"function": "calculate", "arguments": {"operation": "multiply", "numbers": [25, 4]}}'}
            ]
        },
        {
            "messages": [
                {"role": "system", "content": "You are a helpful assistant with access to weather and calculation functions."},
                {"role": "user", "content": "Is it going to rain in London tomorrow?"},
                {"role": "assistant", "content": '{"function": "get_weather", "arguments": {"location": "London", "unit": "celsius", "forecast": "tomorrow"}}'}
            ]
        },
        {
            "messages": [
                {"role": "system", "content": "You are a helpful assistant with access to weather and calculation functions."},
                {"role": "user", "content": "What's 100 divided by 5?"},
                {"role": "assistant", "content": '{"function": "calculate", "arguments": {"operation": "divide", "numbers": [100, 5]}}'}
            ]
        },
        {
            "messages": [
                {"role": "system", "content": "You are a helpful assistant with access to weather and calculation functions."},
                {"role": "user", "content": "Tell me the temperature in Tokyo"},
                {"role": "assistant", "content": '{"function": "get_weather", "arguments": {"location": "Tokyo", "unit": "celsius"}}'}
            ]
        }
    ]

    return training_data

# Create training dataset
train_data = create_function_calling_dataset()
print(f"Created {len(train_data)} training examples")
print("\nSample example:")
print(json.dumps(train_data[0], indent=2))

Created 5 training examples

Sample example:
{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant with access to weather and calculation functions."
    },
    {
      "role": "user",
      "content": "What's the weather like in New York?"
    },
    {
      "role": "assistant",
      "content": "{\"function\": \"get_weather\", \"arguments\": {\"location\": \"New York\", \"unit\": \"celsius\"}}"
    }
  ]
}


### Example 2: Domain-Specific Knowledge Dataset

Create a dataset for a specific domain (e.g., medical, legal, technical)

In [6]:
def create_medical_qa_dataset():
    """
    Create a sample medical Q&A dataset.
    Note: This is for demonstration only. Use verified medical data in production.
    """

    training_data = [
        {
            "messages": [
                {"role": "system", "content": "You are a medical information assistant. Provide accurate, evidence-based information."},
                {"role": "user", "content": "What are the common symptoms of type 2 diabetes?"},
                {"role": "assistant", "content": "Common symptoms of type 2 diabetes include: increased thirst and frequent urination, increased hunger, unintended weight loss, fatigue, blurred vision, slow-healing sores, and frequent infections. However, many people with type 2 diabetes have no symptoms initially."}
            ]
        },
        {
            "messages": [
                {"role": "system", "content": "You are a medical information assistant. Provide accurate, evidence-based information."},
                {"role": "user", "content": "How is hypertension diagnosed?"},
                {"role": "assistant", "content": "Hypertension is diagnosed through blood pressure measurements. A diagnosis typically requires: multiple readings on different occasions, readings consistently at or above 140/90 mmHg, and ruling out white coat hypertension through ambulatory monitoring if necessary."}
            ]
        }
    ]

    return training_data

# You can create domain-specific datasets for your use case
# medical_data = create_medical_qa_dataset()

### Save Dataset to JSONL Format

In [7]:
def save_to_jsonl(data: List[Dict], filename: str):
    """
    Save dataset to JSONL format required by Nebius Token Factory.

    Args:
        data: List of conversation examples
        filename: Output filename
    """
    with open(filename, 'w', encoding='utf-8') as f:
        for example in data:
            f.write(json.dumps(example, ensure_ascii=False) + '\n')
    print(f"‚úÖ Saved {len(data)} examples to {filename}")

# Split data into train and validation (80-20 split)
split_idx = int(len(train_data) * 0.8)
train_set = train_data[:split_idx]
val_set = train_data[split_idx:]

# Save datasets
save_to_jsonl(train_set, "fine_tuning_train.jsonl")
save_to_jsonl(val_set, "fine_tuning_validation.jsonl")

print(f"\nüìä Dataset Split:")
print(f"Training examples: {len(train_set)}")
print(f"Validation examples: {len(val_set)}")

‚úÖ Saved 4 examples to fine_tuning_train.jsonl
‚úÖ Saved 1 examples to fine_tuning_validation.jsonl

üìä Dataset Split:
Training examples: 4
Validation examples: 1


### Upload Datasets to Nebius Token Factory

In [8]:
# Upload training dataset
print("Uploading training dataset...")
training_file = client.files.create(
    file=open("fine_tuning_train.jsonl", "rb"),
    purpose="fine-tune"
)
print(f"‚úÖ Training file uploaded: {training_file.id}")

# Upload validation dataset (optional but recommended)
print("\nUploading validation dataset...")
validation_file = client.files.create(
    file=open("fine_tuning_validation.jsonl", "rb"),
    purpose="fine-tune"
)
print(f"‚úÖ Validation file uploaded: {validation_file.id}")

Uploading training dataset...
‚úÖ Training file uploaded: file-019afdb4-f761-77e6-a357-027ef521dbad

Uploading validation dataset...
‚úÖ Validation file uploaded: file-019afdb4-f87b-7abf-bc60-543cca61e9d3


<a id="lora-config"></a>
## 3. LoRA Fine-Tuning Configuration ‚öôÔ∏è

### Understanding LoRA Hyperparameters

**LoRA (Low-Rank Adaptation)** is an efficient fine-tuning method that:
- Trains only small adapter matrices instead of the entire model
- Reduces training costs by up to 90%
- Maintains model quality with proper configuration

#### Key Hyperparameters:

1. **lora_r (Rank)**: Size of the low-rank matrices
   - Higher = More capacity but higher cost
   - Recommended: 8-32 for most tasks
   - Use 16-64 for complex domain adaptation

2. **lora_alpha**: Scaling factor for LoRA weights
   - Typically set equal to lora_r
   - Controls influence of fine-tuning on base model

3. **lora_dropout**: Regularization to prevent overfitting
   - Range: 0.0 - 0.1
   - Higher for smaller datasets

4. **batch_size**: Number of examples per training step
   - Larger = More stable but higher memory
   - Recommended: 8-32

5. **n_epochs**: Number of training passes
   - Start with 3-5 epochs
   - Monitor validation loss to avoid overfitting

6. **learning_rate**: Controls update step size
   - Default: 1e-5
   - Increase for new domains, decrease for fine-tuning

### Configuration Presets

Here are recommended configurations for different scenarios:

In [9]:
# Configuration presets for different use cases

PRESETS = {
    "quick_test": {
        "description": "Fast training for testing (~$1-2)",
        "hyperparameters": {
            "n_epochs": 1,
            "batch_size": 16,
            "lora": True,
            "lora_r": 8,
            "lora_alpha": 8,
            "lora_dropout": 0.05
        }
    },
    "standard": {
        "description": "Balanced quality and cost (~$5-10)",
        "hyperparameters": {
            "n_epochs": 3,
            "batch_size": 32,
            "lora": True,
            "lora_r": 16,
            "lora_alpha": 16,
            "lora_dropout": 0.05
        }
    },
    "high_quality": {
        "description": "Best quality for production (~$15-25)",
        "hyperparameters": {
            "n_epochs": 5,
            "batch_size": 32,
            "lora": True,
            "lora_r": 32,
            "lora_alpha": 32,
            "lora_dropout": 0.05,
            "learning_rate": 0.00001
        }
    },
    "domain_adaptation": {
        "description": "For completely new domains (~$20-30)",
        "hyperparameters": {
            "n_epochs": 5,
            "batch_size": 32,
            "lora": True,
            "lora_r": 64,
            "lora_alpha": 64,
            "lora_dropout": 0.1,
            "learning_rate": 0.00003
        }
    }
}

# Display presets
print("Available Configuration Presets:\n")
for name, config in PRESETS.items():
    print(f"üìã {name.upper()}")
    print(f"   {config['description']}")
    print(f"   Config: {json.dumps(config['hyperparameters'], indent=2)}\n")

Available Configuration Presets:

üìã QUICK_TEST
   Fast training for testing (~$1-2)
   Config: {
  "n_epochs": 1,
  "batch_size": 16,
  "lora": true,
  "lora_r": 8,
  "lora_alpha": 8,
  "lora_dropout": 0.05
}

üìã STANDARD
   Balanced quality and cost (~$5-10)
   Config: {
  "n_epochs": 3,
  "batch_size": 32,
  "lora": true,
  "lora_r": 16,
  "lora_alpha": 16,
  "lora_dropout": 0.05
}

üìã HIGH_QUALITY
   Best quality for production (~$15-25)
   Config: {
  "n_epochs": 5,
  "batch_size": 32,
  "lora": true,
  "lora_r": 32,
  "lora_alpha": 32,
  "lora_dropout": 0.05,
  "learning_rate": 1e-05
}

üìã DOMAIN_ADAPTATION
   For completely new domains (~$20-30)
   Config: {
  "n_epochs": 5,
  "batch_size": 32,
  "lora": true,
  "lora_r": 64,
  "lora_alpha": 64,
  "lora_dropout": 0.1,
  "learning_rate": 3e-05
}



<a id="fine-tuning"></a>
## 4. Launch Fine-Tuning Job üöÄ

### Supported Models

Nebius Token Factory supports fine-tuning for:
- **Llama 3.1** (8B, 70B) - Best for instruction following
- **Qwen 2.5** (7B, 14B, 32B, 72B) - Multilingual, strong reasoning
- **Mistral** (7B) - Efficient and fast
- **DeepSeek** - Code and reasoning tasks

Check [Nebius Models](https://docs.tokenfactory.nebius.com/fine-tuning/models) for the latest supported models.

In [10]:
# Choose your configuration preset
SELECTED_PRESET = "standard"  # Change to: quick_test, standard, high_quality, or domain_adaptation

# Select model
MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"  # Options: meta-llama/Llama-3.1-8B-Instruct, Qwen/Qwen2.5-7B-Instruct, etc.

print(f"üîß Configuration: {SELECTED_PRESET.upper()}")
print(f"ü§ñ Model: {MODEL_NAME}")
print(f"\n{PRESETS[SELECTED_PRESET]['description']}")

üîß Configuration: STANDARD
ü§ñ Model: meta-llama/Llama-3.1-8B-Instruct

Balanced quality and cost (~$5-10)


### Create Fine-Tuning Job

In [11]:
# Launch fine-tuning job
print("üöÄ Launching fine-tuning job...\n")

ft_job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    validation_file=validation_file.id,
    model=MODEL_NAME,
    suffix="buildfast-experiment",  # Custom name for your model
    hyperparameters=PRESETS[SELECTED_PRESET]["hyperparameters"],
    seed=42  # For reproducibility
)

print("‚úÖ Fine-tuning job created successfully!\n")
print(f"Job ID: {ft_job.id}")
print(f"Model: {ft_job.model}")
print(f"Status: {ft_job.status}")
print(f"\nHyperparameters:")
print(json.dumps(dict(ft_job.hyperparameters), indent=2))

üöÄ Launching fine-tuning job...

‚úÖ Fine-tuning job created successfully!

Job ID: ftjob-4b50e7e54fe84bbcb6a45a3b62b9be8a
Model: meta-llama/Llama-3.1-8B-Instruct
Status: running

Hyperparameters:
{
  "batch_size": 32,
  "learning_rate_multiplier": null,
  "n_epochs": 3,
  "learning_rate": 1e-05,
  "warmup_ratio": 0.0,
  "weight_decay": 0.0,
  "lora": true,
  "lora_r": 16,
  "lora_alpha": 16,
  "lora_dropout": 0.05,
  "packing": true,
  "max_grad_norm": 1.0,
  "context_length": 8192
}


<a id="monitoring"></a>
## 5. Monitor Training Progress üìä

Training can take 15-60 minutes depending on:
- Dataset size
- Model size
- Number of epochs
- LoRA rank

The cell below will monitor progress and update every 15 seconds.

In [12]:
def monitor_fine_tuning_job(job_id: str, check_interval: int = 15):
    """
    Monitor the fine-tuning job status.

    Args:
        job_id: Fine-tuning job ID
        check_interval: Seconds between status checks
    """
    active_statuses = ["validating_files", "queued", "running"]

    print("üìä Monitoring fine-tuning job...\n")
    print("Status updates:")
    print("-" * 50)

    start_time = time.time()

    while True:
        job = client.fine_tuning.jobs.retrieve(job_id)
        elapsed = int(time.time() - start_time)

        print(f"[{elapsed}s] Status: {job.status}", end="")

        if job.trained_tokens:
            print(f" | Tokens trained: {job.trained_tokens:,}", end="")

        if job.estimated_finish:
            print(f" | ETA: {job.estimated_finish}", end="")

        print()  # New line

        if job.status not in active_statuses:
            print("-" * 50)
            if job.status == "succeeded":
                print(f"\n‚úÖ Fine-tuning completed successfully!")
                print(f"Fine-tuned model: {job.fine_tuned_model}")
                print(f"Total training time: {elapsed}s ({elapsed/60:.1f} minutes)")
            elif job.status == "failed":
                print(f"\n‚ùå Fine-tuning failed: {job.error}")
            else:
                print(f"\n‚ö†Ô∏è Fine-tuning ended with status: {job.status}")
            break

        time.sleep(check_interval)

    return job

# Start monitoring
completed_job = monitor_fine_tuning_job(ft_job.id)

üìä Monitoring fine-tuning job...

Status updates:
--------------------------------------------------
[0s] Status: running
[16s] Status: running
[31s] Status: running
[46s] Status: running
[62s] Status: running
[77s] Status: running
[93s] Status: running
[108s] Status: running | Tokens trained: 16,204 | ETA: 1765193281
[124s] Status: running | Tokens trained: 48,612 | ETA: 1765193279
[139s] Status: running | Tokens trained: 48,612 | ETA: 1765193279
[155s] Status: running | Tokens trained: 48,612 | ETA: 1765193279
[170s] Status: running | Tokens trained: 48,612 | ETA: 1765193279
[186s] Status: succeeded | Tokens trained: 48,612 | ETA: 1765193279
--------------------------------------------------

‚úÖ Fine-tuning completed successfully!
Fine-tuned model: None
Total training time: 186s (3.1 minutes)


### View Training Metrics (if available)

In [16]:
# Retrieve final job details
final_job = client.fine_tuning.jobs.retrieve(ft_job.id)

print("üìà Training Summary:\n")
print(f"Job ID: {final_job.id}")
print(f"Base Model: {final_job.model}")
print(f"Fine-tuned Model: {final_job.fine_tuned_model}")
print(f"Status: {final_job.status}")
print(f"Total Tokens Trained: {final_job.trained_tokens:,}")
print(f"\nHyperparameters Used:")
for key, value in dict(final_job.hyperparameters).items():
    print(f"  {key}: {value}")

üìà Training Summary:

Job ID: ftjob-4b50e7e54fe84bbcb6a45a3b62b9be8a
Base Model: meta-llama/Llama-3.1-8B-Instruct
Fine-tuned Model: None
Status: succeeded
Total Tokens Trained: 48,612

Hyperparameters Used:
  batch_size: 32
  learning_rate_multiplier: None
  n_epochs: 3
  learning_rate: 1e-05
  warmup_ratio: 0.0
  weight_decay: 0.0
  lora: True
  lora_r: 16
  lora_alpha: 16
  lora_dropout: 0.05
  packing: True
  max_grad_norm: 1.0
  context_length: 8192


<a id="deployment"></a>
## 6. Deploy Custom Model üåê

Your fine-tuned LoRA adapter is automatically deployed and ready to use!

### Serverless Deployment Benefits:
- ‚ö° **No infrastructure management** - Just call the API
- üí∞ **Pay per token** - Only pay for what you use
- üîÑ **Instant scaling** - Handle variable workloads
- üöÄ **Low latency** - Optimized inference

In [27]:
# Your fine-tuned model ID
fine_tuned_model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:buildfast-experiment-IiJa"

print(f"üéØ Your fine-tuned model is deployed!\n")
print(f"Model ID: {fine_tuned_model_id}")
print(f"\nUse this model ID for inference requests.")

üéØ Your fine-tuned model is deployed!

Model ID: meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:buildfast-experiment-IiJa

Use this model ID for inference requests.


<a id="inference"></a>
## 7. Inference & Evaluation üß™

Now let's test your fine-tuned model!

### Compare Base vs Fine-Tuned Model

In [28]:
def compare_models(prompt: str, base_model: str, fine_tuned_model: str):
    """
    Compare responses from base and fine-tuned models.

    Args:
        prompt: Test prompt
        base_model: Base model ID
        fine_tuned_model: Fine-tuned model ID
    """
    print(f"üß™ Testing Prompt: '{prompt}'\n")
    print("=" * 80)

    # Base model response
    print("\nüìò BASE MODEL RESPONSE:\n")
    base_response = client.chat.completions.create(
        model="meta-llama/Meta-Llama-3.1-8B-Instruct",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        max_tokens=200,
        temperature=0.7
    )
    print(base_response.choices[0].message.content)

    print("\n" + "=" * 80)

    # Fine-tuned model response
    print("\nüéØ FINE-TUNED MODEL RESPONSE:\n")
    ft_response = client.chat.completions.create(
        model=fine_tuned_model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant with access to weather and calculation functions."},
            {"role": "user", "content": prompt}
        ],
        max_tokens=200,
        temperature=0.7
    )
    print(ft_response.choices[0].message.content)

    print("\n" + "=" * 80)

# Test prompts relevant to your fine-tuning task
test_prompts = [
    "What's the weather in Paris?",
    "Calculate 45 plus 67",
    "Is it sunny in Miami?"
]

# Compare models
for prompt in test_prompts:
    compare_models(prompt, MODEL_NAME, fine_tuned_model_id)
    print("\n" + "#" * 80 + "\n")

üß™ Testing Prompt: 'What's the weather in Paris?'


üìò BASE MODEL RESPONSE:

Unfortunately, I'm a large language model, I don't have real-time access to current weather conditions. However, I can suggest some ways for you to find out the current weather in Paris.

1. **Check online weather websites**: You can check websites like AccuWeather, Weather.com, or BBC Weather for the current weather conditions in Paris.
2. **Use a weather app**: Download a weather app like Dark Sky, Weather Underground, or The Weather Channel on your smartphone to get the current weather conditions in Paris.
3. **Ask a voice assistant**: If you have a smart speaker or virtual assistant like Siri, Google Assistant, or Alexa, you can ask them to tell you the current weather in Paris.

If you want, I can also give you a general idea of the typical weather conditions in Paris at different times of the year. Just let me know!


üéØ FINE-TUNED MODEL RESPONSE:

Let me check the current weather conditions in Par

### Quantitative Evaluation

For production, implement proper evaluation metrics:

In [29]:
def evaluate_model(model_id: str, test_set: List[Dict], verbose: bool = True):
    """
    Evaluate model on a test set.

    Args:
        model_id: Model to evaluate
        test_set: List of test examples
        verbose: Print detailed results

    Returns:
        Dict with evaluation metrics
    """
    results = {
        "total": len(test_set),
        "correct": 0,
        "errors": []
    }

    print(f"\nüß™ Evaluating {model_id}...\n")

    for i, example in enumerate(tqdm(test_set, desc="Evaluating")):
        try:
            # Get model response
            response = client.chat.completions.create(
                model=model_id,
                messages=example["messages"][:-1],  # All except the last (expected) message
                max_tokens=150,
                temperature=0.1  # Low temperature for consistent evaluation
            )

            predicted = response.choices[0].message.content
            expected = example["messages"][-1]["content"]

            # Simple exact match (you can implement more sophisticated metrics)
            if predicted.strip() == expected.strip():
                results["correct"] += 1
            else:
                results["errors"].append({
                    "example": i,
                    "expected": expected,
                    "predicted": predicted
                })

        except Exception as e:
            results["errors"].append({"example": i, "error": str(e)})

    # Calculate metrics
    results["accuracy"] = results["correct"] / results["total"] if results["total"] > 0 else 0

    # Print results
    print(f"\nüìä Evaluation Results:")
    print(f"Total Examples: {results['total']}")
    print(f"Correct: {results['correct']}")
    print(f"Accuracy: {results['accuracy']:.2%}")

    if verbose and results["errors"]:
        print(f"\nErrors: {len(results['errors'])}")
        for error in results["errors"][:3]:  # Show first 3 errors
            print(f"\nExample {error.get('example', 'N/A')}:")
            if 'expected' in error:
                print(f"Expected: {error['expected'][:100]}...")
                print(f"Predicted: {error['predicted'][:100]}...")
            else:
                print(f"Error: {error.get('error', 'Unknown')}")

    return results

# If you have a separate test set, evaluate here
# test_results = evaluate_model(fine_tuned_model_id, test_set)

### Production-Ready Inference Function

In [30]:
def generate_with_fine_tuned_model(
    prompt: str,
    model_id: str = None,
    system_prompt: str = "You are a helpful assistant.",
    max_tokens: int = 500,
    temperature: float = 0.7,
    top_p: float = 0.95
) -> str:
    """
    Production-ready inference function.

    Args:
        prompt: User input
        model_id: Fine-tuned model ID
        system_prompt: System message
        max_tokens: Maximum response length
        temperature: Sampling temperature
        top_p: Nucleus sampling parameter

    Returns:
        Model response
    """
    if model_id is None:
        model_id = fine_tuned_model_id

    try:
        response = client.chat.completions.create(
            model=model_id,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": prompt}
            ],
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p
        )

        return response.choices[0].message.content

    except Exception as e:
        return f"Error: {str(e)}"

# Test the function
test_prompt = "What's the temperature in London?"
result = generate_with_fine_tuned_model(test_prompt)

print(f"Prompt: {test_prompt}")
print(f"Response: {result}")

Prompt: What's the temperature in London?
Response: However, I'm a large language model, I don't have real-time access to current weather information. But I can suggest some options to find the current temperature in London:

1. Check online weather websites: You can visit websites like AccuWeather, Weather.com, or BBC Weather to get the current temperature in London.
2. Use a weather app: Download a weather app on your smartphone, such as Dark Sky or Weather Underground, to get the current temperature in London.
3. Check social media: Follow the official Twitter accounts of weather services, such as the Met Office (UK's national weather service), to get the current temperature in London.

If you want, I can also provide you with the average temperature in London throughout the year, or information on the best time to visit London based on its climate. Just let me know!


<a id="best-practices"></a>
## 8. Production Best Practices üéØ

### 1. Dataset Quality
- ‚úÖ Use diverse, high-quality examples
- ‚úÖ Cover edge cases and variations
- ‚úÖ Ensure consistency in formatting
- ‚úÖ Minimum 50-100 examples for good results
- ‚úÖ Include validation set for monitoring

### 2. Hyperparameter Tuning
- Start with standard preset
- Monitor validation loss
- Increase LoRA rank for complex tasks
- Use early stopping to prevent overfitting
- Experiment with learning rate for new domains

### 3. Model Selection
- **Llama 3.1-8B**: Best for instruction following
- **Qwen 2.5**: Great for multilingual tasks
- **Larger models**: Better for complex reasoning
- Balance model size vs. inference cost

### 4. Cost Optimization
- Start with smaller datasets for testing
- Use LoRA instead of full fine-tuning
- Monitor training progress and stop early if converged
- Use serverless deployment for variable workloads

### 5. Evaluation & Monitoring
- Implement proper evaluation metrics
- Compare against base model
- A/B test in production
- Monitor for model drift
- Collect user feedback

### 6. Integration Tips
```python
# Save model ID securely
FINE_TUNED_MODEL_ID = "your-model-id"

# Use environment variables
os.environ['FINE_TUNED_MODEL'] = FINE_TUNED_MODEL_ID

# Implement retry logic
from tenacity import retry, wait_exponential

@retry(wait=wait_exponential(multiplier=1, min=4, max=10))
def call_model(prompt):
    return client.chat.completions.create(...)
```

### 7. Common Pitfalls to Avoid
- ‚ùå Overfitting on small datasets
- ‚ùå Not using validation sets
- ‚ùå Inconsistent data formatting
- ‚ùå Skipping evaluation before deployment
- ‚ùå Not monitoring production performance

## üéì Next Steps

Congratulations! You've successfully fine-tuned an LLM with Nebius Token Factory. Here's what to explore next:

1. **Expand Your Dataset**: Collect more domain-specific examples
2. **Experiment with Models**: Try different base models
3. **Optimize Hyperparameters**: Fine-tune LoRA configuration
4. **Deploy to Production**: Integrate with your application
5. **Monitor & Iterate**: Track performance and retrain as needed

### Resources

- üìö [Nebius Token Factory Docs](https://docs.tokenfactory.nebius.com/)
- üç≥ [Token Factory Cookbook](https://github.com/nebius/token-factory-cookbook)
- üí¨ [Nebius Discord Community](https://discord.com/invite/WJ2DUQRz4m)
- üê¶ [Follow @nebiustf](https://x.com/nebiustf)
- üìñ [Fine-Tuning Blog Post](https://nebius.com/blog/posts/fine-tuning-llms-with-nebius-ai-studio)

### Connect With BuildFastWithAI

- üåê [Website](https://www.buildfastwithai.com/)
- üê¶ [Twitter](https://x.com/BuildFastWithAI)
- üíº [LinkedIn](https://www.linkedin.com/company/build-fast-with-ai)
- üìß [Email](mailto:satvik@buildfastwithai.com)

---

<div align="center">
  <p><strong>Built with ‚ù§Ô∏è by BuildFastWithAI</strong></p>
  <p><em>Powered by Nebius Token Factory</em></p>
  <p>‚≠ê Star the <a href="https://github.com/buildfastwithai/gen-ai-experiments">Gen-AI-Experiments</a> repo if you found this helpful!</p>
</div>