<a href="https://colab.research.google.com/github/Arindam200/awesome-ai-apps/blob/main/fine_tuning/Fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-tune Open-Source LLMs on <a href="https://tokenfactory.nebius.com/"><picture><source media="(prefers-color-scheme: dark)" srcset="https://mintcdn.com/nebius-723e8b65/jsgY7B_gdaTjMC6y/logo/Main-logo-TF-Dark.svg?fit=max&auto=format&n=jsgY7B_gdaTjMC6y&q=85&s=92ebc07d32d93f3918de2f7ec4a0754a"><source media="(prefers-color-scheme: light)" srcset="https://mintcdn.com/nebius-723e8b65/jsgY7B_gdaTjMC6y/logo/Main-logo-TF-Light.svg?fit=max&auto=format&n=jsgY7B_gdaTjMC6y&q=85&s=48ceb3cd949e5160c884634bbaf1af59"><img alt="Nebius Token Factory" src="https://mintcdn.com/nebius-723e8b65/jsgY7B_gdaTjMC6y/logo/Main-logo-TF-Light.svg?fit=max&auto=format&n=jsgY7B_gdaTjMC6y&q=85&s=48ceb3cd949e5160c884634bbaf1af59" width="200"></picture></a>

Learn how to fine-tune & deploy open models like Llama 3.1 directly from your dataset using [Nebius Token Factory](https://dub.sh/nebius), an all-in-one platform for working with large language models (LLMs).

Before you begin, get your API key from the [Dashboard](https://tokenfactory.nebius.com/?modals=create-api-key).

Press Runtime → Run all to start fine-tuning on a free Google Colab instance.

## Step 1: Installation & Setup

In [None]:
!pip install -qq openai datasets

Before running, store your key in Colab Variables as `NEBIUS_API_KEY` or export it as an environment variable.





In [None]:
import os, json, time
from openai import OpenAI
from datasets import load_dataset
import requests

try:
    from google.colab import userdata
    nebius_api_key = userdata.get('NEBIUS_API_KEY')
except Exception:
    nebius_api_key = os.getenv("NEBIUS_API_KEY")

assert nebius_api_key, "⚠️ Please set your NEBIUS_API_KEY via Colab or environment."

client = OpenAI(
    base_url="https://api.tokenfactory.nebius.com/v1/",
    api_key=nebius_api_key,
)


## Step 2: Prepare your dataset

Fine-tuning works best with conversational data (the OpenAI-style format with messages).
We’ll use a sample [dataset](https://huggingface.co/datasets/olathepavilion/Conversational-datasets-json)  from Hugging Face to keep things simple.

You can learn more about preparing DataSets [here](https://docs.tokenfactory.nebius.com/fine-tuning/datasets)

In [None]:
dataset = load_dataset("olathepavilion/Conversational-datasets-json", split="train")

formatted_data = [{"messages": entry["messages"]} for entry in dataset]

data_path = "training_data.jsonl"
with open(data_path, "w", encoding="utf-8") as f:
    for ex in formatted_data:
        f.write(json.dumps(ex, ensure_ascii=False) + "\n")

print(f"Saved {len(formatted_data)} samples to {data_path}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/27.0 [00:00<?, ?B/s]

Validation.jsonl: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Saved 1000 samples to training_data.jsonl


## Step 3: Upload your dataset to Token Factory

Next, we’ll upload the dataset so Nebius can access it for training.

In [None]:
with open(data_path, "rb") as f:
    upload = client.files.create(file=f, purpose="fine-tune")

training_file_id = upload.id
print("Uploaded file ID:", training_file_id)

Uploaded file ID: file-019a635b-2dca-7a23-92b3-74646ebf092a


Keep that `training_file_id` handy, it’s used in the fine-tuning request.

## Step 4: Create and start your fine-tuning job

We’ll fine-tune Llama 3.2 1B Instruct using LoRA, which is efficient and much faster than full fine-tuning.

In [None]:
job = client.fine_tuning.jobs.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    suffix="demo-run",
    training_file=training_file_id,
    hyperparameters={
        "batch_size": 16,
        "learning_rate_multiplier": 2e-4,
        "n_epochs": 1,
        "warmup_ratio": 0.03,
        "weight_decay": 0,
        "lora": True,
        "packing": True,
    },
)

print("Job created:", job.id, "| status:", job.status)

Job created: ftjob-821d852c7f8f4a28909ca5f56c5c84eb | status: running


## Step 5: Monitor job progress

When you create a fine-tune job, its initial status will usually be running.
The script below polls the status every 15 seconds to check for updates.

If it fails, Nebius will return an error message explaining what went wrong, and how to fix it. If you get a 500 error, just resubmit the job.

The training is complete when you see either Dataset processed successfully or Training completed successfully in the event logs.

In [None]:
active = {"validating_files", "queued", "running"}
while job.status in active:
    time.sleep(15)
    job = client.fine_tuning.jobs.retrieve(job.id)
    print("Current Status:", job.status)

print("Final status:", job.status)

Current Status: running
Current Status: running
Current Status: running
Current Status: running
Current Status: running
Current Status: running
Current Status: running
Current Status: running
Current Status: running
Current Status: running
Current Status: running
Current Status: running
Current Status: running
Current Status: succeeded
Final status: succeeded


Check job events:

In [None]:
events = client.fine_tuning.jobs.list_events(job.id)
for e in events.data:
    print(f"[{e.created_at}] {e.level}: {e.message}")

[1762603610] info: Job is submitted
[1762603639] info: Dataset 'training' processed successfully
[1762603789] info: Training completed successfully


This is the best way to confirm that your fine-tune finished successfully.

## Step 6: Download your checkpoints

After every epoch, Nebius saves a checkpoint, a snapshot of the model at that stage. You’ll get all of them. For the final model, just grab the last checkpoint.

The code below creates a folder for each checkpoint and saves all the files there.

In [None]:
if job.status == "succeeded":
    # Check the job events
    events = client.fine_tuning.jobs.list_events(job.id)
    print(events)

    for checkpoint in client.fine_tuning.jobs.checkpoints.list(job.id).data:
        print("Checkpoint ID:", checkpoint.id)

        # Create a directory for every checkpoint
        os.makedirs(checkpoint.id, exist_ok=True)

        for model_file_id in checkpoint.result_files:
            # Get the name of a model file
            filename = client.files.retrieve(model_file_id).filename

            # Retrieve the contents of the file
            file_content = client.files.content(model_file_id)

            # Save the contents into a local file
            file_content.write_to_file(filename)

SyncCursorPage[FineTuningJobEvent](data=[FineTuningJobEvent(id='3c4a9b92-37a0-4553-820d-90c089ce3066', created_at=1762603610, level='info', message='Job is submitted', object='fine_tuning.job.event', data=None, type=None, source='api', job_uuid='ftjob-821d852c7f8f4a28909ca5f56c5c84eb'), FineTuningJobEvent(id='fa1d272e-c27a-4856-976b-abc399145d65', created_at=1762603639, level='info', message="Dataset 'training' processed successfully", object='fine_tuning.job.event', data=None, type=None, source='datasets', job_uuid='ftjob-821d852c7f8f4a28909ca5f56c5c84eb'), FineTuningJobEvent(id='c4ff1e70-d021-4955-bf17-7b122cf140f9', created_at=1762603789, level='info', message='Training completed successfully', object='fine_tuning.job.event', data=None, type=None, source='training', job_uuid='ftjob-821d852c7f8f4a28909ca5f56c5c84eb')], has_more=False)
Checkpoint ID: ftckpt_17e89a2f-dc43-4341-aae3-73cd235e9542


## Setp 7: Deploy Your LoRA Adapter

Now that your fine-tune is complete, you can deploy the **LoRA adapter** directly on **Nebius Token Factory** for inference.  
This lets you use your fine-tuned model as a hosted endpoint, ready for API calls, experiments, or integration into your own applications.

In [None]:
import requests, time

api_url = "https://api.tokenfactory.nebius.com"
base_model = "meta-llama/Meta-Llama-3.1-8B-Instruct"

# Create a LoRA model from a fine-tuning job and checkpoint
def create_lora_from_job(name, ft_job, ft_checkpoint, base_model):
    print(f"Creating LoRA model from job {ft_job} and checkpoint {ft_checkpoint}...")
    fine_tuning_result = ft_job + ":" + ft_checkpoint
    lora_creation_request = {
        "source": fine_tuning_result,
        "base_model": base_model,
        "name": name,
        "description": "Example LoRA model deployment"
    }
    response = requests.post(
        f"{api_url}/v0/models",
        json=lora_creation_request,
        headers={
            "Content-Type": "application/json",
            "Authorization": f"Bearer {nebius_api_key}"
        }
    )
    print(f"LoRA model creation request sent. Response: {response.json()}")
    return response.json()

# Wait for validation of the deployed model
def wait_for_validation(name, delay=5):
    print(f"Waiting for validation of LoRA model '{name}'...")
    while True:
        time.sleep(delay)
        lora_info = requests.get(
            f"{api_url}/v0/models/{name}",
            headers={
                "Content-Type": "application/json",
                "Authorization": f"Bearer {nebius_api_key}"
            }
        ).json()
        current_status = lora_info.get("status", "unknown")
        print(f"Current status for '{name}': {current_status}")
        if current_status in {"active", "error"}:
            return lora_info

# Send a test completion request
def get_completion(model):
    print(f"Requesting completion from model '{model}'...")
    completion = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "Hello"}],
    )
    print(f"Completion received for model '{model}'.")
    return completion.choices[0].message.content

# Deploy a LoRA adapter model using the fine-tuning job and checkpoint IDs
lora_name = create_lora_from_job("demo-arindam", job.id, checkpoint.id, base_model).get("name")
print(f"Generated LoRA model name: {lora_name}")

# Check model validation status
lora_info = wait_for_validation(lora_name)

# If validation passes, test inference
if lora_info.get("status") == "active":
    print(f"LoRA model '{lora_name}' is active. Getting a sample completion...")
    print(get_completion(lora_name))
elif lora_info.get("status") == "error":
    print(f"An error occurred during validation: {lora_info['status_reason']}")

Creating LoRA model from job ftjob-821d852c7f8f4a28909ca5f56c5c84eb and checkpoint ftckpt_17e89a2f-dc43-4341-aae3-73cd235e9542...
LoRA model creation request sent. Response: {'name': 'meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:demo-arindam-hnuw', 'base_model': 'meta-llama/Meta-Llama-3.1-8B-Instruct', 'source': 'ftjob-821d852c7f8f4a28909ca5f56c5c84eb:ftckpt_17e89a2f-dc43-4341-aae3-73cd235e9542', 'description': 'description', 'created_at': 1764528337, 'status': 'validating'}
Generated LoRA model name: meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:demo-arindam-hnuw
Waiting for validation of LoRA model 'meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:demo-arindam-hnuw'...
Current status for 'meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:demo-arindam-hnuw': validating
Current status for 'meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:demo-arindam-hnuw': active
LoRA model 'meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:demo-arindam-hnuw' is active. Getting a sample completion...
Requesting completion from mode

Once the model status becomes active, you can send chat completions just like any OpenAI-compatible model.

And that’s it!

You’ve just fine-tuned, deployed, and run inference with your own LoRA model, all using Nebius Token Factory.

If you want to go further, here are a few next steps worth exploring:

- [Track Fine-Tuning Jobs](https://tokenfactory.nebius.com/fine-tuning): Monitor progress, view logs, and check model checkpoints  
- [Deploy Your Custom Model](https://docs.tokenfactory.nebius.com/fine-tuning/deploy-custom-model): Set up inference endpoints and integrate your fine-tuned model into applications  
- [Fine-Tuning Docs](https://docs.tokenfactory.nebius.com/fine-tuning/overview): Learn about hyperparameters, LoRA configurations, and advanced options  
- [Nebius Token Factory Dashboard](https://tokenfactory.nebius.com/): Manage models, datasets, and deployments visually  

**Start tracking and deploying your fine-tuned models today at [Nebius Token Factory](https://tokenfactory.nebius.com/).**
