To train this model, click **Runtime** > **Run all**.

[![GitHub](https://img.shields.io/badge/GitHub-ART-blue?logo=github)](https://github.com/OpenPipe/ART)
[![Discord](https://img.shields.io/badge/Discord-Join-7289da?logo=discord&logoColor=white)](https://discord.gg/zbBHRUpwf4)
[![Docs](https://img.shields.io/badge/Docs-ART-green)](https://docs.art-e.dev/fundamentals/sft-training)

This notebook demonstrates how to fine-tune a model using **supervised fine-tuning (SFT)** with ART. We'll download a text-to-SQL dataset and train from a JSONL file using `train_sft_from_file`.

For distillation (training from a teacher model's outputs), see the [distillation notebook](https://github.com/OpenPipe/ART/blob/main/examples/sft/distillation.ipynb).

Completions and metrics will be logged to [Weights & Biases](https://wandb.ai).

### Installation

In [None]:
%%capture
!uv pip install "openpipe-art @ git+https://github.com/openpipe/art.git@main" datasets --prerelease allow --no-cache-dir

: 

### Environment Variables

Set your `WANDB_API_KEY` to use the serverless backend. Get one at [wandb.ai](https://wandb.ai/home).

In [None]:
import os

WANDB_API_KEY = ""  # required
if WANDB_API_KEY:
    os.environ["WANDB_API_KEY"] = WANDB_API_KEY

### Prepare Dataset

SFT training expects a JSONL file where each line has a `messages` array in the [OpenAI chat format](https://platform.openai.com/docs/api-reference/chat). The last message must be from the `assistant` role.

We'll use [gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql), a large synthetic text-to-SQL dataset covering 100 domains and multiple SQL complexity levels. Each example has a schema context, a natural language question, and the corresponding SQL query. We convert the first 2,000 examples into chat format with the schema as the system message.

In [None]:
import json

from datasets import load_dataset

ds = load_dataset("gretelai/synthetic_text_to_sql", split="train")

with open("train.jsonl", "w") as f:
    for row in ds.select(range(20)):
        messages = [
            {"role": "system", "content": row["sql_context"]},
            {"role": "user", "content": row["sql_prompt"]},
            {"role": "assistant", "content": row["sql"]},
        ]
        f.write(json.dumps({"messages": messages}) + "\n")

### Training

Use `train_sft_from_file` to train directly from the JSONL file. It handles batching, learning rate scheduling, and logging automatically.

In [None]:
import art
from art.serverless.backend import ServerlessBackend
from art.utils.sft import train_sft_from_file

backend = ServerlessBackend()
model = art.TrainableModel(
    name="sft-text-to-sql",
    project="sft-example",
    base_model="Qwen/Qwen3-30B-A3B-Instruct-2507",
)
await model.register(backend)

await train_sft_from_file(
    model=model,
    file_path="train.jsonl",
    epochs=3,
    batch_size=2,
    peak_lr=2e-4,
    schedule_type="cosine",
    verbose=True,
)

print("Training complete!")

### Using the Model

Try the trained model with a new text-to-SQL question.

In [None]:
client = model.openai_client()
completion = await client.chat.completions.create(
    model=model.get_inference_name(),
    messages=[
        {"role": "system", "content": "CREATE TABLE equipment_maintenance (equipment_type VARCHAR(255), maintenance_frequency INT);"},
        {"role": "user", "content": "List all the unique equipment types and their corresponding total maintenance frequency from the equipment_maintenance table."},
    ],
)
print(completion.choices[0].message.content)

---

For more details, see the [SFT Training docs](https://docs.art-e.dev/fundamentals/sft-training). For distillation, see the [distillation notebook](https://github.com/OpenPipe/ART/blob/main/examples/sft/distillation.ipynb). Questions? Join the [Discord](https://discord.gg/zbBHRUpwf4)!