# Confidential Fine-tuning

Fine-tune an LLM on your data inside a TEE. Upload your dataset, run the cells, download your model.

## 1. Configure

Upload your training data using the file browser (left sidebar), then set the path below.

Data format: JSONL with `instruction` and `response` fields:
```json
{"instruction": "What is 2+2?", "response": "4"}
{"instruction": "Explain gravity", "response": "Gravity is..."}
```

In [None]:
DATA_PATH = "data.jsonl"  # Path to your uploaded data
MODEL_NAME = "unsloth/Llama-3.2-1B-Instruct"
OUTPUT_DIR = "output"

## 2. Load Model

In [None]:
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=MODEL_NAME,
    max_seq_length=2048,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    lora_dropout=0,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
)

## 3. Load Data

In [None]:
import json
from datasets import Dataset

with open(DATA_PATH) as f:
    data = [json.loads(line) for line in f]

formatted = [
    {"text": f"### Instruction:\n{item['instruction']}\n\n### Response:\n{item['response']}"}
    for item in data
]

dataset = Dataset.from_list(formatted)
print(f"Loaded {len(dataset)} examples")

## 4. Train

In [None]:
from transformers import TrainingArguments
from trl import SFTTrainer

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    args=TrainingArguments(
        output_dir=OUTPUT_DIR,
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        num_train_epochs=1,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        save_strategy="epoch",
    ),
    tokenizer=tokenizer,
)

trainer.train()

## 5. Save & Download

In [None]:
model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)
print(f"Model saved to {OUTPUT_DIR}/")
print("Use the file browser to download the output folder.")