# NurseSim-RL: Training a Triage Agent with Unsloth (Llama 3.2 Edition)

**OpenEnv Challenge Entry - 2026**

If you are seeing `RuntimeError: Unsloth: No config file found`, it usually means the Hugging Face token isn't being detected or the repository name has a slight mismatch.

## Setup
- Google Colab (Paid tier A100/L4 recommended)
- **PASTE YOUR TOKEN BELOW** in the code cell when prompted.

## 1. Install Dependencies

In [None]:
%%capture
# Install/Upgrade Unsloth (2x faster fine-tuning)
!pip install --upgrade "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps trl peft accelerate bitsandbytes xformers

## 2. Load Llama-3.2-3B with Unsloth

In [None]:
from unsloth import FastLanguageModel
import torch
import os

# 1. PASTE YOUR HF TOKEN HERE
HF_TOKEN = "YOUR_HF_TOKEN_HERE"

# Configuration
max_seq_length = 2048
dtype = None  # None for auto detection
load_in_4bit = True

# Try different model names if one fails
# Option A: unsloth/Llama-3.2-3B-Instruct (Recommended)
# Option B: unsloth/Llama-3.2-3B-Instruct-bnb-4bit
# Option C: unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-3B-Instruct",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    token=HF_TOKEN, # Explicitly pass the token to fix 'No config file' error
)

print(f"Model loaded: {model.config._name_or_path}")

## 3. Add LoRA Adapters

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16, 
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=42,
)

print("LoRA adapters added!")
model.print_trainable_parameters()

## 4. Prepare Training Dataset

Upload your `train.jsonl` from the local machine to the Colab env before running this cell.

In [None]:
from datasets import load_dataset
import os

# Check for train.jsonl
if not os.path.exists("train.jsonl"):
    print("WARNING: train.jsonl not found. Please upload it to the 'Files' sidebar.")
else:
    dataset = load_dataset("json", data_files="train.jsonl", split="train")

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
{output}"""

EOS_TOKEN = tokenizer.eos_token

def format_prompts(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input_text, output in zip(instructions, inputs, outputs):
        text = alpaca_prompt.format(instruction=instruction, input=input_text, output=output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

dataset = dataset.map(format_prompts, batched = True,)
print(f"Dataset ready with {len(dataset)} examples")

## 5. Training Configuration

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False, 
    args=TrainingArguments(
        per_device_train_batch_size=8, # Optimized for A100/L4
        gradient_accumulation_steps=4,
        warmup_steps=10,
        max_steps=100, 
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=42,
        output_dir="outputs",
    ),
)

## 6. Train!

In [None]:
trainer_stats = trainer.train()
print(f"Training time: {trainer_stats.metrics['train_runtime']:.2f} seconds")

## 7. Save & Test

This saves the LoRA adapters.

In [None]:
model.save_pretrained("nursesim_lora_llama3")
tokenizer.save_pretrained("nursesim_lora_llama3")
print("Model saved to 'nursesim_lora_llama3'")