# Toolset Training on Nebius JupyterHub

This notebook demonstrates how to run SFT and KTO training on Nebius AI Cloud using JupyterHub.

## Prerequisites
- Nebius JupyterHub instance with H100/H200 GPU
- Training repository uploaded to `/workspace/`
- Datasets uploaded to `/workspace/Datasets/`

## 1. Environment Setup

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Install Unsloth and dependencies
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" -q
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes -q
!pip install wandb python-dotenv -q

In [None]:
# Verify installation
import torch
import unsloth
from transformers import __version__ as transformers_version
from trl import __version__ as trl_version

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"Transformers version: {transformers_version}")
print(f"TRL version: {trl_version}")
print(f"Unsloth installed: ✓")

## 2. Configure Paths and Settings

In [None]:
import sys
import os
from pathlib import Path

# Add training modules to path
WORKSPACE = Path("/workspace/Toolset-Training")
SFT_TRAINER = WORKSPACE / "Trainers/rtx3090_sft"
KTO_TRAINER = WORKSPACE / "Trainers/rtx3090_kto"
DATASETS = WORKSPACE / "Datasets"

sys.path.insert(0, str(SFT_TRAINER))
sys.path.insert(0, str(KTO_TRAINER))

# Verify paths
print(f"SFT Trainer: {SFT_TRAINER.exists()}")
print(f"KTO Trainer: {KTO_TRAINER.exists()}")
print(f"Datasets: {DATASETS.exists()}")

# List available datasets
print("\nAvailable datasets:")
for dataset in DATASETS.glob("*.jsonl"):
    print(f"  - {dataset.name}")

In [None]:
# Optional: Configure W&B for experiment tracking
import wandb

# Set your W&B API key
os.environ["WANDB_API_KEY"] = "your-wandb-key-here"  # Replace with your key

# Or login interactively
# wandb.login()

## 3. SFT Training (Supervised Fine-Tuning)

Train the model to learn tool-calling behavior from positive examples.

In [None]:
# Import SFT training modules
os.chdir(SFT_TRAINER)

from configs.training_config import get_7b_config, ModelConfig, LoRAConfig, SFTTrainingConfig, DatasetConfig
from src.model_loader import load_model_and_tokenizer
from src.data_loader import prepare_dataset
from src.training_callbacks import MetricsTableCallback

In [None]:
# Configure SFT training
sft_config = get_7b_config()

# Update dataset path
sft_config.dataset_config.local_file = str(DATASETS / "syngen_tools_sft_11.18.25.jsonl")

# Optional: Adjust for H100 (you can increase batch size!)
sft_config.training_config.per_device_train_batch_size = 8  # Up from 6
sft_config.training_config.gradient_accumulation_steps = 3  # Effective batch = 24

# Optional: Enable W&B
# sft_config.training_config.report_to = ["wandb"]
# sft_config.training_config.run_name = "nebius-sft-7b"

print("SFT Configuration:")
print(f"  Model: {sft_config.model_config.model_name}")
print(f"  Dataset: {sft_config.dataset_config.local_file}")
print(f"  Batch size: {sft_config.training_config.per_device_train_batch_size}")
print(f"  Gradient accumulation: {sft_config.training_config.gradient_accumulation_steps}")
print(f"  Effective batch size: {sft_config.training_config.per_device_train_batch_size * sft_config.training_config.gradient_accumulation_steps}")
print(f"  Learning rate: {sft_config.training_config.learning_rate}")
print(f"  Epochs: {sft_config.training_config.num_train_epochs}")

In [None]:
# Load model and tokenizer
print("Loading model and tokenizer...")
model, tokenizer = load_model_and_tokenizer(
    sft_config.model_config,
    sft_config.lora_config
)
print("✓ Model loaded")

In [None]:
# Prepare dataset
print("Preparing dataset...")
train_dataset = prepare_dataset(sft_config.dataset_config, tokenizer)
print(f"✓ Dataset loaded: {len(train_dataset)} examples")

In [None]:
# Setup output directory
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_dir = SFT_TRAINER / f"sft_output_nebius/{timestamp}"
output_dir.mkdir(parents=True, exist_ok=True)

sft_config.training_config.output_dir = str(output_dir)
sft_config.training_config.logging_dir = str(output_dir / "logs")

print(f"Output directory: {output_dir}")

In [None]:
# Train!
from trl import SFTTrainer
from transformers import TrainingArguments

print("Starting SFT training...")

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    args=TrainingArguments(
        output_dir=sft_config.training_config.output_dir,
        per_device_train_batch_size=sft_config.training_config.per_device_train_batch_size,
        gradient_accumulation_steps=sft_config.training_config.gradient_accumulation_steps,
        learning_rate=sft_config.training_config.learning_rate,
        num_train_epochs=sft_config.training_config.num_train_epochs,
        logging_steps=sft_config.training_config.logging_steps,
        save_strategy="steps",
        save_steps=100,
        max_grad_norm=sft_config.training_config.max_grad_norm,
        warmup_steps=sft_config.training_config.warmup_steps,
        fp16=True,
        logging_dir=sft_config.training_config.logging_dir,
    ),
    dataset_text_field="text",
    max_seq_length=sft_config.training_config.max_seq_length,
)

# Train
trainer.train()

print("\n✓ Training complete!")

In [None]:
# Save final model
final_model_dir = output_dir / "final_model"
final_model_dir.mkdir(exist_ok=True)

model.save_pretrained(str(final_model_dir))
tokenizer.save_pretrained(str(final_model_dir))

print(f"✓ Model saved to: {final_model_dir}")

## 4. Test the Model

In [None]:
# Quick inference test
from unsloth import FastLanguageModel

# Enable inference mode
FastLanguageModel.for_inference(model)

# Test prompt
test_prompt = """Create a new note titled 'Meeting Notes' with content about the quarterly review."""

inputs = tokenizer(
    test_prompt,
    return_tensors="pt",
    padding=True,
    truncation=True
).to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\nModel Response:")
print("="*80)
print(response)
print("="*80)

## 5. Upload to HuggingFace (Optional)

In [None]:
# Set HuggingFace token
os.environ["HF_TOKEN"] = "hf_your_token_here"  # Replace with your token

# Or login interactively
# from huggingface_hub import login
# login()

In [None]:
# Upload using the existing upload script
# Note: Adjust paths based on where you saved the model

!python {SFT_TRAINER}/src/upload_to_hf.py \
  {final_model_dir} \
  your-username/toolset-sft-7b-nebius \
  --save-method merged_16bit \
  --create-gguf

## 6. KTO Training (Optional Refinement)

After SFT, you can optionally refine with KTO using preference learning.

In [None]:
# Switch to KTO trainer directory
os.chdir(KTO_TRAINER)
sys.path.insert(0, str(KTO_TRAINER))

from configs.training_config import get_7b_config as get_kto_7b_config
# ... (similar setup as SFT, but with KTO dataset and trainer)

print("KTO training setup would go here...")
print("See the full KTO training script in Trainers/rtx3090_kto/train_kto.py")

## 7. Monitor GPU Usage

In [None]:
# Check GPU memory usage
!nvidia-smi --query-gpu=index,name,memory.used,memory.total,utilization.gpu --format=csv

In [None]:
# View training logs
!tail -n 50 {output_dir}/logs/training_*.jsonl

## Summary

This notebook demonstrates:
1. ✓ Environment setup on Nebius JupyterHub
2. ✓ SFT training with your existing pipeline
3. ✓ Model testing and inference
4. ✓ Uploading to HuggingFace

**Next Steps:**
- Run KTO training for refinement
- Experiment with different hyperparameters
- Use W&B for experiment tracking
- Try multi-GPU training (if using multi-node setup)

**Estimated Cost (H100 Explorer Tier at $1.50/hour):**
- SFT Training (45 min): ~$1.13
- KTO Training (15 min): ~$0.38
- **Total: ~$1.50 for complete pipeline**