# Fine-Tuning SahabatAI per Nuzantara Domain

**Obiettivo**: Adattare SahabatAI Gemma2-9B al dominio business advisory indonesiano (KITAS, PT PMA, tax)

**GPU Required**: A100 40GB (disponibile su Colab Pro)

**Tempo stimato**: 2-4 ore

---

## Setup Instructions

1. **Runtime ‚Üí Change runtime type ‚Üí A100 GPU**
2. **Run cells in order** (Shift+Enter)
3. **Upload your dataset** quando richiesto
4. **Download fine-tuned model** alla fine

## Step 1: Environment Setup

Install dependencies (takes ~5 minutes)

In [None]:
%%capture
# Install Unsloth for fast fine-tuning (2x faster, 63% less VRAM)
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps trl peft accelerate bitsandbytes

In [None]:
# Verify GPU
!nvidia-smi

## Step 2: Load SahabatAI Base Model

Download SahabatAI Gemma2-9B (takes ~10 minutes first time)

In [None]:
from unsloth import FastLanguageModel
import torch

# Model configuration
model_name = "GoToCompany/gemma2-9b-cpt-sahabatai-v1-instruct"
max_seq_length = 2048  # Max context length
dtype = None  # Auto-detect
load_in_4bit = True  # QLoRA for memory efficiency

print("üáÆüá© Loading SahabatAI base model...")
print(f"   Model: {model_name}")
print(f"   Max sequence: {max_seq_length}")
print(f"   4-bit quantization: {load_in_4bit}")
print()

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

print("‚úÖ Model loaded successfully!")

## Step 3: Configure LoRA Adapters

Only train 0.16% of parameters (14.5M instead of 9B)

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank (16 = good balance)
    lora_alpha=32,  # LoRA scaling
    lora_dropout=0.05,  # Dropout for regularization
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    bias="none",
    use_gradient_checkpointing="unsloth",  # Memory optimization
    random_state=42,
)

# Count parameters
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total = sum(p.numel() for p in model.parameters())

print("‚úÖ LoRA adapters configured")
print(f"   Trainable parameters: {trainable:,}")
print(f"   Total parameters: {total:,}")
print(f"   Trainable %: {100 * trainable / total:.3f}%")

## Step 4: Upload & Prepare Dataset

**Dataset Format** (JSON):
```json
[
  {
    "instruction": "System prompt: Kamu adalah ZANTARA...",
    "input": "Saya mau buka PT PMA di Bali, prosesnya gimana?",
    "output": "Wah bagus nih! Untuk PT PMA di Bali..."
  },
  ...
]
```

In [None]:
# Upload your dataset file
from google.colab import files
import json

print("üì§ Upload your training dataset (JSON file)")
print("   Expected format: [{'instruction': ..., 'input': ..., 'output': ...}, ...]")
print()

uploaded = files.upload()
dataset_filename = list(uploaded.keys())[0]

print(f"\n‚úÖ Dataset uploaded: {dataset_filename}")

In [None]:
# Load and format dataset
from datasets import Dataset

with open(dataset_filename, 'r', encoding='utf-8') as f:
    raw_data = json.load(f)

print(f"üìä Dataset loaded: {len(raw_data)} examples")
print()

# Format for Gemma2 chat template
def format_example(example):
    """Format single example for Gemma2"""
    prompt = f"""<start_of_turn>user
{example['instruction']}

{example['input']}<end_of_turn>
<start_of_turn>model
{example['output']}<end_of_turn>"""
    return {"text": prompt}

# Format all examples
formatted_data = [format_example(ex) for ex in raw_data]
dataset = Dataset.from_list(formatted_data)

print("‚úÖ Dataset formatted for training")
print(f"   Total examples: {len(dataset)}")
print()

# Show first example
print("üìù First training example:")
print("="*80)
print(dataset[0]['text'][:500] + "...")
print("="*80)

## Step 5: Configure Training Parameters

In [None]:
from transformers import TrainingArguments
from trl import SFTTrainer

# Training configuration
training_args = TrainingArguments(
    output_dir="./nuzantara-sahabatai-lora",
    
    # Training hyperparameters
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    
    # Precision
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    
    # Optimizer
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="cosine",
    warmup_steps=10,
    
    # Logging & checkpoints
    logging_steps=10,
    save_steps=100,
    save_total_limit=3,
    
    # Other
    seed=42,
    report_to="none",
)

print("‚úÖ Training configuration set")
print(f"   Epochs: {training_args.num_train_epochs}")
print(f"   Batch size: {training_args.per_device_train_batch_size}")
print(f"   Learning rate: {training_args.learning_rate}")
print(f"   Gradient accumulation: {training_args.gradient_accumulation_steps}")
print(f"   Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")

## Step 6: Initialize Trainer

In [None]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    args=training_args,
)

print("‚úÖ Trainer initialized")
print()
print("üìä Training stats:")
print(f"   Total examples: {len(dataset)}")
print(f"   Total steps: {len(dataset) // (training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps) * training_args.num_train_epochs}")
print(f"   Estimated time: 2-4 hours on A100")

## Step 7: Start Training! üöÄ

This will take 2-4 hours. Monitor the loss - it should decrease steadily.

In [None]:
print("üöÄ Starting training...")
print("="*80)
print()

# Train!
trainer.train()

print()
print("="*80)
print("‚úÖ Training complete!")

## Step 8: Save Fine-Tuned Model

In [None]:
# Save LoRA adapters
output_dir = "./nuzantara-sahabatai-lora-final"

print(f"üíæ Saving fine-tuned model to: {output_dir}")

model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print("‚úÖ Model saved!")

## Step 9: Test Fine-Tuned Model

Try some queries to see how it performs!

In [None]:
# Enable inference mode
FastLanguageModel.for_inference(model)

def test_query(query: str):
    """Test model with a query"""
    
    system_prompt = """Kamu adalah ZANTARA, asisten bisnis yang membantu orang asing dengan urusan bisnis, visa, dan legal di Indonesia.

PENTING:
- Gunakan bahasa Indonesia yang natural dan conversational
- Boleh pakai slang umum kalau konteksnya casual
- Fokus: KITAS, PT PMA, tax, visa, business setup"""

    prompt = f"""<start_of_turn>user
{system_prompt}

{query}<end_of_turn>
<start_of_turn>model
"""

    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=500,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Extract only the model's response
    response = response.split("<start_of_turn>model")[-1].strip()
    
    return response

print("‚úÖ Test function ready")
print("\nTry it with: test_query('Your Indonesian question here')")

In [None]:
# Test queries
test_queries = [
    "Saya mau buka usaha kopi di Bali. KBLI apa yang cocok?",
    "Berapa lama proses KITAS investor?",
    "Gimana cara bikin PT PMA? Ribet ga?",
]

for i, query in enumerate(test_queries, 1):
    print(f"\n{'='*80}")
    print(f"Test {i}/{len(test_queries)}")
    print(f"{'='*80}")
    print(f"Query: {query}")
    print()
    
    response = test_query(query)
    print(f"Response:\n{response}")

## Step 10: Download Fine-Tuned Model

Download to your local machine for deployment

In [None]:
# Zip the model directory
!zip -r nuzantara-sahabatai-lora-final.zip ./nuzantara-sahabatai-lora-final

print("üì¶ Model zipped")
print("\n‚¨áÔ∏è Downloading...")

# Download
files.download('nuzantara-sahabatai-lora-final.zip')

print("\n‚úÖ Download complete!")
print("\nNext steps:")
print("1. Extract zip file")
print("2. Upload to your server")
print("3. Load with: FastLanguageModel.from_pretrained('./nuzantara-sahabatai-lora-final')")
print("4. A/B test vs base SahabatAI")
print("5. If better ‚Üí deploy to production!")

---

## Summary

**What you did**:
- ‚úÖ Loaded SahabatAI Gemma2-9B (natural Indonesian base)
- ‚úÖ Configured LoRA adapters (parameter-efficient)
- ‚úÖ Trained on your Nuzantara dataset
- ‚úÖ Saved fine-tuned model
- ‚úÖ Tested results
- ‚úÖ Downloaded for deployment

**Result**:
SahabatAI base (natural Indonesian) + Your domain expertise (KITAS, PT PMA, tax) = Best AI for Indonesian business advisory! üèÜ

**Next**:
1. Test with Indonesian team
2. Compare vs base SahabatAI
3. If better ‚Üí deploy to production
4. Collect feedback ‚Üí iterate

---

## Troubleshooting

**Out of memory**:
- Reduce `per_device_train_batch_size` to 2
- Increase `gradient_accumulation_steps` to 8

**Training too slow**:
- Make sure you selected A100 GPU (not T4)
- Runtime ‚Üí Change runtime type ‚Üí A100

**Model not improving**:
- Check dataset quality (naturalness, accuracy)
- Increase epochs to 5
- Adjust learning rate (try 1e-4)

**Questions?**:
- Check guide: `docs/guides/FINE_TUNING_SAHABATAI_NUZANTARA.md`
- Review training script: `apps/backend-rag/backend/llm/train_nuzantara_sahabatai.py`