# LendSafe: Fine-tune Granite Model on Google Colab

This notebook fine-tunes IBM Granite 3.1 3B for loan explanation generation.

**‚ö° OPTIMIZED FOR T4 GPU (15GB)**

**Setup:**
1. Runtime ‚Üí Change runtime type ‚Üí GPU (T4)
2. Run all cells in order
3. Training takes 20-40 minutes
4. Download the fine-tuned model at the end

**Memory Optimizations:**
- Batch size: 1 (effective: 8 with gradient accumulation)
- Sequence length: 256 tokens
- LoRA rank: 8 (smaller adapters)
- Gradient checkpointing enabled
- FP16 mixed precision training

**If you still get OOM errors:**
- Reduce MAX_LENGTH to 128 in Cell 6
- Or use A100 GPU (Colab Pro: $9.99/mo)

## 1. Install Dependencies

In [None]:
!pip install -q torch transformers accelerate peft datasets

## 2. Upload Training Data

Upload your `training_examples.jsonl` file from the LendSafe project.

In [None]:
from google.colab import files
import os

print("üì§ Upload your training_examples.jsonl file")
uploaded = files.upload()

# Verify upload
if 'training_examples.jsonl' in uploaded:
    print("‚úÖ Training data uploaded successfully!")
    print(f"   File size: {len(uploaded['training_examples.jsonl']) / 1024:.1f} KB")
else:
    print("‚ùå Please upload training_examples.jsonl")

## 3. Load Model and Configure LoRA

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer, DataCollatorForLanguageModeling
from peft import LoraConfig, get_peft_model
from datasets import load_dataset

print("üîß Configuration - Optimized for T4 GPU (15GB)")
MODEL_ID = "ibm-granite/granite-3.1-3b-a800m-instruct"
MAX_LENGTH = 256  # Reduced to save memory
BATCH_SIZE = 1    # Small batch for T4 GPU
GRADIENT_ACCUMULATION = 8  # Effective batch size = 8
LEARNING_RATE = 2e-4
NUM_EPOCHS = 3

# Check GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"‚úÖ Using device: {device}")
if device == "cuda":
    print(f"   GPU: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    
# Clear GPU cache
if device == "cuda":
    torch.cuda.empty_cache()

In [None]:
# Load model and tokenizer with memory optimization
print("üì• Loading IBM Granite 3B model...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load with 8-bit quantization to save memory
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,  # FP16 for memory efficiency
    device_map="auto",           # Automatic device placement
    trust_remote_code=True,
    low_cpu_mem_usage=True       # Reduce CPU memory during loading
)

print(f"‚úÖ Model loaded: {model.num_parameters():,} parameters")

# Clear cache again
if torch.cuda.is_available():
    torch.cuda.empty_cache()

In [None]:
# Configure LoRA - smaller rank to save memory
print("üîß Configuring LoRA...")
lora_config = LoraConfig(
    r=8,              # Reduced from 16 to save memory
    lora_alpha=16,    # Reduced proportionally
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],  # Only attention layers
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)

trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total = sum(p.numel() for p in model.parameters())
print(f"‚úÖ LoRA configured:")
print(f"   Trainable params: {trainable:,} ({100*trainable/total:.2f}%)")
print(f"   Total params: {total:,}")

# Enable gradient checkpointing to save memory
model.gradient_checkpointing_enable()
print(f"‚úÖ Gradient checkpointing enabled (saves ~30% memory)")

## 4. Prepare Dataset

In [None]:
# Load dataset
print("üìä Loading training data...")
dataset = load_dataset('json', data_files='training_examples.jsonl', split='train')
print(f"‚úÖ Loaded {len(dataset)} examples")

# Format prompts
def format_prompt(example):
    prompt = f"""### Instruction:
{example['instruction']}

### Input:
{example['input']}

### Response:
{example['output']}"""
    return {"text": prompt}

dataset = dataset.map(format_prompt, remove_columns=dataset.column_names)

# Tokenize
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=MAX_LENGTH,
        padding="max_length"
    )

tokenized_dataset = dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=["text"]
)

# Split
split_dataset = tokenized_dataset.train_test_split(test_size=0.1, seed=42)
print(f"‚úÖ Train: {len(split_dataset['train'])}, Val: {len(split_dataset['test'])}")

## 5. Train Model

In [None]:
# Training arguments - Optimized for T4 GPU
training_args = TrainingArguments(
    output_dir="./granite-finetuned",
    num_train_epochs=NUM_EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=GRADIENT_ACCUMULATION,
    learning_rate=LEARNING_RATE,
    fp16=True,                              # Use FP16 to save memory
    logging_steps=20,
    eval_strategy="steps",
    eval_steps=100,
    save_strategy="steps",
    save_steps=100,
    save_total_limit=2,
    warmup_steps=50,
    load_best_model_at_end=True,
    report_to="none",
    gradient_checkpointing=True,            # Save memory during backward pass
    optim="adamw_torch",                    # Standard optimizer
    max_grad_norm=1.0,                      # Gradient clipping
)

# Data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=split_dataset["train"],
    eval_dataset=split_dataset["test"],
    data_collator=data_collator,
)

print("üöÄ Starting training...")
print("‚è∞ Expected time: 20-40 minutes on T4 GPU")
print("üí° Memory optimizations:")
print("   - Batch size: 1 (effective: 8 with gradient accumulation)")
print("   - Sequence length: 256 tokens")
print("   - LoRA rank: 8 (smaller adapters)")
print("   - Gradient checkpointing: Enabled")
print("   - FP16 training: Enabled")

In [None]:
# Train!
trainer.train()

## 6. Test the Fine-tuned Model

In [None]:
# Test generation
test_prompt = """### Instruction:
Explain why this loan application was approved.

### Input:
Credit Score: 720
Debt-to-Income Ratio: 28%
Loan Amount: $25,000
Annual Income: $85,000
Employment Length: 5 years
Delinquencies (2 yrs): 0
Credit Inquiries (6 mo): 1

### Response:
"""

inputs = tokenizer(test_prompt, return_tensors="pt").to(model.device)

print("üß™ Testing fine-tuned model...")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        temperature=0.7,
        do_sample=True,
        top_p=0.9,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\n" + "="*60)
print("GENERATED EXPLANATION:")
print("="*60)
print(response)
print("="*60)

## 7. Save and Download Model

In [None]:
# Save model
print("üíæ Saving fine-tuned model...")
trainer.save_model("./granite-finetuned-final")
tokenizer.save_pretrained("./granite-finetuned-final")
print("‚úÖ Model saved!")

# Create zip for download
!zip -r granite-finetuned-final.zip granite-finetuned-final/
print("\nüì¶ Model packaged for download")

In [None]:
# Download the model
from google.colab import files

print("‚¨áÔ∏è Downloading fine-tuned model...")
files.download('granite-finetuned-final.zip')
print("\n‚úÖ Download started!")
print("\nTo use locally:")
print("1. Extract granite-finetuned-final.zip")
print("2. Move to LendSafe/models/granite-finetuned/")
print("3. Run evaluation script")

## üéâ Done!

Your Granite model is now fine-tuned for loan explanations!

**Next steps:**
1. Download the model zip file
2. Extract and place in your local LendSafe project
3. Run `python scripts/evaluate_model.py` to get metrics

**Training Summary:**
- Model: IBM Granite 4.0 H 350M
- Method: LoRA (0.1% parameters trained)
- Data: 1,500 loan explanation examples
- Training time: ~15-30 minutes on T4 GPU
- Cost: $0 (free Colab)