# 🚀 BLOOMZ-560M LoRA Adapter Training

**Train lightweight LoRA adapters on Google Colab (Free GPU!)**

## Instructions:
1. **Enable GPU**: Runtime → Change runtime type → T4 GPU → Save
2. **Run Cell 1**: Install dependencies
3. **Run Cell 2**: Load model and prepare training
4. **Run Cell 3**: Train the adapter!
5. **Run Cell 4**: Download trained adapter

Then unzip on your computer and place in `adapters/gurukul_lite/`


In [None]:
# CELL 1: Install Dependencies & Check GPU

print("📦 Installing dependencies...")
!pip install -q transformers datasets peft accelerate bitsandbytes scipy

print("\n✅ Dependencies installed!")

# Check GPU
import torch

if torch.cuda.is_available():
    print(f"\n🎮 GPU: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
    print("\n✅ GPU is ready!")
else:
    print("\n⚠️ WARNING: No GPU detected!")
    print("   Go to: Runtime → Change runtime type → T4 GPU → Save")
    print("   Then re-run this cell.")


In [None]:
# CELL 2: Load Model & Prepare Training Data

from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    Trainer,
    TrainingArguments,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model, TaskType
from datasets import Dataset

# ============================================================================
# CONFIGURATION (You can modify these!)
# ============================================================================

CONFIG = {
    'model_name': 'bigscience/bloomz-560m',
    'output_dir': 'trained_adapter',
    'max_samples': 500,      # Number of training samples
    'num_epochs': 3,         # Training epochs
    'batch_size': 4,         # Batch size
    'learning_rate': 2e-4,   # Learning rate
    'max_length': 512,       # Max sequence length
    'lora_r': 8,            # LoRA rank (PROVEN)
    'lora_alpha': 16,       # LoRA alpha (PROVEN)
    'lora_dropout': 0.05,   # LoRA dropout (PROVEN)
}

print("⚙️ Training Configuration:")
for k, v in CONFIG.items():
    print(f"   {k}: {v}")

# ============================================================================
# SAMPLE TRAINING DATA (Replace with your own!)
# ============================================================================

print("\n📊 Creating sample multilingual training data...")

# Sample texts (you can replace this with your own data!)
sample_texts = [
    "Hello, how are you today?",
    "नमस्ते, आप कैसे हैं?",  # Hindi
    "你好，你今天怎么样？",  # Chinese
    "Bonjour, comment allez-vous?",  # French
    "Hola, ¿cómo estás?",  # Spanish
    "こんにちは、お元気ですか？",  # Japanese
    "Привет, как дела?",  # Russian
    "Olá, como você está?",  # Portuguese
    "مرحبا، كيف حالك؟",  # Arabic
    "안녕하세요, 어떻게 지내세요?",  # Korean
]

# Repeat to get desired number of samples
train_texts = (sample_texts * (CONFIG['max_samples'] // len(sample_texts) + 1))[:CONFIG['max_samples']]

# Split into train/validation
split_idx = int(len(train_texts) * 0.9)
train_data = train_texts[:split_idx]
val_data = train_texts[split_idx:]

print(f"✅ Training samples: {len(train_data)}")
print(f"   Validation samples: {len(val_data)}")

# ============================================================================
# LOAD MODEL
# ============================================================================

print("\n🤖 Loading BLOOMZ-560M model...")

tokenizer = AutoTokenizer.from_pretrained(CONFIG['model_name'])
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# 8-bit quantization for memory efficiency
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
    llm_int8_has_fp16_weight=False,
)

model = AutoModelForCausalLM.from_pretrained(
    CONFIG['model_name'],
    quantization_config=quantization_config,
    device_map='auto',
    torch_dtype=torch.float16,
)

print("✅ Model loaded successfully!")

# ============================================================================
# APPLY LORA ADAPTERS
# ============================================================================

print("\n🔧 Applying LoRA adapters with PROVEN configuration...")

lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=CONFIG['lora_r'],
    lora_alpha=CONFIG['lora_alpha'],
    lora_dropout=CONFIG['lora_dropout'],
    # BLOOM-specific target modules (PROVEN to work!)
    target_modules=['query_key_value', 'dense', 'dense_h_to_4h', 'dense_4h_to_h'],
)

model = get_peft_model(model, lora_config)

print("\n📊 Trainable Parameters:")
model.print_trainable_parameters()

# ============================================================================
# TOKENIZE DATA
# ============================================================================

print("\n📝 Tokenizing training data...")

def tokenize_texts(texts):
    tokenized = []
    for text in texts:
        tokens = tokenizer(
            text,
            truncation=True,
            max_length=CONFIG['max_length'],
            padding=False,
        )
        tokenized.append(tokens)
    
    return Dataset.from_dict({
        'input_ids': [t['input_ids'] for t in tokenized],
        'attention_mask': [t['attention_mask'] for t in tokenized]
    })

train_dataset = tokenize_texts(train_data)
val_dataset = tokenize_texts(val_data)

print(f"✅ Tokenization complete!")
print(f"   Train dataset: {len(train_dataset)} samples")
print(f"   Val dataset: {len(val_dataset)} samples")

# Data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,  # Causal LM, not masked LM
)

print("\n✅ Setup complete! Ready to train.")


In [None]:
# CELL 3: Train the Adapter!

print("="*80)
print("🎯 STARTING TRAINING")
print("="*80)

# Training arguments (PROVEN settings)
training_args = TrainingArguments(
    output_dir=CONFIG['output_dir'],
    num_train_epochs=CONFIG['num_epochs'],
    per_device_train_batch_size=CONFIG['batch_size'],
    per_device_eval_batch_size=CONFIG['batch_size'],
    gradient_accumulation_steps=4,
    learning_rate=CONFIG['learning_rate'],
    warmup_steps=50,
    logging_steps=10,
    save_steps=50,
    eval_steps=50,
    fp16=False,  # Disable FP16 (PROVEN to avoid issues)
    save_total_limit=2,
    eval_strategy='steps',
    load_best_model_at_end=True,
    report_to='none',
    dataloader_num_workers=0,  # Important for stability
)

# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    data_collator=data_collator,
)

# Train!
print("\n🚀 Training started...")
print("   This will take about 5-10 minutes.\n")

try:
    trainer.train()
    
    print("\n" + "="*80)
    print("🎉 TRAINING COMPLETE!")
    print("="*80)
    
    # Save the adapter
    print(f"\n💾 Saving adapter to {CONFIG['output_dir']}...")
    model.save_pretrained(CONFIG['output_dir'])
    tokenizer.save_pretrained(CONFIG['output_dir'])
    
    print("\n✅ SUCCESS! Adapter saved.")
    print("   Run Cell 4 to download it.")
    
except Exception as e:
    print(f"\n❌ Training failed: {e}")
    print("\nTry reducing batch_size or max_samples in Cell 2 and re-run.")


In [None]:
# CELL 4: Download Trained Adapter

import shutil
from google.colab import files

print("📦 Packaging trained adapter...")

# Zip the adapter directory
shutil.make_archive('trained_adapter', 'zip', CONFIG['output_dir'])

print("✅ Packaged successfully!")
print("\n⬇️ Downloading trained_adapter.zip...")

# Download
files.download('trained_adapter.zip')

print("\n🎉 DOWNLOAD COMPLETE!")
print("\n" + "="*80)
print("NEXT STEPS:")
print("="*80)
print("\n1. Unzip 'trained_adapter.zip' on your computer")
print("2. Place contents in: C:\\pc\\Project\\adapters\\gurukul_lite\\")
print("3. Start your API:")
print("   python -m uvicorn adapter_service.standalone_api:app --port 8110")
print("\n4. Test with adapter:")
print('   curl -X POST http://localhost:8110/generate \\')
print('     -d \'{"prompt":"Translate to Hindi: Hello", "adapter_path":"adapters/gurukul_lite"}\'')
print("\n" + "="*80)


## 📝 Notes

### To Use Your Own Data:
1. Click the 📁 folder icon on the left
2. Upload your `.txt` files (one text per line)
3. Modify Cell 2 to load your files:

```python
# Replace the sample_texts section with:
train_texts = []
import os
for filename in os.listdir('/content/'):
    if filename.endswith('.txt'):
        with open(filename, 'r', encoding='utf-8') as f:
            lines = [line.strip() for line in f if len(line.strip()) > 10]
            train_texts.extend(lines[:CONFIG['max_samples']//10])
```

### Troubleshooting:
- **Out of memory**: Reduce `batch_size` or `max_samples` in Cell 2
- **Slow training**: Normal! 500 samples takes ~5-10 minutes
- **No GPU**: Runtime → Change runtime type → GPU → Save

### What You Get:
- **Trained LoRA adapter** (only ~12MB!)
- **Ready to use** with your BLOOMZ-560M model
- **Multilingual** capabilities enhanced
