# 🌐 English-Assamese Translation Model Training (Fixed & Optimized)

This notebook fine-tunes Meta's NLLB model for English-Assamese translation with **FIXED** issues:
- ✅ **Fixed tokenizer deprecation warning**
- ✅ **Optimized training parameters for small datasets**
- ✅ **Multiple dataset fallback options**
- ✅ **Better error handling and GPU memory management**

## Setup Instructions:
1. **Runtime → Change runtime type → GPU (T4 recommended)**
2. **Run all cells in order**
3. **Monitor training progress**
4. **Download the trained model**

---

## 1. Environment Setup & GPU Check

In [None]:
# Install required packages with specific versions for stability
!pip install -q torch>=2.0.0 transformers>=4.30.0 datasets>=2.12.0
!pip install -q accelerate>=0.20.0 sentencepiece>=0.1.99 protobuf>=3.20.0
!pip install -q pandas>=2.0.0 numpy>=1.24.0 tqdm>=4.65.0

# Check GPU availability and memory
import torch
import gc
print(f"🔥 CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"🎮 GPU: {torch.cuda.get_device_name(0)}")
    print(f"💾 Total Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    print(f"💾 Available Memory: {torch.cuda.memory_reserved(0) / 1e9:.1f} GB")
    # Clear any existing GPU memory
    torch.cuda.empty_cache()
    gc.collect()
    print("🧹 GPU memory cleared")
else:
    print("⚠️  No GPU available - training will be slow on CPU")

## 2. Clone Repository and Setup Project Structure

In [None]:
# Clone the repository (replace with your actual repo URL)
import os
repo_url = "https://github.com/your-username/Machine-Translation-.git"  # UPDATE THIS

if not os.path.exists("Machine-Translation-"):
    print(f"📥 Cloning repository from {repo_url}")
    !git clone $repo_url
else:
    print("📁 Repository already exists")

%cd Machine-Translation-

# Verify project structure
print("\n📂 Project structure:")
!ls -la

# Create necessary directories
!mkdir -p data/processed models results
print("\n✅ Project setup completed")