# 🇪🇹 Amharic XTTS Fine-Tuning WebUI with GitHub LFS Persistence

This notebook allows you to:
- Fine-tune XTTS v2 models for Amharic TTS
- Use Gradio WebUI for easy interaction
- **Save/Load training progress to/from GitHub LFS** (prevents data loss on Colab disconnect)

---

## 📋 Prerequisites

Before running, you'll need:
1. A GitHub account
2. A Personal Access Token (PAT) with `repo` permissions
   - Go to: https://github.com/settings/tokens
   - Click "Generate new token (classic)"
   - Select `repo` scope
   - Copy the token

---

## 🔧 Step 1: Configure GitHub Credentials

**IMPORTANT:** Keep your token secure! Don't share notebooks with tokens in them.

In [None]:
import getpass
import os

# Configuration
GITHUB_USERNAME = input("Enter your GitHub username: ")
GITHUB_TOKEN = getpass.getpass("Enter your GitHub Personal Access Token: ")
GITHUB_REPO = "Diakonrobel/Amharic_XTTS-V2_TTS"  # Change if using different repo
BRANCH = "main"

# Store credentials securely
os.environ['GITHUB_USERNAME'] = GITHUB_USERNAME
os.environ['GITHUB_TOKEN'] = GITHUB_TOKEN
os.environ['GITHUB_REPO'] = GITHUB_REPO

print("✅ GitHub credentials configured!")

## 📦 Step 2: Install Dependencies

In [None]:
%%capture
# Install Git LFS
!curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
!sudo apt-get install git-lfs -y
!git lfs install

print("✅ Git LFS installed!")

In [None]:
%%capture
# Install PyTorch with CUDA support
!pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118

print("✅ PyTorch installed!")

## 🔽 Step 3: Clone Repository & Load Existing Training Data (if any)

In [None]:
import os
from pathlib import Path

# Set up authentication in git config
GITHUB_USERNAME = os.environ['GITHUB_USERNAME']
GITHUB_TOKEN = os.environ['GITHUB_TOKEN']
GITHUB_REPO = os.environ['GITHUB_REPO']

# Clone with authentication
REPO_URL = f"https://{GITHUB_USERNAME}:{GITHUB_TOKEN}@github.com/{GITHUB_REPO}.git"

if not Path("Amharic_XTTS-V2_TTS").exists():
    print("🔽 Cloning repository...")
    !git clone $REPO_URL Amharic_XTTS-V2_TTS
    print("✅ Repository cloned!")
else:
    print("📂 Repository already exists.")

# Change to repo directory
%cd Amharic_XTTS-V2_TTS

# Pull latest including LFS files
print("\n🔄 Pulling latest changes including LFS files...")
!git lfs pull

print("\n✅ Repository ready!")
print("\n📊 Checking for existing training data...")
!ls -lh finetune_models/ 2>/dev/null || echo "No existing training data found."

In [None]:
%%capture
# Install project dependencies
!pip install -r requirements.txt

print("✅ Dependencies installed!")

## 💾 Step 4: Helper Functions for LFS Save/Load

In [None]:
import subprocess
from datetime import datetime

def save_training_to_lfs(message=None):
    """
    Save current training state to GitHub LFS
    """
    if message is None:
        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        message = f"Training checkpoint at {timestamp}"
    
    print("\n💾 Saving training data to GitHub LFS...")
    
    try:
        # Add all changes
        subprocess.run(["git", "add", "."], check=True)
        
        # Commit
        subprocess.run(["git", "commit", "-m", message], check=True)
        
        # Push to LFS
        subprocess.run(["git", "push", "origin", "main"], check=True)
        
        print("✅ Training data saved to GitHub LFS!")
        print(f"📝 Commit message: {message}")
        return True
    except subprocess.CalledProcessError as e:
        print(f"❌ Error saving: {e}")
        return False

def load_training_from_lfs():
    """
    Load latest training state from GitHub LFS
    """
    print("\n🔽 Loading training data from GitHub LFS...")
    
    try:
        # Pull latest changes
        subprocess.run(["git", "pull", "origin", "main"], check=True)
        
        # Pull LFS files
        subprocess.run(["git", "lfs", "pull"], check=True)
        
        print("✅ Training data loaded from GitHub LFS!")
        return True
    except subprocess.CalledProcessError as e:
        print(f"❌ Error loading: {e}")
        return False

def show_training_status():
    """
    Show current training files and their sizes
    """
    print("\n📊 Current Training Status:")
    print("=" * 50)
    subprocess.run(["bash", "-c", "du -sh finetune_models/* 2>/dev/null || echo 'No training data yet'"])
    print("=" * 50)

print("✅ Helper functions loaded!")

## 🎨 Step 5: Launch Gradio WebUI

This will start the fine-tuning interface with public URL.

In [None]:
# Launch the WebUI
print("🚀 Launching Amharic XTTS Fine-Tuning WebUI...")
print("\n⚠️  IMPORTANT: After training, use the 'Save to GitHub' button below!\n")

!python xtts_demo.py --share --port 7860

## 💾 Step 6: Save Training Progress to GitHub LFS

**Run this cell periodically or after completing training!**

This will save your:
- Trained models
- Checkpoints
- Datasets
- Configuration files

In [None]:
# Show current training status
show_training_status()

# Save to GitHub LFS
save_message = input("Enter commit message (or press Enter for default): ").strip()
if not save_message:
    save_message = None

success = save_training_to_lfs(save_message)

if success:
    print("\n🎉 Your training data is now safely stored on GitHub!")
    print("   You can disconnect from Colab without losing your progress.")
else:
    print("\n⚠️  Please check the error above and try again.")

## 🔄 Step 7: Resume Training (Optional)

If you disconnected and want to resume, run this cell to load your saved progress.

In [None]:
# Load latest training data
success = load_training_from_lfs()

if success:
    show_training_status()
    print("\n✅ Ready to resume training!")
    print("   Run the 'Launch Gradio WebUI' cell above to continue.")
else:
    print("\n⚠️  Could not load training data. Check the error above.")

## 📥 Step 8: Download Trained Model (Optional)

Download your trained model to your local machine.

In [None]:
from google.colab import files
import zipfile
from pathlib import Path

# Create zip of trained model
model_dir = Path("finetune_models")

if model_dir.exists():
    print("📦 Creating model archive...")
    
    with zipfile.ZipFile("amharic_xtts_model.zip", "w", zipfile.ZIP_DEFLATED) as zipf:
        for file in model_dir.rglob("*"):
            if file.is_file():
                zipf.write(file, file.relative_to(model_dir.parent))
    
    print("✅ Archive created: amharic_xtts_model.zip")
    print("⬇️  Downloading...")
    
    files.download("amharic_xtts_model.zip")
    
    print("\n✅ Download complete!")
else:
    print("❌ No trained model found. Train a model first!")

## 🧪 Step 9: Test Trained Model (Optional)

In [None]:
# Quick test of the trained model
print("🧪 Testing Amharic TTS Model...\n")

# Show available models
!ls -lh finetune_models/ready/*.pth 2>/dev/null || echo "No trained models found."

print("\nℹ️  Use the Gradio UI above to test inference with your trained model.")

---

## 📚 Quick Reference

### Workflow:
1. ✅ Configure GitHub credentials (Step 1)
2. ✅ Install dependencies (Step 2)
3. ✅ Clone repo & load existing data (Step 3)
4. ✅ Launch WebUI (Step 5)
5. ✅ Train your model via the UI
6. ✅ **SAVE** to GitHub LFS (Step 6) ← **IMPORTANT!**
7. ✅ Resume training later (Step 7) if needed

### Tips:
- 💾 Save frequently during long training sessions
- 🔄 Use Step 7 to resume after disconnection
- 📥 Download final model with Step 8
- 🌐 Share the Gradio public URL with collaborators

### Supported Languages:
English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh), Japanese (ja), Hungarian (hu), Korean (ko), **Amharic (amh)** ✨

---

## 🐛 Troubleshooting

**"Git LFS quota exceeded":**
- GitHub Free: 1GB storage, 1GB bandwidth/month
- Compress models or use Git LFS on a different service

**"Authentication failed":**
- Regenerate your Personal Access Token
- Make sure `repo` scope is selected

**"Colab disconnected":**
- Your training data is safe if you saved to GitHub!
- Reconnect and run Step 7 to resume

---

## 🎉 Credits

- Based on [xtts-finetune-webui](https://github.com/daswer123/xtts-finetune-webui)
- Amharic TTS implementation by Diakon Robel
- XTTS by [Coqui AI](https://github.com/coqui-ai/TTS)

---

**Star ⭐ the repo:** https://github.com/Diakonrobel/Amharic_XTTS-V2_TTS
