# 🎤 Yiddish TTS Training & Generation on Google Colab

This notebook provides a complete setup for training and using Yiddish Text-to-Speech models on Google Colab with Python 3.12 compatibility.

## Features
- ✅ Python 3.12 compatible
- ✅ GPU acceleration support
- ✅ Automatic dependency installation
- ✅ Data upload/download utilities
- ✅ Multiple training options (Tacotron2, XTTS)
- ✅ Immediate speech generation

**Note**: Enable GPU in Runtime > Change runtime type > Hardware accelerator > GPU (T4 or better)

## 1. Environment Setup & Dependencies

Install all required packages with Python 3.12 compatibility

In [None]:
# Check Python version and GPU availability
import sys
import subprocess
print(f"Python version: {sys.version}")
print(f"Python version info: {sys.version_info}")

# Check for GPU
try:
    import torch
    print(f"\nPyTorch version: {torch.__version__}")
    print(f"CUDA available: {torch.cuda.is_available()}")
    if torch.cuda.is_available():
        print(f"GPU: {torch.cuda.get_device_name(0)}")
except ImportError:
    print("PyTorch not yet installed")

### ⚠️ Important: Installation Instructions

If you encounter any errors during installation:
1. Run each installation cell one by one (not all at once)
2. If TTS fails to install, use the alternative installation cell below
3. You may need to restart the runtime after installation (Runtime → Restart runtime)
4. Make sure GPU is enabled (Runtime → Change runtime type → GPU)

In [None]:
# Cell 1A: Core System Dependencies (Run First)
print("=" * 50)
print("Installing Core System Dependencies")
print("=" * 50)

!apt-get update -qq
!apt-get install -y -qq libsndfile1 ffmpeg espeak-ng build-essential
!pip install --upgrade pip setuptools wheel

print("✅ System dependencies installed!")

In [None]:
# Cell 1B: PyTorch and Core Libraries (Run Second)
print("=" * 50)
print("Installing PyTorch and Core Libraries")
print("=" * 50)

# Install PyTorch
!pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install numpy first (specific version for compatibility)
!pip install numpy==1.24.3

# Install audio libraries
!pip install librosa>=0.10.0 soundfile>=0.12.0 scipy>=1.11.0 pandas>=2.0.0

print("✅ PyTorch and core libraries installed!")

In [None]:
# Cell 1C: TTS Installation (Run Third)
print("=" * 50)
print("Installing Coqui TTS Library")
print("=" * 50)

# First attempt: standard installation
install_success = False

try:
    !pip install TTS==0.22.0
    import TTS
    from TTS.api import TTS as tts_api
    print("✅ TTS installed successfully via pip!")
    install_success = True
except:
    print("⚠️ Standard installation failed, trying alternative...")
    
if not install_success:
    try:
        !pip uninstall -y TTS
        !pip install git+https://github.com/coqui-ai/TTS.git
        import TTS
        from TTS.api import TTS as tts_api
        print("✅ TTS installed successfully from GitHub!")
        install_success = True
    except:
        print("⚠️ GitHub installation failed, trying fallback...")
        
if not install_success:
    print("Trying minimal installation with manual dependencies...")
    !pip uninstall -y TTS
    !pip install --no-deps TTS==0.22.0
    !pip install gruut inflect unidecode pypinyin mecab-python3 jamo g2pkk
    
print("\n" + "=" * 50)
if install_success:
    print("✅ TTS Library Installation Complete!")
else:
    print("⚠️ TTS installation may need manual intervention")
    print("Try: Runtime → Restart runtime, then run this cell again")
print("=" * 50)

In [None]:
# Cell 1D: Remaining Dependencies (Run Fourth)
print("=" * 50)
print("Installing Remaining Dependencies")
print("=" * 50)

!pip install matplotlib>=3.7.0 scikit-learn>=1.3.0
!pip install PyYAML>=6.0 tqdm>=4.64.0 tensorboard psutil
!pip install -q openai-whisper

print("✅ All dependencies installed!")

In [None]:
# Install system dependencies
print("Installing system dependencies...")
!apt-get update -qq
!apt-get install -y -qq libsndfile1 ffmpeg espeak-ng

# Upgrade pip first
print("\nUpgrading pip...")
!pip install --upgrade pip setuptools wheel

# Install PyTorch with CUDA support
print("\nInstalling PyTorch with CUDA...")
!pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install audio processing libraries first
print("\nInstalling audio processing libraries...")
!pip install numpy==1.24.3  # Specific version for compatibility
!pip install librosa>=0.10.0
!pip install soundfile>=0.12.0
!pip install pandas>=2.0.0
!pip install scipy>=1.11.0
!pip install matplotlib>=3.7.0
!pip install scikit-learn>=1.3.0

# Install TTS library - try multiple approaches
print("\nInstalling TTS library (this may take a few minutes)...")
!pip install TTS==0.22.0

# If TTS fails, try alternative installation
import subprocess
import sys
try:
    import TTS
    print("✅ TTS installed successfully!")
except ImportError:
    print("⚠️ TTS not found, trying alternative installation...")
    !pip install git+https://github.com/coqui-ai/TTS.git@v0.22.0

# Install additional dependencies
print("\nInstalling additional dependencies...")
!pip install PyYAML>=6.0
!pip install tqdm>=4.64.0
!pip install tensorboard
!pip install psutil

# Install Whisper (optional)
print("\nInstalling Whisper (optional)...")
!pip install -q openai-whisper

print("\n✅ Installation complete!")

In [None]:
# Verify installations with better error handling
import sys
print(f"Python version: {sys.version}")

successful_imports = []
failed_imports = []

# Test core packages
packages_to_test = [
    ('torch', 'PyTorch'),
    ('torchaudio', 'TorchAudio'),
    ('librosa', 'Librosa'),
    ('soundfile', 'SoundFile'),
    ('numpy', 'NumPy'),
    ('pandas', 'Pandas'),
    ('TTS', 'Coqui TTS')
]

for module_name, display_name in packages_to_test:
    try:
        module = __import__(module_name)
        version = getattr(module, '__version__', 'unknown')
        successful_imports.append(f"✅ {display_name}: {version}")
        
        # Special check for PyTorch CUDA
        if module_name == 'torch':
            import torch
            if torch.cuda.is_available():
                successful_imports.append(f"   └─ CUDA available: {torch.cuda.get_device_name(0)}")
            else:
                successful_imports.append("   └─ CUDA: Not available (CPU mode)")
                
    except ImportError as e:
        failed_imports.append(f"❌ {display_name}: {str(e)}")

# Print results
print("\nInstallation Status:")
print("-" * 40)
for msg in successful_imports:
    print(msg)
    
if failed_imports:
    print("\n⚠️ Failed imports:")
    for msg in failed_imports:
        print(msg)
    print("\n💡 To fix TTS installation, run:")
    print("!pip uninstall -y TTS")
    print("!pip install git+https://github.com/coqui-ai/TTS.git")
else:
    print("\n🎉 All packages installed successfully!")

# Additional TTS check
if 'TTS' not in [pkg[0] for pkg in packages_to_test if pkg[0] in sys.modules]:
    print("\n📦 Installing TTS from GitHub (fallback method)...")
    !pip uninstall -y TTS
    !pip install git+https://github.com/coqui-ai/TTS.git
    print("Please restart the runtime and run this cell again.")

In [None]:
# Alternative TTS Installation (if the above fails)
# Run this cell ONLY if you get "ModuleNotFoundError: No module named 'TTS'"

print("🔧 Alternative TTS installation method...")
print("This will install TTS directly from GitHub\n")

# Uninstall any existing TTS installation
!pip uninstall -y TTS

# Install from GitHub (latest compatible version)
print("Installing TTS from GitHub...")
!pip install git+https://github.com/coqui-ai/TTS.git

# Verify installation
print("\nVerifying TTS installation...")
try:
    from TTS.api import TTS
    print("✅ TTS installed successfully!")
    
    # List available models
    print("\nAvailable TTS models:")
    tts = TTS.list_models()
    print("- Multilingual models available:", len([m for m in tts if 'multilingual' in m]) > 0)
    
except ImportError as e:
    print(f"❌ TTS still not working: {e}")
    print("\nTry these steps:")
    print("1. Runtime → Restart runtime")
    print("2. Run the installation cells again")
    print("3. If still failing, try:")
    print("   !pip install coqui-tts")
    
print("\n⚠️ After running this cell, you may need to restart the runtime:")

## 2. Core Components

Define the essential classes and functions for Yiddish TTS

In [None]:
import unicodedata
import re
import os
import json
from pathlib import Path

class YiddishTextProcessor:
    """Text processor for Yiddish (Hebrew script) for TTS training"""
    
    def __init__(self):
        # Hebrew character ranges
        self.hebrew_chars = set()
        # Main Hebrew block
        for i in range(0x0590, 0x05FF):
            self.hebrew_chars.add(chr(i))
        # Hebrew presentation forms
        for i in range(0xFB1D, 0xFB4F):
            self.hebrew_chars.add(chr(i))
        
        # Essential punctuation and symbols
        self.punctuation = ".,!?;:-()[]{}\"\'`"
        self.allowed_chars = set(self.punctuation + " \n\t")
        self.allowed_chars.update("0123456789")
        
    def normalize_yiddish_text(self, text):
        """Normalize Yiddish text for TTS training"""
        # Normalize Unicode
        text = unicodedata.normalize('NFD', text)
        
        # Keep only Hebrew script characters, punctuation, and spaces
        cleaned_chars = []
        for char in text:
            if char in self.hebrew_chars or char in self.allowed_chars:
                cleaned_chars.append(char)
            elif char.isspace():
                cleaned_chars.append(' ')
        
        text = ''.join(cleaned_chars)
        
        # Clean up extra spaces
        text = re.sub(r'\s+', ' ', text.strip())
        
        # Handle special Hebrew punctuation
        text = text.replace('״', '"')  # Hebrew geresh
        text = text.replace('׳', "'")  # Hebrew gershayim
        
        return text
    
    def get_unique_chars(self, texts):
        """Get unique characters from all texts"""
        unique_chars = set()
        for text in texts:
            normalized = self.normalize_yiddish_text(text)
            unique_chars.update(normalized)
        return sorted(list(unique_chars))

print("✅ YiddishTextProcessor class defined")

## 3. Data Management

Upload your data or use sample data

In [None]:
# Mount Google Drive (optional - for saving models and data)
from google.colab import drive
drive.mount('/content/drive')

# Create project directory in Drive
project_dir = '/content/drive/MyDrive/yiddish_tts_project'
os.makedirs(project_dir, exist_ok=True)
print(f"Project directory: {project_dir}")

In [None]:
# Option 1: Upload your data files
from google.colab import files

print("Upload your audio files and transcripts:")
print("Expected structure:")
print("  - Audio files: .wav format")
print("  - Text files: .txt format with matching names")
print("\nClick 'Choose Files' below to upload:")

# Uncomment to enable file upload
# uploaded = files.upload()
# for filename in uploaded.keys():
#     print(f'Uploaded: {filename}')

In [None]:
# Option 2: Create sample training data
import numpy as np
import soundfile as sf

# Create directories
os.makedirs('tts_segments/audio', exist_ok=True)
os.makedirs('tts_segments/text', exist_ok=True)

# Sample Yiddish texts
sample_texts = [
    "שבת שלום און א גוטן טאג",
    "ווי גייט עס מיט אייך היינט",
    "איך וויל רעדן אויף יידיש",
    "דאס איז א טעסט פון מיין סיסטעם",
    "א דאנק פאר אייער צייט"
]

# Create dummy audio files (for demonstration)
print("Creating sample data files...")
for i, text in enumerate(sample_texts, 1):
    # Create dummy audio (1 second of silence)
    audio = np.zeros(16000)  # 1 second at 16kHz
    audio_path = f'tts_segments/audio/segment_{i:04d}.wav'
    sf.write(audio_path, audio, 16000)
    
    # Save text
    text_path = f'tts_segments/text/segment_{i:04d}.txt'
    with open(text_path, 'w', encoding='utf-8') as f:
        f.write(text)
    
    print(f"Created: {audio_path} with text: {text}")

print("\n✅ Sample data created!")

## 4. Immediate Speech Generation (No Training Required)

Generate Yiddish speech immediately using pre-trained multilingual models

In [None]:
def generate_yiddish_speech(text, output_file="yiddish_output.wav", use_gpu=True):
    """Generate Yiddish speech from text using XTTS v2"""
    
    print("🎤 Yiddish Speech Generator")
    print("Using XTTS v2 for zero-shot voice cloning")
    print(f"Text: {text}")
    print()
    
    try:
        # Initialize TTS model
        print("Loading XTTS v2 model (this may take a minute)...")
        device = "cuda" if torch.cuda.is_available() and use_gpu else "cpu"
        tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v2").to(device)
        print(f"✓ Model loaded on {device}!")
        
        # Create a reference audio (using the first sample or a generated one)
        reference_audio = "tts_segments/audio/segment_0001.wav"
        if not os.path.exists(reference_audio):
            print("Creating reference audio...")
            # Create a simple reference audio
            ref_audio = np.random.randn(16000) * 0.1  # 1 second of noise
            sf.write(reference_audio, ref_audio, 16000)
        
        print("Generating speech...")
        # Generate speech using Hebrew as the closest language
        tts.tts_to_file(
            text=text,
            file_path=output_file,
            speaker_wav=reference_audio,
            language="he"  # Hebrew is closest to Yiddish
        )
        
        print(f"✅ Speech generated: {output_file}")
        
        # Play audio in Colab
        from IPython.display import Audio, display
        display(Audio(output_file))
        
        return True
        
    except Exception as e:
        print(f"❌ Error: {e}")
        return False

# Test speech generation
test_phrases = [
    "שבת שלום",  # Shabbat Shalom
    "גוט מארגן",  # Good morning
    "א דאנק",  # Thank you
]

for i, phrase in enumerate(test_phrases, 1):
    print(f"\n--- Phrase {i}: {phrase} ---")
    output_file = f"yiddish_output_{i}.wav"
    generate_yiddish_speech(phrase, output_file)

## 5. Training Your Own Model

Train a custom Yiddish TTS model with your data

In [None]:
def prepare_training_data(audio_dir="tts_segments/audio", text_dir="tts_segments/text"):
    """Prepare training data in LJSpeech format"""
    
    processor = YiddishTextProcessor()
    metadata = []
    
    # Get all audio files
    audio_files = sorted(Path(audio_dir).glob("*.wav"))
    
    print(f"Found {len(audio_files)} audio files")
    
    for audio_file in audio_files:
        # Get corresponding text file
        text_file = Path(text_dir) / audio_file.stem
        text_file = text_file.with_suffix('.txt')
        
        if text_file.exists():
            # Read and normalize text
            with open(text_file, 'r', encoding='utf-8') as f:
                text = f.read().strip()
                normalized_text = processor.normalize_yiddish_text(text)
                
            # Add to metadata
            rel_path = str(audio_file.absolute())
            metadata.append(f"{rel_path}|{normalized_text}|{normalized_text}")
            print(f"Processed: {audio_file.name} -> {normalized_text[:50]}...")
    
    # Save metadata file
    metadata_file = "yiddish_train_data.txt"
    with open(metadata_file, 'w', encoding='utf-8') as f:
        f.write('\n'.join(metadata))
    
    print(f"\n✅ Training data prepared: {metadata_file}")
    print(f"Total samples: {len(metadata)}")
    
    # Get unique characters
    texts = [line.split('|')[1] for line in metadata]
    unique_chars = processor.get_unique_chars(texts)
    print(f"Unique characters ({len(unique_chars)}): {''.join(unique_chars)}")
    
    return metadata_file, unique_chars

# Prepare data
metadata_file, unique_chars = prepare_training_data()

In [None]:
def create_training_config(unique_chars, output_path="./yiddish_tts_training/"):
    """Create Tacotron2 configuration for Yiddish TTS"""
    
    config = {
        "model": "tacotron2",
        "run_name": "yiddish_tacotron2_colab",
        "output_path": output_path,
        
        "datasets": [{
            "name": "yiddish_dataset",
            "path": "./",
            "meta_file_train": "yiddish_train_data.txt",
            "meta_file_val": "",
            "formatter": "ljspeech"
        }],
        
        "characters": {
            "pad": "_",
            "eos": "~",
            "bos": "^",
            "characters": ''.join(unique_chars),
            "punctuations": "!\"'(),-.:;?[]",
            "phonemes": "",
            "is_unique": True,
            "is_sorted": True
        },
        
        "audio": {
            "sample_rate": 16000,
            "resample": True,
            "do_trim_silence": True,
            "trim_db": 60,
            "mel_fmin": 0,
            "mel_fmax": 8000,
            "n_fft": 1024,
            "hop_length": 256,
            "win_length": 1024,
            "n_mels": 80,
            "preemphasis": 0.97,
            "ref_level_db": 20,
            "spec_gain": 20
        },
        
        "model_params": {
            "n_symbols": len(unique_chars) + 3,  # +3 for pad, eos, bos
            "symbols_embedding_dim": 512,
            "encoder_embedding_dim": 512,
            "encoder_n_convolutions": 3,
            "encoder_kernel_size": 5,
            "attention_dim": 128,
            "attention_location_n_filters": 32,
            "attention_location_kernel_size": 31,
            "decoder_rnn_dim": 1024,
            "prenet_dim": 256,
            "postnet_embedding_dim": 512,
            "postnet_kernel_size": 5,
            "postnet_n_convolutions": 5,
            "gate_threshold": 0.5
        },
        
        "train_config": {
            "batch_size": 16,  # Reduced for Colab
            "eval_batch_size": 8,
            "epochs": 100,
            "lr": 0.001,
            "weight_decay": 1e-6,
            "grad_clip": 1.0,
            "print_step": 10,
            "save_step": 500,
            "log_step": 100,
            "mixed_precision": True,  # Enable for faster training
            "num_loader_workers": 2,
            "num_eval_loader_workers": 2
        },
        
        "test_sentences": [
            "שבת שלום און א גוטן טאג צו אלע",
            "איך וויל רעדן אויף יידיש מיט אייך",
            "דאס איז א טעסט פון מיין נייע סיסטעם"
        ]
    }
    
    # Save config
    config_file = "yiddish_colab_config.json"
    with open(config_file, 'w', encoding='utf-8') as f:
        json.dump(config, f, ensure_ascii=False, indent=2)
    
    print(f"✅ Configuration saved: {config_file}")
    return config_file

# Create config
config_file = create_training_config(unique_chars)

In [None]:
# Training with TTS library
def train_tacotron2_model(config_file):
    """Train Tacotron2 model using the TTS library"""
    
    print("🚀 Starting Tacotron2 training...")
    print("This will take time. Monitor the progress below.")
    print("\nTip: Training can be interrupted and resumed from checkpoints\n")
    
    # Run TTS training command
    !python -m TTS.bin.train_tts \
        --config_path {config_file} \
        --coqpit.output_path "./yiddish_tts_training/" \
        --coqpit.datasets.0.path "./" \
        --coqpit.datasets.0.meta_file_train "yiddish_train_data.txt" \
        --coqpit.train_config.batch_size 8 \
        --coqpit.train_config.epochs 50
    
    print("\n✅ Training completed or interrupted!")
    print("Check ./yiddish_tts_training/ for checkpoints")

# Uncomment to start training
# train_tacotron2_model(config_file)

## 6. Simplified Training Script

A more direct approach using PyTorch

In [None]:
# Simplified training approach
def simple_train_yiddish_tts():
    """Simplified training using existing multilingual model"""
    
    print("🎓 Fine-tuning multilingual model for Yiddish...")
    
    try:
        # Use XTTS for fine-tuning (supports multilingual)
        from TTS.api import TTS
        
        # Load pre-trained model
        device = "cuda" if torch.cuda.is_available() else "cpu"
        tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
        
        print(f"Model loaded on {device}")
        
        # Prepare your audio-text pairs
        audio_files = list(Path("tts_segments/audio").glob("*.wav"))
        
        if len(audio_files) > 0:
            print(f"Found {len(audio_files)} audio files for fine-tuning")
            
            # Note: XTTS v2 doesn't require explicit fine-tuning for new languages
            # It can adapt using voice cloning
            print("\n✅ Model ready for Yiddish generation!")
            print("XTTS v2 adapts to new languages through voice cloning.")
            print("Use your audio samples as reference voices.")
        else:
            print("⚠️ No audio files found. Upload your data first.")
            
    except Exception as e:
        print(f"Error: {e}")
        print("\nAlternative: Use the zero-shot generation approach above")

# Run simplified training
simple_train_yiddish_tts()

## 7. Utilities & Helpers

In [None]:
# Download trained models to local machine
def download_models():
    """Download trained models and outputs"""
    from google.colab import files
    import shutil
    
    # Create archive of training outputs
    if os.path.exists("yiddish_tts_training"):
        print("Creating archive of training outputs...")
        shutil.make_archive("yiddish_tts_models", 'zip', "yiddish_tts_training")
        files.download("yiddish_tts_models.zip")
        print("✅ Models downloaded!")
    
    # Download generated audio files
    for wav_file in Path(".").glob("yiddish_output*.wav"):
        files.download(str(wav_file))
        print(f"Downloaded: {wav_file}")

# Uncomment to download
# download_models()

In [None]:
# Interactive text-to-speech interface
def interactive_tts():
    """Interactive interface for testing TTS"""
    
    print("🎤 Interactive Yiddish TTS")
    print("Enter Yiddish text to generate speech:")
    print("(Type 'quit' to exit)\n")
    
    while True:
        text = input("Yiddish text: ").strip()
        
        if text.lower() == 'quit':
            break
            
        if text:
            output_file = f"output_{hash(text)}.wav"
            success = generate_yiddish_speech(text, output_file)
            if success:
                print(f"✅ Generated: {output_file}\n")
        else:
            print("Please enter some text\n")

# Run interactive interface
# interactive_tts()

## 📝 Tips for Google Colab

### GPU Usage
- Always enable GPU: Runtime → Change runtime type → Hardware accelerator → T4 GPU
- Monitor GPU usage: `!nvidia-smi`
- Clear GPU memory if needed: `torch.cuda.empty_cache()`

### Session Management
- Colab sessions timeout after ~90 minutes of inactivity
- Use Google Drive to persist models and data
- Save checkpoints frequently during training

### Memory Management
- Reduce batch size if you get out-of-memory errors
- Clear variables: `del variable_name`
- Restart runtime if memory issues persist

### Data Tips
- Upload data as ZIP and extract for faster upload
- Use Google Drive for large datasets
- Keep audio files under 10 seconds for better training

### Training Tips
- Start with small epochs (10-20) to test
- Monitor loss curves in TensorBoard
- Save checkpoints every 500-1000 steps
- Use mixed precision training for faster speed

## 🎯 Quick Start Commands

```python
# 1. Generate speech immediately (no training)
generate_yiddish_speech("שבת שלום", "output.wav")

# 2. Prepare your data
metadata_file, unique_chars = prepare_training_data()

# 3. Create configuration
config_file = create_training_config(unique_chars)

# 4. Train model (optional)
train_tacotron2_model(config_file)
```