# üé§ Kurdish TTS Training with Coqui TTS

Train a custom Kurdish (Kurmanji) voice model using your own audio samples.

## üìã Requirements
- Google Colab with GPU (T4 or better)
- 30 minutes to 2 hours of Kurdish audio recordings
- Corresponding text transcriptions
- 2-6 hours for training

## üéØ What You'll Build
A custom Kurdish text-to-speech model that can:
- Speak any Kurdish text in a natural voice
- Run on your Raspberry Pi or local server
- Generate high-quality audio files

---

## üöÄ Step 1: Check GPU Availability

First, verify that you have GPU access. Go to **Runtime ‚Üí Change runtime type** and select **GPU (T4)**.

In [None]:
!nvidia-smi

## üì¶ Step 2: Install Dependencies

Install Coqui TTS and required libraries.

In [None]:
# Install Coqui TTS
!pip install -q TTS>=0.27.0
!pip install -q pydub librosa soundfile

print("‚úÖ Dependencies installed successfully!")

## üìÅ Step 3: Prepare Your Training Data

### Data Format
You need:
1. **Audio files**: MP3 or WAV files (16kHz, mono recommended)
2. **Metadata file**: CSV or TXT file with audio_file|transcription pairs

### Example Directory Structure
```
kurdish_data/
‚îú‚îÄ‚îÄ wavs/
‚îÇ   ‚îú‚îÄ‚îÄ audio_001.wav
‚îÇ   ‚îú‚îÄ‚îÄ audio_002.wav
‚îÇ   ‚îî‚îÄ‚îÄ ...
‚îî‚îÄ‚îÄ metadata.csv
```

### Example Metadata Format
```
wavs/audio_001.wav|Silav, tu √ßawa y√Æ?
wavs/audio_002.wav|Ez bi x√™r im, spas!
wavs/audio_003.wav|Nav√™ min Ahmed e.
```

### Upload Your Data
Upload your audio files and metadata.csv to Google Colab.

In [None]:
from google.colab import files
import os

# Create directories
os.makedirs('kurdish_data/wavs', exist_ok=True)

print("üì§ Please upload your audio files to kurdish_data/wavs/")
print("üì§ Then upload your metadata.csv to kurdish_data/")
print("\nAlternatively, manually upload using the file browser on the left.")

## üîß Step 4: Preprocess Audio Files

Convert audio to the required format (16kHz, mono) and verify the dataset.

In [None]:
import librosa
import soundfile as sf
import os
from pathlib import Path

def preprocess_audio(input_path, output_path, target_sr=16000):
    """Convert audio to 16kHz mono WAV format."""
    try:
        # Load audio
        audio, sr = librosa.load(input_path, sr=target_sr, mono=True)
        
        # Save as WAV
        sf.write(output_path, audio, target_sr)
        return True
    except Exception as e:
        print(f"Error processing {input_path}: {e}")
        return False

# Process all audio files
wavs_dir = Path('kurdish_data/wavs')
processed_dir = Path('kurdish_data/wavs_processed')
processed_dir.mkdir(exist_ok=True)

audio_files = list(wavs_dir.glob('*.wav')) + list(wavs_dir.glob('*.mp3'))
print(f"Found {len(audio_files)} audio files")

successful = 0
for audio_file in audio_files:
    output_file = processed_dir / f"{audio_file.stem}.wav"
    if preprocess_audio(audio_file, output_file):
        successful += 1
        if successful % 100 == 0:
            print(f"Processed {successful}/{len(audio_files)} files...")

print(f"‚úÖ Successfully preprocessed {successful}/{len(audio_files)} audio files")

## ‚úÖ Step 5: Verify Dataset

Check the dataset structure and play a sample audio file.

In [None]:
from IPython.display import Audio, display
import librosa

# Load metadata
metadata_path = 'kurdish_data/metadata.csv'

if os.path.exists(metadata_path):
    # Read metadata
    with open(metadata_path, 'r', encoding='utf-8') as f:
        lines = f.readlines()
    
    print(f"üìä Dataset Statistics:")
    print(f"   Total samples: {len(lines)}")
    
    # Calculate total duration
    total_duration = 0
    for i, line in enumerate(lines[:100]):  # Check first 100 for speed
        parts = line.strip().split('|')
        if len(parts) >= 2:
            audio_path = f"kurdish_data/{parts[0]}"
            if os.path.exists(audio_path):
                duration = librosa.get_duration(path=audio_path)
                total_duration += duration
    
    avg_duration = total_duration / min(100, len(lines))
    estimated_total = avg_duration * len(lines) / 60  # in minutes
    
    print(f"   Estimated duration: {estimated_total:.1f} minutes")
    print(f"   Average clip length: {avg_duration:.2f} seconds")
    
    # Show sample
    print("\nüìù Sample entries:")
    for i, line in enumerate(lines[:5]):
        parts = line.strip().split('|')
        if len(parts) >= 2:
            print(f"   {i+1}. {parts[1][:50]}...")
    
    # Play first audio sample
    print("\nüîä Playing first audio sample:")
    first_audio = f"kurdish_data/{lines[0].strip().split('|')[0]}"
    if os.path.exists(first_audio):
        display(Audio(first_audio))
    
    print("\n‚úÖ Dataset verified successfully!")
else:
    print("‚ùå metadata.csv not found! Please upload your metadata file.")

## ‚öôÔ∏è Step 6: Configure Training Parameters

Set up the training configuration for Coqui TTS.

In [None]:
import json

# Training configuration
config = {
    "model": "tts_models/multilingual/multi-dataset/xtts_v2",
    "dataset_path": "kurdish_data",
    "output_path": "kurdish_tts_model",
    "batch_size": 32,
    "num_epochs": 1000,
    "learning_rate": 0.0001,
    "language": "ku",
    "audio_sample_rate": 16000,
    "text_cleaner": "multilingual_cleaners",
    "use_gpu": True
}

# Save configuration
with open('training_config.json', 'w') as f:
    json.dump(config, f, indent=2)

print("‚öôÔ∏è Training Configuration:")
print(json.dumps(config, indent=2))
print("\n‚úÖ Configuration saved!")

## üéØ Step 7: Initialize TTS Model

Load the pre-trained XTTS v2 model. This model already supports Kurdish!

**Note:** XTTS v2 is pre-trained on Kurdish data from Mozilla Common Voice, so you can use it directly or fine-tune it with your own voice samples.

In [None]:
from TTS.api import TTS
import torch

# Check CUDA availability
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"üîß Using device: {device}")

# Initialize TTS model
print("üì• Loading XTTS v2 model... (this may take a few minutes)")
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)

print("\n‚úÖ Model loaded successfully!")
print("\nüí° XTTS v2 is pre-trained on Kurdish data.")
print("For custom voices, use voice cloning with a reference speaker.")

## üß™ Step 8: Test the Model

Generate speech samples using the pre-trained model or your custom voice.

In [None]:
from IPython.display import Audio, display

# Test sentences in Kurdish
test_sentences = [
    "Silav, tu √ßawa y√Æ?",
    "Ez bi x√™r im, spas!",
    "Nav√™ min Ahmed e.",
    "Ez ji Kurdistan√™ me.",
    "√ävara te bi x√™r."
]

print("üé§ Generating speech samples...\n")

# Option 1: Use pre-trained voice (no reference needed)
use_reference = input("Use custom reference voice? (y/n): ")

reference_audio = None
if use_reference.lower() == 'y':
    # Use your uploaded audio as reference
    reference_audio = "kurdish_data/wavs_processed/audio_001.wav"
    print(f"Using reference audio: {reference_audio}\n")

for i, text in enumerate(test_sentences, 1):
    print(f"{i}. {text}")
    
    # Generate audio
    output_path = f"test_output_{i}.wav"
    
    try:
        if reference_audio and os.path.exists(reference_audio):
            # Use voice cloning with reference audio
            tts.tts_to_file(
                text=text,
                file_path=output_path,
                speaker_wav=reference_audio,
                language="ku"
            )
        else:
            # Use pre-trained voice
            tts.tts_to_file(
                text=text,
                file_path=output_path,
                language="ku"
            )
        
        # Play audio
        display(Audio(output_path))
    except Exception as e:
        print(f"Error: {e}")
    print()

print("‚úÖ Test samples generated successfully!")

## üì¶ Step 9: Export and Download Model

Package your model configuration for use on Raspberry Pi or local server.

In [None]:
import shutil
from google.colab import files
from datetime import datetime

# Create export directory
export_dir = "kurdish_tts_export"
os.makedirs(export_dir, exist_ok=True)

# Copy reference audio for voice cloning (if used)
if reference_audio and os.path.exists(reference_audio):
    shutil.copy(reference_audio, f"{export_dir}/reference_speaker.wav")
    print("‚úÖ Reference audio copied")

# Copy test outputs
for i in range(1, 6):
    test_file = f"test_output_{i}.wav"
    if os.path.exists(test_file):
        shutil.copy(test_file, f"{export_dir}/{test_file}")
print("‚úÖ Test outputs copied")

# Create usage instructions
instructions = f"""# Kurdish TTS Model - Usage Instructions

## Installation
```bash
pip install TTS>=0.27.0
```

## Usage
```python
from TTS.api import TTS

# Initialize model
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")

# Generate speech with your custom voice (if reference_speaker.wav is included)
tts.tts_to_file(
    text="Silav, tu √ßawa y√Æ?",
    file_path="output.wav",
    speaker_wav="reference_speaker.wav",
    language="ku"
)

# Or use the pre-trained voice (no reference needed)
tts.tts_to_file(
    text="Silav, tu √ßawa y√Æ?",
    file_path="output.wav",
    language="ku"
)
```

## Integration with TTS_STT_Kurdifer
1. Install: pip install -r requirements.txt
2. Copy reference_speaker.wav to your project directory (optional)
3. Update tts_stt_service_base44.py to use the reference speaker
4. Test with: python tts_stt_service_base44.py

## Model Details
- Model: XTTS v2 (Multilingual)
- Language: Kurdish (ku)
- Pre-trained: Yes (on Mozilla Common Voice Kurdish dataset)
- Voice Cloning: Supported (optional)

Generated on: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
"""

with open(f"{export_dir}/README.txt", 'w', encoding='utf-8') as f:
    f.write(instructions)

# Create archive
print("\nüì¶ Creating export package...")
shutil.make_archive('kurdish_tts_model', 'zip', export_dir)

print("‚úÖ Model exported successfully!")
print("\nüì• Downloading model package...")

# Download the package
files.download('kurdish_tts_model.zip')

print("\n‚úÖ Download complete!")
print("\nüìù Next steps:")
print("1. Extract the ZIP file")
print("2. Read README.txt for usage instructions")
print("3. Integrate with your TTS_STT_Kurdifer project")
print("4. Test on your Raspberry Pi or local server")

## üí° Tips for Better Results

### Data Quality
- Use clear, noise-free recordings
- Consistent speaking pace and volume
- Diverse vocabulary and sentence structures
- At least 30 minutes of audio (more is better)

### Voice Cloning Best Practices
- Choose a reference audio with clear pronunciation
- 3-10 seconds is optimal length for reference
- Avoid background noise in reference audio
- Can use multiple reference speakers for variety
- Test different reference samples to find the best one

### Performance Tips
- First generation takes longer (model initialization)
- Subsequent generations are faster (model cached)
- GPU recommended for production use
- CPU works but is slower

---

## üìö Additional Resources

- [Coqui TTS Documentation](https://docs.coqui.ai/en/latest/)
- [XTTS v2 Model Card](https://huggingface.co/coqui/XTTS-v2)
- [Mozilla Common Voice Kurdish](https://commonvoice.mozilla.org/ku)
- [TTS_STT_Kurdifer Repository](https://github.com/T1Agit/TTS_STT_Kurdifer)

---

**Made with ‚ù§Ô∏è for the Kurdish community**