# üé§ Kurdish (Kurmanji) TTS Training with XTTS v2

This notebook fine-tunes XTTS v2 on Kurdish Common Voice data to add proper Kurdish language support.

## Prerequisites
- Google Colab with GPU (Runtime ‚Üí Change runtime type ‚Üí GPU)
- Mozilla Common Voice Kurdish corpus uploaded to Google Drive

## Steps
1. Mount Google Drive
2. Install dependencies
3. Load and prepare Common Voice data
4. Fine-tune XTTS v2
5. Download trained model

## 1. Check GPU Availability

In [None]:
!nvidia-smi

## 2. Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## 3. Install Dependencies

In [None]:
!pip install -q coqui-tts>=0.27.0
!pip install -q librosa>=0.11.0
!pip install -q soundfile
!pip install -q pandas
print("‚úÖ Dependencies installed!")

## 4. Clone Training Script from Repository

In [None]:
!git clone https://github.com/T1Agit/TTS_STT_Kurdifer.git
%cd TTS_STT_Kurdifer

## 5. Configure Paths

Update these paths to match your Google Drive structure:

In [None]:
# Path to your Common Voice Kurdish corpus in Google Drive
# Example: /content/drive/MyDrive/CommonVoice/cv-corpus-24.0-2025-12-05-kmr/cv-corpus-24.0-2025-12-05/kmr/
CORPUS_PATH = "/content/drive/MyDrive/CommonVoice/cv-corpus-24.0-2025-12-05-kmr/cv-corpus-24.0-2025-12-05/kmr/"

# Output directory for processed audio (local to Colab)
OUTPUT_DIR = "/content/processed_audio"

# Model output directory (will be saved to Drive)
MODEL_DIR = "/content/drive/MyDrive/kurdish_tts_model"

# Maximum samples to process (None for all, or e.g. 1000 for quick test)
MAX_SAMPLES = None  # Set to 1000 for quick test

print(f"Corpus path: {CORPUS_PATH}")
print(f"Output dir: {OUTPUT_DIR}")
print(f"Model dir: {MODEL_DIR}")
print(f"Max samples: {MAX_SAMPLES or 'All'}")

## 6. Run Training Script

This will:
- Load the Common Voice TSV metadata
- Filter for quality clips (2+ upvotes, 2-15s duration)
- Convert MP3 ‚Üí WAV (22050Hz, mono)
- Prepare training data
- Initialize XTTS v2 trainer

**Note:** Full training may take 2-4 hours depending on dataset size.

In [None]:
import sys

# Build command
cmd = [
    "python", "train_kurdish_xtts.py",
    "--corpus_path", CORPUS_PATH,
    "--output_dir", OUTPUT_DIR,
    "--model_dir", MODEL_DIR,
    "--epochs", "10",
    "--batch_size", "4"  # Adjust based on Colab GPU memory
]

if MAX_SAMPLES is not None:
    cmd.extend(["--max_samples", str(MAX_SAMPLES)])

# Run training
!{" ".join(cmd)}

## 7. Verify Output

Check the generated files:

In [None]:
import os

print("üìÅ Model Directory Contents:")
!ls -lh {MODEL_DIR}

print("\nüìÑ Config File:")
config_path = os.path.join(MODEL_DIR, "config.json")
if os.path.exists(config_path):
    !cat {config_path}
else:
    print("‚ùå Config file not found")

print("\nüìä Training Manifest:")
manifest_path = os.path.join(MODEL_DIR, "training_manifest.json")
if os.path.exists(manifest_path):
    import json
    with open(manifest_path, 'r') as f:
        manifest = json.load(f)
    print(f"Total training samples: {len(manifest)}")
    print(f"First sample: {manifest[0] if manifest else 'None'}")
else:
    print("‚ùå Manifest file not found")

## 8. Download Model (Optional)

Download the trained model to your local machine:

In [None]:
from google.colab import files
import shutil

# Create a zip file of the model directory
print("üì¶ Creating zip file...")
shutil.make_archive("/content/kurdish_tts_model", 'zip', MODEL_DIR)

print("‚¨áÔ∏è Downloading...")
files.download("/content/kurdish_tts_model.zip")

print("‚úÖ Download complete!")
print("\nüí° Next steps:")
print("   1. Extract the zip file on your local machine")
print("   2. Place in: TTS_STT_Kurdifer/models/kurdish/")
print("   3. Run: python tts_stt_service_base44.py")

## 9. Test the Model (Optional)

Test the model directly in Colab:

In [None]:
# This is a placeholder - actual testing would require loading the fine-tuned model
print("Testing placeholder...")
print("For actual testing, use the tts_stt_service_base44.py script on your local machine.")

## üìö Additional Resources

- [Mozilla Common Voice Kurdish Dataset](https://datacollective.mozillafoundation.org/datasets/cmj8u3pbq00dtnxxbz4yoxc4i)
- [Coqui TTS Documentation](https://docs.coqui.ai/)
- [XTTS v2 Model Card](https://huggingface.co/coqui/XTTS-v2)
- [GitHub Repository](https://github.com/T1Agit/TTS_STT_Kurdifer)