# 🎙️ XTTS Dataset Creator for Google Colab

Create professional voice datasets for XTTS training with a full-featured Gradio interface.

## 🚀 Features:
- **Multiple Input Sources:** YouTube, file upload, microphone
- **Automatic Transcription:** Faster Whisper with GPU support
- **Audio Segmentation:** Advanced VAD and quality filtering
- **Export Formats:** CSV, JSON, LJSpeech, metadata.txt
- **Statistics Dashboard:** Real-time dataset analytics
- **Google Drive Integration:** Auto-save all projects

---

## 📦 Step 1: Install Dependencies

In [None]:
%%capture

# Install PyTorch with CUDA
!pip install torch==2.1.2+cu118 torchaudio==2.1.2+cu118 --index-url https://download.pytorch.org/whl/cu118

# Install core dependencies
!pip install gradio>=4.44.0 faster-whisper>=1.0.0 librosa soundfile pandas numpy yt-dlp pydub

print("✅ Dependencies installed!")

## 💾 Step 2: Mount Google Drive

In [None]:
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')

# Create workspace
workspace = '/content/drive/MyDrive/XTTS_Datasets'
os.makedirs(workspace, exist_ok=True)

print(f"\n✅ Google Drive mounted!")
print(f"📍 Workspace: {workspace}")
print(f"💾 All datasets will be saved here automatically!")

## 🔽 Step 3: Clone Dataset Creator

In [None]:
import os
from pathlib import Path

# Clone repository
if not Path("xtts-finetune-webui-fresh").exists():
    print("🔽 Cloning repository...")
    !git clone https://github.com/Diakonrobel/Amharic_XTTS-V2_TTS.git xtts-finetune-webui-fresh
    print("✅ Repository cloned!")
else:
    print("📂 Repository exists, pulling updates...")
    !cd xtts-finetune-webui-fresh && git pull

%cd xtts-finetune-webui-fresh/dataset_creator

print(f"\n✅ Dataset creator ready!")
print(f"📍 Current directory: {os.getcwd()}")

## 🚀 Step 4: Launch Dataset Creator UI

In [None]:
import sys
sys.path.append('..')

# Import and launch
from app import create_interface

print("🎨 Launching Dataset Creator...\n")
print("📊 Features Available:")
print("   ✅ YouTube video processing")
print("   ✅ Audio file upload")
print("   ✅ Microphone recording")
print("   ✅ Automatic transcription")
print("   ✅ Quality filtering")
print("   ✅ Multiple export formats")
print("   ✅ Real-time statistics")
print("\n💡 All projects saved to Google Drive automatically!\n")

# Create and launch interface
demo = create_interface()
demo.launch(share=True, debug=True)

---

## 📚 Quick Guide

### 1️⃣ Create a Project
1. Go to **Project Setup** tab
2. Enter project name, language, and speaker name
3. Click **Create New Project**

### 2️⃣ Add Data (Choose One):

**🎬 YouTube:**
- Paste YouTube URL
- Adjust min/max duration sliders
- Click **Process YouTube Video**

**📁 File Upload:**
- Upload audio files (WAV, MP3, FLAC)
- Configure segmentation options
- Click **Process Audio Files**

**🎤 Recording:**
- Click microphone to record
- Optionally add manual transcription
- Click **Add Recording**

### 3️⃣ Review & Export
- Check **Dataset Overview** for statistics
- Select export format (CSV, JSON, LJSpeech)
- Click **Export Dataset**
- Download the file

---

## ⚙️ Processing Options

### Segment Duration
- **Min Duration (1-5s):** Filter out very short segments
- **Max Duration (5-30s):** Split long segments
- **Recommended:** 1-15 seconds

### Quality Threshold (0-1)
- **0.3-0.5:** Accept most segments (quantity)
- **0.6-0.7:** Balanced quality/quantity ✅
- **0.8-1.0:** Strict filtering (quality)

---

## 💡 Best Practices

### Audio Quality
- ✅ Clear voice, minimal background noise
- ✅ Consistent volume levels
- ✅ Sample rate: 22050 Hz or higher
- ❌ Avoid music, multiple speakers, echoes

### Dataset Size
- **Testing:** 5-10 minutes
- **Good Quality:** 30-60 minutes
- **Excellent Quality:** 2-4 hours
- **Professional:** 10+ hours

### Language Codes
- English: `en`
- Spanish: `es`
- French: `fr`
- German: `de`
- Amharic: `am` or `amh`
- [Full list in UI]

---

## 🔧 Troubleshooting

**YouTube Download Fails:**
```bash
# Update yt-dlp
!pip install -U yt-dlp
```

**Low Quality Segments:**
- Lower quality threshold to 0.5-0.6
- Check source audio quality
- Adjust min/max duration

**Transcription Errors:**
- Verify correct language selected
- Ensure audio is clear
- Try shorter segments

**Out of Memory:**
- Process fewer files at once
- Use shorter audio segments
- Restart runtime

---

## 📖 Export Formats

### CSV
- Pandas-compatible format
- Columns: audio_file, text, speaker_name, duration
- Use for data analysis

### JSON
- Structured data format
- Easy to parse programmatically
- Includes all metadata

### metadata.txt
- LJSpeech-style format
- Format: `filename|text`
- Compatible with TTS trainers

### LJSpeech
- Complete LJSpeech dataset
- Includes audio files + metadata
- Ready for training
- Exported as ZIP archive

---

## 🎉 Credits

- **XTTS v2:** [Coqui AI](https://github.com/coqui-ai/TTS)
- **Faster Whisper:** [systran/faster-whisper](https://github.com/systran/faster-whisper)
- **Gradio:** [gradio-app/gradio](https://github.com/gradio-app/gradio)
- **yt-dlp:** [yt-dlp/yt-dlp](https://github.com/yt-dlp/yt-dlp)

---

**⭐ Star the repo:** https://github.com/Diakonrobel/Amharic_XTTS-V2_TTS

**💾 All datasets saved to:** `/content/drive/MyDrive/XTTS_Datasets/`

**Status:** ✅ Ready for Production
