# Turkish XTTS Fine-tuning on Google Colab

This notebook guides you through fine-tuning XTTSv2 for Turkish language.

## Cell 1: Check GPU

Verify L4 GPU is active. Expected output should show "NVIDIA L4" with ~24GB memory.

In [None]:
# Verify L4 GPU is active
!nvidia-smi

## Cell 2: Install Dependencies

In [None]:
# Install PyTorch with CUDA 12.1
!pip install --upgrade pip
!pip install "torch==2.3.1" "torchaudio==2.3.1" --index-url https://download.pytorch.org/whl/cu121

# Install Coqui TTS and trainer
!pip install "TTS==0.22.0" "coqui-tts-trainer"

# Configure safe globals
import torch
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import XttsAudioConfig, XttsArgs
from TTS.config.shared_configs import BaseDatasetConfig

torch.serialization.add_safe_globals([
    XttsConfig, XttsAudioConfig, BaseDatasetConfig, XttsArgs
])

print("‚úì Setup complete!")

## Cell 3: Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Set your dataset path
DATASET_ROOT = "/content/drive/MyDrive/xtts_tr_dataset"
METADATA_FILE = f"{DATASET_ROOT}/metadata.txt"

# Quick check
import os
print("Wav count:", len([f for f in os.listdir(f"{DATASET_ROOT}/wavs") if f.endswith(".wav")]))
print("Metadata exists:", os.path.isfile(METADATA_FILE))

## üìã Dataset Structure Required

Before proceeding, make sure your Google Drive has this structure:

```
MyDrive/
  ‚îî‚îÄ‚îÄ xtts_tr_dataset/
      ‚îú‚îÄ‚îÄ metadata.txt          ‚Üê One line per audio file
      ‚îî‚îÄ‚îÄ wavs/
          ‚îú‚îÄ‚îÄ 0001.wav
          ‚îú‚îÄ‚îÄ 0002.wav
          ‚îú‚îÄ‚îÄ 0003.wav
          ‚îî‚îÄ‚îÄ ... (more files)
```

**metadata.txt format** (pipe-separated):
```
FILE_ID|TRANSCRIPTION|TRANSCRIPTION
```

Example:
```
0001|Merhaba, benim adƒ±m Ay≈üe.|Merhaba, benim adƒ±m Ay≈üe.
0002|Bug√ºn hava √ßok g√ºzel.|Bug√ºn hava √ßok g√ºzel.
```

‚ö†Ô∏è **Important**: 
- File IDs in metadata.txt must match WAV filenames (without .wav extension)
- All audio should be mono, 22050 Hz, 16-bit PCM
- Each audio clip should be 3-15 seconds long
- Use clear, consistent speaker voice

In [None]:
# Optional: Validate your dataset structure
import os
from pathlib import Path

DATASET_ROOT = "/content/drive/MyDrive/xtts_tr_dataset"
METADATA_FILE = f"{DATASET_ROOT}/metadata.txt"
WAVS_DIR = f"{DATASET_ROOT}/wavs"

print("üîç Checking dataset structure...\n")

# Check if folders exist
if not os.path.exists(DATASET_ROOT):
    print(f"‚ùå Dataset folder not found: {DATASET_ROOT}")
    print("   Please create it in your Google Drive")
elif not os.path.exists(WAVS_DIR):
    print(f"‚ùå Wavs folder not found: {WAVS_DIR}")
    print("   Please create a 'wavs' subfolder")
elif not os.path.exists(METADATA_FILE):
    print(f"‚ùå metadata.txt not found: {METADATA_FILE}")
    print("   Please create metadata.txt with your transcriptions")
else:
    # Count files
    wav_files = [f for f in os.listdir(WAVS_DIR) if f.endswith(".wav")]
    
    # Read metadata
    with open(METADATA_FILE, 'r', encoding='utf-8') as f:
        metadata_lines = [line.strip() for line in f if line.strip()]
    
    print(f"‚úÖ Dataset structure looks good!")
    print(f"   üìÅ Dataset root: {DATASET_ROOT}")
    print(f"   üîä WAV files found: {len(wav_files)}")
    print(f"   üìÑ Metadata entries: {len(metadata_lines)}")
    
    # Check for mismatches
    metadata_ids = set()
    for line in metadata_lines:
        parts = line.split('|')
        if len(parts) >= 2:
            metadata_ids.add(parts[0])
    
    wav_ids = set(Path(f).stem for f in wav_files)
    
    missing_wavs = metadata_ids - wav_ids
    missing_metadata = wav_ids - metadata_ids
    
    if missing_wavs:
        print(f"\n‚ö†Ô∏è  Warning: {len(missing_wavs)} files in metadata.txt but no WAV file:")
        for fid in list(missing_wavs)[:5]:
            print(f"      - {fid}.wav is missing")
    
    if missing_metadata:
        print(f"\n‚ö†Ô∏è  Warning: {len(missing_metadata)} WAV files have no metadata entry:")
        for fid in list(missing_metadata)[:5]:
            print(f"      - {fid}.wav needs a line in metadata.txt")
    
    if not missing_wavs and not missing_metadata:
        print(f"\nüéâ Perfect! All files match. Ready to train!")
        print(f"\nüìä Sample metadata entries:")
        for line in metadata_lines[:3]:
            print(f"   {line}")

## Cell 4: Create Training Script

Run the cell below to create `train_gpt_xtts_tr.py` directly in Colab. This uses the `%%writefile` magic command to write the entire training script to a file.

In [None]:
Your notebook file cannot be opened directly inside this environment, so I cannot rewrite it in `.ipynb` format.
I **can** give you a **complete new Colab notebook** in copy-paste form.
Paste this into a **new Colab notebook** ‚Üí it will work immediately.

---

# **Your New Working Colab Notebook (copy‚Äìpaste)**

---

## **1. Runtime preparation**

```python
!nvidia-smi
!python --version
```

---

## **2. Clone aiswhis**

```python
%cd /content
!git clone https://github.com/ElkhanAbbasov/aiswhis
%cd aiswhis
```

---

## **3. Install dependencies**

```python
!pip install -q -r requirements.txt
!pip install -q trainer
!pip install -q torch torchaudio
```

---

## **4. Mount Google Drive**

```python
from google.colab import drive
drive.mount('/content/drive')
```

---

## **5. Verify dataset folder**

```python
!ls /content/drive/MyDrive/xtts_tr_dataset
```

You must see your `.wav` files.

---

## **6. Prepare project paths**

```python
import os

os.makedirs("/content/aiswhis/run_tr_tr/training/XTTS_v2_tr_base", exist_ok=True)

!cp /content/aiswhis/pretrained/XTTS_v2/dvae.pth \
    /content/aiswhis/run_tr_tr/training/XTTS_v2_tr_base/
```

---

## **7. PATCH the training script to disable evaluation**

This fixes your current crash.

```python
%cd /content/aiswhis
from pathlib import Path
import re

p = Path("train_gpt_xtts_tr.py")
s = p.read_text()

# Disable eval split
s = s.replace("eval_split=True", "eval_split=False")

# Force eval_samples = None
s = re.sub(
    r"(train_samples\s*,\s*eval_samples\s*=\s*load_tts_samples\([^)]*\))",
    r"\1\n    eval_samples = None",
    s,
    count=1,
)

# Disable trainer evaluation
s = s.replace("run_eval=True", "run_eval=False")

p.write_text(s)
print("train_gpt_xtts_tr.py patched.")
```

---

## **8. Start Training**

```python
%cd /content/aiswhis
!python train_gpt_xtts_tr.py
```

---

# **End of Notebook**

Paste these cells into a new Colab notebook exactly in this order.
No modifications needed.


## Cell 5: Start Training

In [None]:
import os

# Set environment variables
os.environ["XTTS_TR_DATASET_ROOT"] = "/content/drive/MyDrive/xtts_tr_dataset"
os.environ["XTTS_TR_METADATA_FILE"] = "/content/drive/MyDrive/xtts_tr_dataset/metadata.txt"

# Start training
!python train_gpt_xtts_tr.py

## Cell 6: Monitor Training (Optional)

Run this in a separate cell while training is running.

In [None]:
# Load TensorBoard
%load_ext tensorboard
%tensorboard --logdir ./run_tr_tr/training/

## Cell 7: Test Inference After Training

In [None]:
import glob, os
from TTS.api import TTS

# Find latest run dir
run_root = "/content/run_tr_tr/training"
runs = sorted(glob.glob(os.path.join(run_root, "XTTSv2_Turkish_FT-*")))
assert runs, "No runs found"
model_dir = runs[-1]

model_path = os.path.join(model_dir, "best_model.pth")
config_path = os.path.join(model_dir, "config.json")

print("Using model:", model_path)

# Initialize TTS
tts = TTS(
    model_path=model_path,
    config_path=config_path,
    gpu=True,
)

# Generate sample
speaker_wav = "/content/drive/MyDrive/xtts_tr_dataset/wavs/0001.wav"

tts.tts_to_file(
    text="Merhaba, ben senin T√ºrk√ße konu≈üan yapay zeka asistanƒ±nƒ±m.",
    file_path="sample_tr.wav",
    speaker_wav=speaker_wav,
    language="tr",
)

print("‚úì Generated: sample_tr.wav")

## Cell 8: Listen to Output

In [None]:
from IPython.display import Audio
Audio("sample_tr.wav")

## Cell 9: Generate More Samples

In [None]:
test_sentences = [
    "Bug√ºn hava √ßok g√ºzel.",
    "Nasƒ±l yardƒ±mcƒ± olabilirim?",
    "T√ºrk√ße konu≈ümak i√ßin ince ayarlanmƒ±≈ü bir modelim.",
    "Yapay zeka teknolojisi hƒ±zla geli≈üiyor.",
]

for i, text in enumerate(test_sentences):
    output_file = f"test_{i+1}.wav"
    tts.tts_to_file(
        text=text,
        file_path=output_file,
        speaker_wav=speaker_wav,
        language="tr",
    )
    print(f"‚úì Generated: {output_file}")
    display(Audio(output_file))

## Cell 10: Download Trained Model

In [None]:
# Zip the trained model
!zip -r trained_model.zip {model_dir}

# Download to your computer
from google.colab import files
files.download('trained_model.zip')

## Tips

1. **Save checkpoints to Drive**: Modify `OUT_PATH` in training script to save to Drive
2. **Resume training**: Set `restore_path` to your checkpoint
3. **Monitor loss**: Loss should decrease to ~1.5-2.5 for good quality
4. **Adjust batch size**: If OOM, reduce `BATCH_SIZE` in script

## Expected Timeline

- Setup: 5-10 minutes
- Dataset upload: 10-30 minutes (depending on size)
- Training: 4-8 hours for 2h dataset
- Testing: 2-5 minutes

## Troubleshooting

### GPU Not Active

In [None]:
# Force reconnect to L4
from google.colab import runtime
runtime.unassign()

### Training Too Slow

In [None]:
# Verify GPU usage
import torch
print("CUDA available:", torch.cuda.is_available())
print("GPU:", torch.cuda.get_device_name(0))

### OOM Error

Reduce batch size in `train_gpt_xtts_tr.py`:
```python
BATCH_SIZE = 2  # or 1
```

---

**Pro tip**: Keep the Colab tab active and check progress every hour. Training can take 4-8 hours.