# Bemba TTS Training Notebook

This notebook trains a Text-to-Speech model for Bemba language using the BembaSpeech dataset. We'll use Coqui TTS framework for training.

## 1. Import Required Libraries

Import necessary libraries for TTS training, including Coqui TTS, audio processing, and data handling.

In [None]:
# Install Coqui TTS if not already installed
!pip install coqui-tts

import os
import pandas as pd
import numpy as np
from pathlib import Path
import librosa
from TTS.api import TTS
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer

## 2. Load and Prepare Data

Load the BembaSpeech dataset and prepare it for TTS training by extracting text and audio paths.

In [None]:
# Load BembaSpeech dataset
dataset_path = 'BembaSpeech/bem'

# Load training data
train_df = pd.read_csv(f'{dataset_path}/train.csv', sep='\t')
dev_df = pd.read_csv(f'{dataset_path}/dev.csv', sep='\t')
test_df = pd.read_csv(f'{dataset_path}/test.csv', sep='\t')

print(f"Training samples: {len(train_df)}")
print(f"Dev samples: {len(dev_df)}")
print(f"Test samples: {len(test_df)}")

# Sample data
print(train_df.head())

# Prepare metadata for TTS training
def prepare_metadata(df, audio_dir):
    metadata = []
    for _, row in df.iterrows():
        audio_path = f"{audio_dir}/{row['audio_filepath']}"
        text = row['text']
        metadata.append(f"{audio_path}|{text}")
    return metadata

train_metadata = prepare_metadata(train_df, f"{dataset_path}/audio")
dev_metadata = prepare_metadata(dev_df, f"{dataset_path}/audio")

# Save metadata files
with open('train_metadata.txt', 'w') as f:
    f.write('\n'.join(train_metadata))

with open('dev_metadata.txt', 'w') as f:
    f.write('\n'.join(dev_metadata))

## 3. Define the Model

We'll use a pre-trained TTS model and fine-tune it on Bemba data. For this demo, we'll use a simple TTS model setup.

In [None]:
# For TTS training, we'll use Coqui TTS configuration
# Note: Full training requires significant compute resources
# This is a simplified setup for demonstration

# Define TTS model configuration
model_config = {
    'model': 'tts_models/en/ljspeech/tacotron2-DDC_ph',  # Base English model
    'vocoder': 'vocoder_models/en/ljspeech/hifigan_v2',
    'language': 'en',  # We'll adapt for Bemba
}

# Initialize TTS with pre-trained model
tts = TTS(model_name=model_config['model'], 
          vocoder_name=model_config['vocoder'])

print("TTS Model loaded successfully")
print(f"Model: {model_config['model']}")
print(f"Vocoder: {model_config['vocoder']}")

## 4. Compile the Model

Configure the TTS model settings and prepare for inference or fine-tuning.

In [None]:
# TTS models are pre-compiled, but we can set parameters
tts.tts_config.audio.sample_rate = 22050
tts.tts_config.audio.do_trim_silence = True

print("TTS Model configured:")
print(f"Sample Rate: {tts.tts_config.audio.sample_rate}")
print(f"Trim Silence: {tts.tts_config.audio.do_trim_silence}")

## 5. Train the Model

Note: Full TTS training requires significant computational resources and time. For this demo, we'll use the pre-trained model for inference. To train on Bemba data, use Coqui TTS training scripts.

In [None]:
# Training command (run in terminal, not in notebook for long training)
# python -m TTS.bin.train_tts --config_path config.json --restore_path <pretrained_model_path>

print("For full training, use:")
print("python -m TTS.bin.train_tts --config_path bemba_tts_config.json --restore_path pretrained_model.pth.tar")

## 6. Evaluate the Model

Test the TTS model by generating audio for sample Bemba phrases.

In [None]:
# Test TTS with Bemba phrases
bemba_phrases = [
    "Umuntu wawonekera",  # Person detected
    "Imoto yawonekera",   # Car detected
    "Ibayisikilo yawonekera",  # Bicycle detected
    "Iciti cawonekera",   # Chair detected
    "Itabule yawonekera"  # Table detected
]

for i, phrase in enumerate(bemba_phrases):
    print(f"Generating audio for: {phrase}")
    # Note: This will use English phonetics, not true Bemba
    # For proper Bemba TTS, need trained model
    tts.tts_to_file(text=phrase, file_path=f"bemba_sample_{i}.wav")
    print(f"Saved: bemba_sample_{i}.wav")

## 7. Make Predictions

Generate audio files for all object detection phrases that can be used in the Android app.

In [None]:
# Generate audio for all object detection labels
object_phrases = {
    "person": "Umuntu wawonekera",
    "car": "Imoto yawonekera", 
    "bicycle": "Ibayisikilo yawonekera",
    "motorcycle": "Njinga yamamoto yawonekera",
    "bus": "Ibhasi yawonekera",
    "truck": "Iloli yawonekera",
    "chair": "Iciti cawonekera",
    "table": "Itabule yawonekera",
    "obstacle": "Ichocha chawonekera"
}

# Create output directory
os.makedirs('bemba_audio', exist_ok=True)

for obj, phrase in object_phrases.items():
    output_path = f"bemba_audio/bemba_{obj}.wav"
    print(f"Generating: {phrase}")
    tts.tts_to_file(text=phrase, file_path=output_path)
    print(f"Saved: {output_path}")

print("\nAudio files generated in 'bemba_audio/' directory")
print("Convert to MP3 and copy to Android project: app/src/main/res/raw/bemba/")