# Aparsoft TTS - Complete Tutorial: Basic to Advanced

**A comprehensive guide to using Aparsoft TTS for professional voice generation**

---

## Table of Contents

1. [Setup and Installation](#setup-and-installation)
2. [Basic Concepts](#basic-concepts)
3. [Simple Speech Generation](#simple-speech-generation)
4. [Exploring Voices](#exploring-voices)
5. [Audio Enhancement](#audio-enhancement)
6. [Batch Processing](#batch-processing)
7. [Script Processing for Videos](#script-processing-for-videos)
8. [Configuration Management](#configuration-management)
9. [Command-Line Interface](#command-line-interface)
10. [Advanced Audio Processing](#advanced-audio-processing)
11. [MCP Server Integration](#mcp-server-integration)
12. [Production Deployment](#production-deployment)
13. [Testing and Debugging](#testing-and-debugging)
14. [Performance Optimization](#performance-optimization)

---

## Setup and Installation

### System Dependencies

Before starting, ensure you have the required system packages installed:

**Ubuntu/Debian:**


In [None]:
%%bash
%%bash
sudo apt-get update
sudo apt-get install espeak-ng ffmpeg libsndfile1

**macOS:**


In [None]:
%%bash
%%bash
brew install espeak ffmpeg

**Windows:**
- Download espeak from http://espeak.sourceforge.net/
- Download ffmpeg from https://ffmpeg.org/download.html
- Add both to system PATH

### Python Package Installation


In [None]:
%%bash
%%bash
# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Linux/Mac:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate

# Install package with all features
pip install -e ".[mcp,cli,dev]"

### Verify Installation


In [1]:
# Test core imports
from aparsoft_tts import TTSEngine, TTSConfig
from aparsoft_tts import ALL_VOICES, MALE_VOICES, FEMALE_VOICES

print("✅ Aparsoft TTS imported successfully!")
print(f"Available male voices: {MALE_VOICES}")
print(f"Available female voices: {FEMALE_VOICES}")

  from .autonotebook import tqdm as notebook_tqdm


✅ Aparsoft TTS imported successfully!
Available male voices: ['am_adam', 'am_michael', 'bm_george', 'bm_lewis']
Available female voices: ['af_bella', 'af_nicole', 'af_sarah', 'af_sky', 'bf_emma', 'bf_isabella']


---

## Basic Concepts

### Architecture Overview

Aparsoft TTS is built with a modular architecture:

- **Core Engine** (`core/engine.py`): Main TTS functionality using Kokoro model
- **Configuration** (`config.py`): Pydantic-based settings management
- **Audio Utilities** (`utils/audio.py`): Audio processing functions
- **Logging** (`utils/logging.py`): Structured logging with structlog
- **Exceptions** (`utils/exceptions.py`): Custom exception hierarchy
- **CLI** (`cli.py`): Command-line interface with Typer
- **MCP Server** (`mcp_server.py`): Model Context Protocol server

### Key Classes

**TTSEngine**: Main class for text-to-speech generation
- Methods: `generate()`, `generate_stream()`, `batch_generate()`, `process_script()`
- Handles model loading, audio generation, and enhancement

**TTSConfig**: Configuration management
- Uses Pydantic for validation
- Supports environment variables
- Type-safe settings

---

## Simple Speech Generation

### Creating a TTS Engine Instance


In [2]:
from aparsoft_tts import TTSEngine

# Initialize with default settings
engine = TTSEngine()

# The engine is now ready to generate speech
# Default voice: am_michael (American male)
# Default speed: 1.0
# Default enhancement: enabled

[2m2025-10-05T06:58:44.290174Z[0m [[32m[1minfo     [0m] [1minitializing_tts_engine       [0m [[0m[1m[34mTTSEngine[0m][0m [36mfilename[0m=[35mengine.py[0m [36mfunc_name[0m=[35m__init__[0m [36mlineno[0m=[35m138[0m [36mmodule[0m=[35mengine[0m [36mpathname[0m=[35m/home/ram/projects/youtube-creator/aparsoft_tts/core/engine.py[0m [36mprocess[0m=[35m58195[0m [36mprocess_name[0m=[35mMainProcess[0m [36mservice[0m=[35maparsoft-tts[0m [36mthread[0m=[35m138044826550400[0m [36mthread_name[0m=[35mMainThread[0m [36mtoken_limits[0m=[35m100-250[0m [36mversion[0m=[35m1.0.0[0m [36mvoice[0m=[35mam_michael[0m


  WeightNorm.apply(module, name, dim)


[2m2025-10-05T06:58:46.541273Z[0m [[32m[1minfo     [0m] [1mtts_engine_initialized        [0m [[0m[1m[34mTTSEngine[0m][0m [36mdefault_lang_code[0m=[35ma[0m [36mfilename[0m=[35mengine.py[0m [36mfunc_name[0m=[35m__init__[0m [36mlineno[0m=[35m146[0m [36mmodule[0m=[35mengine[0m [36mpathname[0m=[35m/home/ram/projects/youtube-creator/aparsoft_tts/core/engine.py[0m [36mprocess[0m=[35m58195[0m [36mprocess_name[0m=[35mMainProcess[0m [36mservice[0m=[35maparsoft-tts[0m [36mthread[0m=[35m138044826550400[0m [36mthread_name[0m=[35mMainThread[0m [36mversion[0m=[35m1.0.0[0m


### Generate Your First Audio


In [3]:
# Generate speech and save to file
output_path = engine.generate(
    text="Welcome to Aparsoft TTS tutorial",
    output_path="output/tutorial_intro.wav"
)

print(f"Audio saved to: {output_path}")

[2m2025-10-05T06:58:49.891320Z[0m [[32m[1minfo     [0m] [1mgenerating_speech             [0m [[0m[1m[34mTTSEngine[0m][0m [36menhance[0m=[35mTrue[0m [36mestimated_tokens[0m=[35m40[0m [36mfilename[0m=[35mengine.py[0m [36mfunc_name[0m=[35mgenerate[0m [36mlineno[0m=[35m397[0m [36mmodule[0m=[35mengine[0m [36mpathname[0m=[35m/home/ram/projects/youtube-creator/aparsoft_tts/core/engine.py[0m [36mprocess[0m=[35m58195[0m [36mprocess_name[0m=[35mMainProcess[0m [36mservice[0m=[35maparsoft-tts[0m [36mspeed[0m=[35m1.0[0m [36mtext_length[0m=[35m32[0m [36mthread[0m=[35m138044826550400[0m [36mthread_name[0m=[35mMainThread[0m [36mversion[0m=[35m1.0.0[0m [36mvoice[0m=[35mam_michael[0m
[2m2025-10-05T06:58:51.790468Z[0m [[32m[1minfo     [0m] [1mspeech_generated              [0m [[0m[1m[34mTTSEngine[0m][0m [36mduration_seconds[0m=[35m2.505333333333333[0m [36mfilename[0m=[35mengine.py[0m [36mfunc_name[0m=[35mge

### Generate Audio as NumPy Array


In [4]:
import numpy as np

# Generate without saving (returns numpy array)
audio_array = engine.generate(
    text="This returns an audio array instead of saving to file"
)

print(f"Audio type: {type(audio_array)}")
print(f"Audio shape: {audio_array.shape}")
print(f"Audio dtype: {audio_array.dtype}")
print(f"Duration: {len(audio_array) / 24000:.2f} seconds")

[2m2025-10-05T06:58:58.779099Z[0m [[32m[1minfo     [0m] [1mgenerating_speech             [0m [[0m[1m[34mTTSEngine[0m][0m [36menhance[0m=[35mTrue[0m [36mestimated_tokens[0m=[35m53[0m [36mfilename[0m=[35mengine.py[0m [36mfunc_name[0m=[35mgenerate[0m [36mlineno[0m=[35m397[0m [36mmodule[0m=[35mengine[0m [36mpathname[0m=[35m/home/ram/projects/youtube-creator/aparsoft_tts/core/engine.py[0m [36mprocess[0m=[35m58195[0m [36mprocess_name[0m=[35mMainProcess[0m [36mservice[0m=[35maparsoft-tts[0m [36mspeed[0m=[35m1.0[0m [36mtext_length[0m=[35m53[0m [36mthread[0m=[35m138044826550400[0m [36mthread_name[0m=[35mMainThread[0m [36mversion[0m=[35m1.0.0[0m [36mvoice[0m=[35mam_michael[0m
[2m2025-10-05T06:58:59.508420Z[0m [[32m[1minfo     [0m] [1mspeech_generated              [0m [[0m[1m[34mTTSEngine[0m][0m [36mduration_seconds[0m=[35m3.428[0m [36mfilename[0m=[35mengine.py[0m [36mfunc_name[0m=[35mgenerate[0m 

### Customize Generation Parameters


In [6]:
# Generate with custom parameters
custom_output = engine.generate(
    text="This speech is customized with different parameters",
    output_path="output/custom.wav",
    voice="bm_george",      # British male voice
    speed=1.2,              # 20% faster
    enhance=True            # Enable audio enhancement
)

print(f"Custom audio generated: {custom_output}")

[2m2025-10-05T07:00:16.505151Z[0m [[32m[1minfo     [0m] [1mgenerating_speech             [0m [[0m[1m[34mTTSEngine[0m][0m [36menhance[0m=[35mTrue[0m [36mestimated_tokens[0m=[35m50[0m [36mfilename[0m=[35mengine.py[0m [36mfunc_name[0m=[35mgenerate[0m [36mlineno[0m=[35m397[0m [36mmodule[0m=[35mengine[0m [36mpathname[0m=[35m/home/ram/projects/youtube-creator/aparsoft_tts/core/engine.py[0m [36mprocess[0m=[35m58195[0m [36mprocess_name[0m=[35mMainProcess[0m [36mservice[0m=[35maparsoft-tts[0m [36mspeed[0m=[35m1.2[0m [36mtext_length[0m=[35m51[0m [36mthread[0m=[35m138044826550400[0m [36mthread_name[0m=[35mMainThread[0m [36mversion[0m=[35m1.0.0[0m [36mvoice[0m=[35mbm_george[0m
[2m2025-10-05T07:00:17.111954Z[0m [[32m[1minfo     [0m] [1mspeech_generated              [0m [[0m[1m[34mTTSEngine[0m][0m [36mduration_seconds[0m=[35m2.852[0m [36mfilename[0m=[35mengine.py[0m [36mfunc_name[0m=[35mgenerate[0m [

---

## Exploring Voices

### List All Available Voices


In [7]:
from aparsoft_tts import TTSEngine, ALL_VOICES

# Static method to list voices
voices = TTSEngine.list_voices()

print("=== Available Voices ===\n")

print("Male Voices:")
for voice in voices['male']:
    print(f"  - {voice}")

print("\nFemale Voices:")
for voice in voices['female']:
    print(f"  - {voice}")

print(f"\nTotal voices: {len(ALL_VOICES)}")

=== Available Voices ===

Male Voices:
  - am_adam
  - am_michael
  - bm_george
  - bm_lewis

Female Voices:
  - af_bella
  - af_nicole
  - af_sarah
  - af_sky
  - bf_emma
  - bf_isabella

Total voices: 11


### Voice Comparison Test


In [9]:
# Test text for comparison
test_text = "Hello and welcome, this is a voice comparison test for Aparsoft TTS."

# Generate audio with each male voice
male_voices = ["am_michael", "bm_george", "am_adam", "bm_lewis"]
# Female Voices:  - af_bella  - af_nicole  - af_sarah  - af_sky  - bf_emma  - bf_isabella
female_voices = ["af_bella", "af_nicole", "af_sarah", "af_sky", "bf_emma", "bf_isabella"]

for voice in male_voices:
    output_file = f"output/male_voice_test_{voice}.wav"
    engine.generate(
        text=test_text,
        output_path=output_file,
        voice=voice
    )
    print(f"✅ Generated: {output_file}")

for voice in female_voices:
    output_file = f"output/female_voice_test_{voice}.wav"
    engine.generate(
        text=test_text,
        output_path=output_file,
        voice=voice
    )
    print(f"✅ Generated: {output_file}")


[2m2025-10-05T07:05:28.656026Z[0m [[32m[1minfo     [0m] [1mgenerating_speech             [0m [[0m[1m[34mTTSEngine[0m][0m [36menhance[0m=[35mTrue[0m [36mestimated_tokens[0m=[35m75[0m [36mfilename[0m=[35mengine.py[0m [36mfunc_name[0m=[35mgenerate[0m [36mlineno[0m=[35m397[0m [36mmodule[0m=[35mengine[0m [36mpathname[0m=[35m/home/ram/projects/youtube-creator/aparsoft_tts/core/engine.py[0m [36mprocess[0m=[35m58195[0m [36mprocess_name[0m=[35mMainProcess[0m [36mservice[0m=[35maparsoft-tts[0m [36mspeed[0m=[35m1.0[0m [36mtext_length[0m=[35m68[0m [36mthread[0m=[35m138044826550400[0m [36mthread_name[0m=[35mMainThread[0m [36mversion[0m=[35m1.0.0[0m [36mvoice[0m=[35mam_michael[0m
[2m2025-10-05T07:05:29.651684Z[0m [[32m[1minfo     [0m] [1mspeech_generated              [0m [[0m[1m[34mTTSEngine[0m][0m [36mduration_seconds[0m=[35m4.836[0m [36mfilename[0m=[35mengine.py[0m [36mfunc_name[0m=[35mgenerate[0m 

In [10]:
from aparsoft_tts import TTSEngine
from pathlib import Path
import time

# Create output directory
output_dir = Path("output/parameter_comparison")
output_dir.mkdir(parents=True, exist_ok=True)

# Test text
test_text = "This is a demonstration of different voice parameters using Kokoro TTS"

print("=" * 70)
print("APARSOFT TTS - PARAMETER COMPARISON DEMO")
print("=" * 70)
print(f"\nTest text: {test_text}\n")

engine = TTSEngine()

# ==========================================
# 1. RAW (No Enhancement)
# ==========================================
print("🎙️  [1/6] Generating RAW audio (no enhancement)...")
start = time.time()

raw_path = engine.generate(
    text=test_text,
    output_path=output_dir / "1_raw_no_enhancement.wav",
    voice="am_michael",
    speed=1.0,
    enhance=False,  # ← NO enhancement
)

print(f"✅ Raw: {time.time() - start:.2f}s | {raw_path.name}")

# ==========================================
# 2. ENHANCED (Basic Enhancement)
# ==========================================
print("\n🎙️  [2/6] Generating ENHANCED audio (full processing)...")
start = time.time()

enhanced_path = engine.generate(
    text=test_text,
    output_path=output_dir / "2_enhanced.wav",
    voice="am_michael",
    speed=1.0,
    enhance=True,  # ← Full enhancement (normalize, trim, fade, noise reduction)
)

print(f"✅ Enhanced: {time.time() - start:.2f}s | {enhanced_path.name}")

# ==========================================
# 3. SPEED VARIATIONS
# ==========================================
print("\n🎙️  [3/6] Generating SPEED variations...")

speeds = [
    (0.8, "slow", "Slower, deliberate pace"),
    (1.0, "normal", "Default natural speed"),
    (1.3, "fast", "Faster, energetic delivery"),
]

for speed_val, speed_name, description in speeds:
    start = time.time()

    speed_path = engine.generate(
        text=test_text,
        output_path=output_dir / f"3_speed_{speed_name}_{speed_val}x.wav",
        voice="am_michael",
        speed=speed_val,
        enhance=True,
    )

    print(f"✅ Speed {speed_val}x ({speed_name}): {time.time() - start:.2f}s | {description}")

# ==========================================
# 4. VOICE VARIATIONS
# ==========================================
print("\n🎙️  [4/6] Generating different VOICES...")

voices = [
    ("am_michael", "American male - professional, deep"),
    ("bm_george", "British male - formal, authoritative"),
    ("af_bella", "American female - warm, friendly"),
    ("bf_emma", "British female - professional, elegant"),
]

for voice_id, description in voices:
    start = time.time()

    voice_path = engine.generate(
        text=test_text,
        output_path=output_dir / f"4_voice_{voice_id}.wav",
        voice=voice_id,
        speed=1.0,
        enhance=True,
    )

    print(f"✅ {voice_id}: {time.time() - start:.2f}s | {description}")

# ==========================================
# 5. COMBINATION: Fast + No Enhancement
# ==========================================
print("\n🎙️  [5/6] Generating FAST + RAW (quick draft)...")
start = time.time()

fast_raw_path = engine.generate(
    text=test_text,
    output_path=output_dir / "5_fast_raw_draft.wav",
    voice="am_michael",
    speed=1.5,  # Fast
    enhance=False,  # No processing - fastest generation
)

print(f"✅ Fast+Raw: {time.time() - start:.2f}s | Quick draft mode")

# ==========================================
# 6. COMBINATION: Slow + British + Enhanced
# ==========================================
print("\n🎙️  [6/6] Generating SLOW + BRITISH + ENHANCED (premium quality)...")
start = time.time()

premium_path = engine.generate(
    text=test_text,
    output_path=output_dir / "6_premium_slow_british.wav",
    voice="bm_george",
    speed=0.9,  # Slower for clarity
    enhance=True,  # Full enhancement
)

print(f"✅ Premium: {time.time() - start:.2f}s | Formal presentation quality")

# ==========================================
# SUMMARY TABLE
# ==========================================
print("\n" + "=" * 70)
print("COMPARISON SUMMARY")
print("=" * 70)

print(f"\n📁 All files saved to: {output_dir}/")

print("\n📊 USE CASES:")
print("\n1️⃣  RAW (enhance=False)")
print("   ├─ Use: Quick drafts, testing, preview")
print("   ├─ Speed: Fastest (~0.5s)")
print("   └─ Quality: Good but unpolished")

print("\n2️⃣  ENHANCED (enhance=True)")
print("   ├─ Use: Final production, YouTube, podcasts")
print("   ├─ Speed: Fast (~0.6-0.7s)")
print("   └─ Quality: Professional, polished")

print("\n3️⃣  SPEED 0.8x (Slow)")
print("   ├─ Use: Educational content, complex topics")
print("   ├─ Speed: ~0.7s")
print("   └─ Quality: Clear, easy to understand")

print("\n4️⃣  SPEED 1.3x (Fast)")
print("   ├─ Use: Quick updates, news, energetic content")
print("   ├─ Speed: ~0.5s")
print("   └─ Quality: Fast-paced, engaging")

print("\n5️⃣  BRITISH VOICE (bm_george)")
print("   ├─ Use: Documentaries, formal presentations")
print("   ├─ Speed: ~0.6s")
print("   └─ Quality: Authoritative, professional")

print("\n6️⃣  AMERICAN FEMALE (af_bella)")
print("   ├─ Use: Storytelling, friendly content")
print("   ├─ Speed: ~0.6s")
print("   └─ Quality: Warm, approachable")

print("\n" + "=" * 70)
print("🎧 RECOMMENDED SETTINGS BY USE CASE")
print("=" * 70)

recommendations = [
    ("YouTube Tutorial", "am_michael", 1.0, True, "Professional + Clear"),
    ("Podcast", "af_bella", 0.95, True, "Warm + Natural"),
    ("Documentary", "bm_george", 0.9, True, "Authoritative + Slow"),
    ("Quick Draft", "am_michael", 1.3, False, "Fast + Raw"),
    ("Audiobook", "af_bella", 0.85, True, "Slow + Warm"),
    ("News/Updates", "am_michael", 1.2, True, "Fast + Professional"),
    ("Educational", "bm_george", 0.9, True, "Slow + Clear"),
    ("Advertisement", "af_sky", 1.1, True, "Energetic + Young"),
]

print(
    "\n{:<20} {:<12} {:<8} {:<8} {:<25}".format(
        "USE CASE", "VOICE", "SPEED", "ENHANCE", "DESCRIPTION"
    )
)
print("-" * 70)

for use_case, voice, speed, enhance, desc in recommendations:
    enhance_str = "Yes" if enhance else "No"
    print(
        "{:<20} {:<12} {:<8} {:<8} {:<25}".format(use_case, voice, f"{speed}x", enhance_str, desc)
    )

print("\n" + "=" * 70)
print("✨ Pro Tips:")
print("=" * 70)
print("• enhance=False for DRAFTS (2x faster, test scripts quickly)")
print("• enhance=True for PRODUCTION (professional quality, ready to publish)")
print("• speed=0.8-0.9 for LEARNING (educational, complex topics)")
print("• speed=1.0-1.1 for NORMAL (most content, natural pace)")
print("• speed=1.2-1.5 for ENERGY (news, updates, quick content)")
print("• am_michael = Most popular (YouTube, tutorials, corporate)")
print("• bm_george = Formal/authoritative (documentaries, presentations)")
print("• af_bella = Friendly/warm (storytelling, guides)")
print("=" * 70)

print(f"\n✅ Done! Compare the 6 files in: {output_dir}/")

APARSOFT TTS - PARAMETER COMPARISON DEMO

Test text: This is a demonstration of different voice parameters using Kokoro TTS

[2m2025-10-05T07:06:40.366594Z[0m [[32m[1minfo     [0m] [1minitializing_tts_engine       [0m [[0m[1m[34mTTSEngine[0m][0m [36mfilename[0m=[35mengine.py[0m [36mfunc_name[0m=[35m__init__[0m [36mlineno[0m=[35m138[0m [36mmodule[0m=[35mengine[0m [36mpathname[0m=[35m/home/ram/projects/youtube-creator/aparsoft_tts/core/engine.py[0m [36mprocess[0m=[35m58195[0m [36mprocess_name[0m=[35mMainProcess[0m [36mservice[0m=[35maparsoft-tts[0m [36mthread[0m=[35m138044826550400[0m [36mthread_name[0m=[35mMainThread[0m [36mtoken_limits[0m=[35m100-250[0m [36mversion[0m=[35m1.0.0[0m [36mvoice[0m=[35mam_michael[0m
[2m2025-10-05T07:06:41.959704Z[0m [[32m[1minfo     [0m] [1mtts_engine_initialized        [0m [[0m[1m[34mTTSEngine[0m][0m [36mdefault_lang_code[0m=[35ma[0m [36mfilename[0m=[35mengine.py[0m [36mfu

### Voice Characteristics


In [None]:
# Define voice characteristics
voice_info = {
    "am_michael": {
        "gender": "male",
        "accent": "American",
        "tone": "Professional, clear",
        "use_case": "Tutorials, corporate content"
    },
    "bm_george": {
        "gender": "male",
        "accent": "British",
        "tone": "Formal, authoritative",
        "use_case": "Documentaries, formal presentations"
    },
    "am_adam": {
        "gender": "male",
        "accent": "American",
        "tone": "Younger, casual",
        "use_case": "Entertainment, vlogs"
    },
    "af_bella": {
        "gender": "female",
        "accent": "American",
        "tone": "Warm, friendly",
        "use_case": "Storytelling, guided tours"
    },
    "af_heart": {
        "gender": "female",
        "accent": "American",
        "tone": "Expressive, energetic",
        "use_case": "Advertising, promotional content"
    },
    "bf_emma": {
        "gender": "female",
        "accent": "British",
        "tone": "Professional, elegant",
        "use_case": "Educational content, narration"
    }
}

# Display voice information
for voice, info in voice_info.items():
    print(f"\n{voice}:")
    for key, value in info.items():
        print(f"  {key}: {value}")

---

## Audio Enhancement

### Understanding Audio Enhancement

Audio enhancement in Aparsoft TTS includes:
1. **Normalization**: Adjusts volume to consistent level
2. **Silence Trimming**: Removes quiet sections from start/end
3. **Noise Reduction**: Applies spectral gating to reduce background noise
4. **Fade In/Out**: Prevents clicks and pops at audio boundaries

### Enhanced vs Non-Enhanced Comparison


In [None]:
# Generate without enhancement
audio_raw = engine.generate(
    text="This audio has no enhancement applied",
    output_path="output/no_enhancement.wav",
    enhance=False
)

# Generate with enhancement (default)
audio_enhanced = engine.generate(
    text="This audio has full enhancement applied",
    output_path="output/with_enhancement.wav",
    enhance=True
)

print("Generated both versions for comparison")

### Manual Audio Enhancement


In [None]:
from aparsoft_tts.utils.audio import enhance_audio
import numpy as np

# Generate raw audio
raw_audio = engine.generate(
    text="Testing manual audio enhancement"
)

# Apply custom enhancement
enhanced = enhance_audio(
    raw_audio,
    sample_rate=24000,
    normalize=True,
    trim_silence=True,
    trim_db=25.0,           # More aggressive trimming
    add_fade=True,
    fade_duration=0.2,      # Longer fade (200ms)
    noise_reduction=True
)

print(f"Raw audio length: {len(raw_audio)} samples")
print(f"Enhanced audio length: {len(enhanced)} samples")
print(f"Reduction: {len(raw_audio) - len(enhanced)} samples")

### Custom Audio Processing


In [None]:
from aparsoft_tts.utils.audio import save_audio, get_audio_duration
import librosa
import numpy as np

# Generate audio
audio = engine.generate("Testing custom audio processing")

# Apply custom processing with librosa
# Example: Pitch shift
audio_shifted = librosa.effects.pitch_shift(
    audio,
    sr=24000,
    n_steps=2  # Shift up by 2 semitones
)

# Save custom processed audio
save_audio(audio_shifted, "output/pitch_shifted.wav", sample_rate=24000)

# Get duration
duration = get_audio_duration(audio_shifted, sample_rate=24000)
print(f"Duration: {duration:.2f} seconds")

---

## Batch Processing

### Basic Batch Generation


In [None]:
from pathlib import Path

# Create output directory
output_dir = Path("output/batch_example")
output_dir.mkdir(parents=True, exist_ok=True)

# List of texts to generate
texts = [
    "Welcome to our tutorial series",
    "In this episode, we will cover text-to-speech",
    "Let's explore the features of Aparsoft TTS",
    "Don't forget to subscribe for more content"
]

# Batch generate
paths = engine.batch_generate(
    texts=texts,
    output_dir=output_dir,
    voice="am_michael",
    speed=1.0
)

print(f"Generated {len(paths)} files:")
for path in paths:
    print(f"  - {path}")

### Batch with Different Voices


In [None]:
# Define segments with specific voices
segments = [
    ("Welcome to the podcast", "am_michael"),
    ("Today's guest is here", "af_bella"),
    ("Thanks for joining us", "am_michael"),
    ("Happy to be here", "af_bella")
]

# Generate each with its specified voice
for i, (text, voice) in enumerate(segments):
    output = f"output/podcast_segment_{i+1:02d}_{voice}.wav"
    engine.generate(text, output, voice=voice)
    print(f"✅ Generated: {output}")

### Batch with Speed Variations


In [None]:
# Generate the same text at different speeds
base_text = "Testing different speech speeds for comparison"

speeds = [0.8, 0.9, 1.0, 1.1, 1.2, 1.3]

for speed in speeds:
    output = f"output/speed_test_{speed:.1f}x.wav"
    engine.generate(
        text=base_text,
        output_path=output,
        speed=speed
    )
    print(f"Generated at {speed}x speed: {output}")

---

## Script Processing for Videos

### Creating a Video Script File


In [None]:
# Define a complete video script
video_script = """
Hi everyone! Welcome back to my channel.
Today we're going to explore an amazing open-source text-to-speech system.
This system uses state-of-the-art AI models to generate professional voiceovers.
Let me show you how it works with some live examples.
First, we'll look at the basic features and capabilities.
Then we'll dive into advanced usage and customization options.
If you find this useful, please like and subscribe for more content.
Thanks for watching! See you in the next video.
"""

# Save script to file
script_path = Path("output/video_script.txt")
script_path.write_text(video_script.strip())

print(f"Script saved to: {script_path}")

### Process Complete Script


In [None]:
# Process the script with automatic paragraph detection
complete_voiceover = engine.process_script(
    script_path=script_path,
    output_path="output/complete_voiceover.wav",
    gap_duration=1.2,  # 1.2 seconds gap between paragraphs
    voice="am_michael",
    speed=1.0
)

print(f"Complete voiceover generated: {complete_voiceover}")

### Script Processing with Custom Gaps


In [None]:
# Generate versions with different gap durations
gap_durations = [0.3, 0.5, 0.7, 1.0]

for gap in gap_durations:
    output = f"output/voiceover_gap_{gap:.1f}s.wav"
    engine.process_script(
        script_path=script_path,
        output_path=output,
        gap_duration=gap,
        voice="am_michael",
    )
    print(f"Generated with {gap}s gaps: {output}")

### Advanced Script Processing


In [None]:
from aparsoft_tts.utils.audio import combine_audio_segments, save_audio
from pathlib import Path

# Define script sections with metadata
script_sections = [
    {
        "name": "intro",
        "text": "Welcome to this advanced tutorial on text-to-speech.",
        "voice": "bm_george",
        "speed": 1.0
    },
    {
        "name": "main_content",
        "text": "Now let's explore the advanced features of the system.",
        "voice": "bm_george",
        "speed": 1.0
    },
    {
        "name": "guest_quote",
        "text": "This is an amazing system that I highly recommend.",
        "voice": "af_bella",  # Different voice for guest
        "speed": 0.95
    },
    {
        "name": "outro",
        "text": "Thanks for watching this tutorial. See you next time! Bye.",
        "voice": "bm_george",
        "speed": 1.0
    }    
]

# Generate each section
audio_segments = []

for section in script_sections:
    print(f"Generating: {section['name']}")

    audio = engine.generate(
        text=section['text'],
        voice=section['voice'],
        speed=section['speed']
    )

    audio_segments.append(audio)

# Combine with custom gap
combined = combine_audio_segments(
    audio_segments,
    sample_rate=24000,
    gap_duration=0.6
)

# Save combined audio
save_audio(combined, "output/advanced_script.wav", sample_rate=24000)
print("Advanced script processing complete!")

---

## Configuration Management

### Using TTSConfig for Custom Settings


In [None]:
from aparsoft_tts import TTSConfig, TTSEngine

# Create custom configuration
config = TTSConfig(
    voice="bm_george",           # British male voice
    lang_code="b",               # British English
    speed=0.95,                  # Slightly slower
    sample_rate=24000,           # Standard sample rate
    enhance_audio=True,          # Enable enhancement
    trim_silence=True,           # Trim silence
    trim_db=22.0,                # Trim threshold
    fade_duration=0.15,          # Fade duration (150ms)
    output_format="wav",         # Output format
    output_dir=Path("output/custom_config")
)

# Create engine with custom config
custom_engine = TTSEngine(config=config)

# All generations now use these settings
custom_engine.generate(
    text="This uses the custom configuration",
    output_path="output/custom_config/test.wav"
)

print("Generated with custom configuration")

### Environment-Based Configuration


In [None]:
# Configuration can also be set via environment variables
# Example .env file content:
"""
TTS_VOICE=am_michael
TTS_SPEED=1.1
TTS_ENHANCE_AUDIO=true
TTS_TRIM_DB=20.0
LOG_LEVEL=INFO
LOG_FORMAT=json
"""

# Load from environment
from aparsoft_tts.config import get_config

config = get_config()
print(f"TTS Voice: {config.tts.voice}")
print(f"TTS Speed: {config.tts.speed}")
print(f"Enhance Audio: {config.tts.enhance_audio}")
print(f"Log Level: {config.logging.level}")

### Configuration Hierarchy


In [None]:
from aparsoft_tts import TTSConfig, MCPConfig, LoggingConfig, Config

# Create comprehensive configuration
tts_config = TTSConfig(
    voice="am_michael",
    speed=1.0,
    enhance_audio=True
)

mcp_config = MCPConfig(
    server_name="aparsoft-tts-production",
    transport="stdio",
    enable_rate_limiting=True,
    rate_limit_calls=100
)

logging_config = LoggingConfig(
    level="INFO",
    format="json",
    output="file",
    log_file=Path("logs/production.log")
)

# Combine into main config
full_config = Config(
    environment="production",
    debug=False,
    tts=tts_config,
    mcp=mcp_config,
    logging=logging_config
)

print(f"Environment: {full_config.environment}")
print(f"TTS Voice: {full_config.tts.voice}")
print(f"MCP Server: {full_config.mcp.server_name}")
print(f"Log Level: {full_config.logging.level}")

---

## Command-Line Interface

### Basic CLI Usage

The CLI provides quick access to all functionality without writing Python code.


In [None]:
%%bash
%%bash
# Generate simple audio
aparsoft-tts generate "Hello world" -o output/hello.wav

# Use different voice
aparsoft-tts generate "British voice test" -v bm_george -o output/british.wav

# Adjust speed
aparsoft-tts generate "Fast speech" -s 1.3 -o output/fast.wav

# Disable enhancement
aparsoft-tts generate "No enhancement" --no-enhance -o output/raw.wav

### CLI Batch Processing


In [None]:
%%bash
%%bash
# Generate multiple files
aparsoft-tts batch "First segment" "Second segment" "Third segment" -d output/cli_batch/

# With custom voice and speed
aparsoft-tts batch "Hello" "World" -v af_bella -s 0.9 -d output/female_voice/

### CLI Script Processing


In [None]:
%%bash
%%bash
# Process a script file
aparsoft-tts script output/video_script.txt -o output/cli_voiceover.wav

# With custom settings
aparsoft-tts script output/video_script.txt \
    -o output/custom_voiceover.wav \
    -v bm_george \
    -s 0.95 \
    -g 0.7

### List Available Voices


In [None]:
%%bash
%%bash
# Display voice information table
aparsoft-tts voices

### Programmatic CLI Integration


In [None]:
import subprocess

# Call CLI from Python
def generate_with_cli(text, output_file, voice="am_michael", speed=1.0):
    """Use CLI for generation"""
    cmd = [
        "aparsoft-tts", "generate",
        text,
        "-o", output_file,
        "-v", voice,
        "-s", str(speed)
    ]

    result = subprocess.run(cmd, capture_output=True, text=True)

    if result.returncode == 0:
        print(f"✅ CLI generation successful: {output_file}")
    else:
        print(f"❌ CLI error: {result.stderr}")

    return result.returncode == 0

# Use the function
generate_with_cli(
    "Testing CLI from Python",
    "output/cli_from_python.wav",
    voice="am_adam",
    speed=1.1
)

---

## Advanced Audio Processing

### Streaming Generation

Streaming allows processing audio chunks as they're generated, useful for real-time applications.


In [None]:
# Generate audio as a stream
chunks = []

for chunk in engine.generate_stream(
    text="This is a longer text that will be streamed chunk by chunk to demonstrate real-time generation. Also, this shows how to handle streaming audio through aparsoft-tts applications.",
    voice="bm_george",
    speed=1.0
):
    chunks.append(chunk)
    print(f"Received chunk: {len(chunk)} samples")

# Combine chunks
import numpy as np
full_audio = np.concatenate(chunks)

print(f"\nTotal chunks: {len(chunks)}")
print(f"Total audio length: {len(full_audio)} samples")
print(f"Duration: {len(full_audio) / 24000:.2f} seconds")

### Audio Chunking for Processing


In [None]:
from aparsoft_tts.utils.audio import chunk_audio

# Generate long audio
long_text = " ".join(["This is a test sentence."] * 20)
audio = engine.generate(long_text)

# Split into chunks for processing
chunks = chunk_audio(
    audio,
    chunk_size=24000,  # 1 second chunks at 24kHz
    overlap=2400       # 100ms overlap
)

print(f"Original audio: {len(audio)} samples")
print(f"Number of chunks: {len(chunks)}")
print(f"Chunk sizes: {[len(c) for c in chunks]}")

# Process each chunk (example: apply effect)
processed_chunks = []
for i, chunk in enumerate(chunks):
    # Apply some processing (example: simple amplification)
    processed = chunk * 1.2
    processed_chunks.append(processed)
    print(f"Processed chunk {i+1}/{len(chunks)}")

### Combining Audio Files


In [None]:
from aparsoft_tts.utils.audio import load_audio, combine_audio_segments, save_audio

# Load multiple audio files
audio_files = [
    "output/voice_test_am_michael.wav",
    "output/voice_test_bm_george.wav",
    "output/voice_test_am_adam.wav"
]

# Load all files
segments = []
for file in audio_files:
    audio, sr = load_audio(file, sample_rate=24000)
    segments.append(audio)
    print(f"Loaded: {file} ({len(audio)} samples)")

# Combine with gaps
combined = combine_audio_segments(
    segments,
    sample_rate=24000,
    gap_duration=1.0  # 1 second gap
)

# Save combined file
save_audio(combined, "output/combined_voices.wav", sample_rate=24000)
print(f"\nCombined audio saved: {len(combined)} samples")

### Advanced Signal Processing


In [None]:
import librosa
import numpy as np
from scipy import signal

# Generate test audio
audio = engine.generate("Testing advanced signal processing")

# 1. Spectral Analysis
stft = librosa.stft(audio)
magnitude = np.abs(stft)
phase = np.angle(stft)

print(f"STFT shape: {stft.shape}")
print(f"Magnitude range: {magnitude.min():.2f} to {magnitude.max():.2f}")

# 2. Apply High-Pass Filter (remove low-frequency rumble)
sos = signal.butter(10, 100, 'hp', fs=24000, output='sos')
filtered = signal.sosfilt(sos, audio)

save_audio(filtered, "output/high_pass_filtered.wav", sample_rate=24000)

# 3. Time Stretching (change duration without changing pitch)
stretched = librosa.effects.time_stretch(audio, rate=1.2)  # 20% faster

save_audio(stretched, "output/time_stretched.wav", sample_rate=24000)

# 4. Pitch Shifting (change pitch without changing duration)
shifted = librosa.effects.pitch_shift(audio, sr=24000, n_steps=3)

save_audio(shifted, "output/pitch_shifted.wav", sample_rate=24000)

print("Advanced processing complete!")
print(f"Filtered audio saved: {len(filtered)} samples")
print(f"Output files generated: high_pass_filtered.wav, time_stretched.wav, pitch_shifted.wav")

---

## MCP Server Integration

### Understanding MCP (Model Context Protocol)

**What is MCP?**

Model Context Protocol (MCP) is an open standard created by Anthropic that enables AI applications to securely connect to external tools and data sources. Think of it as "USB-C for AI" - a standardized way for LLMs like Claude to interact with external services.

**MCP Architecture:**

```
┌─────────────────┐          ┌──────────────────┐
│   MCP Client    │          │   MCP Server     │
│  (Claude/Cursor)│ ◄──────► │  (Aparsoft TTS)  │
└─────────────────┘          └──────────────────┘
      Host App                   Tool Provider
```

**Key Concepts:**

1. **Tools** (POST-like): Execute actions and produce side effects (e.g., generate audio)
2. **Resources** (GET-like): Load information into LLM context
3. **Prompts**: Reusable templates for LLM interactions
4. **Transport**: Communication protocol (stdio, SSE, HTTP)

**Why MCP for TTS?**

- **No Context Switching**: Generate voiceovers without leaving Claude/Cursor
- **Natural Language Interface**: Just describe what you want: "Create voiceover using am_michael"
- **Automated Workflows**: AI agents can generate audio as part of larger tasks
- **Standardized Integration**: Works with any MCP-compatible client

### MCP Server Implementation

**Architecture Overview:**

Our MCP server is built with [FastMCP](https://github.com/jlowin/fastmcp), the Pythonic framework for MCP servers.

In [11]:
# Key Components of aparsoft_tts/mcp_server.py:
from pydantic import BaseModel, Field
from aparsoft_tts import TTSEngine
tts_engine = TTSEngine()

# 1. FastMCP Server Instance
from fastmcp import FastMCP
mcp = FastMCP("aparsoft-tts", version="1.0.0")

# 2. Request Validation Models (Pydantic)
class GenerateSpeechRequest(BaseModel):
    text: str = Field(..., min_length=1, max_length=10000)
    voice: str = Field(default="am_michael")
    speed: float = Field(default=1.0, ge=0.5, le=2.0)
    # ... more fields

# 3. Tool Definition (Auto-schema from type hints)
@mcp.tool()
async def generate_speech(request: GenerateSpeechRequest) -> str:
    """Generate speech with automatic schema generation"""
    output = tts_engine.generate(
        text=request.text,
        voice=request.voice,
        speed=request.speed
    )
    return f"✅ Speech generated: {output}"

# 4. Four Main Tools:
# - generate_speech: Single audio generation
# - list_voices: Voice catalog
# - batch_generate: Multiple files
# - process_script: Complete video scripts

[2m2025-10-05T07:09:33.990218Z[0m [[32m[1minfo     [0m] [1minitializing_tts_engine       [0m [[0m[1m[34mTTSEngine[0m][0m [36mfilename[0m=[35mengine.py[0m [36mfunc_name[0m=[35m__init__[0m [36mlineno[0m=[35m138[0m [36mmodule[0m=[35mengine[0m [36mpathname[0m=[35m/home/ram/projects/youtube-creator/aparsoft_tts/core/engine.py[0m [36mprocess[0m=[35m58195[0m [36mprocess_name[0m=[35mMainProcess[0m [36mservice[0m=[35maparsoft-tts[0m [36mthread[0m=[35m138044826550400[0m [36mthread_name[0m=[35mMainThread[0m [36mtoken_limits[0m=[35m100-250[0m [36mversion[0m=[35m1.0.0[0m [36mvoice[0m=[35mam_michael[0m
[2m2025-10-05T07:09:35.492597Z[0m [[32m[1minfo     [0m] [1mtts_engine_initialized        [0m [[0m[1m[34mTTSEngine[0m][0m [36mdefault_lang_code[0m=[35ma[0m [36mfilename[0m=[35mengine.py[0m [36mfunc_name[0m=[35m__init__[0m [36mlineno[0m=[35m146[0m [36mmodule[0m=[35mengine[0m [36mpathname[0m=[35m/home/ram/p

**Key Features:**

1. ✅ **Type Safety**: Pydantic models validate all requests
2. ✅ **Auto Schema**: FastMCP generates tool schemas from Python type hints
3. ✅ **Structured Logging**: Correlation IDs track requests end-to-end
4. ✅ **Error Handling**: Custom exceptions with helpful error messages
5. ✅ **stdio Transport**: Standard input/output for local clients

### Installation & Setup

#### Step 1: Verify MCP Server is Installed

In [None]:
%%bash
# Ensure MCP dependencies are installed
pip install -e ".[mcp]"

# Test the server can start
python -m aparsoft_tts.mcp_server --help

#### Step 2: Configure Claude Desktop

**macOS:**

In [None]:
%%bash
# Open config file
open ~/Library/Application\ Support/Claude/claude_desktop_config.json

**Linux:**

In [None]:
%%bash
# Open config file
nano ~/.config/Claude/claude_desktop_config.json

**Windows:**

In [None]:
%%powershell
# Open config file
notepad %APPDATA%\Claude\claude_desktop_config.json

## MCP Server Configuration

### For Linux/macOS (Native)

**Configuration:**

```json
{
  "mcpServers": {
    "aparsoft-tts": {
      "command": "/absolute/path/to/your/venv/bin/python",
      "args": ["-m", "aparsoft_tts.mcp_server"],
      "env": {
        "LOG_LEVEL": "WARNING",
        "LOG_OUTPUT": "stderr"
      }
    }
  }
}
```

Find your Python path:
```bash
# Activate your virtual environment first
source /path/to/your/venv/bin/activate

# Get the absolute path
which python
# Example output: /home/ram/projects/youtube-creator/venv/bin/python
```

**Important Notes:**
- Use **absolute paths** for the Python interpreter
- Find your venv path: `which python` (Linux/Mac) or `where python` (Windows)
- Example: `/home/user/projects/youtube-creator/venv/bin/python`


### For WSL

```json
{
  "globalShortcut": "",
  "mcpServers": {
    "aparsoft-tts": {
      "command": "wsl",
      "args": [
        "-e",
        "/absolute/path/to/your/venv/bin/python",
        "-W", "ignore",
        "-m",
        "aparsoft_tts.mcp_server"
      ],
      "env": {
        "LOG_LEVEL": "INFO",
        "LOG_OUTPUT": "stderr",
        "LOG_FORMAT": "json",
        "PYTHONWARNINGS": "ignore",
        "NO_COLOR": "1"
      }
    }
  }
}
```

#### Step 3: Configure Cursor

**Location:** `~/.cursor/mcp.json` (create if doesn't exist)

```json
{
  "mcpServers": {
    "aparsoft-tts": {
      "command": "/absolute/path/to/venv/bin/python",
      "args": ["-m", "aparsoft_tts.mcp_server"]
    }
  }
}
```

**Cursor-Specific Setup:**
1. Open Cursor Settings → Tools & Integrations → MCP Servers
2. Or manually edit `~/.cursor/mcp.json`
3. Restart Cursor completely for changes to take effect

#### Step 4: Restart Your Client

In [None]:
%%bash
# Claude Desktop: Command/Ctrl + R or quit and reopen
# Cursor: Completely close and restart

### Using MCP Server with Claude Desktop

**Example Prompts:**

```
# Basic generation
"Generate speech for 'Welcome to our channel' using am_michael voice and save as intro.wav"

# Voice selection
"List all available TTS voices"

# Batch processing
"Generate three audio files: 'Welcome', 'Main content', and 'Thanks for watching' using af_bella voice"

# Script processing
"Process the video_script.txt file and create a complete voiceover with 0.8 second gaps between paragraphs"

# Custom parameters
"Generate 'This is a test' at 1.3x speed with bm_george voice and enhanced audio"
```

**Claude Desktop Workflow:**

1. Ask Claude to use the TTS tool
2. Claude calls the MCP server tool
3. You approve the tool execution (security feature)
4. Server generates audio and returns result
5. Claude shows you the success message with file details

**Tool Permissions:**

Claude Desktop will ask permission before executing tools. You can:
- ✅ Allow: Execute this specific request
- ✅ Allow Always: Auto-approve all requests to this tool
- ❌ Deny: Skip this request

### Using MCP Server with Cursor

**Cursor-Specific Features:**

1. **Agent Mode**: Required for MCP integration
   ```
   Enable: Cursor Settings → Features → Agent Mode: ON
   ```

2. **Explicit Tool Calls**: Reference tools by name
   ```
   "Use the aparsoft-tts generate_speech tool to create audio for 'Hello world'"
   ```

3. **Workspace Context**: Cursor can see your files
   ```
   "Read script.txt and generate a voiceover using the TTS tool"
   ```

**Example Cursor Workflow:**

In [None]:
# You: "Create a voiceover for this script using TTS"
# Cursor sees your workspace, finds script.txt, and:

# 1. Reads the script
with open('script.txt', 'r') as f:
    script_content = f.read()

# 2. Calls MCP tool
# Cursor uses aparsoft-tts process_script tool
# Result: "✅ Speech generated: complete_voiceover.wav"

### Testing & Debugging

#### Method 1: MCP Inspector (Recommended)

**Install Inspector:**

In [None]:
%%bash
npx @modelcontextprotocol/inspector

**Launch with your server:**

In [None]:
%%bash
npx @modelcontextprotocol/inspector \
  --command "/path/to/venv/bin/python" \
  --args "-m" "aparsoft_tts.mcp_server"

**Inspector Features:**
- Interactive UI at `http://localhost:6274`
- Test all tools without Claude/Cursor
- View request/response JSON
- Export config for Claude/Cursor
- Debug connection issues

#### Method 2: CLI Testing

In [None]:
%%bash
# List available tools
npx @modelcontextprotocol/inspector --cli \
  /path/to/venv/bin/python -m aparsoft_tts.mcp_server \
  --method tools/list

# Test generate_speech tool
npx @modelcontextprotocol/inspector --cli \
  /path/to/venv/bin/python -m aparsoft_tts.mcp_server \
  --method tools/call \
  --tool-name generate_speech \
  --tool-arg text="Test audio" \
  --tool-arg voice="am_michael"

#### Method 3: Direct Python Testing

In [None]:
import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def test_mcp_server():
    """Test MCP server programmatically"""
    
    server_params = StdioServerParameters(
        command="/path/to/venv/bin/python",
        args=["-m", "aparsoft_tts.mcp_server"]
    )
    
    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            # Initialize connection
            await session.initialize()
            
            # List available tools
            tools = await session.list_tools()
            print("Available tools:")
            for tool in tools.tools:
                print(f"  - {tool.name}: {tool.description}")
            
            # Test generate_speech
            result = await session.call_tool(
                "generate_speech",
                arguments={
                    "text": "MCP test successful!",
                    "voice": "am_michael",
                    "output_file": "output/mcp_test.wav"
                }
            )
            
            print(f"\nResult: {result.content[0].text}")

# Run test
asyncio.run(test_mcp_server())

#### Method 4: Check Logs

**Claude Desktop Logs:**

In [None]:
%%bash
# macOS
tail -f ~/Library/Logs/Claude/mcp*.log

# Linux  
tail -f ~/.config/Claude/logs/mcp*.log

# Windows
type %APPDATA%\Claude\logs\mcp*.log

**Cursor Logs:**

In [None]:
%%bash
# Check Cursor's output panel
# View → Output → Select "MCP" from dropdown

**Enable Debug Logging:**
```json
{
  "mcpServers": {
    "aparsoft-tts": {
      "command": "/path/to/venv/bin/python",
      "args": ["-m", "aparsoft_tts.mcp_server"],
      "env": {
        "LOG_LEVEL": "DEBUG",
        "DEBUG": "true"
      }
    }
  }
}
```

### Common Issues & Solutions

**Issue 1: "Could not attach to MCP server"**

✅ Solutions:
- Use absolute path to Python: `/full/path/to/venv/bin/python`
- Verify server runs: `python -m aparsoft_tts.mcp_server`
- Check Python version: `python --version` (needs 3.10+)
- Restart client completely (not just reload)

**Issue 2: "Tool not found" or "No tools available"**

✅ Solutions:

In [None]:
%%bash
# Verify MCP dependencies
pip install -e ".[mcp]"

# Check FastMCP is installed
python -c "from fastmcp import FastMCP; print('OK')"

# Test tool registration
python -c "from aparsoft_tts.mcp_server import mcp; print(mcp._tools.keys())"

**Issue 3: "Permission denied" errors**

✅ Solutions:

In [None]:
%%bash
# Make server executable
chmod +x /path/to/venv/bin/python

# Check output directory permissions
mkdir -p output && chmod 755 output

**Issue 4: Audio files not generated**

✅ Solutions:

In [None]:
# Check espeak-ng is installed
import subprocess
subprocess.run(["espeak-ng", "--version"])

# Verify output path exists
from pathlib import Path
Path("output").mkdir(exist_ok=True)

# Test TTS engine directly
from aparsoft_tts import TTSEngine
engine = TTSEngine()
engine.generate("test", "output/test.wav")

### Advanced MCP Features

#### Context Injection

In [None]:
# MCP servers can access context about the client
from fastmcp import Context

@mcp.tool()
async def smart_generate(text: str, ctx: Context) -> str:
    """Context-aware generation"""
    
    # Log to client
    await ctx.info(f"Generating speech for: {text[:50]}...")
    
    # Access server metadata
    server_name = ctx.fastmcp.name
    
    # Use session for advanced features
    client_info = ctx.session.client_params
    
    # Generate audio
    result = tts_engine.generate(text)
    
    return f"Generated by {server_name}: {result}"

#### Authentication (Advanced)

In [None]:
# For remote MCP servers (not needed for local stdio)
from fastmcp.server.auth import GoogleProvider

auth = GoogleProvider(
    client_id="your-client-id",
    client_secret="your-secret",
    base_url="https://your-server.com"
)

mcp = FastMCP("aparsoft-tts", auth=auth)

#### Rate Limiting

In [None]:
# Configure in MCPConfig
mcp_config = MCPConfig(
    server_name="aparsoft-tts",
    enable_rate_limiting=True,
    rate_limit_calls=100,      # Max 100 calls
    rate_limit_period=60       # Per 60 seconds
)

### Production Deployment

#### Remote MCP Server (SSE Transport)

In [None]:
# For cloud deployment
from fastmcp import FastMCP

mcp = FastMCP("aparsoft-tts")

if __name__ == "__main__":
    # SSE transport for remote access
    mcp.run(
        transport="sse",
        host="0.0.0.0",
        port=8000
    )

**Client Configuration:**
```json
{
  "mcpServers": {
    "aparsoft-tts": {
      "url": "https://your-server.com/sse",
      "transport": "sse"
    }
  }
}
```

#### Docker Deployment

```dockerfile
# Dockerfile for MCP server
FROM python:3.11-slim

WORKDIR /app
COPY . .

RUN pip install -e ".[mcp]"

CMD ["python", "-m", "aparsoft_tts.mcp_server"]
```

In [None]:
%%bash
# Build and run
docker build -t aparsoft-tts-mcp .
docker run -it aparsoft-tts-mcp

### Best Practices

1. **Path Management**
   - Always use absolute paths in config
   - Use `Path` objects for cross-platform compatibility
   
2. **Error Messages**
   - Return helpful error messages from tools
   - Include suggested fixes in error text
   
3. **Tool Descriptions**
   - Write clear docstrings (used in tool schema)
   - Include examples in descriptions
   
4. **Type Safety**
   - Use Pydantic models for validation
   - Leverage Python type hints
   
5. **Logging**
   - Log all tool calls with correlation IDs
   - Include context in log messages

### Additional Resources

- **FastMCP Docs**: https://github.com/jlowin/fastmcp
- **MCP Specification**: https://modelcontextprotocol.io
- **MCP Inspector**: https://github.com/modelcontextprotocol/inspector
- **Example Servers**: https://github.com/modelcontextprotocol/servers
- **Claude MCP Guide**: https://docs.anthropic.com/claude/docs/mcp

---

## Production Deployment

### Docker Deployment

```dockerfile
# The project includes a Comprehensive Dockerfile
# Key features:
# - Multi-stage build for optimization
# - Non-root user (ttsuser)
# - Health checks
# - System dependencies included
# - Both CPU and GPU variants
```

### Building Docker Image


In [None]:
%%bash
%%bash
# Build the Docker image
docker build -t aparsoft-tts:latest .

# Build GPU variant (optional)
docker build -t aparsoft-tts:gpu --target gpu .

# Verify image
docker images | grep aparsoft-tts

### Running with Docker


In [None]:
%%bash
%%bash
# Run MCP server in Docker
docker run -d \
  --name aparsoft-tts-server \
  -v $(pwd)/outputs:/app/outputs \
  -v $(pwd)/logs:/app/logs \
  aparsoft-tts:latest

# Check logs
docker logs -f aparsoft-tts-server

# Run CLI commands in Docker
docker run --rm \
  -v $(pwd)/outputs:/app/outputs \
  aparsoft-tts:latest \
  aparsoft-tts generate "Docker test" -o /app/outputs/docker_test.wav

# Interactive Python shell in Docker
docker run -it --rm aparsoft-tts:latest python

### Docker Compose Deployment


In [None]:
%%bash
%%bash
# Start services with docker-compose
docker-compose up -d

# View logs
docker-compose logs -f

# Scale services
docker-compose up -d --scale aparsoft-tts=3

# Stop services
docker-compose down

### Environment Variables in Production


In [None]:
# Production configuration via environment variables
# Set in docker-compose.yml or .env file

production_env = {
    # TTS Settings
    "TTS_VOICE": "am_michael",
    "TTS_SPEED": "1.0",
    "TTS_ENHANCE_AUDIO": "true",

    # MCP Server Settings
    "MCP_SERVER_NAME": "aparsoft-tts-production",
    "MCP_ENABLE_RATE_LIMITING": "true",
    "MCP_RATE_LIMIT_CALLS": "100",
    "MCP_RATE_LIMIT_PERIOD": "60",

    # Logging Settings
    "LOG_LEVEL": "WARNING",
    "LOG_FORMAT": "json",
    "LOG_OUTPUT": "file",
    "LOG_FILE": "/app/logs/production.log",

    # Environment
    "ENVIRONMENT": "production",
    "DEBUG": "false"
}

for key, value in production_env.items():
    print(f"{key}={value}")

---

## Testing and Debugging

### Running Unit Tests


In [None]:
%%bash
%%bash
# Run all tests
pytest

# Run with verbose output
pytest -v

# Run specific test file
pytest tests/unit/test_engine.py

# Run with coverage
pytest --cov=aparsoft_tts --cov-report=html

# Run only fast tests (skip slow ones)
pytest -m "not slow"

### Writing Custom Tests


In [None]:
import pytest
from aparsoft_tts import TTSEngine
from pathlib import Path

def test_basic_generation(tmp_path):
    """Test basic audio generation"""
    engine = TTSEngine()

    output = tmp_path / "test.wav"
    result = engine.generate(
        text="Test audio",
        output_path=output
    )

    assert output.exists()
    assert output.stat().st_size > 0
    assert result == output

def test_invalid_voice():
    """Test that invalid voice raises error"""
    engine = TTSEngine()

    with pytest.raises(Exception):  # Should raise InvalidVoiceError
        engine.generate("Test", voice="invalid_voice")

def test_batch_generation(tmp_path):
    """Test batch processing"""
    engine = TTSEngine()

    texts = ["First", "Second", "Third"]
    paths = engine.batch_generate(texts, output_dir=tmp_path)

    assert len(paths) == 3
    for path in paths:
        assert path.exists()

# Run these tests
# pytest test_custom.py -v

### Debug Logging


In [None]:
from aparsoft_tts.utils.logging import setup_logging, get_logger
from aparsoft_tts.config import LoggingConfig

# Configure debug logging
debug_config = LoggingConfig(
    level="DEBUG",
    format="console",
    output="stdout",
    include_timestamp=True,
    include_caller=True
)

setup_logging(debug_config)

# Get logger
log = get_logger(__name__)

# Log debug information
log.debug("starting_generation", text="test", voice="am_michael")

# Now run TTS with debug logs
engine = TTSEngine()
engine.generate("Debug test", "output/debug_test.wav")

### Error Handling


In [None]:
from aparsoft_tts.utils.exceptions import (
    TTSGenerationError,
    InvalidVoiceError,
    AudioProcessingError
)

def safe_generate(text, output_path, **kwargs):
    """Generate with comprehensive error handling"""
    try:
        engine = TTSEngine()
        result = engine.generate(text, output_path, **kwargs)
        print(f"✅ Success: {result}")
        return result

    except InvalidVoiceError as e:
        print(f"❌ Invalid voice: {e}")
        print("Available voices:", TTSEngine.list_voices())

    except AudioProcessingError as e:
        print(f"❌ Audio processing failed: {e}")
        print("Try disabling enhancement: enhance=False")

    except TTSGenerationError as e:
        print(f"❌ Generation failed: {e}")
        print("Check input text and parameters")

    except Exception as e:
        print(f"❌ Unexpected error: {e}")
        raise

    return None

# Test error handling
safe_generate("Test", "output/test.wav", voice="invalid")

---

## Performance Optimization

### Benchmarking Generation Speed


In [None]:
import time
from statistics import mean, stdev

def benchmark_generation(text, iterations=5):
    """Benchmark TTS generation speed"""
    engine = TTSEngine()
    times = []

    # Warm-up (model loading)
    engine.generate(text)

    # Benchmark
    for i in range(iterations):
        start = time.time()
        engine.generate(text)
        elapsed = time.time() - start
        times.append(elapsed)
        print(f"Iteration {i+1}: {elapsed:.3f}s")

    avg_time = mean(times)
    std_time = stdev(times) if len(times) > 1 else 0

    print(f"\nAverage: {avg_time:.3f}s ± {std_time:.3f}s")
    print(f"Characters: {len(text)}")
    print(f"Speed: {len(text) / avg_time:.1f} chars/second")

    return avg_time

# Run benchmark
test_text = "This is a benchmark test for text-to-speech generation speed."
benchmark_generation(test_text, iterations=10)

### Optimization Techniques


In [None]:
from aparsoft_tts import TTSEngine, TTSConfig

# 1. Disable enhancement for faster generation
fast_config = TTSConfig(enhance_audio=False)
fast_engine = TTSEngine(config=fast_config)

# 2. Reuse engine instance
engine = TTSEngine()  # Create once

# Generate many times with same instance
for i in range(100):
    engine.generate(f"Text {i}", f"output/batch_{i}.wav")
    # Much faster than creating new engine each time

# 3. Use streaming for long texts
long_text = "Long text here..." * 100

# This starts producing audio sooner
for chunk in engine.generate_stream(long_text):
    # Process chunk immediately
    pass

# 4. Parallel processing with multiprocessing
from multiprocessing import Pool

def generate_one(args):
    text, output = args
    engine = TTSEngine()
    return engine.generate(text, output)

texts_and_outputs = [
    ("Text 1", "output/parallel_1.wav"),
    ("Text 2", "output/parallel_2.wav"),
    ("Text 3", "output/parallel_3.wav"),
]

with Pool(processes=3) as pool:
    results = pool.map(generate_one, texts_and_outputs)

print(f"Generated {len(results)} files in parallel")

### Memory Profiling


In [None]:
import tracemalloc
from aparsoft_tts import TTSEngine

def profile_memory():
    """Profile memory usage during generation"""

    # Start tracing
    tracemalloc.start()

    # Get baseline
    snapshot1 = tracemalloc.take_snapshot()

    # Generate audio
    engine = TTSEngine()
    engine.generate("Memory profiling test" * 50, "output/memory_test.wav")

    # Take snapshot after generation
    snapshot2 = tracemalloc.take_snapshot()

    # Compare
    top_stats = snapshot2.compare_to(snapshot1, 'lineno')

    print("[ Top 10 Memory Usage ]")
    for stat in top_stats[:10]:
        print(stat)

    # Stop tracing
    tracemalloc.stop()

# Run profiler
# profile_memory()
print("Memory profiling requires tracemalloc module")

### Caching Strategy


In [None]:
from functools import lru_cache
import hashlib

class CachedTTSEngine:
    """TTS Engine with caching for repeated texts"""

    def __init__(self):
        self.engine = TTSEngine()
        self.cache_dir = Path("output/cache")
        self.cache_dir.mkdir(exist_ok=True)

    def _get_cache_key(self, text, voice, speed):
        """Generate cache key from parameters"""
        key_str = f"{text}_{voice}_{speed}"
        return hashlib.md5(key_str.encode()).hexdigest()

    def generate_cached(self, text, voice="am_michael", speed=1.0):
        """Generate with caching"""
        cache_key = self._get_cache_key(text, voice, speed)
        cache_file = self.cache_dir / f"{cache_key}.wav"

        if cache_file.exists():
            print(f"✅ Cache hit: {cache_key}")
            from aparsoft_tts.utils.audio import load_audio
            audio, _ = load_audio(cache_file)
            return audio
        else:
            print(f"⏳ Cache miss: generating...")
            audio = self.engine.generate(
                text,
                output_path=cache_file,
                voice=voice,
                speed=speed
            )
            return audio

# Test caching
cached_engine = CachedTTSEngine()

# First call - cache miss
cached_engine.generate_cached("Repeating text for cache test")

# Second call - cache hit (much faster)
cached_engine.generate_cached("Repeating text for cache test")

---

## Conclusion

This tutorial covered Aparsoft TTS from basic usage to advanced production deployment:

### Key Takeaways

1. **Setup**: System dependencies, Python package, verification
2. **Basic Usage**: Engine initialization, simple generation, voice selection
3. **Audio Processing**: Enhancement, streaming, custom processing
4. **Batch Operations**: Multiple files, different voices/speeds
5. **Script Processing**: Video workflows, paragraph detection, segment combination
6. **Configuration**: Pydantic models, environment variables, hierarchical config
7. **CLI**: Command-line interface for quick tasks
8. **MCP Integration**: Server setup, Claude/Cursor integration
9. **Production**: Docker deployment, environment management, scaling
10. **Testing**: Unit tests, debugging, error handling
11. **Performance**: Benchmarking, optimization, caching

### Next Steps

1. **Explore Examples**: Check `examples/` directory for more patterns
2. **Read Source Code**: Well-documented code in `aparsoft_tts/`
3. **Contribute**: See `CONTRIBUTING.md` for guidelines
4. **Deploy**: Use Docker for production deployments
5. **Integrate**: Connect with your existing workflows

### Resources

- **GitHub**: https://github.com/aparsoft/youtube-tts
- **Documentation**: README.md, QUICKSTART.md
- **Support**: contact@aparsoft.com
- **Website**: https://aparsoft.com

---

**Happy Voice Generation! 🎙️**