# SSML Generator & Text-to-Speech for AI Oracle Project

This notebook demonstrates how to use Google Cloud Text-to-Speech API with SSML (Speech Synthesis Markup Language) to create high-quality voiceovers for the AI Oracle project.

## Prerequisites

1. **Google Cloud Project Setup:**
   - Go to [Google Cloud Console](https://console.cloud.google.com/)
   - Create a new project or select an existing one
   - Enable the Cloud Text-to-Speech API
   - Navigate to "APIs & Services" > "Credentials"
   - Create an API key or service account credentials

2. **Authentication Options:**
   
   **Option A - API Key (Simpler):**
   - Create an API key in Google Cloud Console
   - Add to your `.env` file:
   ```
   GOOGLE_CLOUD_API_KEY=your_api_key_here
   ```
   
   **Option B - Service Account (Recommended for production):**
   - Create a service account with Text-to-Speech permissions
   - Download the JSON credentials file
   - Save it as `credentials/google_cloud_tts_credentials.json`
   - Add to your `.env` file:
   ```
   GOOGLE_APPLICATION_CREDENTIALS=credentials/google_cloud_tts_credentials.json
   ```

3. **Install Required Packages:**
   ```bash
   pip install google-cloud-texttospeech
   ```

## SSML Features for AI Oracle
- Control speaking rate, pitch, and volume for dramatic effect
- Add pauses and emphasis to create suspense
- Use multiple voices for different characters
- Insert breaks for pacing and dramatic timing
- Adjust pronunciation for technical terms

## 1. Import Required Libraries

In [None]:
import os
from pathlib import Path
from dotenv import load_dotenv
from google.cloud import texttospeech
from IPython.display import Audio, display, Markdown
import base64

print("✅ All libraries imported successfully")

## 2. Setup Authentication

This section handles Google Cloud Text-to-Speech API authentication.

In [None]:
# Load environment variables
load_dotenv()

# Set up authentication
credentials_path = os.getenv('GOOGLE_APPLICATION_CREDENTIALS')
if credentials_path and os.path.exists(credentials_path):
    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credentials_path
    print(f"✅ Using service account credentials from: {credentials_path}")
elif os.getenv('GOOGLE_CLOUD_API_KEY'):
    print("✅ Using API key for authentication")
else:
    print("⚠️  Warning: No credentials found. Please set up authentication.")
    print("   Add GOOGLE_APPLICATION_CREDENTIALS or GOOGLE_CLOUD_API_KEY to your .env file")

def initialize_tts_client():
    """
    Initialize Google Cloud Text-to-Speech client.
    
    Returns:
        texttospeech.TextToSpeechClient: Initialized client
    """
    try:
        client = texttospeech.TextToSpeechClient()
        print("✅ Text-to-Speech client initialized successfully")
        return client
    except Exception as e:
        print(f"❌ Failed to initialize client: {e}")
        print("   Please check your credentials and API enablement")
        return None

# Initialize the client
tts_client = initialize_tts_client()

## 3. Available Voices

List available voices and select appropriate ones for the AI Oracle character.

In [None]:
def list_available_voices(client, language_code='en-US'):
    """
    List available voices for a specific language.
    
    Args:
        client: Text-to-Speech client
        language_code: Language code (e.g., 'en-US', 'en-GB')
    """
    try:
        voices = client.list_voices(language_code=language_code)
        
        print(f"\n🎤 Available {language_code} voices:\n")
        
        for voice in voices.voices:
            # Get voice characteristics
            name = voice.name
            gender = texttospeech.SsmlVoiceGender(voice.ssml_gender).name
            
            # Show premium voices (WaveNet, Neural2, Studio)
            if 'Wavenet' in name or 'Neural2' in name or 'Studio' in name:
                print(f"  • {name:30} ({gender})")
        
        print("\n💡 Recommended voices for AI Oracle:")
        print("  • en-US-Neural2-A (Female) - Clear and authoritative")
        print("  • en-US-Neural2-D (Male) - Deep and mysterious")
        print("  • en-US-Neural2-J (Male) - Calm and measured")
        print("  • en-GB-Neural2-B (Male) - British, sophisticated")
        print("  • en-US-Studio-M (Male) - Premium quality, dramatic")
        print("  • en-US-Studio-O (Female) - Premium quality, enigmatic")
        
    except Exception as e:
        print(f"❌ Error listing voices: {e}")

if tts_client:
    list_available_voices(tts_client)
else:
    print("⚠️  Skipping voice listing - client not initialized")

## 4. Basic Text-to-Speech

Generate simple speech without SSML markup.

In [None]:
def generate_speech(client, text, voice_name='en-US-Neural2-D', output_file='output.mp3'):
    """
    Generate speech from plain text.
    
    Args:
        client: Text-to-Speech client
        text: Text to convert to speech
        voice_name: Voice to use
        output_file: Output filename
    
    Returns:
        str: Path to generated audio file
    """
    try:
        # Set the text input
        synthesis_input = texttospeech.SynthesisInput(text=text)
        
        # Build the voice request
        voice = texttospeech.VoiceSelectionParams(
            language_code=voice_name.split('-')[0] + '-' + voice_name.split('-')[1],
            name=voice_name
        )
        
        # Select the audio file type
        audio_config = texttospeech.AudioConfig(
            audio_encoding=texttospeech.AudioEncoding.MP3,
            speaking_rate=1.0,  # Normal speed
            pitch=0.0           # Normal pitch
        )
        
        # Perform the text-to-speech request
        response = client.synthesize_speech(
            input=synthesis_input,
            voice=voice,
            audio_config=audio_config
        )
        
        # Create output directory if it doesn't exist
        output_path = Path('2_Voiceover_Vault') / output_file
        output_path.parent.mkdir(parents=True, exist_ok=True)
        
        # Write the response to the output file
        with open(output_path, 'wb') as out:
            out.write(response.audio_content)
        
        print(f"✅ Audio content written to: {output_path}")
        return str(output_path)
    
    except Exception as e:
        print(f"❌ Error generating speech: {e}")
        return None

# Example: Generate basic speech
if tts_client:
    example_text = "Welcome to the AI Oracle. I am your guide through the digital realm."
    audio_file = generate_speech(
        tts_client,
        example_text,
        voice_name='en-US-Neural2-D',
        output_file='basic_example.mp3'
    )
    
    if audio_file and os.path.exists(audio_file):
        display(Markdown("### 🔊 Listen to the generated audio:"))
        display(Audio(audio_file))
else:
    print("⚠️  Skipping basic TTS - client not initialized")

## 5. SSML Basics

Learn SSML (Speech Synthesis Markup Language) for advanced speech control.

### Common SSML Tags:
- `<speak>` - Root element (required)
- `<break>` - Add pauses
- `<emphasis>` - Emphasize words
- `<prosody>` - Control rate, pitch, volume
- `<say-as>` - Control how text is spoken
- `<voice>` - Change voice characteristics

In [None]:
def generate_ssml_speech(client, ssml_text, voice_name='en-US-Neural2-D', 
                        output_file='ssml_output.mp3', speaking_rate=1.0, pitch=0.0):
    """
    Generate speech from SSML markup.
    
    Args:
        client: Text-to-Speech client
        ssml_text: SSML markup text
        voice_name: Voice to use
        output_file: Output filename
        speaking_rate: Speed (0.25 to 4.0)
        pitch: Pitch adjustment (-20.0 to 20.0)
    
    Returns:
        str: Path to generated audio file
    """
    try:
        # Ensure SSML has speak tags
        if not ssml_text.strip().startswith('<speak>'):
            ssml_text = f'<speak>{ssml_text}</speak>'
        
        # Set the SSML input
        synthesis_input = texttospeech.SynthesisInput(ssml=ssml_text)
        
        # Build the voice request
        voice = texttospeech.VoiceSelectionParams(
            language_code=voice_name.split('-')[0] + '-' + voice_name.split('-')[1],
            name=voice_name
        )
        
        # Select the audio file type and configuration
        audio_config = texttospeech.AudioConfig(
            audio_encoding=texttospeech.AudioEncoding.MP3,
            speaking_rate=speaking_rate,
            pitch=pitch
        )
        
        # Perform the text-to-speech request
        response = client.synthesize_speech(
            input=synthesis_input,
            voice=voice,
            audio_config=audio_config
        )
        
        # Create output directory
        output_path = Path('2_Voiceover_Vault') / output_file
        output_path.parent.mkdir(parents=True, exist_ok=True)
        
        # Write the response to file
        with open(output_path, 'wb') as out:
            out.write(response.audio_content)
        
        print(f"✅ Audio content written to: {output_path}")
        return str(output_path)
    
    except Exception as e:
        print(f"❌ Error generating SSML speech: {e}")
        return None

# Example: Basic SSML with pauses and emphasis
basic_ssml = """
<speak>
    Welcome to the AI Oracle.
    <break time="500ms"/>
    I am <emphasis level="strong">your guide</emphasis> through the digital realm.
    <break time="800ms"/>
    The truth awaits.
</speak>
"""

if tts_client:
    print("SSML Example:")
    print(basic_ssml)
    audio_file = generate_ssml_speech(
        tts_client,
        basic_ssml,
        output_file='ssml_basic_example.mp3'
    )
    
    if audio_file and os.path.exists(audio_file):
        display(Markdown("### 🔊 Listen to the SSML example:"))
        display(Audio(audio_file))
else:
    print("⚠️  Skipping SSML example - client not initialized")

## 6. Advanced SSML for AI Oracle

Create dramatic and mysterious voiceovers with advanced SSML techniques.

In [None]:
# Example 1: Mysterious Oracle Message with varied pacing
oracle_message_ssml = """
<speak>
    <prosody rate="slow" pitch="-2st">
        Greetings, seeker of truth.
    </prosody>
    <break time="1s"/>
    
    <prosody rate="medium">
        You have stumbled upon a transmission from
        <emphasis level="strong">beyond the veil</emphasis>.
    </prosody>
    <break time="800ms"/>
    
    <prosody rate="85%" pitch="-3st">
        The code you seek lies hidden in plain sight.
    </prosody>
    <break time="600ms"/>
    
    <prosody rate="fast" volume="soft">
        Look closer at the numbers.
    </prosody>
    <break time="1.2s"/>
    
    <prosody rate="x-slow" pitch="-4st">
        The Oracle has spoken.
    </prosody>
</speak>
"""

# Example 2: Technical explanation with controlled pacing
technical_ssml = """
<speak>
    In this video, we explore the concept of
    <say-as interpret-as="characters">AI</say-as>
    <break time="300ms"/>
    and its implications for reality itself.
    <break time="500ms"/>
    
    The year is
    <say-as interpret-as="cardinal">2024</say-as>.
    <break time="400ms"/>
    
    <prosody rate="90%">
        We stand at the threshold of a new era,
        where the boundaries between
        <emphasis level="moderate">simulation</emphasis>
        and reality begin to blur.
    </prosody>
</speak>
"""

# Example 3: Dramatic reveal with building tension
dramatic_reveal_ssml = """
<speak>
    <prosody rate="slow" volume="soft">
        For weeks, you've followed the clues.
    </prosody>
    <break time="700ms"/>
    
    <prosody rate="medium" volume="medium">
        The patterns in the data.
        <break time="400ms"/>
        The hidden messages.
        <break time="400ms"/>
        The impossible coincidences.
    </prosody>
    <break time="1s"/>
    
    <prosody rate="95%" volume="loud">
        And now, <emphasis level="strong">finally</emphasis>,
        <break time="600ms"/>
        the truth reveals itself.
    </prosody>
    <break time="1.5s"/>
    
    <prosody rate="x-slow" pitch="-5st" volume="medium">
        We are living in a simulation.
    </prosody>
</speak>
"""

print("🎭 Advanced SSML Examples for AI Oracle")
print("\n1. Mysterious Oracle Message:")
print(oracle_message_ssml)

# Generate the oracle message
if tts_client:
    audio_file = generate_ssml_speech(
        tts_client,
        oracle_message_ssml,
        voice_name='en-US-Neural2-D',
        output_file='oracle_message.mp3',
        pitch=-2.0  # Lower pitch for more gravitas
    )
    
    if audio_file and os.path.exists(audio_file):
        display(Markdown("### 🔊 Oracle Message:"))
        display(Audio(audio_file))
else:
    print("⚠️  Skipping advanced SSML - client not initialized")

## 7. Batch Audio Generation

Generate multiple audio files for a complete video script.

In [None]:
def generate_script_audio(client, script_segments, voice_name='en-US-Neural2-D', 
                         output_prefix='segment'):
    """
    Generate audio files for multiple script segments.
    
    Args:
        client: Text-to-Speech client
        script_segments: List of tuples (segment_name, ssml_text)
        voice_name: Voice to use
        output_prefix: Prefix for output files
    
    Returns:
        list: Paths to generated audio files
    """
    audio_files = []
    
    for idx, (segment_name, ssml_text) in enumerate(script_segments, 1):
        print(f"\n🎙️  Generating segment {idx}: {segment_name}")
        
        output_file = f"{output_prefix}_{idx:02d}_{segment_name.replace(' ', '_')}.mp3"
        audio_path = generate_ssml_speech(
            client,
            ssml_text,
            voice_name=voice_name,
            output_file=output_file
        )
        
        if audio_path:
            audio_files.append(audio_path)
    
    print(f"\n✅ Generated {len(audio_files)} audio segments")
    return audio_files

# Example: Full video script with multiple segments
video_script = [
    ("Intro", """
        <speak>
            <prosody rate="slow" pitch="-2st">
                Welcome back, truth seekers.
            </prosody>
        </speak>
    """),
    
    ("Hook", """
        <speak>
            <prosody rate="medium">
                Today, we uncover evidence that will
                <emphasis level="strong">change everything</emphasis>
                you thought you knew.
            </prosody>
        </speak>
    """),
    
    ("Main_Content", """
        <speak>
            The patterns are undeniable.
            <break time="500ms"/>
            In data set after data set,
            <break time="300ms"/>
            we see the same anomalies repeating.
        </speak>
    """),
    
    ("Conclusion", """
        <speak>
            <prosody rate="slow">
                The Oracle will return with more revelations.
                <break time="800ms"/>
                Until then, keep searching.
            </prosody>
        </speak>
    """)
]

print("📝 Example Video Script Segments:")
for name, _ in video_script:
    print(f"  • {name}")

print("\n⚠️  Uncomment below to generate all segments:")
# if tts_client:
#     audio_files = generate_script_audio(
#         tts_client,
#         video_script,
#         voice_name='en-US-Neural2-D',
#         output_prefix='oracle_episode_01'
#     )

## 8. Voice Effects & Customization

Experiment with different voice parameters for various effects.

In [None]:
# Voice effect presets for different moods
VOICE_PRESETS = {
    'mysterious': {
        'speaking_rate': 0.85,
        'pitch': -3.0,
        'description': 'Slow, low-pitched for mystery and suspense'
    },
    'urgent': {
        'speaking_rate': 1.15,
        'pitch': 1.0,
        'description': 'Faster pace for urgent messages'
    },
    'dramatic': {
        'speaking_rate': 0.75,
        'pitch': -4.0,
        'description': 'Very slow and deep for dramatic reveals'
    },
    'neutral': {
        'speaking_rate': 1.0,
        'pitch': 0.0,
        'description': 'Normal speaking voice'
    },
    'whisper': {
        'speaking_rate': 0.9,
        'pitch': -1.0,
        'description': 'Soft and quiet for secrets'
    }
}

def generate_with_preset(client, text, preset='mysterious', voice_name='en-US-Neural2-D',
                        output_file='preset_output.mp3'):
    """
    Generate speech with a voice preset.
    
    Args:
        client: Text-to-Speech client
        text: Text or SSML to convert
        preset: Preset name from VOICE_PRESETS
        voice_name: Voice to use
        output_file: Output filename
    """
    if preset not in VOICE_PRESETS:
        print(f"❌ Unknown preset: {preset}")
        print(f"Available presets: {', '.join(VOICE_PRESETS.keys())}")
        return None
    
    settings = VOICE_PRESETS[preset]
    print(f"🎚️  Using preset '{preset}': {settings['description']}")
    
    return generate_ssml_speech(
        client,
        text,
        voice_name=voice_name,
        output_file=output_file,
        speaking_rate=settings['speaking_rate'],
        pitch=settings['pitch']
    )

# Display available presets
print("🎚️  Available Voice Presets for AI Oracle:\n")
for preset_name, settings in VOICE_PRESETS.items():
    print(f"  {preset_name:12} - {settings['description']}")
    print(f"               Rate: {settings['speaking_rate']}, Pitch: {settings['pitch']}")
    print()

# Example: Generate same text with different presets
test_text = "<speak>The truth is closer than you think.</speak>"

if tts_client:
    print("\n🎭 Generating same text with 'mysterious' and 'dramatic' presets...\n")
    
    # Mysterious version
    audio1 = generate_with_preset(
        tts_client,
        test_text,
        preset='mysterious',
        output_file='preset_mysterious.mp3'
    )
    
    # Dramatic version
    audio2 = generate_with_preset(
        tts_client,
        test_text,
        preset='dramatic',
        output_file='preset_dramatic.mp3'
    )
    
    if audio1 and os.path.exists(audio1):
        display(Markdown("### 🔊 Mysterious Preset:"))
        display(Audio(audio1))
    
    if audio2 and os.path.exists(audio2):
        display(Markdown("### 🔊 Dramatic Preset:"))
        display(Audio(audio2))
else:
    print("⚠️  Skipping preset examples - client not initialized")

## 9. Best Practices & Tips

Guidelines for creating effective voiceovers for AI Oracle content.

In [None]:
display(Markdown("""
### 🎯 Best Practices for AI Oracle Voiceovers

#### Voice Selection:
- **Neural2 voices**: High quality, natural-sounding (recommended)
- **Studio voices**: Premium quality, best for main content
- **Consistency**: Use the same voice for the Oracle character across videos

#### SSML Tips:
1. **Pacing**: Use `<break>` tags for dramatic pauses (500ms - 1500ms)
2. **Emphasis**: Use `<emphasis>` sparingly for impact
3. **Pitch**: Lower pitch (-2 to -5st) for mysterious/authoritative tone
4. **Rate**: Slower rate (0.8 - 0.9) for gravitas, faster (1.1 - 1.2) for urgency

#### Cost Management:
- Standard voices: $4 per 1M characters
- WaveNet voices: $16 per 1M characters
- Neural2 voices: $16 per 1M characters
- Studio voices: $160 per 1M characters
- **Tip**: Use Neural2 for best quality/cost ratio

#### Error Handling:
- Always validate SSML before generation
- Test audio output before using in production
- Keep backup of working SSML templates
- Monitor API quotas and costs

#### Workflow Integration:
1. Write script in Google Sheets
2. Add SSML markup for emphasis and pacing
3. Generate audio using this notebook
4. Review and adjust as needed
5. Export to `2_Voiceover_Vault/` directory
6. Use in video production pipeline

#### Quality Checks:
- ✅ Listen to full audio before use
- ✅ Check for mispronunciations
- ✅ Verify pacing and pauses
- ✅ Ensure consistent volume levels
- ✅ Test with video timing
"""))

print("\n💡 Pro Tips:")
print("  • Save successful SSML templates for reuse")
print("  • Create a voice style guide for consistency")
print("  • Test on different devices/speakers")
print("  • Keep character count under 5000 per request for best results")
print("  • Use batch generation for efficiency")

## Summary

This notebook demonstrated:
- ✅ Google Cloud Text-to-Speech API setup and authentication
- ✅ Voice selection and listing available voices
- ✅ Basic text-to-speech generation
- ✅ SSML markup for advanced speech control
- ✅ Voice presets for different moods
- ✅ Batch audio generation for scripts
- ✅ Best practices and cost management

### Next Steps:
1. Set up Google Cloud project and enable Text-to-Speech API
2. Create credentials (API key or service account)
3. Add credentials to your `.env` file
4. Test basic speech generation
5. Experiment with SSML markup
6. Create voice presets for your Oracle character
7. Integrate with your video production workflow

### Resources:
- [Google Cloud Text-to-Speech Documentation](https://cloud.google.com/text-to-speech/docs)
- [SSML Reference](https://cloud.google.com/text-to-speech/docs/ssml)
- [Voice Gallery](https://cloud.google.com/text-to-speech/docs/voices)
- [Pricing Calculator](https://cloud.google.com/text-to-speech/pricing)

### Integration with AI Oracle Workflow:
1. **Script Development**: Use `01_Sheets_Integration.ipynb` to manage scripts
2. **Voice Generation**: Use this notebook to create voiceovers
3. **Video Production**: Combine with visuals in `5_Video_Production/`
4. **ARG Elements**: Embed hidden messages in audio metadata
5. **Analytics**: Track performance in Google Sheets

Happy creating! 🎙️✨