[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IvaroEkel/AI-Spielplatz/blob/main/Tutorials/Text_To_Speech_TTS/Text_to_Speech_OpenAI_Tutorial.ipynb)

# Text-to-Speech (TTS) with OpenAI API

## Introduction
Text-to-Speech (TTS) technology converts written text into spoken words. It is used in a variety of applications, including:
- Virtual assistants
- Accessibility tools for visually impaired individuals
- Audiobook creation
- Multilingual voice applications

OpenAI provides a powerful TTS API that enables developers to create high-quality, natural-sounding speech from text.

## How Does TTS Work?

TTS systems typically involve the following components:
1. **Text Analysis**: Breaks text into smaller linguistic units (e.g., sentences, words).
2. **Phoneme Conversion**: Converts text into phonemes, the basic sound units of speech.
3. **Waveform Generation**: Synthesizes audio waveforms for the phonemes using a speech model.

Modern TTS models, like those used by OpenAI, leverage deep learning techniques such as transformer architectures to generate highly natural speech.

## Prerequisites

1. **Install the OpenAI Python Library**:
   ```bash
   pip install openai
   ```
2. **API Key**: Obtain an API key from the [OpenAI Platform](https://platform.openai.com/). Workshop-Team is happy to provide this, of course... :) 
3. **Audio Output Library** (Optional): Install `pydub` for audio playback.
   ```bash
   pip install pydub
   ```

In [None]:
import os
from openai import OpenAI

# Set up the API client
# client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY")) as system environment variable
client = OpenAI(api_key="our-shared-api-key-here")


In [None]:
def generate_speech(text, voice="default", output_file="output.mp3"):
    """
    Generates speech from text using OpenAI's TTS API.
    :param text: The text to convert to speech
    :param voice: The voice model to use (default: "default")
    :param output_file: File to save the generated audio
    :return: Path to the generated audio file
    """
    try:
        # Send the request to OpenAI TTS API
        response = client.audio.speech.create(
            model="tts-1",
            voice="alloy",
            input=text
        )
        response.with_streaming_response.method("output.mp3")

        return output_file
    except Exception as e:
        return f"Error generating speech: {e}"

# Example usage
text_input = "Hello, this is a demonstration of OpenAI's Text-to-Speech capabilities."
output_path = generate_speech(text_input)
print(f"Audio saved to 'output.mp3'")

## Challenges and Best Practices

### Challenges:
- **Latency**: Real-time speech generation can be slow for long texts.
- **Customization**: Limited options for voice customization compared to some dedicated TTS tools.
- **Cost**: Using the API extensively can incur significant costs. OpenAI: 15$ per 1M tokens TTS-ed but: 1 million tokens is a lot. A simple calculation: rule of thumb is 4 characters per token, on average, and 6 characters per word, which would put 1M tokens to ~670,000 words or ~1300 book-pages. Take-away: be careful anyways :) 

### Best Practices:
- Use short sentences for real-time applications to reduce latency.
- Cache audio files if the same text is used repeatedly.
- Choose the appropriate voice model for your application.


## Summary

- TTS technology is essential for accessibility and interactive applications.
- OpenAI's TTS API provides a simple yet powerful way to convert text into high-quality speech.
- By following this guide, you can integrate TTS functionality into your Python applications with ease.
