This Python notebook integrates OpenAI's APIs to demonstrate **Text-to-Speech (TTS) **and** Speech-to-Text (STT)** capabilities, enabling seamless interaction with audio and text. The key features include:

**Text-to-Speech (TTS):**

Converts input text into a lifelike speech audio file using OpenAI's TTS API.
Saves the generated audio as an MP3 file and plays it directly in the notebook.
**Speech-to-Text (STT):**

Transcribes audio files into text using OpenAI's Whisper API (latest interface).
Allows users to upload an MP3 file, process it, and display the transcription in a dynamic text box.
Interactive Widgets:

Utilizes ipywidgets to create an intuitive interface for file uploads and transcription triggers.
Displays real-time feedback and outputs directly in the notebook environment.
API Integration:

Fully integrates OpenAI's APIs with support for Whisper (Audio.transcriptions.create) and TTS models.
This notebook serves as a hands-on tool for experimenting with AI-driven audio-to-text and text-to-audio transformations, making it ideal for accessibility enhancements and interactive applications.

In [None]:
# Required installations (run these in your terminal):
!pip install openai
!pip install python-dotenv

In [None]:
from pathlib import Path
from openai import OpenAI
from dotenv import load_dotenv
import os
from getpass import getpass

# Load environment variables from a .env file
load_dotenv()

False

In [None]:
# Check if OPENAI_API_KEY is already set in the environment
if "OPENAI_API_KEY" not in os.environ:
    # Prompt the user for the API key securely
    api_key = getpass("Please enter your OpenAI API key: ")

    # Set the environment variable
    os.environ["OPENAI_API_KEY"] = api_key
    print("OPENAI_API_KEY has been set.")
else:
    print("OPENAI_API_KEY is already set.")

OPENAI_API_KEY has been set.


In [None]:
# Initialize the OpenAI client
client = OpenAI()

In [None]:
def text_to_speech_openai(input_text, output_file_path="output_audio.mp3", voice="alloy", model="gpt-4o-mini"):
    """
    Convert text to speech using OpenAI's GPT-4o-mini model.
    :param input_text: The text to be converted into speech.
    :param output_file_path: The path where the audio file will be saved. Default is "output_audio.mp3".
    :param voice: The voice to use for the TTS. Default is "alloy".
    :param model: The model to use for TTS. Default is "gpt-4o-mini".
    :return: The path to the saved audio file.
    """
    try:
        # Create the audio using the specified TTS model and voice
        response = client.audio.speech.create(
            model=model,
            voice=voice,
            input=input_text
        )

        # Save the generated audio to the specified output file path
        response.stream_to_file(output_file_path)

        return output_file_path  # Return the path to the saved audio file

    except Exception as e:
        raise Exception(f"OpenAI TTS Error: {e}")


In [None]:
def convert_audio_to_text(audio_file_path, model="whisper-1"):
    """
    Convert audio to text using OpenAI's Whisper API.
    :param audio_file_path: Path to the audio file to be transcribed.
    :param model: The Whisper model to use. Default is "whisper-1".
    :return: Transcription as text.
    """
    try:
        # Open the audio file in binary mode
        with open(audio_file_path, "rb") as audio_file:
            # Call Whisper API to transcribe the audio file into text
            response = client.audio.transcriptions.create(
                model="whisper-1",
                file=audio_file
            )
            return response.text  # Extract and return the transcribed text

    except Exception as e:
        raise Exception(f"OpenAI Whisper Error: {e}")


In [None]:
# Example 1: Convert Text to Speech
text = "Today is a great day to use OpenAI's GPT-4o-mini for text-to-speech!"
tts_output_path = "gpt4o_audio.mp3"
try:
    # Generate speech from text
    tts_audio_file = text_to_speech_openai(input_text=text, output_file_path=tts_output_path, voice="alloy", model="tts-1")
    print(f"Text-to-Speech audio saved at: {tts_audio_file}")
except Exception as e:
    print(f"Error in Text-to-Speech: {e}")

Text-to-Speech audio saved at: gpt4o_audio.mp3


  response.stream_to_file(output_file_path)


In [None]:
# Example 2: Convert Speech to Text
audio_input_path = "gpt4o_audio.mp3"  # Path to the audio file to transcribe
try:
    # Transcribe audio to text
    transcription = convert_audio_to_text(audio_file_path=audio_input_path, model="whisper-1")
    print(f"Transcription: {transcription}")
except Exception as e:
    print(f"Error in Speech-to-Text: {e}")

Transcription: Today is a great day to use OpenAI's GPT-40 Mini for text-to-speech.
