# Text-to-Speech Automation with Background Music Overlay

## Overview

This project automates the process of converting text data into speech using Google Cloud's Text-to-Speech API and overlays the generated speech with background music using the `pydub` library. The final output is a collection of MP3 files enriched with background audio, suitable for applications like podcasts, audiobooks, or multimedia presentations.

## Prerequisites

Before running the scripts, ensure you have the following installed:

- **Python 3.x**: The programming language used for scripting.
- **Google Cloud Account**: To access the Text-to-Speech API.
- **FFmpeg**: A multimedia framework required by `pydub` for audio processing.
- **Python Libraries**:
  - `pandas`: For handling Excel files.
  - `pydub`: For audio manipulation.
  - `google-cloud-texttospeech`: For interfacing with Google's Text-to-Speech API.

## Setup Instructions

### 1. Just select one Google Cloud Setup, gTTS, ElevenLabs

a.1 **Create a Google Cloud Project**:
   - Navigate to the [Google Cloud Console](https://console.cloud.google.com/).
   - Click on **"Select a Project"** and then **"New Project"**.
   - Provide a name and create the project.

a.2 **Create an account in ElevenLabs**:
   - Get an API to your account.

a.3 **gTTS works without account it is FREE**:

b. **Enable the Text-to-Speech API**:
   - Within your project, go to **"APIs & Services" > "Library"**.
   - Search for **"Text-to-Speech API"** and click **"Enable"**.

c. **Set Up Authentication**:
   - Go to **"APIs & Services" > "Credentials"**.
   - Click **"Create Credentials"** and select **"Service Account"**.
   - Fill in the required details and proceed.
   - After creating the service account, navigate to it and create a **JSON key**.
   - Download the JSON key file and save it securely; you'll need its path later.

### 2. Install FFmpeg

**For Windows**:
- Download the FFmpeg executable from the [official website](https://github.com/BtbN/FFmpeg-Builds/releases).
- select and download ffmpeg-master-latest-win64-gpl file
- Extract the contents and add the `bin` directory to your system's PATH environment variable.


In [None]:
# PLEASE READ THE README FILE BEFORE RUNNING THIS SCRIPT!

# this script is used to convert text to speech using ElevenLabs API
# the script reads the text from an Excel file and generates an MP3 file for each row
# the MP3 files are saved in a folder named "speech_outputs/elevenlabs"
# this need MONTHLY SUBSCRIPTION to be used. But the quality is better than Google Text-to-Speech API.

import pandas as pd
import requests
import os

# Load the Excel file
excel_path = "Automation.xlsx"  # Make sure this file is in the same folder as the script
df = pd.read_excel(excel_path)

# ElevenLabs API details
ELEVENLABS_API_KEY = "sk_b3bd0bdd22e7c389e1b0b026c83a7e9dc3a21ec980c3aae8"
VOICE_ID = "Yko7PKHZNXotIFUBG7I9"
BASE_URL = "https://api.elevenlabs.io/v1/text-to-speech/"
HEADERS = {
    "xi-api-key": ELEVENLABS_API_KEY,
    "Content-Type": "application/json"
}

# Create output folder
output_folder = "speech_outputs/elevenlabs"
os.makedirs(output_folder, exist_ok=True)

# Select first two valid rows (non-empty Title & Message)
sample_df = df[['Title', 'Message']].dropna().head(2)

# Iterate over the selected samples
for index, row in sample_df.iterrows():
    title = str(row['Title']).strip()
    message = str(row['Message']).strip()

    if title and message:
        text_to_speak = f"{title}. {message}"
        filename = f"{index + 1}_{title.replace(' ', '_')}.mp3"
        output_path = os.path.join(output_folder, filename)

        # ElevenLabs API request
        response = requests.post(
            f"{BASE_URL}{VOICE_ID}",
            headers=HEADERS,
            json={"text": text_to_speak, "model_id": "eleven_monolingual_v1"}
        )

        if response.status_code == 200:
            with open(output_path, "wb") as f:
                f.write(response.content)
            print(f"Generated: {filename}")
        else:
            print(f"Failed for {title}: {response.json()}")

print("Process completed.")


Failed for Happy Wedding: {'detail': {'status': 'detected_unusual_activity', 'message': 'Unusual activity detected. Free Tier usage disabled. If you are using a proxy/VPN you might need to purchase a Paid Plan to not trigger our abuse detectors. Free Tier only works if users do not abuse it, for example by creating multiple free accounts. If we notice that many people try to abuse it, we will need to reconsider Free Tier altogether. \nPlease play fair and purchase any Paid Subscription to continue.'}}
Failed for Best Wishes For A Happy Married Life: {'detail': {'status': 'detected_unusual_activity', 'message': 'Unusual activity detected. Free Tier usage disabled. If you are using a proxy/VPN you might need to purchase a Paid Plan to not trigger our abuse detectors. Free Tier only works if users do not abuse it, for example by creating multiple free accounts. If we notice that many people try to abuse it, we will need to reconsider Free Tier altogether. \nPlease play fair and purchase a

In [None]:
# PLEASE READ THE README FILE BEFORE RUNNING THIS SCRIPT!
# this is a free alternative to the above script. This script uses Google Text-to-Speech API to convert text to speech.
# the script reads the text from an Excel file and generates an MP3 file for each row
# remove the .head(2) to convert all rows. .head(2) is used to convert only the first two rows.
# the MP3 files are saved in a folder named "speech_outputs/gtts"

from gtts import gTTS

# Define output folder
output_folder = "speech_outputs/gtts"
os.makedirs(output_folder, exist_ok=True)

# Select first two rows
# this generates 2 output only REMOVE THE .head(2) TO CONVERT ALL ROWS.
sample_df = df[['Title', 'Message']].dropna().head(2) 

# Iterate through the rows and generate speech
for index, row in sample_df.iterrows():
    title = str(row['Title']).strip()
    message = str(row['Message']).strip()

    if title and message:
        text_to_speak = f"{title}. {message}"
        filename = f"{index + 1}_{title.replace(' ', '_')}.mp3"
        output_path = os.path.join(output_folder, filename)

        # Generate speech using gTTS
        tts = gTTS(text_to_speak, lang="en")
        tts.save(output_path)

        print(f"Generated: {filename}")

print("Process completed.")


Generated: 1_Happy_Wedding.mp3
Generated: 2_Best_Wishes_For_A_Happy_Married_Life.mp3
Process completed.


In [1]:
# PLEASE READ THE README FILE BEFORE RUNNING THIS SCRIPT!
# This script converts text to speech using Google Cloud Text-to-Speech API.
# The script reads the text from an Excel file and generates an MP3 file for each row.
# The MP3 files are saved in a folder named "speech_outputs/google_cloud".

import os
import pandas as pd
from google.cloud import texttospeech
import re  # Import regex for cleaning filenames

# Set up Google Cloud authentication
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "peerless-tiger-452712-b1-6e90be709b2d.json"  # Change filename if renamed

# Load the Excel file
excel_path = "Automation.xlsx"  # Ensure this file is in the same folder as the script
df = pd.read_excel(excel_path)

# Create output directory for speech MP3 files
output_folder = "speech_outputs/google_cloud"
os.makedirs(output_folder, exist_ok=True)

# Initialize Google Cloud TTS Client
client = texttospeech.TextToSpeechClient()

# Select all rows with Title, Message, and Gender (excluding NaN, blanks, and '--')
sample_df = df[['Title', 'Message', 'Gender']].dropna()
sample_df = sample_df[
    (sample_df['Message'].str.strip() != '') & 
    (sample_df['Message'].str.strip() != '--')
]

# Function to clean filenames (remove invalid characters)
def clean_filename(filename):
    filename = filename.replace("\n", " ")  # Replace newlines with space
    filename = re.sub(r'[<>:"/\\|?*]', '', filename)  # Remove invalid characters
    return filename.strip()

# Iterate over all valid rows
for index, row in sample_df.iterrows():
    title = str(row['Title']).strip()
    message = str(row['Message']).strip()
    gender = str(row['Gender']).strip().lower()  # Normalize gender input

    if title and message:
        text_to_speak = f"{title}. {message}"
        filename = f"{index + 1}_{clean_filename(title)}.mp3"
        output_path = os.path.join(output_folder, filename)

        # Select the voice based on gender
        if gender == "female":
            voice = texttospeech.VoiceSelectionParams(
                language_code="en-US",
                name="en-US-Chirp-HD-F",  # Female voice
                ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
            )
        else:  # Default to Male if gender is "male" or invalid
            voice = texttospeech.VoiceSelectionParams(
                language_code="en-US",
                name="en-US-Wavenet-D",  # Male voice
                ssml_gender=texttospeech.SsmlVoiceGender.MALE
            )

        # Audio configuration
        audio_config = texttospeech.AudioConfig(
            audio_encoding=texttospeech.AudioEncoding.MP3,
            speaking_rate=0.9,  # Slower for clarity
            pitch=0  # Neutral pitch
        )

        # Create synthesis request
        synthesis_input = texttospeech.SynthesisInput(text=text_to_speak)
        response = client.synthesize_speech(
            input=synthesis_input, voice=voice, audio_config=audio_config
        )

        # Save the generated speech MP3 file
        with open(output_path, "wb") as f:
            f.write(response.audio_content)
        print(f"Generated: {filename} with {voice.name} voice.")

print("Speech generation process completed.")


Generated: 1_Happy Wedding.mp3 with en-US-Wavenet-D voice.
Generated: 2_Best Wishes For A Happy Married Life.mp3 with en-US-Chirp-HD-F voice.
Generated: 3_Congratulations and Best Wishes for your Wedding Day!.mp3 with en-US-Wavenet-D voice.
Generated: 4_Congratulations!.mp3 with en-US-Chirp-HD-F voice.
Generated: 5_Ring of Forever.mp3 with en-US-Wavenet-D voice.
Generated: 6_Pleasured to be invited.mp3 with en-US-Chirp-HD-F voice.
Generated: 7_I'm Happy For You Both.mp3 with en-US-Wavenet-D voice.
Generated: 8_Best Part.mp3 with en-US-Chirp-HD-F voice.
Generated: 9_To the my bestfriend on her wedding day.mp3 with en-US-Wavenet-D voice.
Generated: 10_Today is the day.mp3 with en-US-Chirp-HD-F voice.
Generated: 11_Best Wishes For A Happy Married Life.mp3 with en-US-Wavenet-D voice.
Generated: 12_CHEERS TO A LIFETIME OF HAPPINESS.mp3 with en-US-Chirp-HD-F voice.
Generated: 13_today we celebrate.mp3 with en-US-Wavenet-D voice.
Generated: 14_You Have Added To Our Delight.mp3 with en-US-Chir

In [2]:
# PLEASE READ THE README FILE BEFORE RUNNING THIS SCRIPT!
# this script is used to overlay background music on the generated speech files
# the script reads the MP3 files from the folder "speech_outputs/google_cloud"
# and overlays background music on each file
# the output files are saved in a folder named "output/google_cloud"
# the background music files are in "assets" folder
import os
import random
from pydub import AudioSegment

# Paths
speech_folder = "speech_outputs/google_cloud"
output_folder = "output/google_cloud"
music_folder = "assets"

# Ensure output folder exists
os.makedirs(output_folder, exist_ok=True)

def get_random_background_music():
    """Select a random background music file from the assets folder with 'background' in the filename."""
    music_files = [f for f in os.listdir(music_folder) if "background" in f.lower() and f.endswith(".mp3")]
    
    if not music_files:
        raise FileNotFoundError("No background music files found in 'assets' folder with 'background' in the filename.")

    return os.path.join(music_folder, random.choice(music_files))

def overlay_background_music(speech_file, music_file, output_file, music_volume_dB=-20, pre_delay_ms=2000, post_music_ms=2000):
    """Overlay background music onto speech with an initial delay and allow a smooth continuation at the end."""
    
    # Load the speech and background music
    speech = AudioSegment.from_file(speech_file)
    music = AudioSegment.from_file(music_file)

    # Adjust the background music volume
    music = music + music_volume_dB

    # Calculate total duration required for background music
    total_music_length = len(speech) + pre_delay_ms + post_music_ms

    # Ensure the background music is long enough
    if len(music) < total_music_length:
        music = music * ((total_music_length // len(music)) + 1)

    # Trim the music so it naturally extends beyond the speech
    music = music[:total_music_length]

    # Add silence before the speech starts
    silence = AudioSegment.silent(duration=pre_delay_ms)

    # Combine silence with speech
    speech_with_delay = silence + speech

    # Overlay the speech on the background music
    combined = music.overlay(speech_with_delay)

    # Apply a **natural fade-out** only at the end of the background music
    final_audio = combined.fade_out(post_music_ms)

    # Export final MP3
    final_audio.export(output_file, format='mp3')
    print(f"Exported: {output_file}")

# Process all generated MP3 files
for filename in os.listdir(speech_folder):
    if filename.endswith(".mp3"):
        speech_file = os.path.join(speech_folder, filename)
        output_file = os.path.join(output_folder, filename)

        # Get a random background music file
        random_background_music = get_random_background_music()

        overlay_background_music(speech_file, random_background_music, output_file, music_volume_dB=-20, pre_delay_ms=2000, post_music_ms=2000)

print("Background music overlay process completed.")


Exported: output/google_cloud\10_Today is the day.mp3
Exported: output/google_cloud\11_Best Wishes For A Happy Married Life.mp3
Exported: output/google_cloud\12_CHEERS TO A LIFETIME OF HAPPINESS.mp3
Exported: output/google_cloud\13_today we celebrate.mp3
Exported: output/google_cloud\14_You Have Added To Our Delight.mp3
Exported: output/google_cloud\15_Happy Wedding.mp3
Exported: output/google_cloud\16_To my other half.mp3
Exported: output/google_cloud\17_Thank You, My Love.mp3
Exported: output/google_cloud\19_CHEERS TO FOREVER!.mp3
Exported: output/google_cloud\1_Happy Wedding.mp3
Exported: output/google_cloud\20_Congratulations  on your wedding day!  Cheers to your new adventure!.mp3
Exported: output/google_cloud\23_My Life Without You.mp3
Exported: output/google_cloud\27_Wedding of a Lifetime.mp3
Exported: output/google_cloud\28_Our Marriage will be a Journey Forever.mp3
Exported: output/google_cloud\2_Best Wishes For A Happy Married Life.mp3
Exported: output/google_cloud\32_HAPPY E