**Downloading, Transcribing, and Converting Audio**

This script automates the process of downloading audio from a YouTube video, transcribing the audio into text using the Deepgram API, and then converting the transcribed text into speech using ElevenLabs' Text-to-Speech (TTS) API. It utilizes Python libraries like yt-dlp for downloading, Deepgram for transcription, and ElevenLabs for generating synthesized audio.
Here's a breakdown of each step and function:

1. **yt-dlp**: Download Audio from YouTube
2. **Deepgram**: Transcribe Audio
3. **ElevenLabs**: Text-to-Speech


In [2]:
!pip install pafy yt-dlp deepgram-sdk aiohttp


Collecting pafy
  Downloading pafy-0.5.5-py2.py3-none-any.whl.metadata (10 kB)
Collecting yt-dlp
  Downloading yt_dlp-2024.10.7-py3-none-any.whl.metadata (171 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m171.3/171.3 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting deepgram-sdk
  Downloading deepgram_sdk-3.7.3-py3-none-any.whl.metadata (13 kB)
Collecting brotli (from yt-dlp)
  Downloading Brotli-1.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.5 kB)
Collecting mutagen (from yt-dlp)
  Downloading mutagen-1.47.0-py3-none-any.whl.metadata (1.7 kB)
Collecting pycryptodomex (from yt-dlp)
  Downloading pycryptodomex-3.21.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting websockets>=13.0 (from yt-dlp)
  Downloading websockets-13.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Collectin

In [4]:
!pip install elevenlabs


Collecting elevenlabs
  Downloading elevenlabs-1.9.0-py3-none-any.whl.metadata (10 kB)
Downloading elevenlabs-1.9.0-py3-none-any.whl (134 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/134.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: elevenlabs
Successfully installed elevenlabs-1.9.0


In [6]:
!pip install deepgram-sdk==2.12.0

Collecting deepgram-sdk==2.12.0
  Downloading deepgram_sdk-2.12.0-py3-none-any.whl.metadata (27 kB)
Downloading deepgram_sdk-2.12.0-py3-none-any.whl (25 kB)
Installing collected packages: deepgram-sdk
  Attempting uninstall: deepgram-sdk
    Found existing installation: deepgram-sdk 3.7.3
    Uninstalling deepgram-sdk-3.7.3:
      Successfully uninstalled deepgram-sdk-3.7.3
Successfully installed deepgram-sdk-2.12.0


**First let's try converting the audio generated into text using DeepGram API**

**Deepgram: Transcribe Audio**

After downloading the audio, the script transcribes the MP3 file into text using the Deepgram API, which is a deep learning-powered automatic speech recognition (ASR) service.

**Function**: transcribe_audio_deepgram(audio_file, deepgram_api_key)

**Input**:

Path to the audio file.

API key for Deepgram.

**Process**:

Opens the audio file in binary format and sends it to Deepgram’s API for transcription.

Returns the transcribed text as a string.

**Output**: Prints the transcription and returns it.

In [None]:
import yt_dlp
import requests
import json
from deepgram import Deepgram
import asyncio
import os
import nest_asyncio

# Allow nested event loops
nest_asyncio.apply()

# Function to download audio from YouTube using yt-dlp
def download_audio_from_youtube(url, output_file='audio.mp3'):
    ydl_opts = {
        'format': 'bestaudio/best',
        'outtmpl': output_file,
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }],
    }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        ydl.download([url])
    print(f'Audio downloaded at: {output_file}')
    return output_file

# Function to transcribe audio using Deepgram API
async def transcribe_audio_deepgram(audio_file, deepgram_api_key):
    # Initialize Deepgram SDK
    deepgram = Deepgram(deepgram_api_key)

    # Open the audio file
    with open(audio_file, 'rb') as audio:
        # Sending the file for transcription
        source = {'buffer': audio, 'mimetype': 'audio/mp3'}  # Change to mp3

        response = await deepgram.transcription.prerecorded(source, {'punctuate': True})
        transcript = response['results']['channels'][0]['alternatives'][0]['transcript']
        print(f'Transcript: {transcript}')
        return transcript

# Main execution block
async def main():
    # Your API key for Deepgram
    DEEPGRAM_API_KEY = 'API_KEY'

    # YouTube URL to download audio from
    YOUTUBE_URL = 'https://www.youtube.com/watch?v=oubBnzSg2jY'  # Example video URL

    # Step 1: Download the audio from YouTube using yt-dlp
    audio_file_path = download_audio_from_youtube(YOUTUBE_URL)

    # Step 2: Use Deepgram to transcribe the audio
    await transcribe_audio_deepgram(audio_file_path, DEEPGRAM_API_KEY)

    # Optionally, delete the audio file after transcription
    if os.path.exists(audio_file_path):
        os.remove(audio_file_path)
        print("Audio file deleted.")

# Call the main function
asyncio.run(main())


[youtube] Extracting URL: https://www.youtube.com/watch?v=oubBnzSg2jY
[youtube] oubBnzSg2jY: Downloading webpage
[youtube] oubBnzSg2jY: Downloading ios player API JSON
[youtube] oubBnzSg2jY: Downloading mweb player API JSON
[youtube] oubBnzSg2jY: Downloading m3u8 information
[info] oubBnzSg2jY: Downloading 1 format(s): 251
[download] audio.mp3 has already been downloaded
[download] 100% of  402.23KiB
[ExtractAudio] Not converting audio audio.mp3; file is already in target format mp3
Audio downloaded at: audio.mp3
Transcript: Hi. You know who you call, leave a message. Maybe they'll call you back. Then again, maybe they won't. That's how life is. Point is. You've done what you can. Have a nice day.
Audio file deleted.


In [5]:
!pip install --upgrade elevenlabs




**We extend on the above output by adding elevenlabs to the mix**

**ElevenLabs: Text-to-Speech**

The transcribed text is then converted into speech using ElevenLabs’ Text-to-Speech (TTS) API, which provides highly realistic and customizable voice generation.

**Function**: convert_text_to_audio_elevenlabs(transcript, output_audio)

**Input**:

The transcript (text to convert to speech).

(Optional) output filename for the audio (default is 'output_audio.mp3').

**Process**:

Initializes an ElevenLabs client with an API key.

Specifies custom voice settings (like stability and similarity boost) and a voice ID.

Calls the TTS API to generate the speech and saves it as an MP3 file.

**Output**: Saves the generated audio file locally.

In [8]:
import yt_dlp
import requests
import json
from deepgram import Deepgram
import asyncio
import os
import nest_asyncio
from elevenlabs.client import ElevenLabs

# Allow nested event loops
nest_asyncio.apply()


# Function to download audio from YouTube using yt-dlp
# Function to download audio from YouTube using yt-dlp
def download_audio_from_youtube(url, output_file='audio.mp3'):
    ydl_opts = {
        'format': 'bestaudio/best',
        'outtmpl': output_file,
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }],
        'verbose': True,  # Enable verbose logging
    }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        ydl.download([url])

    # Check if the audio file was created
    if not os.path.exists(output_file):
        raise FileNotFoundError(f"Downloaded audio file '{output_file}' not found.")

    print(f'Audio downloaded at: {output_file}')
    return output_file


# Function to transcribe audio using Deepgram API
async def transcribe_audio_deepgram(audio_file, deepgram_api_key):
    # Initialize Deepgram SDK
    deepgram = Deepgram(deepgram_api_key)

    # Print the path to the audio file for debugging
    print(f"Attempting to open audio file: {audio_file}")

    # Open the audio file
    with open(audio_file, 'rb') as audio:
        # Sending the file for transcription
        source = {'buffer': audio, 'mimetype': 'audio/mp3'}  # Change to mp3

        response = await deepgram.transcription.prerecorded(source, {'punctuate': True})
        transcript = response['results']['channels'][0]['alternatives'][0]['transcript']
        print(f'Transcript: {transcript}')
        return transcript


# Function to convert text to audio using ElevenLabs TTS API
def convert_text_to_audio_elevenlabs(transcript, output_audio='output_audio.mp3'):
    # Set ElevenLabs API key
    client = ElevenLabs( api_key="API_KEY" )

    # Define custom voice settings
    voice = elevenlabs.Voice(
        voice_id="ZQe5CZNOzWyzPSCn5a3c",  # You can replace this with a valid voice ID from ElevenLabs
        settings=elevenlabs.VoiceSettings(
            stability=0,
            similarity_boost=0.75
        )
    )

    # Generate audio using ElevenLabs API
    audio = client.generate(
        text=transcript,
        voice=voice
    )

    # Save the generated audio to an mp3 file
    elevenlabs.save(audio, output_audio)
    print(f"Audio saved to {output_audio}")

# Main execution block
async def main():
    # Your API keys for Deepgram and ElevenLabs
    DEEPGRAM_API_KEY = 'API_KEY'

    # YouTube URL to download audio from
    YOUTUBE_URL = 'https://www.youtube.com/watch?v=oubBnzSg2jY'  # Example video URL

    # Step 1: Download the audio from YouTube using yt-dlp
    audio_file_path = download_audio_from_youtube(YOUTUBE_URL)

    # Step 2: Use Deepgram to transcribe the audio
    transcript = await transcribe_audio_deepgram(audio_file_path, DEEPGRAM_API_KEY)

    # Step 3: Convert the transcript to audio using ElevenLabs
    convert_text_to_audio_elevenlabs(transcript)

    # Optionally, delete the audio file after transcription
    if os.path.exists(audio_file_path):
        os.remove(audio_file_path)
        print("Audio file deleted.")

# Call the main function
asyncio.run(main())


[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out UTF-8 (No ANSI), error UTF-8 (No ANSI), screen UTF-8 (No ANSI)
[debug] yt-dlp version stable@2024.10.07 from yt-dlp/yt-dlp [1a176d874] (pip) API
[debug] params: {'format': 'bestaudio/best', 'outtmpl': 'audio.mp3', 'postprocessors': [{'key': 'FFmpegExtractAudio', 'preferredcodec': 'mp3', 'preferredquality': '192'}], 'verbose': True, 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.10.12 (CPython x86_64 64bit) - Linux-6.1.85+-x86_64-with-glibc2.35 (OpenSSL 3.0.2 15 Mar 2022, glibc 2.35)
[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2
[debug] Optional libraries: Cryptodome-3.21.0, brotli-1.1.0, certifi-2024.08.30, mutagen-1.47.0, requests-2.

[youtube] Extracting URL: https://www.youtube.com/watch?v=oubBnzSg2jY
[youtube] oubBnzSg2jY: Downloading webpage
[youtube] oubBnzSg2jY: Downloading ios player API JSON
[youtube] oubBnzSg2jY: Downloading mweb player API JSON


[debug] Loading youtube-nsig.2f238d39 from cache
[debug] [youtube] Decrypted nsig hip10pa9Z65CKgF => 3HJXlHuUXzKYeQ
[debug] Loading youtube-nsig.2f238d39 from cache
[debug] [youtube] Decrypted nsig -0RjxRNqxsXMG0K => uckcVZMKPIy5Kg


[youtube] oubBnzSg2jY: Downloading m3u8 information


[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id


[info] oubBnzSg2jY: Downloading 1 format(s): 251


[debug] Invoking http downloader on "https://rr5---sn-p5qs7nsr.googlevideo.com/videoplayback?expire=1728896385&ei=IYkMZ7PyOcS9kucPprnygQY&ip=34.86.97.63&id=o-ACWCSgICQIPpM_Yww4gXqtZU7HT3luyfKZe2yazBj4nj&itag=251&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&met=1728874785%2C&mh=d0&mm=31%2C26&mn=sn-p5qs7nsr%2Csn-ab5l6ndr&ms=au%2Conr&mv=m&mvi=5&pl=20&rms=au%2Cau&initcwndbps=1667500&vprv=1&svpuc=1&mime=audio%2Fwebm&rqh=1&gir=yes&clen=257148&dur=17.101&lmt=1695711452569717&mt=1728874366&fvip=1&keepalive=yes&fexp=51300761&c=IOS&txp=5311224&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cxpc%2Cvprv%2Csvpuc%2Cmime%2Crqh%2Cgir%2Cclen%2Cdur%2Clmt&sig=AJfQdSswRQIhAOK_pS_ml0iiIt_kWn61STLd7-5QchHVXJT6qiJhJ9t_AiAEJPxuzTelJLq-28ass15EVM-g4ri4ZITFjPFHuu8PXA%3D%3D&lsparams=met%2Cmh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Crms%2Cinitcwndbps&lsig=ACJ0pHgwRQIgfhVIENk8MZrMgi_RpSojumrV0Ov1k9z7cx3g2m1qKAECIQCi7fEhFvNHo5WbRdKLuvf3tQgYTS459xkVp8iQpEORwA%3D%3D"


[download] audio.mp3 has already been downloaded
[download] 100% of  402.23KiB


[debug] ffmpeg command line: ffprobe -show_streams file:audio.mp3


[ExtractAudio] Not converting audio audio.mp3; file is already in target format mp3
Audio downloaded at: audio.mp3
Attempting to open audio file: audio.mp3
Transcript: Hi. You know who you call, leave a message. Maybe they'll call you back. Then again, maybe they won't. That's how life is. Point is. You've done what you can. Have a nice day.
Audio saved to output_audio.mp3
Audio file deleted.
