# **YouTube Video Transcription with Speaker Diarization**

This notebook uses AssemblyAI to transcribe a YouTube video and assigns speaker labels to each segment of the transcription.

In [1]:
!pip install yt-dlp assemblyai



### Step 1: Download Audio from YouTube
Use `yt-dlp` to extract audio from the YouTube video.

In [2]:
# Download audio from a YouTube video
import os

# Step 1: Download audio from YouTube
youtube_url = input("Enter the YouTube video URL: ")
os.system(f'yt-dlp -x --audio-format mp3 -o "audio.mp3" "{youtube_url}"')
print("Audio downloaded as 'audio.mp3'")


Enter the YouTube video URL: https://youtu.be/DQzKw30LeTA?si=hdyugBZVIo-a2q4x
Audio downloaded as 'audio.mp3'


In [3]:

# Step 2: Verify the audio file
if not os.path.exists("audio.mp3"):
    raise FileNotFoundError("The audio file was not downloaded successfully.")


### Step 2: Set Up AssemblyAI
Obtain your API key from [AssemblyAI](https://www.assemblyai.com/) and initialize the client.

In [4]:
import assemblyai as aai



# Step 3: Set up AssemblyAI
aai.settings.api_key = "76e966abc56746f88f365735a37c766f"  # Replace with your API key
transcriber = aai.Transcriber()


### Step 3: Upload Audio and Transcribe with Speaker Diarization

In [5]:

# Step 4: Configure transcription with speaker diarization
config = aai.TranscriptionConfig(speaker_labels=True)

transcript = transcriber.transcribe("/content/audio.mp3", config)


# Step 6: Print the transcription with speaker labels
print("Transcription with Speaker Labels:")
for utterance in transcript.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.text}")


Transcription with Speaker Labels:
Speaker A: Welcome to Open to Debate. I'm John Donvan and we are gathered for something remarkable here. A meeting between two men who disagree fiercely across an issue that has fiercely divided so many of us within families and on college campuses. And for that alone, for their courage in coming to the same stage, let us congratulate them at the outset because we know how rare it is in our currently hyper polarized culture for opposing viewpoints to be heard, spoken aloud and in competition with each other under the same roof. But that is the essence of debate and that is the value that we promote it. Open to debate. And so it is an honor to be here at New York's venerable Adler Hall, a crown jewel of the New York Society for Ethical Culture, where we cannot and we will not forget how the war in Gaza is steeped in pain and outrage and fear and anger. It's why this debate matters and it's why it is natural to take sides, even necessary to take sides. 

### Step 4: Save the Transcript
Save the transcription as a text file for further use.

In [6]:
# Step 7: Save the transcript to a file
with open("transcript_with_speakers.txt", "w") as f:
    for utterance in transcript.utterances:
        f.write(f"Speaker {utterance.speaker}: {utterance.text}\n")
print("Covid Debate.txt")


Covid Debate.txt
