<a href="https://colab.research.google.com/github/Engr-Muhammad-Anees/Dubbing-Podcast-ML/blob/main/Dupping_podast_try_on_another_sample_using_assemby_ai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**package install:**

In [1]:
!pip install --quiet assemblyai elevenlabs pydub moviepy

**import libraries:**

In [2]:
from elevenlabs import ElevenLabs, VoiceSettings
from dotenv import load_dotenv
import os
import assemblyai as aai
from pydub import AudioSegment
import io
import time
from moviepy.editor import VideoFileClip, AudioFileClip

**API KEYS:**

In [3]:
os.environ["ASSEMBLYAI_API_KEY"] = "your api_key"
os.environ["ELEVENLABS_API_KEY"] = "your api_key"

In [4]:
ASSEMBLY_KEY = os.getenv("ASSEMBLYAI_API_KEY")
ELEVEN_KEY = os.getenv("ELEVENLABS_API_KEY")
print("ASSEMBLY_KEY:", ASSEMBLY_KEY)
print("ELEVEN_KEY:", ELEVEN_KEY)



**INPUT AND OUTPUT ADDRESS:**

In [37]:
input_video = "/content/drive/MyDrive/video for dubbing podcast/A Daughter’s Serious Question.mp4"
dubbed_audio = "/content/drive/MyDrive/video for dubbing podcast/dubbing_audio12345.mp3"
dubbed_video = "/content/drive/MyDrive/video for dubbing podcast/dubbing_video123455.mp4"
input_audio= "input_audio.wav"

**ASTRACT AUDIO FROM INPUT VIDEO:**

In [6]:
!ffmpeg -y -i "{input_video}" -ac 1 -ar 16000 -vn "{input_audio}" >/dev/null 2>&1
print(" Audio extracted:", input_audio)

 Audio extracted: input_audio.wav


In [7]:
aai.settings.api_key = ASSEMBLY_KEY
audio_file = input_audio

**Enable speaker diarization in the transcription config:**

In [8]:
config = aai.TranscriptionConfig(
    speaker_labels=True,
    punctuate=True,
    format_text=True
)
transcriber = aai.Transcriber(config=config)

print("Uploading and transcribing with diarization...")
transcript = transcriber.transcribe(audio_file)

Uploading and transcribing with diarization...


**Extract utterances with speaker labels:**

In [None]:
utterances = transcript.utterances
segments = []

for utt in utterances:
    segments.append({
        "speaker": utt.speaker,
        "text": utt.text.strip(),
        "start": utt.start,
        "end": utt.end
    })
    print(f"{utt.start}-{utt.end} ms | Speaker {utt.speaker}: {utt.text}")

**Generate Dubbed Speech:**

In [20]:
speaker_voices = { "A": "pNInz6obpgDQGcFmaJgB",
                  "B": "TxGEqnHWrfWFTfGW9XjX",
                   "SPEAKER_0": "pNInz6obpgDQGcFmaJgB",
                   "SPEAKER_1": "TxGEqnHWrfWFTfGW9XjX"
                    }
client = ElevenLabs(api_key=ELEVEN_KEY)

In [None]:
segments

**Function to create TTS for an utterance**

In [32]:
valid_segments = [s for s in segments if "end_ms" in s]
total_duration = int(max(s["end_ms"] for s in valid_segments))
final_audio = AudioSegment.silent(duration=total_duration)

In [None]:
for seg in segments:
    speaker = seg["speaker"]
    text = seg["text"].strip()
    start_ms = int(seg["start_ms"])
    end_ms = int(seg["end_ms"])
    duration_ms = max(300, end_ms - start_ms)
    voice_id = speaker_voices.get(speaker, "pNInz6obpgDQGcFmaJgB")

    print(f"🎙 Generating AI voice for Speaker {speaker} "
          f"({start_ms/1000:.2f}-{end_ms/1000:.2f}s): {text}")

    try:
        audio_stream = client.text_to_speech.convert(
            voice_id=voice_id,
            model_id="eleven_multilingual_v2",
            text=text
        )
        audio_bytes = b"".join(audio_stream)
        seg_audio = AudioSegment.from_file(io.BytesIO(audio_bytes), format="mp3")

        ai_dur = len(seg_audio)
        if ai_dur < duration_ms:
            seg_audio += AudioSegment.silent(duration=duration_ms - ai_dur)
        elif ai_dur > duration_ms:
            seg_audio = seg_audio[:duration_ms]

        final_audio = final_audio.overlay(seg_audio, position=start_ms)

    except Exception as e:
        print(f"not generating voice for segment: {e}")
        final_audio = final_audio.overlay(AudioSegment.silent(duration=duration_ms), position=start_ms)


**Export dubbed audio:**

In [None]:
output_audio_path = "dubbed_output_clean.wav"
final_audio.export(dubbed_audio, format="wav")
print(f"\n Dubbed audio (AI voices only) saved at: {dubbed_audio}")

**Merge Dubbed Audio into Original Video:**

In [None]:
video = VideoFileClip(input_video)
dubbed_audio = AudioFileClip("/content/drive/MyDrive/video for dubbing podcast/dubbing_audio12345.mp3")
final_video = video.set_audio(dubbed_audio)
final_video.write_videofile(dubbed_video, codec="libx264", audio_codec="aac")