# Required libraries

### openai-whisper: This library provides access to OpenAI's Whisper model for audio transcription.
### youtube-dl: Used to fetch audio streams from YouTube videos.
### ffmpeg-python: A Python wrapper for FFmpeg, which helps in processing audio streams

## 1. Download the audio of the video we want to summarize

In [1]:
import yt_dlp
import ffmpeg
import whisper
import torch
from transformers import pipeline, BartTokenizer
import pyttsx3
from tqdm import tqdm
import time
from pydub import AudioSegment

In [2]:
video_url = 'https://www.youtube.com/watch?v=zuzW7Ipoe6U'  # video URL

In [3]:
# Function to extract audio from a YouTube video
def extract_audio_stream(video_url):
    # Set youtube-dl options
    ydl_opts = {
        'format': 'bestaudio/best',  # Best available audio format
        'quiet': True,  # Suppress verbose output
        'outtmpl': 'audio.%(ext)s',  # Output template for audio file
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',  # Use FFmpeg to extract audio
            'preferredcodec': 'mp3',  # Preferred audio codec
            'preferredquality': '192',  # Preferred audio quality
        }],
    }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        # Download the audio stream
        ydl.download([video_url])

    # Return the downloaded audio file path
    return 'audio.mp3'

In [4]:
# Example usage
audio_file = extract_audio_stream(video_url)
print(f'Audio File: {audio_file}')

Audio File: audio.mp3                                      


## 1.1 Test the audio file downloaded

In [5]:
#from IPython.display import Audio

# Play the audio file in the notebook
#Audio("audio.mp3")

## 2. Transcribe Audio Using preloaded OpenAI's Whisper model

In [6]:
# Function to initialize and preload the Whisper model with a progress bar
def preload_whisper_model_with_progress():
    print("Loading Whisper large model...")
    
    # Simulate progress for model loading
    progress_bar = tqdm(total=100, desc="Loading Model", bar_format="{l_bar}{bar} [ time left: {remaining} ]")
    
    # Simulate loading time (e.g., 5 seconds)
    simulated_loading_time = 1  # seconds
    loading_steps = 100  # Number of steps to simulate
    
    for _ in range(loading_steps):
        # Simulate some work being done
        time.sleep(simulated_loading_time / loading_steps)
        progress_bar.update(1)
    
    progress_bar.close()
    
    # Load the Whisper model and move it to the appropriate device
    model = whisper.load_model("large").to("cuda" if torch.cuda.is_available() else "cpu")
    print("Model loaded successfully.")
    return model

# Function to transcribe audio using the preloaded Whisper model
def transcribe_audio_with_progress(model, audio_file, language='en'):
    # Load the audio file to get its duration
    audio = AudioSegment.from_file(audio_file)
    audio_duration = len(audio) / 1000  # Duration in seconds

    # Initialize progress bar for transcription
    progress_bar = tqdm(total=100, desc="Transcribing Audio", bar_format="{l_bar}{bar} [ time left: {remaining} ]")
    
    # Estimate transcription time based on audio duration
    estimated_transcription_time = audio_duration * 0.5  # Example: assume transcription takes 50% of audio duration
    transcription_steps = 100  # Number of steps for progress bar

    for _ in range(transcription_steps):
        time.sleep(estimated_transcription_time / transcription_steps)
        progress_bar.update(1)
    
    progress_bar.close()

    print(f"Transcribing {audio_file}...")
    result = model.transcribe(audio_file, language=language)
    transcript = result['text']
    return transcript

# Preload the model with a progress bar
whisper_model = preload_whisper_model_with_progress()

# Path to your Chinese audio file
audio_file = 'audio.mp3'  # Replace with the actual path to your audio file

# Transcribe the audio using the preloaded model with progress bar
transcript = transcribe_audio_with_progress(whisper_model, audio_file)
print('Transcript:', transcript)

Loading Whisper large model...


Loading Model: 100%|██████████████████████████████████████████████████████████████████████████████ [ time left: 00:00 ]


Model loaded successfully.


Transcribing Audio: 100%|█████████████████████████████████████████████████████████████████████████ [ time left: 00:00 ]


Transcribing audio.mp3...
Transcript:  Today I will tell you an inspiring story from history. In ancient Rome, Seneca was one of the most famous philosophers of his time, who had great wealth. Despite his wealth, Seneca found inner peace and happiness by adopting the teachings of Stoic philosophy. One day, one of Seneca's students asked him, does wealth bring happiness or does happiness come from wealth? In answering this question, Seneca deeply analyzed the nature of wealth and happiness. So why did we start with Seneca's story? Because this video explores the relationship between wealth and happiness within the framework of Stoic philosophy. How do you think happiness and wealth are related? In this video, you will learn the answers to such questions and how to achieve true happiness. Have you ever thought that wealth brings happiness? Or on the contrary, have you known people who are rich but unhappy? To understand this complex relationship between wealth and happiness, we will take

## 3.Summarize the Transcript Using Hugging Face Transformers

## 3.1 Initialization

In [7]:
# Load the summarization pipeline with a pre-trained model
# Initialize the summarization pipeline and tokenizer
summarizer = pipeline('summarization', model='facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')

In [8]:
# Initialize the text-to-speech engine
tts_engine = pyttsx3.init()

In [9]:
# Set properties for the TTS engine
tts_engine.setProperty('rate', 150)  # Speed of speech
tts_engine.setProperty('volume', 0.9)  # Volume (0.0 to 1.0)

# Change voice (if multiple voices are available)
voices = tts_engine.getProperty('voices')
tts_engine.setProperty('voice', voices[0].id)  # Change index for different voices

## 3.2 Functions to split the transcript if it exceeds 1024 words/tokens as the max limit for BART

In [10]:
# Function to split transcript into chunks
def split_transcript(transcript, max_tokens=1024):
    # Tokenize the transcript
    tokens = tokenizer(transcript, return_tensors='pt', truncation=False)['input_ids'][0]
    total_tokens = len(tokens)

    # Split into chunks
    chunks = []
    for i in range(0, total_tokens, max_tokens):
        chunk_tokens = tokens[i:i+max_tokens]
        chunk_text = tokenizer.decode(chunk_tokens, skip_special_tokens=True)
        chunks.append(chunk_text)

    return chunks

In [11]:
# Function to summarize the transcript
def summarize_transcript(transcript):
    # Tokenize the transcript to get the number of tokens
    tokens = tokenizer(transcript, return_tensors='pt', truncation=False)['input_ids'][0]
    total_tokens = len(tokens)

    # Check if the transcript is longer than the model's token limit
    if total_tokens > 1024:
        print(f"Transcript is too long ({total_tokens} tokens). Splitting into chunks.")
        transcript_chunks = split_transcript(transcript, max_tokens=1024)
        summaries = []
        for index, chunk in enumerate(transcript_chunks):
            try:
                summary = summarizer(chunk, max_length=150, min_length=50, do_sample=False)
                summaries.append(summary[0]['summary_text'])
            except IndexError as e:
                print(f"Error summarizing chunk {index}: {e}")
        final_summary = ' '.join(summaries)
    else:
        try:
            print(f"Transcript is within token limit ({total_tokens} tokens). Summarizing directly.")
            summary = summarizer(transcript, max_length=200, min_length=50, do_sample=False)
            final_summary = summary[0]['summary_text']
        except IndexError as e:
            print(f"Error summarizing the transcript: {e}")
            final_summary = ""

    return final_summary

In [12]:
# Function to read the text aloud
def read_text_aloud(text):
    tts_engine.say(text)
    tts_engine.runAndWait()

## 3.3 Summarize

In [13]:
# Summarize the transcript
final_summary = summarize_transcript(transcript)
print('Final Summary:', final_summary)

Transcript is too long (2988 tokens). Splitting into chunks.
Error summarizing chunk 0: index out of range in self
Error summarizing chunk 1: index out of range in self
Final Summary:  wealth can make our lives more comfortable and secure, but it is not the source of happiness. Helping others and contributing to society can increase an individual's happiness. Gratitude means appreciating what we have and being grateful for it. Remember, wealth is a tool and true happiness comes from inner peace.


In [14]:
# Read the final summary aloud
read_text_aloud(final_summary)