# Project for extra credit in Adpro

In [2]:
#imports
import whisper
from pytube import YouTube
from IPython.display import Markdown, display
import os
import warnings
from langchain_openai import ChatOpenAI

warnings.filterwarnings("ignore")

### 5a
Install whisperai on a virtual environment and make a python function that receives a file or a youtube link and generates a text extraction. The function should also receive a model name with a default for the "small" model, which is also whisper's default. (the models here determine how much memory, gpu, etc you can use, as several models have different hardware requirements).

In [3]:

def text_extraction(file, is_link=False, model_name="small", display_text=False):
    """
    Extracts text from an audio file or YouTube video link using the specified Whisper model.

    Arguments
    ---------
    file : str
        The path to the audio file or the YouTube video link.
    is_link : bool, optional
        Whether the input is a YouTube video link. The default is False.
    model_name : str, optional
        The name of the Whisper model to use for transcription. The default is "small".
    display_text : bool, optional
        Whether to display the extracted text. The default is False.
    
    Returns
    -------
    str
        The extracted text.
    """
    try:
        # Load the appropriate Whisper model
        model = whisper.load_model(model_name)
        
        if is_link:
            yt = YouTube(file)
            audio_stream = yt.streams.filter(only_audio=True).first()
            if not audio_stream:
                raise Exception("No audio stream found in the video.")
            
            audio_file = audio_stream.download(filename='audio.mp4')
            file = audio_file  # Update file variable to the downloaded audio file

        # Perform transcription
        result = model.transcribe(file)

        if display_text:
            display(Markdown(result["text"]))

        # Clean up the downloaded audio file if it's a link
        if is_link:
            os.remove(audio_file)

        return result["text"]

    except Exception as e:
        print(f"An error occurred: {e}")
        return ""

# Example usage with a YouTube link
ted_talk_url = "https://www.youtube.com/watch?v=z7e7gtU3PHY"
text = text_extraction(ted_talk_url, is_link=True, model_name="small", display_text=True)

 When I first told my friends that I was doing a talk on a study method that I use, I could see the collective look of disgust that slept across their faces as they processed what I just told them. So bear with me as I firmly believe that the Pomodoro method has the power to change your life. My typical cycle of studying used to start out determined. I would come home, sit down at my desk, and do a couple of worksheets. The only problem was that productiveness only lasted for an hour as I would easily get distracted. I would usually spend a couple hours on my phone and then I would snap back into determination but find myself getting burned out once again as the minutes ticked away. I would work until I physically couldn't anymore. I'd pass out, utterly exhausted. When my rigorous course choice this year, I had made myself promise that I would be productive. I had to. I had to succeed and yet I failed to do that every single day. I struggled to stay afloat, fatigued, stressed, and strained and I snapped as a result. And quite truthfully, I was disappointed, disappointed with myself. Then one day I came across a video. It was a video telling me how to study better and I was intrigued by one specific tip, the Pomodoro method. So what is it exactly? Well you start out by deciding on a task and estimating the amount of time that it will take you. Take for instance this AP World Chapter outline. I estimate that it will take me four hours of work, give or take. When instead of thinking about the outline as four hours of work, I'm going to think about it in terms of 25 minute increments or Pomodoros. So this outline would in theory take me eight Pomodoros. The next step is to work for those 25 minutes with absolutely no distractions or you have to restart the Pomodoro. But after that hyper focused work, you get to reward yourself with a five minute break which serves to recharge and refresh you in preparation for the next Pomodoro. Four cycles of this pattern of 25, five minutes and then you get to take a long break, 15 to 30 minutes. For myself, I typically still try to stay off my phone during these breaks and make some coffee, take a short walk or when I want to feel super productive, I'll do chores. I know. Shocker. This method is actually developed in the 90s by Francisco Cirillo who named the system Pomodoro which means tomato in Italian after this 25 minute kitchen timer that he used to track his work. It is important to know that although he developed the system for a 25, five minute pattern, the Pomodoro is a fluid system. It's designed to help you and help you with your work. For myself, I stick to the traditional 25, five minute pattern when I'm doing worksheets or studying for tests but for longer and more time consuming assignments like let's say projects or essays, I choose to work for much longer increments that take shorter breaks. So here I am now. I'm still not the perfect student and I want to iterate that but the Pomodoro has changed me. It's changed the way I think and act about my work. When needed, I can spend a full day simply working as I'm just recharged and kept simulated through the whole day. With a timer constantly ticking, I find myself working quickly in order to achieve and accomplish those goals through each 25 minute increment. And quite truthfully, it just feels so much more rewarding and fulfilling. Being able to check things off after the other, watching your pile of work go down, knowing that you accomplished something that day instead of not to call you out but wasting two hours on Netflix. So now it's my turn to ask you, are you as efficient as you can me? Are you productive or does your time seem to just slip away? Do you complete your work or is it scraped together at the last minute? The Pomodoro is a fluid system designed to help you produce higher quality work in a shorter amount of time. But whatever method, I encourage you to think about your time differently. To set goals for yourself and strive to meet them. To set aside the constant distractions and focus on your task at hand. You never know how much time you really have until you start to use it and it looks like my break is over. Thank you. Thank you.

In [4]:
# store text in a .txt file
with open("transcript.txt", "w") as file:
    file.write(text)

### 5b
Make another function that receives a text file and uses gpt-3.5-turbo for summarization, but the model can also be an option:https://platform.openai.com/docs/models/overview  You can make it support only gpt-3.5-turbo and gpt-4-turbo but use only 3.5, as 4 is way too much expensive!

In [5]:
def GPT_summarize(file_path, model='gpt-3.5-turbo', display_text=False):
    """
    Extracts text from an audio file or a YouTube video link using a specified Whisper model.

    Arguments
    ---------
    file : str
        The path to the audio file or the YouTube video link.
    is_link : bool, optional
        Indicates if 'file' is a YouTube video link. If True, the video's audio will be downloaded for processing.
        The default is False.
    model_name : str, optional
        Specifies the Whisper model to use for transcription. Valid options include "tiny", "base", "small", 
        "medium", and "large". The default is "small".
    display_text : bool, optional
        If True, displays the extracted text using Markdown format. The default is False.

    Returns
    -------
    str
        The extracted text as a string. Returns an empty string and prints an error message in case of failure.
    """

    # Read the text from the file
    with open(file_path, 'r') as file:
        text = file.read()

    prompt = f"Summarize the main topic of this text in one or two sentences:\n\n{text}"

    # Access ChatOpenAI to summarize the text
    llm = ChatOpenAI(temperature=0.1, model = model)
    summary = llm.invoke(prompt)

    # Display the summarized text if requested
    if display_text:
        display(Markdown(summary.content))

    return summary.content

# Example usage

# main topic
print("Summary:")
summary = GPT_summarize("transcript.txt", display_text=True)

Summary:


The main topic of this text is the Pomodoro method, a study technique that involves working in focused 25-minute intervals with short breaks in between. The author shares their personal experience with using the Pomodoro method and encourages readers to consider using it to improve their productivity and efficiency.

### 5c
The summary created should be either "main_topic", "abridged", or "descriptive" (or you can find better names). Main_topic: the output should be just one or two sentences for the summary showing the main topic of the discussion in the text. "Abridged" should be 5-10 bullet points and "descriptive" should have something like 3k to 3.5k charact limit. Don't focus on the quality of the output, just on the code framework to make the prompt taking these options into account! Otherwise what is a 30m coding exercise will turn into hours of research!

In [6]:

from langchain_openai import ChatOpenAI

def GPT_summarize(file_path, summary_style='main_topic', model='gpt-3.5-turbo', display_text=False):
    """
    Summarizes the contents of a text file using a specified OpenAI model.

    Arguments:
    ---------
    file_path : str
        Path to the text file to be summarized.
    summary_style : str, optional
        Type of summary needed: 'main_topic', 'abridged', or 'descriptive'.
        Default is 'main_topic'.
    model : str, optional
        Model to use for the AI summarization, options are 'gpt-3.5-turbo' or 'gpt-4-turbo'.
        Default is 'gpt-3.5-turbo'.
    display_text : bool, optional
        If True, displays the summarized text using Markdown format. Default is False.

    Returns:
    --------
    str
        The summarized text. If an invalid summary style is provided, a ValueError is raised.

    Notes:
    -----
    The function relies on the `langchain_openai.ChatOpenAI` class to interface with OpenAI's API.
    """

    # Read the text from the file
    with open(file_path, 'r') as file:
        text = file.read()

    # Generate the prompt based on the desired summary style
    if summary_style == 'main_topic':
        prompt = f"Summarize the main topic of this text in one or two sentences:\n\n{text}"
    elif summary_style == 'abridged':
        prompt = f"Provide an abridged summary of this text in 5-10 bullet points:\n\n{text}"
    elif summary_style == 'descriptive':
        prompt = f"Write a detailed summary of this text with a character limit of 3000 to 3500 characters:\n\n{text}"
    else:
        raise ValueError("Invalid summary style specified. Choose 'main_topic', 'abridged', or 'descriptive'.")

    # Access ChatOpenAI to summarize the text
    llm = ChatOpenAI(temperature=0.1, model = model)
    summary = llm.invoke(prompt)

    # Display the summarized text if requested
    if display_text:
        display(Markdown(summary.content))

    return summary.content

# Example usage

# main topic
print("Main Topic Summary:")
summary = GPT_summarize("transcript.txt", summary_style="main_topic", display_text=True)

# bullet points
print("\nAbridged Summary:")
summary = GPT_summarize("transcript.txt", summary_style="abridged", display_text=True)

# descriptive
print("\nDescriptive Summary:")
summary = GPT_summarize("transcript.txt", summary_style="descriptive", display_text=True)

Main Topic Summary:


The main topic of this text is the Pomodoro method, a study technique that involves working in focused 25-minute intervals followed by short breaks. The author shares their personal experience with the method and encourages readers to consider using it to improve productivity and efficiency in their own work.


Abridged Summary:


- The author struggled with staying productive while studying until they discovered the Pomodoro method
- The Pomodoro method involves working for 25 minutes with no distractions, followed by a 5-minute break
- After four cycles of 25 minutes of work and 5-minute breaks, a longer break of 15-30 minutes is taken
- The Pomodoro method was developed in the 90s by Francisco Cirillo and is named after a kitchen timer
- The author finds the Pomodoro method helps them stay focused and productive throughout the day
- The author encourages others to try the Pomodoro method to increase productivity and efficiency
- The Pomodoro method can be adapted for different types of tasks, such as shorter increments for worksheets and longer increments for projects
- The author believes that using time efficiently and setting goals can lead to higher quality work and a sense of accomplishment
- The author challenges readers to think about their time differently and strive to be more productive
- The author concludes by emphasizing the importance of using time effectively and getting back to work after their break


Descriptive Summary:


The author begins by recounting their struggle with productivity and focus while studying. Despite their initial determination, they found themselves easily distracted and burnt out after short periods of time. This led to feelings of fatigue, stress, and disappointment in themselves. However, everything changed when they discovered the Pomodoro method through a video. The Pomodoro method involves breaking down tasks into 25-minute intervals, known as Pomodoros, followed by short breaks. After four Pomodoros, a longer break is taken. This method was developed by Francisco Cirillo in the 90s and is designed to help individuals stay focused and productive.

The author explains how they apply the Pomodoro method to their study routine, estimating the time needed for each task and working in focused intervals with no distractions. They use the short breaks to recharge and refresh themselves, often engaging in activities like making coffee or doing chores. While they typically stick to the traditional 25, five-minute pattern for worksheets and test preparation, they adjust the intervals for longer assignments like projects or essays.

The author reflects on how the Pomodoro method has transformed their approach to work. They now find themselves more efficient and able to work consistently throughout the day. The ticking timer motivates them to work quickly and achieve their goals within each Pomodoro. They find satisfaction in checking off tasks and reducing their workload, rather than wasting time on distractions like Netflix.

In conclusion, the author encourages readers to consider their own productivity and time management. They emphasize the importance of setting goals, eliminating distractions, and focusing on tasks to produce higher quality work in less time. The Pomodoro method serves as a tool to help individuals make the most of their time and achieve their goals. The author leaves the audience with a challenge to rethink their approach to time and work, highlighting the benefits of using methods like the Pomodoro technique to enhance productivity and efficiency.

### 5d
Make a final function (or class) that puts everything together: receives as inputs the audio and all other options for whisper and the LLM and generates a text summary.

In [7]:
class AudioToSummary:

    def __init__(self, link, is_link, model_name_whisper='small', model_name_llm='gpt-3.5-turbo'):
        """
        Initialize the AudioToSummary instance with links and model specifications.

        Parameters
        ----------
        link : str
            The URL or path to the audio file to be processed. This could be a local path or a URL to a YouTube video.
        is_link : bool
            A flag to indicate whether the 'link' parameter is a URL to a YouTube video or a local file path.
        model_name_whisper : str, optional
            The name of the Whisper model to use for transcription. Defaults to 'small'.
        model_name_llm : str, optional
            The name of the language model to use for generating summaries. Defaults to 'gpt-3.5-turbo'.

        Attributes
        ----------
        text : str
            The text extracted from the audio file or YouTube video.
        _summary : str or None
            Cached value of the main topic summary. Calculated lazily.
        _description : str or None
            Cached value of the descriptive summary. Calculated lazily.
        _abridged_summary : str or None
            Cached value of the abridged summary. Calculated lazily.

        Notes
        -----
        The attributes `_summary`, `_description`, and `_abridged_summary` are lazy-loaded, meaning
        they are only computed when first accessed and if `text` is successfully extracted.
        """
        self.model_name_whisper = model_name_whisper
        self.link = link
        self.model_name_llm = model_name_llm
        self.is_link = is_link
        self.text = self.text_extraction(self.link, is_link)
        self._summary = None
        self._description = None
        self._abridged_summary = None

    @property
    def summary(self):
        """
        Retrieve or compute the main topic summary of the text.

        Returns
        -------
        str
            The main topic summary of the text. If `text` is not loaded, it returns None.
        """
        if self._summary is None and self.text:  # Generate summary only if text is loaded and summary not already generated
            self._summary = self.GPT_summarize(self.text)
        return self._summary

    @property
    def description(self):
        """
        Retrieve or compute the descriptive summary of the text.

        Returns
        -------
        str
            The descriptive summary of the text. If `text` is not loaded, it returns None.
        """
        if self._description is None and self.text:  # Generate descriptive summary only if text is loaded and not already generated
            self._description = self.GPT_summarize(self.text, summary_style='descriptive')
        return self._description

    @property
    def abridged_summary(self):
        """
        Retrieve or compute the abridged summary of the text.

        Returns
        -------
        str
            The abridged summary of the text. If `text` is not loaded, it returns None.
        """
        if self._abridged_summary is None and self.text:  # Generate abridged summary only if text is loaded and not already generated
            self._abridged_summary = self.GPT_summarize(self.text, summary_style='abridged_summary')
        return self._abridged_summary


    def text_extraction(self, file, is_link=False):
        """
        Extracts text from an audio file or YouTube video link using the specified Whisper model.

        Parameters:
        -----------
        file : str
            The path to the audio file or the YouTube video link.
        is_link : bool
            Specifies whether the input is a YouTube video link. Defaults to False.

        Returns:
        --------
        str
            The extracted text if successful, otherwise returns an empty string if an error occurs.

        Notes:
        -----
        Handles the downloading and deletion of the audio file if the source is a YouTube link.
        """

        try:
            model = whisper.load_model(self.model_name_whisper)
            if is_link:
                yt = YouTube(file)
                audio_stream = yt.streams.filter(only_audio=True).first()
                if not audio_stream:
                    raise Exception("No audio stream found in the video.")

                audio_file = audio_stream.download(filename='audio.mp4')
                file = audio_file  # Update file variable to the downloaded audio file

            result = model.transcribe(file)
            text = result["text"]

            if is_link:
                os.remove(audio_file)

            return text

        except Exception as e:
            print(f"An error occurred: {e}")
            return ""

    def GPT_summarize(self, text, summary_style='main_topic', model='gpt-3.5-turbo'):
        """
        Summarizes the text using a specified OpenAI model.

        Parameters:
        -----------
        text : str
            The text to be summarized.
        summary_style : str, optional
            The type of summary required: 'main_topic', 'abridged_summary', or 'descriptive'.
            Defaults to 'main_topic'.
        model : str, optional
            The model to use for the AI summarization, with options including 'gpt-3.5-turbo' or 'gpt-4-turbo'.
            Defaults to 'gpt-3.5-turbo'.

        Returns:
        --------
        str
            The summarized text. Raises a ValueError if an invalid summary style is provided.

        Notes:
        -----
        Leverages the OpenAI's GPT model for generating summaries based on the specified style.
        """

        # Generate the prompt based on the desired summary style
        if summary_style == 'main_topic':
            prompt = f"Summarize the main topic of this text in one or two sentences:\n\n{text}"
        elif summary_style == 'abridged_summary':
            prompt = f"Provide an abridged summary of this text in 5-10 bullet points:\n\n{text}"
        elif summary_style == 'descriptive':
            prompt = f"Write a detailed summary of this text with a character limit of 3000 to 3500 characters:\n\n{text}"
        else:
            raise ValueError("Invalid summary style specified. Choose 'main_topic', 'abridged', or 'descriptive'.")

        # Access ChatOpenAI to summarize the text
        llm = ChatOpenAI(temperature=0.1, model = model)
        summary = llm.invoke(prompt)

        return summary.content

In [8]:
# Example 1
processor = AudioToSummary(link ="https://www.youtube.com/watch?v=z7e7gtU3PHY",is_link=True)
print('summary: ', processor.summary)
print()
print('description: ', processor.description)
print()
print('Abridget summary: ', processor.abridged_summary)

summary:  The text discusses the Pomodoro method as a study technique that can help improve productivity and efficiency. The author shares their personal experience with using the method and encourages readers to consider their own time management habits in order to achieve their goals.

description:  The author begins by recounting their struggle with productivity and focus while studying. They describe a familiar cycle of starting out determined, only to quickly become distracted and burned out, leading to feelings of fatigue, stress, and disappointment. However, everything changed when they discovered the Pomodoro method, a time management technique developed by Francisco Cirillo in the 90s.

The Pomodoro method involves breaking down tasks into 25-minute intervals, known as Pomodoros, followed by short breaks. After four Pomodoros, a longer break of 15-30 minutes is taken. The author explains how they apply this method to their study routine, focusing intensely for 25 minutes, then

In [9]:
#Example 2
Veritasium = AudioToSummary(link ="https://www.youtube.com/watch?v=A5w-dEgIU1M",is_link=True)
print('summary: ', Veritasium.summary)
print()
print('description: ', Veritasium.description)
print()
print('Abridget summary: ', Veritasium.abridged_summary)

summary:  Physicists and mathematicians have revolutionized the financial industry by using equations to model market dynamics and price derivatives accurately. This has led to the creation of multi-trillion dollar industries and has allowed for the development of strategies to beat the market, as seen with the success of Jim Simons and the Medallion Fund. Their work has also challenged the efficient market hypothesis and provided new insights into risk management and market stability.

description:  The text discusses the impact of a single equation on the financial industry, specifically the Black-Scholes-Merton equation, which revolutionized the pricing of derivatives. The equation, derived by Fisher Black, Myron Scholes, and Robert Merton, provided a way to accurately price options and opened up a new way to hedge against risks in the market. The text explores the history of derivatives, from the earliest known options bought by the Greek philosopher Thales of Miletus to the modern