<a href="https://colab.research.google.com/github/cclljj/Google_colab_ipynb/blob/master/colab_notebook_YouTube_transcription.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Summarization of YouTube audio content

The following steps are used to generate a summary of a specified YouTube video: 

1. Pytube is used to extract the audio track from the video. 
2. OpenAI Whisper is employed to transcribe the audio into text. 
3. The text is summarized using the OpenAI ChatGPT API.

We install several Python libraries using the pip package manager. The first two libraries, Pytube and Pydub, are used for downloading and processing audio files, respectively. The third library, OpenAI, provides access to OpenAI's GPT-3 language model, which can generate natural language text based on a given prompt. The fourth library, OpenAI-Whisper, is used to securely access the GPT-3 language model without exposing private API keys or data. Overall, these libraries provide useful tools for working with audio and natural language processing tasks in Python.

In [2]:
# Install the Pytube library for downloading YouTube videos
!pip install -q --upgrade pytube

# Install the Pydub library for working with audio files
!pip install -q --upgrade pydub

# Install the OpenAI library for accessing OpenAI's GPT-3 language model
!pip install -q --upgrade openai

# Install the OpenAI-Whisper library for securely accessing OpenAI's GPT-3 language model
!pip install -q --upgrade openai-whisper

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.2/57.2 KB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.1/70.1 KB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m36.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m264.6/264.6 KB[0m [31m33.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m158.8/158.8 KB[0m [31m21.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.2/114.2 KB[0m [31m15.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m792.9/792.9 KB[0m [31m43.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) .

First, we set up the necessary environment for a Python program that uses these libraries and APIs.

The code imports several libraries including pytube for downloading YouTube videos, pydub for working with audio files, whisper for sending and receiving messages over a network, and re for working with regular expressions.

The code also sets the OPENAI_API_KEY environment variable using %env so that it can be used to authenticate API requests to OpenAI. Then, it sets the OpenAI API key and organization using the openai library.

In [3]:
# Import the necessary libraries
from pytube import YouTube    # library for downloading YouTube videos
from pydub import AudioSegment    # library for working with audio files
import whisper    # library for sending and receiving messages over a network
import re    # library for working with regular expressions

# Set the OpenAI API key as an environment variable
import openai    # library for working with the OpenAI API
%env OPENAI_API_KEY=sk-your_key

# Set the OpenAI organization and API key
openai.organization = ""
openai.api_key = "sk-your_key"

env: OPENAI_API_KEY=sk-your_key


This code defines a dictionary in Python that contains a list of YouTube video IDs mapped to their corresponding target file names. In this case, there is only one video in the list, and its target file name is "GPT4", while its video ID is "oc6RV5c1yd0". The purpose of this code may be to keep track of video IDs and their corresponding names, allowing the programmer to easily reference the correct video ID when needed.

In [None]:
# This is a Python dictionary that contains a list of YouTube video IDs 
# mapped to their corresponding target file names
youtube_list = {
    # Here, the "GPT4" target file name is mapped to the "oc6RV5c1yd0" video ID
    "GPT4": "oc6RV5c1yd0",
}


This Python code downloads audio files for a list of YouTube videos and transcribes them using a pre-trained speech recognition model. It first iterates through the youtube_list items, printing a message to indicate which video is being downloaded. It then creates a YouTube object for the video using its unique ID and filters for the audio stream, downloading it to a temporary directory. The audio file is then loaded into an AudioSegment object and printed for information.

Next, the audio file is converted to an MP3 format and saved to the temporary directory. The code then loads a pre-trained base model for speech recognition using the whisper.load_model() function and transcribes the audio file using the model.transcribe() method. The transcribed text is stored in the msg variable for further use.

In [None]:
# iterate through the youtube_list items
for k, v in youtube_list.items():
    # print statement to indicate which video is being downloaded
    print("Downloading " + k + " (" + v + ")")
    # create a YouTube object for the video using its unique ID
    yt = YouTube("https://www.youtube.com/watch?v=" + v)

    # filter for the audio stream and download it to a temp directory
    audio_stream = yt.streams.filter(only_audio=True).first()
    audio_stream.download(output_path="/tmp/", filename="audio_" + k)

    # load the audio file into an AudioSegment object
    audio_file = AudioSegment.from_file("/tmp/audio_" + k)

    # print information about the audio file
    print(audio_file)

    # convert the audio file to an MP3 format and save it to the temp directory
    mp3_file = audio_file.export("/tmp/audio_" + k + ".mp3", format="mp3")

    # load the base model for speech recognition and transcribe the audio file
    model = whisper.load_model("base")
    result = model.transcribe("/tmp/audio_" + k + ".mp3")

    # store the transcribed text in a variable
    msg = result["text"]

Downloading GPT4 (oc6RV5c1yd0)
<pydub.audio_segment.AudioSegment object at 0x7f9af054e280>




This function takes an article (a string) and splits it into smaller pieces (also strings) based on a maximum word count. The function splits the article into sentences using a regular expression, then loops through each sentence, keeping track of the word count in the current piece. If adding a sentence to the current piece would exceed the maximum word count, the current piece is added to the list of article pieces, and a new piece is started with the current sentence. If the word count of the current piece is below the maximum, the current sentence is added to the current piece. Finally, the last piece is added to the list of article pieces, and the list is returned.

In [None]:
def split_article(article, max_words=3000):
    # Initialize variables
    word_count = 0   # word count of the current piece
    pieces = []      # list to store article pieces
    current_piece = ""  # current piece of the article being processed
    
    # Split the article into sentences using regular expression
    lines = re.split(r"\.\s", article)
    
    # Loop through each sentence
    for line in lines:
        line = line + ". "   # add period to the end of each sentence
        words = line.split() # split the sentence into words
        words_length = len(words) # get the number of words in the sentence
        
        # Check if adding the sentence to the current piece exceeds the maximum word count
        if ((word_count + words_length) > max_words):
            pieces.append(current_piece) # add the current piece to the list of article pieces
            current_piece = line    # start a new piece with the current sentence
            word_count = words_length   # set the word count to the number of words in the sentence
        else:
            current_piece += line   # add the current sentence to the current piece
            word_count += words_length  # add the number of words in the sentence to the word count
        
    pieces.append(current_piece)   # add the last piece to the list of article pieces
    return pieces   # return the list of article pieces

This Python function uses OpenAI's GPT-3.5 Turbo model to summarize a given text. It first creates a question string q to ask the AI model to summarize the text. It then uses the openai.ChatCompletion.create() method to send a message to the AI model with the question string as user input. The AI model generates a response that includes the summarized text. The function extracts the summarized text from the AI model's response and returns it.

This function can be useful for automatically summarizing large amounts of text, such as news articles or research papers. However, it's important to note that the quality of the summary will depend on the capabilities of the AI model used and the complexity of the text being summarized.

In [None]:
def summarize_text(text):
    # Create a question for the AI model to summarize the text
    q = f"Please summarize the following text:\n{text}\n\nSummary:"
    #q = f"請依據下列的文字進行摘要:\n{text}\n\n摘要:"

    # Use OpenAI's GPT-3.5 Turbo model to generate a summary of the text
    rsp = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Editor"},
            {"role": "user", "content": q}
        ]
    )
    
    # Get the summary from the AI model's response
    summary = rsp.get("choices")[0]["message"]["content"].strip()
    
    # Return the summary
    return summary

This code takes in an article or a text message called msg, and then splits it into smaller messages using the split_article function. It then enters a while loop that runs as long as the length of the msgs list is greater than 1.

Within the loop, it initializes a variable called summary to an empty string. It then iterates over each message in the msgs list using a for loop. For each message, it calls the summarize_text function, which returns a summarized version of the message. It then appends the summarized version to the summary variable.

Once it has processed all the messages in the msgs list, it splits the summary variable into smaller messages using the split_article function and assigns the resulting list back to the msgs variable. This process is repeated until there is only one message left in the msgs list.

Finally, the code prints out the first message in the msgs list using print(msgs[0]).

In [None]:
# split the article or text message into smaller messages
msgs = split_article(msg)

# while loop that runs as long as the length of the msgs list is greater than 1
while len(msgs)>1:

  # initialize a variable to an empty string
  summary = ""

  # iterate over each message in the msgs list
  for m in msgs:

    # call the summarize_text function and append the result to the summary variable
    r = summarize_text(m)
    summary += r

  # split the summary variable into smaller messages and assign the resulting list to msgs
  msgs = split_article(summary)

# print out the first message in the msgs list
print(msgs[0])

 GPT-4 is the latest AI system from OpenAI, the lab that created Dolly, and chat GPT. GPT-4 is a breakthrough in problem solving capabilities. For example, you can ask it how you would clean the inside of a tank filled with piranhas, and it'll give you something useful. It can also read, analyze, or generate up to 25,000 words of text. It can write code in all major programming languages, and it understands images as input, and can reason with them in sophisticated ways. Most importantly, after we created GPT-4, we spent months making it safer and more aligned with how you want to use it. The methods we've developed to continuously improve GPT-4 will help us as we work towards AI systems that will empower us all.. 
