<a href="https://colab.research.google.com/github/cclljj/Google_colab_ipynb/blob/master/colab_notebook_YouTube_transcription.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Summarization of YouTube audio content

The following steps are used to generate a summary of a specified YouTube video: 

1. Pytube is used to extract the audio track from the video. 
2. OpenAI Whisper is employed to transcribe the audio into text. 
3. The text is summarized using the OpenAI ChatGPT API.

We install several Python libraries using the pip package manager. The first two libraries, Pytube and Pydub, are used for downloading and processing audio files, respectively. The third library, OpenAI, provides access to OpenAI's GPT-3 language model, which can generate natural language text based on a given prompt. The fourth library, OpenAI-Whisper, is used to securely access the GPT-3 language model without exposing private API keys or data. Overall, these libraries provide useful tools for working with audio and natural language processing tasks in Python.

In [None]:
# Install the Pytube library for downloading YouTube videos
!pip install -q --upgrade pytube

# Install the Pydub library for working with audio files
!pip install -q --upgrade pydub

# Install the OpenAI library for accessing OpenAI's GPT-3 language model
!pip install -q --upgrade openai

# Install the OpenAI-Whisper library for securely accessing OpenAI's GPT-3 language model
!pip install -q --upgrade openai-whisper

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.2/57.2 KB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.1/70.1 KB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m158.8/158.8 KB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m264.6/264.6 KB[0m [31m25.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.2/114.2 KB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m792.9/792.9 KB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ..

First, we set up the necessary environment for a Python program that uses these libraries and APIs.

The code imports several libraries including pytube for downloading YouTube videos, pydub for working with audio files, whisper for sending and receiving messages over a network, and re for working with regular expressions.

The code also sets the OPENAI_API_KEY environment variable using %env so that it can be used to authenticate API requests to OpenAI. Then, it sets the OpenAI API key and organization using the openai library.

In [None]:
# Import the necessary libraries
from pytube import YouTube    # library for downloading YouTube videos
from pydub import AudioSegment    # library for working with audio files
import whisper    # library for sending and receiving messages over a network
import re    # library for working with regular expressions

# Set the OpenAI API key as an environment variable
import openai    # library for working with the OpenAI API
%env OPENAI_API_KEY=sk-my_openai_key

# Set the OpenAI organization and API key
openai.organization = ""
openai.api_key = "sk-my_openai_key"

env: OPENAI_API_KEY=sk-my_openai_key


This code defines a dictionary in Python that contains a list of YouTube video IDs mapped to their corresponding target file names. In this case, there is only one video in the list, and its target file name is "GPT4", while its video ID is "oc6RV5c1yd0". The purpose of this code may be to keep track of video IDs and their corresponding names, allowing the programmer to easily reference the correct video ID when needed.

In [None]:
# This is a Python dictionary that contains a list of YouTube video IDs 
# mapped to their corresponding target file names
youtube_list = {
    # Here, the "GPT4" target file name is mapped to the "oc6RV5c1yd0" video ID
    "GPT4": "oc6RV5c1yd0",
}


The rephrase_text function takes in one parameter called text. This function is responsible for rephrasing and improving the readability of the input text.

The function uses the OpenAI API to generate an improved version of the text. It creates a prompt string called q and passes it to the openai.ChatCompletion.create() method along with the name of the GPT-3 model to be used for generating the improved text. The messages parameter is a list of two dictionaries that specify the role of the chat message sender (system or user) and the content of the chat message.

After generating the improved text, the function extracts the text from the OpenAI API response and returns it as the summary.

In [None]:
def rephrase_text(text, language="en"):
    # Create a prompt string to be sent to the OpenAI API
    if language=="zh":
      q = f"請幫我用中文改錯字、加標點符號，讓內容更通順:\n\n{text}\n\n 修正後文字:"
    else:
      q = f"Please rephrase the following text:\n{text}\n\nRevision:"
    
    # Call the OpenAI API to generate an improved version of the text
    rsp = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Editor"},
            {"role": "user", "content": q}
        ]
    )
    
    # Extract the improved text from the OpenAI API response
    summary = rsp.get("choices")[0]["message"]["content"].strip()
    
    # Return the improved text
    return summary

The split_article function takes in an article string and a maximum number of words max_words as inputs. It splits the article into pieces where each piece has a maximum word count of max_words.

To do this, it first initializes a word_count variable to 0, a pieces list to store the article pieces, and a current_piece string to store the current article piece being processed.

Next, the function uses the re module's split() method to split the article string into lines at each period and space (.\s). The resulting lines list contains each sentence as a separate element.

For each line in the lines list, the function appends a period and space to the end of the line to ensure that the split sentence remains grammatically correct. The line is then split into words using the split() method, and the length of the resulting words list is stored in words_length.

If the sum of the word_count and words_length is greater than the max_words, the current_piece is sent to a separate function rephrase_text for any necessary modification, appended to the pieces list, and reset to the current line. The word_count is also reset to the length of the current line's words list.

If the sum of the word_count and words_length is less than or equal to the max_words, the current line is added to the current_piece string, and the word_count is incremented by the length of the current line's words list.

Finally, the last current_piece is appended to the pieces list and the list is returned.

In [None]:
def split_article(article, language="en", max_words=1000):
    word_count = 0   # word count of the current piece
    pieces = []      # list to store article pieces
    current_piece = ""  
    
    if language=="zh":
        max_words = 500
        lines = re.split(r"，", article)  # split article into lines at each period and space
    else:
        lines = re.split(r"\.\s", article)  # split article into lines at each period and space
    
    for line in lines:
        if language=="zh":
            line = line + "，"  # add period and space to end of line for grammatical correctness
            words_length = len(line)  # get length of words
        else:
            line = line + ". "  # add period and space to end of line for grammatical correctness
            words = line.split()  # split line into words
            words_length = len(words)  # get length of words list
        
        if ((word_count + words_length) > max_words):  # if word count exceeds max_words
            current_piece = rephrase_text(current_piece, language)  # send current piece to rephrase_text function for modification
            pieces.append(current_piece)  # append modified piece to pieces list
            current_piece = line  # reset current piece to current line
            word_count = words_length  # reset word count to length of current line's words list
        else:
            current_piece += line  # add current line to current piece
            word_count += words_length  # increment word count by length of current line's words list
        
    pieces.append(current_piece)  # append last current piece to pieces list
    return pieces  # return list of article pieces

The summarize_text() function takes in a piece of text as input and returns a summarized version of it generated by OpenAI's GPT-3.5 Turbo model. The function does this by first creating a question to ask the AI model to summarize the given text. This question is in the form of a string that includes the text and a Summary label.

Then, the function uses OpenAI's ChatCompletion API to generate a summary of the text. It passes the question and a few other parameters to the ChatCompletion.create() method, which makes a request to the GPT-3.5 Turbo model to generate the summary.

The AI model's response includes the generated summary, which the function extracts and returns as output.

In [None]:
def summarize_text(text, language="en"):
    # Create a question for the AI model to summarize the text
    if language=="zh":
        q = f"請依據下列的文字進行摘要:\n{text}\n\n摘要:"
    else:
        q = f"Please summarize the following text:\n{text}\n\nSummary:"

    # Use OpenAI's GPT-3.5 Turbo model to generate a summary of the text
    rsp = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Editor"},
            {"role": "user", "content": q}
        ]
    )
    
    # Get the summary from the AI model's response
    summary = rsp.get("choices")[0]["message"]["content"].strip()
    
    # Return the summary
    return summary

This following code downloads a video from YouTube, extracts the audio, transcribes the audio to text using a pre-trained speech recognition model, summarizes the resulting text, and saves the summarized text to a text file.

The code starts by iterating over a dictionary called "youtube_list" that contains the titles and unique IDs of the YouTube videos to be downloaded. For each video in the list, it prints a message indicating that the video is being downloaded, creates a YouTube object using the video's unique ID, and downloads the audio stream for that video to a temporary directory.

After downloading the audio file, the code loads it into an AudioSegment object and prints information about the audio file, such as its length and format. It then converts the audio file to an MP3 format and saves it to the temporary directory.

The code then loads a pre-trained speech recognition model called "small" and transcribes the audio file to text using that model. The resulting text is stored in a variable called "msg", and the language of the transcribed text is stored in a variable called "lang".

The code then splits the transcribed text into smaller messages using the "split_article" function and enters a while loop that runs as long as the length of the resulting "msgs" list is greater than 1. Within the while loop, the code initializes an empty string called "summary" and iterates over each message in the "msgs" list. For each message, it calls the "summarize_text" function to summarize the text and appends the result to the "summary" variable.

Once all messages in the "msgs" list have been summarized, the code splits the resulting "summary" string into smaller messages and assigns the resulting list back to the "msgs" variable. The while loop then repeats until the length of the "msgs" list is reduced to 1.

Finally, the code creates a new text file with the same name as the original video file and saves the first message in the "msgs" list to that file. The code prints a message indicating the name of the output file and then closes the file.

In [None]:
# iterate over the items in the youtube_list dictionary
for k, v in youtube_list.items():

    # print statement to indicate which video is being downloaded
    print("Downloading " + k + " (" + v + ")")

    # create a YouTube object for the video using its unique ID
    yt = YouTube("https://www.youtube.com/watch?v=" + v)

    # filter for the audio stream and download it to a temp directory
    audio_stream = yt.streams.filter(only_audio=True).first()
    audio_stream.download(output_path="/tmp/", filename="audio_" + k)

    # load the audio file into an AudioSegment object
    audio_file = AudioSegment.from_file("/tmp/audio_" + k)

    # convert the audio file to an MP3 format and save it to the temp directory
    mp3_file = audio_file.export("/tmp/audio_" + k + ".mp3", format="mp3")

    # load the small model for speech recognition and transcribe the audio file
    model = whisper.load_model("small")
    result = model.transcribe("/tmp/audio_" + k + ".mp3")

    # store the transcribed text in a variable
    msg = result["text"]
    lang = result["language"]

    if lang=="zh":
      msg = ""
      txt_writer = get_writer("txt", ".")
      txt_writer(result, k + "2.txt")
      with open(k + "2.txt") as f:
        lines = f.read().splitlines() 
      for line in lines:
        msg += line + "，"
    # split the article or text message into smaller messages
    msgs = split_article(msg, lang)

    # while loop that runs as long as the length of the msgs list is greater than 1
    while len(msgs)>1:

      # initialize a variable to an empty string
      summary = ""

      # iterate over each message in the msgs list
      for m in msgs:

        # call the summarize_text function and append the result to the summary variable
        r = summarize_text(m, lang)
        summary += r

      # split the summary variable into smaller messages and assign the resulting list to msgs
      msgs = split_article(summary, lang)

    # write the transcribed text to a file with the name of the YouTube video
    with open(k + ".txt", "w") as out_file:
      print("Output file: " + k + ".txt")
      out_file.write(msgs[0])
      out_file.close()

Downloading macgpt (EYhlGV9AZHI)




Output file: macgpt.txt


## Reference

- [林鼎淵 - 在 Local 導入 Whisper 套件，用 Python 免費將 Youtube 影片轉換成逐字稿！
](https://medium.com/dean-lin/%E5%9C%A8-local-%E5%B0%8E%E5%85%A5-whisper-%E5%A5%97%E4%BB%B6-%E5%85%8D%E8%B2%BB%E5%B0%87-youtube-%E5%BD%B1%E7%89%87%E8%BD%89%E6%8F%9B%E6%88%90%E9%80%90%E5%AD%97%E7%A8%BF-3227c5c68074)