## 0 Install required packages 

In [None]:
# Run this cell only once, if the following packages are not installed. 
!pip install openai
!pip install youtube-transcript-api

In [1]:
# import required Python libraries
import os
import openai
import pandas as pd

In [2]:
from youtube_transcript_api import YouTubeTranscriptApi

## 1 Set API & Video Key

In [3]:
# Set your API key
openai.api_key = "xxx" # replace xxx with your own Open AI gpt-3.5-turbo API key

In [4]:
#Key in  the ID of Youtube video that you wish to get a summary of.
#For example, in the link https://www.youtube.com/watch?v=CWIOUo5MTZE, ID is CWIOUo5MTZE

video_id = "CWIOUo5MTZE" 
transcript = YouTubeTranscriptApi.get_transcript(video_id)

## 2 Get transcript of video
Use package youtube_transcript_api to get transcript of the video. This notebook use text format function for formatting the transcript of video. Alternatively, you may also try the json format. As pre-processing, I have removed the new line character from the text.

Note that some Youtube videos may not have transcript enabled; this notebook will not work for those videos

In [5]:
from youtube_transcript_api.formatters import JSONFormatter
from youtube_transcript_api.formatters import TextFormatter

formatter = TextFormatter()

# .format_transcript(transcript) turns the transcript into a JSON string.
text_formatted = formatter.format_transcript(transcript)

In [6]:
text_formatted = text_formatted.replace("\n"," ")
len(text_formatted.split())
#text_formatted

2270

## 3 Split processed transcript into chunks
The Language Model (LLM) we're using, gpt-3.5-turbo, has a fixed context window size.
This means that there's a maximum limit to the number of tokens that can be processed in a single request. If the input text exceeds this maximum limit, the LLM won't be able to handle it and will reject the request.
As a result, we need to split any long transcripts into smaller chunks to fit within this context window.

Important Note: The context window size must accommodate both the request and the response.
In other words, the combined token count of the input text (request) and output text (response)
must not exceed the fixed context window size.

In [7]:
def split_into_chunks(string, words_per_chunk=3000):
    words = string.split()
    num_chunks = len(words) // words_per_chunk
    if len(words) % words_per_chunk != 0:
        num_chunks += 1

    chunks = []
    for i in range(num_chunks):
        start_index = i * words_per_chunk
        end_index = start_index + words_per_chunk
        chunk = " ".join(words[start_index:end_index])
        chunks.append(chunk)

    return chunks

input_string = text_formatted
#input_string = "This is a sample string containing more than 2500 words. We will split this string into chunks of 2500 words."
chunked_strings = split_into_chunks(input_string, words_per_chunk=2500)




In [8]:
len(chunked_strings)

1

## 4 Prompt GPT for summary

This section of the code deals with sending transcript chunks to the GPT-3.5-turbo API for summarization.
Each chunk of the transcript is processed individually. When we send a chunk to the API, it returns a summary of that specific part of the transcript. The API is prompted using a basic structure that includes a 'system' and a 'user' message, which is sufficient for our current needs.

Customization Note: The prompt can be tailored to meet specific requirements. For example, you may want to modify it to request the API to return the output in JSON format, or to limit the summary to a certain number of words like 200.


In [9]:
MODEL = "gpt-3.5-turbo"
print(f"The transcript will be split into {len(chunked_strings)} parts. Summary of each part will be provided.")
summary_parts = ""
for i, chunk in enumerate(chunked_strings):
    #print(f"Chunk {i+1}:")
    #print(chunk)
    #print()
    response = openai.ChatCompletion.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": f"Summarize the provided transcript of a youtube video in bullet points. \
        The text of the transcript starts here:\n\n{chunk}"}
    ],
    temperature=0
    )

    print(f"\n\nSummary for part {i+1}:")
    print(response['choices'][0]['message']['content'])
    summary_parts += response['choices'][0]['message']['content']

The transcript will be split into 1 parts. Summary of each part will be provided.


Summary for part 1:
- The African Union has been given membership of the G20 under the presidency of Prime Minister Modi.
- Sunil Mittal, founder of bti Enterprises and chair of the B20 Africa Wing, has played a significant role in cultivating the India-Africa relationship.
- The inclusion of the African Union in the G20 is seen as a major achievement and will be remembered fondly in Africa.
- India's historic relationship with Africa has been revitalized, and Indian businesses will be more welcome than ever before in the African continent.
- China's approach to Africa has been focused on extractive businesses and has caused indebtedness in African countries.
- India, on the other hand, has provided technology support, diplomatic and political support, and soft loans and grants to Africa.
- Doing business in Africa has its challenges, such as infrastructure and currency depreciation, but the continent o