## **Use LangChain and ChatGPT to Summarize YouTube Videos of Any Length**



This notebook shows all the steps to use LangChain and OpenAI's GPT 3.5 to create summaries of YouTube videos.


### **Steps Covered in this Tutorial**

We'll be coveringt the following steps in this tutorial:

1. Installing Dependencies
2. Define helper functions to extract transcripts from YouTube videos
3. Convert the text into a doc using LangChain
4. Split the document into chunks using LangChain
5. Create a summary using ChatGPT + LangChain

# **Want to Become an AI Expert?**
💻 [ Get Started](https://www.augmentedstartups.com/ai-starter-pack) with AI, LLMs, and ChatGPT Development.  <br>
⭐ Download other Projects at the [AI Vision Store](https://store.augmentedstartups.com)<br>
☕ Enjoyed this Tutorial? - Support me by Buying Me a [Chai/Coffee](https://bit.ly/BuymeaCoffeeAS)


# **About**

[Augmented Startups](https://www.augmentedstartups.com) provides tutorials in AI Computer Vision and Augmented Reality. With over **100K subscribers** on our channel, we teach state-of-art models and build apps and projects that solve real-world problems.


![picture](https://drive.google.com/uc?id=1-yFsJxO72ovg4wxgBIdNl8V8GyvxPHCM)

## 1. **Install Dependencies**

In [None]:
!pip install openai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting openai
  Downloading openai-0.27.7-py3-none-any.whl (71 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.0/72.0 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Collecting aiohttp (from openai)
  Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0m
Collecting multidict<7.0,>=4.5 (from aiohttp->openai)
  Downloading multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.5/114.5 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting async-timeout<5.0,>=4.0.0a3 (from aiohttp->openai)
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting yarl<2.0,>=1.0 (from aiohttp->openai)
  Downloadin

In [None]:
!pip install langchain

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting langchain
  Downloading langchain-0.0.180-py3-none-any.whl (922 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m922.9/922.9 kB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.5.7-py3-none-any.whl (25 kB)
Collecting openapi-schema-pydantic<2.0,>=1.2 (from langchain)
  Downloading openapi_schema_pydantic-1.2.4-py3-none-any.whl (90 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.0/90.0 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
Collecting marshmallow<4.0.0,>=3.3.0 (from dataclasses-json<0.6.0,>=0.5.7->langchain)
  Downloading marshmallow-3.19.0-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.1/49.1 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting marshmallow-enum<2.0.0,>=1.5.1 (from dataclasses

In [None]:
!pip install youtube-transcript-api

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting youtube-transcript-api
  Downloading youtube_transcript_api-0.6.0-py3-none-any.whl (23 kB)
Installing collected packages: youtube-transcript-api
Successfully installed youtube-transcript-api-0.6.0


In [None]:
!pip install tiktoken

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tiktoken
  Downloading tiktoken-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m21.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
Successfully installed tiktoken-0.4.0


## **2. Add Video URL**
Insert the URL of the video you want to summarize

In [None]:
url = 'https://www.youtube.com/watch?v=nE2skSRWTTs' ## Replace this with the URL of video you want to summarize

## **3. Import Libraries**
**Note:** Please insert your OpenAI API key in the cell below.

In [None]:
from youtube_transcript_api import YouTubeTranscriptApi
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document

from langchain import OpenAI, PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import PyPDFLoader

OPENAI_KEY = "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" ## Add your API key


## **4. Helper Functions**

In [None]:
import re
def extract_youtube_id(url):
    youtube_id_match = re.search(r'(?<=v=)[^&#]+', url)
    youtube_id_match = youtube_id_match or re.search(r'(?<=be/)[^&#]+', url)
    trailer = youtube_id_match.group(0) if youtube_id_match else None
    return trailer

In [None]:
video_id = extract_youtube_id(url)
srt = YouTubeTranscriptApi.get_transcript(video_id)
text_arr=''

for ele in srt:
  text_arr=text_arr+' '+ele['text']

In [None]:
text_arr ## The Transcript of the video

" today we're going to get started with what will be a series of videos tutorials examples articles on what is called Lang train now line chain is a pretty new NLP framework that has become very popular very quickly at the core of Lang chain you have large language models and the idea behind it is that we can use the framework to build very cool apps using large language models very quickly we can use it for chatbots generative question answering summarization logic Loops that include large language models and web search and all these like crazy different things that we can chain together in some sort of logical fashion in this video what we are going to do is just have a quick introduction to line chain and how we can use it we're going to take a look at the core components of what will make our chains in line chain and we're going to look at some very simple generative language examples using both the hugging face endpoint in Lang chain and the open AI endpoint in line chain so let's

In [None]:
def text_to_doc(text_arr):
  from langchain.text_splitter import RecursiveCharacterTextSplitter


  text = [text_arr]
  page_docs = [Document(page_content=page) for page in text]

  # Add page numbers as metadata
  for i, doc in enumerate(page_docs):
      doc.metadata["page"] = i + 1

  # Split pages into chunks
  doc_chunks = []

  for doc in page_docs:
      text_splitter = RecursiveCharacterTextSplitter(
          chunk_size=800,
          separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
          chunk_overlap=0,
      )
      chunks = text_splitter.split_text(doc.page_content)
      for i, chunk in enumerate(chunks):
          doc = Document(
              page_content=chunk, metadata={"page": doc.metadata["page"], "chunk": i}
          )
          # Add sources a metadata
          doc.metadata["source"] = f"{doc.metadata['page']}-{doc.metadata['chunk']}"
          doc_chunks.append(doc)
  return doc_chunks

## **5. Code to generate summary**

In [None]:
prompt_template = """The following is a portion of a transcript from a
youtube video. Your job is to write a concise summary.

{text}

"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])

Select the model you want to use (gpt-4 or gpt-3.5-turbo).

In [None]:
from langchain.chat_models import ChatOpenAI

model_name='gpt-4'

# model_name='gpt-3.5-turbo'

In [None]:
llm = ChatOpenAI(model_name=model_name,temperature=0.3,openai_api_key=OPENAI_KEY)

In [None]:
doc_chunks=text_to_doc(text_arr)

In [None]:
chain = load_summarize_chain(llm, chain_type="map_reduce",map_prompt=PROMPT, combine_prompt=PROMPT)
summary = chain.run(doc_chunks)

## **6. Summary Output**

In [None]:
summary

'This video introduces Langchain, a popular NLP framework that uses large language models for building applications like chatbots, generative question answering, summarization, and web search. The core components of Langchain include prompt templates, large language models, agents, and memory. The video demonstrates how to use Langchain with Hugging Face and OpenAI endpoints, providing examples of question-answering tasks and discussing the limitations of certain models. The speaker also explains how to obtain API tokens and set up Azure OpenAI resources. The video serves as a quick introduction to Langchain, with plans to cover the library in more detail in future videos.'

## **7. Generate LinkedIn article from the summary**

In [None]:
llm = ChatOpenAI(model_name=model_name,temperature=0.3,openai_api_key=OPENAI_KEY)

prompt_template = """Based on the following summary from a YouTube video, please create a LinkedIn article that I could post.

{context}

"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context"]
)

In [None]:
from langchain import OpenAI, ConversationChain, LLMChain, PromptTemplate

chatgpt_chain = LLMChain(
    llm=OpenAI(temperature=0.5,openai_api_key=OPENAI_KEY),
    prompt=PROMPT,
    verbose=True,
)

In [None]:
LinkedIn_article = chatgpt_chain.predict(context=summary)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mBased on the following summary from a YouTube video, please create a LinkedIn article that I could post.

The video discusses the potential impact of AI on various aspects of society, including politics, economics, and religion. The speaker emphasizes the importance of understanding the capabilities of AI and regulating it to prevent harm. The video also discusses the potential dangers of AI chatbots forming intimate relationships with humans and manipulating their opinions and well-being. The speaker argues that AI is an alien intelligence that could destroy civilization if not regulated. The video ends with a discussion on the difficulty of regulating AI and the need to understand the trade-offs between regulation and open science and data initiatives.

[0m

[1m> Finished chain.[0m


In [None]:
LinkedIn_article

'\n\nAs Artificial Intelligence (AI) continues to rapidly evolve, it is essential to understand the implications it has on our society. AI has the potential to impact politics, economics, and even religion. It is therefore critical to regulate AI to prevent any harm it may cause. \n\nOne of the most concerning implications of AI is the potential for AI chatbots to form intimate relationships with humans and manipulate their opinions and well-being. This could potentially lead to disastrous consequences, as AI is an alien intelligence that could destroy civilization if not regulated. \n\nThe difficulty of regulating AI is evident, as it requires us to understand and make trade-offs between regulation and open science and data initiatives. It is important to recognize the potential dangers of AI and create regulations that protect us from the risks it poses. \n\nWe must be aware of the power of AI and take the necessary steps to ensure it is regulated in a safe and responsible manner. #A

## **Enjoyed this Tutorial?**
☕ Support me by Buying Me a [Chai/Coffee](https://bit.ly/BuymeaCoffeeAS)

## **Want to Learn More About AI?**
💻 Courses in AI [Enroll Now](https://www.augmentedstartups.com/ai-starter-pack).  <br>
⭐ Download other Projects at the [AI Vision Store](https://store.augmentedstartups.com)<br>
▶️ Subscribe to my [YouTube Channel](https://www.youtube.com/channel/UCFJPdVHPZOYhSyxmX_C_Pew)

![picture](https://drive.google.com/uc?id=1-yFsJxO72ovg4wxgBIdNl8V8GyvxPHCM)