<a href="https://colab.research.google.com/github/LianaN/TextLab/blob/main/YouTube_Video_Summarization_Using_LangChain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [16]:
#! pip install openai
#! pip install langchain
#! pip install yt_dlp
#! pip install pydub
#! pip install python-dotenv

# 1. Import Libraries

In [62]:
import os
from dotenv import load_dotenv, find_dotenv
import sys
sys.path.append("../..")

import openai

from langchain import OpenAI, PromptTemplate
from langchain.prompts import PromptTemplate

from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import OpenAIWhisperParser
from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader

from langchain.chains.summarize import load_summarize_chain
from langchain.chains import LLMChain

from langchain.agents import initialize_agent, AgentType

from langchain.agents.agent_toolkits import AzureCognitiveServicesToolkit
from langchain.tools.azure_cognitive_services.text2speech import AzureCogsText2SpeechTool

# 2. Set Configuration

In [2]:
#_ = load_dotenv(find_dotenv())

In [35]:
os.environ["OPENAI_API_KEY"] = "..."
os.environ["AZURE_COGS_KEY"] = "..."
os.environ["AZURE_COGS_ENDPOINT"] = "..."
os.environ["AZURE_COGS_REGION"] = "..."

In [4]:
openai.api_key = os.environ["OPENAI_API_KEY"]

# 3. Load YouTube Video

In [5]:
youtube_url = "https://youtu.be/HEfHFsfGXjs"
save_dir = "docs/youtube"
loader = GenericLoader(
    YoutubeAudioLoader([youtube_url], save_dir),
    OpenAIWhisperParser()
)
docs = loader.load()

[youtube] Extracting URL: https://youtu.be/HEfHFsfGXjs
[youtube] HEfHFsfGXjs: Downloading webpage
[youtube] HEfHFsfGXjs: Downloading ios player API JSON
[youtube] HEfHFsfGXjs: Downloading android player API JSON
[youtube] HEfHFsfGXjs: Downloading m3u8 information
[info] HEfHFsfGXjs: Downloading 1 format(s): 140
[download] docs/youtube/The most unexpected answer to a counting puzzle.m4a has already been downloaded
[download] 100% of    4.82MiB
[ExtractAudio] Not converting audio docs/youtube/The most unexpected answer to a counting puzzle.m4a; file is already in target format m4a
Transcribing part 1!


In [14]:
docs[0].page_content

"Sometimes, math and physics conspire in ways that just feel too good to be true. Let's play a strange sort of mathematical croquet. We're going to have two sliding blocks and a wall. The first block starts by coming in at some velocity from the right, while the second one starts out stationary. Being overly idealistic physicists, let's assume that there's no friction, and all of the collisions are perfectly elastic, which means no energy is lost. The astute among you might complain that such collisions would make no sound, but your goal here is going to be to count how many collisions take place. So in slight conflict with that assumption, I want to leave a little clack sound to better draw your attention to that count. The simplest case is when both blocks have the same mass. The first block hits the second, transferring all of its momentum, then the second one bounces off the wall, and then transfers all of its momentum back to the first, which then sails off towards infinity. Three

In [54]:
llm = OpenAI(temperature=0, max_tokens = 2000)

# 4. Summarize Transcription Text

## 4.1. Summarize using "load_summarize_chain"

In [55]:
chain = load_summarize_chain(llm, chain_type="stuff")
summary = chain.run(docs)
print(summary)

 This video explores the strange mathematical phenomenon of pi appearing in the number of collisions between two blocks of different masses when they are sent sliding on a frictionless surface. The video explains the relevant physics and provides two methods for understanding why pi appears in this situation. It encourages viewers to take a stab at understanding the phenomenon themselves.


## 4.2. Summarize using "load_summarize_chain" and prompt instruction

In [56]:
prompt_template = """Write a concise summary of the following: {text}"""
prompt = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(llm, chain_type="stuff", prompt=prompt)
summary = chain.run(docs)
print(summary)



This article discusses a mathematical phenomenon in which the number of collisions between two blocks of different masses is equal to the digits of pi. The article explains that if the first block is 100 times the mass of the second, there will be 31 collisions, and if the first block is 1,000,000 times the mass of the second, there will be 3,141 collisions. The article explains that this phenomenon is due to the conservation of energy, and encourages readers to try to figure out why this is the case.


# 5. Translate a Summary to Russian

In [57]:
prompt = PromptTemplate.from_template("Translate this summary from English to Russian {summary}?")
chain = LLMChain(llm=llm, prompt=prompt)
summary_ru = chain.run(summary)
print(summary_ru)



В этой статье рассматривается математическое явление, при котором количество столкновений между двумя блоками разной массы равно цифрам пи. Статья объясняет, что если первый блок в 100 раз больше второго, будет 31 столкновение, а если первый блок в 1 000 000 раз больше второго, будет 3 141 столкновение. Статья объясняет, что это явление обусловлено законом сохранения энергии и призывает читателей попробовать выяснить, почему это так.


# 6. Convert a Summary Text into a Speech (Text-To-Speech)

In [41]:
#!pip install --upgrade azure-ai-formrecognizer > /dev/null
#!pip install --upgrade azure-cognitiveservices-speech > /dev/null
#!pip install --upgrade azure-ai-vision > /dev/null

In [58]:
toolkit = AzureCognitiveServicesToolkit()
[tool.name for tool in toolkit.get_tools()]

['azure_cognitive_services_form_recognizer',
 'azure_cognitive_services_speech2text',
 'azure_cognitive_services_text2speech',
 'azure_cognitive_services_image_analysis']

In [70]:
speechtotext = AzureCogsText2SpeechTool(speech_language = "ru-RU", speech_config={"speech_synthesis_voice_name":"ru-RU-SvetlanaNeural"})

In [71]:
agent = initialize_agent(
    tools=[speechtotext], #toolkit.get_tools(),
    llm=llm,
    agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

In [72]:
agent.run(f"""Convert the summary text to speech using text-to-speech: {summary_ru}?""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: I need to use the azure_cognitive_services_text2speech tool to convert the text to speech.
Action:
```
{
  "action": "azure_cognitive_services_text2speech",
  "action_input": "В этой статье рассматривается математическое явление, при котором количество столкновений между двумя блоками разной массы равно цифрам пи. Статья объясняет, что если первый блок в 100 раз больше второго, будет 31 столкновение, а если первый блок в 1 000 000 раз больше второго, будет 3 141 столкновение. Статья объясняет, что это явление обусловлено законом сохранения энергии и призывает читателей попробовать выяснить, почему это так."
}
```
[0m
Observation: [36;1m[1;3m/tmp/tmpkbgxq1zp.wav[0m
Thought:[32;1m[1;3m I now have the audio file of the text
Final Answer: /tmp/tmpkbgxq1zp.wav[0m

[1m> Finished chain.[0m


'/tmp/tmpkbgxq1zp.wav'