# Youtube

## 套件安裝與環境建置

### 安裝

在開始之前，需要先安裝兩個套件，分別為 LangChain 以及 OpenAI。LangChain 通常需要將一個或多個模型、向量資料庫、API 等進行整合。本章節我們將使用 OpenAI 的 API，因此需要安裝 SDK。

In [None]:
!pip install openai
!pip install langchain
!pip install pytube # For audio downloading
!pip install git+https://github.com/openai/whisper.git -q # Whisper from OpenAI transcription model
!pip install youtube-transcript-api
!pip install pysrt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### 環境參數設置

本章節中，將會使用到 OpenAI API key。

In [None]:
import os
os.environ["OPENAI_API_KEY"] = "sk-xxx" # 會是 sk-XXXX 樣式的字樣

## 讀取 Youtube 影片

*   YoutubeLoader
*   pytube

### Youtube loader 方式

Langchain 當中有提供 loader，讀取 youtube 字幕。

官方連結：[Youtube_download_loader](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/youtube.html#youtube)

Github 程式碼：[Source code](https://python.langchain.com/en/latest/_modules/langchain/document_loaders/youtube.html)

In [None]:
from langchain.llms import OpenAI
from langchain.document_loaders import YoutubeLoader

# 無字幕 => https://www.youtube.com/watch?v=bSvTVREwSNw&ab_channel=ByteByteGo
# 有字幕 => https://www.youtube.com/watch?v=C_78DM8fG6E
loader = YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=bSvTVREwSNw&ab_channel=ByteByteGo")
result = loader.load()

In [None]:
result

[Document(page_content='We started OpenAI seven years ago because we felt like something\nreally interesting was happening in AI and we wanted to help steer it\nin a positive direction. It\'s honestly just really amazing to see how far this whole field\nhas come since then. And it\'s really gratifying to hear\nfrom people like Raymond who are using the technology\nwe are building, and others, for so many wonderful things. We hear from people who are excited, we hear from people who are concerned, we hear from people who feel\nboth those emotions at once. And honestly, that\'s how we feel. Above all, it feels like we\'re entering\nan historic period right now where we as a world\nare going to define a technology that will be so important\nfor our society going forward. And I believe that we can\nmanage this for good. So today, I want to show you\nthe current state of that technology and some of the underlying\ndesign principles that we hold dear. So the first thing I\'m going to show yo

### pytube 方式


In [None]:
from pytube import YouTube

In [None]:
url = "https://www.youtube.com/watch?v=bSvTVREwSNw&ab_channel=ByteByteGo"
video = YouTube(url, use_oauth=True, allow_oauth_cache=False)
audio = video.streams.filter(only_audio=True).first()
video_title = video.streams[0].title
video.streams.get_highest_resolution().filesize

Please open https://www.google.com/device and input code JYXV-CDNB
Press enter when you have completed this step.enter


16386171

In [None]:
audio = video.streams.get_audio_only()
fn = audio.download(output_path="tmp.mp3") # Downlods only audio from youtube video

## 字幕呢？你需要的是 Whisper

OpenAI 的 Whisper 模型是一個通用的語音辨識模型，它是通過大量不同種類的音訊資料進行訓練的。Whisper 在語音辨識、語音翻譯和語言識別等方面表現非常出色。Whisper 是語音處理的強大工具。它可以用於提高語音辨識系統的準確性，將一種語言的語音翻譯為另一種語言，以及識別口語語句的語言。

官方連結：https://github.com/openai/whisper

### 讀取 Whisper 模型

In [None]:
import whisper 

In [None]:
model = whisper.load_model("base")

In [None]:
mp3_file_path = "/content/tmp.mp3/{}.mp4".format('sample')
transcription = model.transcribe(mp3_file_path) #約一分鐘

In [None]:
res = transcription['text']
print(res)

 In this video, we take a look at how Chet GPD works. We learn a lot from making this video. We hope you will learn something too. Let's dive right in. Chet GPD was released on November 30, 2022. It reached 100 million monthly active users in just 2 months. We took Instagram 2.5 years to reach the same milestone. This is the fastest growing app in history. Now how does Chet GPD work? The heart of Chet GPD is an LLM or a large language model. The default LLM for Chet GPD is GPT 3.5. Chet GPD could also use the latest GPT4 model. But there is not much technical details on GPT4 yet for us to talk about. Now what is a large language model? A large language model is a type of neural network-based model that is trained on massive amounts of text data to understand and generate human languages. The model uses the training data to learn the statistical patterns and relationships between words in the language and then utilizes this knowledge to predict the subsequent words, one word at a time. 

### 分割字幕成為字幕檔 srt

In [None]:
from datetime import timedelta
import os

def transcribe_audio(segments):
    VIDEO_FILENAME = "example_video"

    for segment in segments['segments']:
        startTime = str(0)+str(timedelta(seconds=int(segment['start'])))+',000'
        endTime = str(0)+str(timedelta(seconds=int(segment['end'])))+',000'
        text = segment['text']
        segmentId = segment['id']+1
        segment = f"{segmentId}\n{startTime} --> {endTime}\n{text[1:] if text[0] == ' ' else text}\n\n"

        srtFilename = os.path.join(f"{VIDEO_FILENAME}.srt")
        with open(srtFilename, 'a', encoding='utf-8') as srtFile:
            srtFile.write(segment)

    return srtFilename

In [None]:
transcribe_audio(transcription)

'example_video.srt'

### 使用 SRTLoader 讀取

In [None]:
from langchain.document_loaders import SRTLoader
loader = SRTLoader("example_video.srt")
data = loader.load()

In [None]:
data

[Document(page_content="In this video, we take a look at how Chet GPD works. We learn a lot from making this video. We hope you will learn something too. Let's dive right in. Chet GPD was released on November 30, 2022. It reached 100 million monthly active users in just 2 months. We took Instagram 2.5 years to reach the same milestone. This is the fastest growing app in history. Now how does Chet GPD work? The heart of Chet GPD is an LLM or a large language model. The default LLM for Chet GPD is GPT 3.5. Chet GPD could also use the latest GPT4 model. But there is not much technical details on GPT4 yet for us to talk about. Now what is a large language model? A large language model is a type of neural network-based model that is trained on massive amounts of text data to understand and generate human languages. The model uses the training data to learn the statistical patterns and relationships between words in the language and then utilizes this knowledge to predict the subsequent word

### 切割讀取進來的文檔

In [None]:
from langchain.text_splitter import TokenTextSplitter

text_splitter = TokenTextSplitter.from_tiktoken_encoder(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(data)

In [None]:
docs[0]

Document(page_content="In this video, we take a look at how Chet GPD works. We learn a lot from making this video. We hope you will learn something too. Let's dive right in. Chet GPD was released on November 30, 2022. It reached 100 million monthly active users in just 2 months. We took Instagram 2.5 years to reach the same milestone. This is the fastest growing app in history. Now how does Chet GPD work? The heart of Chet GPD is an LLM or a large language model. The default LLM for Chet GPD is GPT 3.5. Chet GPD could also use the latest GPT4 model. But there is not much technical details on GPT4 yet for us to talk about. Now what is a large language model? A large language model is a type of neural network-based model that is trained on massive amounts of text data to understand and generate human languages. The model uses the training data to learn the statistical patterns and relationships between words in the language and then utilizes this knowledge to predict the subsequent words

## 該呼叫 Chain 上場了：load_summarize_chain

一條專注用來進行總結的 Chain，當中提供多種 Chain Type 進行設定


* Stuff: 最簡單的方法，您只需將所有相關資料一次倒灌到 Prompt 中作為上下文傳遞給語言模型
* map_reduce: 此方法涉及對每個 chunk 執行時的初始 prompt 語，在摘要任務中，一塊 Chunk 取得一塊摘要
* refine: 生成一些輸出。對於剩餘 Document，該輸出與下一個文檔一起傳入，要求 LLM 根據新文檔改進輸出。



官方連結：[load_summarize_chain](https://python.langchain.com/en/latest/modules/chains/index_examples/summarize.html)

程式碼連結：[summarizing chains](https://github.com/hwchase17/langchain/blob/master/langchain/chains/summarize/__init__.py)


In [None]:
from langchain.chains.summarize import load_summarize_chain
from langchain import OpenAI

In [None]:
llm=OpenAI(temperature=0)
chain = load_summarize_chain(llm, chain_type="refine", verbose=False)
chain.run(docs)

"\n\nChet GPD is a large language model (LLM) released in November 2022 that reached 100 million monthly active users in just two months. It is the fastest growing app in history and uses GPT 3.5 as its default LLM. The model is trained on 500 billion tokens of internet data and is fine tuned with Aula XF by gathering feedback from people, creating a reward model based on their preferences, and then iteratively improving the model's performance using PPO. This allows GPT 3.5 to generate better responses tailored to specific user requests. Additionally, it is context aware, thanks to conversational prompt injection and primary prompt engineering, and includes a moderation API to warn or block certain types of unsafe content."

## 想要其他總結方式？你需要的是 Prompt

「Prompt template」是一種可重複使用的產生 Prompt 的方式。它包含了一個文本字串（即「template」），可以接受來自最終使用者的一組參數，並生成一個 Prompt。

官方連結：[Prompt template](https://python.langchain.com/en/latest/modules/prompts/prompt_templates/getting_started.html#what-is-a-prompt-template)

In [None]:
from langchain.prompts import PromptTemplate

# define map prompt
map_prompt = """Write a concise summary of the following short transcript from a youtube.
Don't add your opinions or interpretations.


{text}


CONCISE SUMMARY:"""

# define combine prompt
combine_prompt = """You have been provided with summaries of chunks of transcripts from a youtube.
Your task is to merge these intermediate summaries to create a brief and comprehensive summary of the entire youtube.
The summary should encompass all the crucial points of the youtube.
Ensure that the summary is at least 2 paragraph long and effectively captures the essence of the youtube.
{text}


SUMMARY:"""

map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["text"])

# initialize the summarizer chain
chain = load_summarize_chain(
    llm,
    chain_type="map_reduce",
    return_intermediate_steps=True,
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
)

summary = chain({"input_documents": docs}, return_only_outputs=True)

In [None]:
summary['output_text']

"\nChet GPD is a revolutionary app that was released in November 2022 and quickly gained 100 million monthly active users, making it the fastest growing app in history. It uses a large language model (LLM) called GPT 3.5, which is a type of neural network-based model trained on massive amounts of text data. To further refine the model, Chet GPD uses feedback from people, creating a reward model based on their preferences, and then iteratively improving the model's performance using proximal policy optimization. Chat GPT is a complex technology that uses context awareness, primary prompt engineering, and a moderation API to generate responses to user prompts, and is constantly evolving and reshaping the way we communicate."