In [14]:
!pip show youtube-transcript-api

Name: youtube-transcript-api
Version: 1.2.4
Summary: This is a python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!
Home-page: https://github.com/jdepoix/youtube-transcript-api
Author: Jonas Depoix
Author-email: jonas.depoix@web.de
License: MIT
Location: C:\Users\Admin\Desktop\Codes\Gen-AI-RAG-LLMs-HuggingFace-Agents-\genvenv\Lib\site-packages
Requires: defusedxml, requests
Required-by: 


In [33]:
from langchain_chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.vectorstores import FAISS
from langchain_ollama import OllamaEmbeddings,ChatOllama
from langchain_text_splitters import RecursiveCharacterTextSplitter
from youtube_transcript_api import YouTubeTranscriptApi,TranscriptsDisabled
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda

In [16]:
llm = ChatOllama(
    model='qwen2:7b'
)

embeddings = OllamaEmbeddings(
    model="nomic-embed-text"
)

In [17]:
video_id = 'wjZofJX0v4M'

In [18]:
from youtube_transcript_api import YouTubeTranscriptApi

yt_transcript_api = YouTubeTranscriptApi()

transcript_list = yt_transcript_api.fetch(video_id, languages=["en"])

transcript = " ".join(chunk.text for chunk in transcript_list)

print(transcript)


The initials GPT stand for Generative Pretrained Transformer. So that first word is straightforward enough, these are bots that generate new text. Pretrained refers to how the model went through a process of learning from a massive amount of data, and the prefix insinuates that there's more room to fine-tune it on specific tasks with additional training. But the last word, that's the real key piece. A transformer is a specific kind of neural network, a machine learning model, and it's the core invention underlying the current boom in AI. What I want to do with this video and the following chapters is go through a visually-driven explanation for what actually happens inside a transformer. We're going to follow the data that flows through it and go step by step. There are many different kinds of models that you can build using transformers. Some models take in audio and produce a transcript. This sentence comes from a model going the other way around, producing synthetic speech just from

In [19]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
chunks=splitter.split_text(transcript)

In [20]:
from langchain_community.vectorstores import FAISS

vector_store = FAISS.from_texts(chunks, embedding=embeddings)


In [21]:
vector_store.docstore._dict.keys()


dict_keys(['991817ce-ab30-446a-8d10-dcee4d9b8e72', '5bc421b6-4ff6-47d7-882e-86c37c3f112e', 'f2515934-9bd7-48cc-8100-c72b69dd229c', '3cb26c3d-d070-4d73-af15-bb9fa0c0a02b', 'c1c7c892-df74-4f33-a0af-f023ecf0c8cb', '73d0b64d-8a67-4855-b235-c0f426c387c7', '0d6403b7-895d-4f4b-a326-c440645e326b', '3fc5342b-1be1-49f9-8b56-9cbb9488210c', 'bfd70367-792a-4265-8a67-383c3eac30cc', 'c236c9f4-70c6-4373-87cd-3a01d0d7aa77', '5a5be2a3-e4bd-4a47-88a7-feeb7e9d87e8', '9272e5c6-38f5-400d-8bd8-776b3130178d', '80a22a93-d2ed-47c8-a1e5-e08f4fd1d734', '3982d458-fb5c-4c8c-b8b1-1609449bbc05', '373947e7-33b4-4c28-a479-c9d0c2830911', '2e5a5969-ddb2-45aa-bb21-a9b021c52125', '04c99608-a874-4e06-8abd-a997458cf105', '8166e769-ec17-422b-9350-b168d5aba5a0', '96527363-6ed6-48dd-bf96-4a9c38c61adb', '067721d4-1aae-464e-a685-a6b2ba445a3d', 'ea046fef-a210-41f1-a4d9-fb0ccb798f8f', 'c868965a-da58-49e4-be0f-f6d6c60fa845', '27ebcf96-a841-475d-ba06-7be664520ab6', '4172333f-3eb0-414b-a94c-d32039e3ea5d', '150ce575-6367-4c18-8f8c-8264

In [23]:
vector_store.get_by_ids(['5bc421b6-4ff6-47d7-882e-86c37c3f112e'])

[Document(id='5bc421b6-4ff6-47d7-882e-86c37c3f112e', metadata={}, page_content="of models that you can build using transformers. Some models take in audio and produce a transcript. This sentence comes from a model going the other way around, producing synthetic speech just from text. All those tools that took the world by storm in 2022 like DALL-E and Midjourney that take in a text description and produce an image are based on transformers. Even if I can't quite get it to understand what a pi creature is supposed to be, I'm still blown away that this kind of thing is even remotely possible. And the original transformer introduced in 2017 by Google was invented for the specific use case of translating text from one language into another. But the variant that you and I will focus on, which is the type that underlies tools like ChatGPT, will be a model that's trained to take in a piece of text, maybe even with some surrounding images or sound accompanying it, and produce a prediction for 

In [24]:
retriever = vector_store.as_retriever(search_type='similarity',search_kwargs={"k":4})

In [25]:
retriever

VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000001E461CDF890>, search_kwargs={'k': 4})

In [27]:
prompt = PromptTemplate(
    template="""
      You are a helpful assistant.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

In [28]:
question          = "is the topic of nuclear fusion discussed in this video? if yes then what was discussed"
retrieved_docs    = retriever.invoke(question)

In [29]:
retrieved_docs

[Document(id='c236c9f4-70c6-4373-87cd-3a01d0d7aa77', metadata={}, page_content="but at a high level this is the idea. In this chapter, you and I are going to expand on the details of what happens at the very beginning of the network, at the very end of the network, and I also want to spend a lot of time reviewing some important bits of background knowledge, things that would have been second nature to any machine learning engineer by the time transformers came around. If you're comfortable with that background knowledge and a little impatient, you could probably feel free to skip to the next chapter, which is going to focus on the attention blocks, generally considered the heart of the transformer. After that, I want to talk more about these multi-layer perceptron blocks, how training works, and a number of other details that will have been skipped up to that point. For broader context, these videos are additions to a mini-series about deep learning, and it's okay if you haven't watche

In [30]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"but at a high level this is the idea. In this chapter, you and I are going to expand on the details of what happens at the very beginning of the network, at the very end of the network, and I also want to spend a lot of time reviewing some important bits of background knowledge, things that would have been second nature to any machine learning engineer by the time transformers came around. If you're comfortable with that background knowledge and a little impatient, you could probably feel free to skip to the next chapter, which is going to focus on the attention blocks, generally considered the heart of the transformer. After that, I want to talk more about these multi-layer perceptron blocks, how training works, and a number of other details that will have been skipped up to that point. For broader context, these videos are additions to a mini-series about deep learning, and it's okay if you haven't watched the previous ones, I think you can do it out of order, but before diving into

In [31]:
final_prompt = prompt.invoke({"context": context_text, "question": question})

In [32]:
answer = llm.invoke(final_prompt)
print(answer.content)

No, the topic of nuclear fusion is not discussed in this video. The video discusses deep learning, transformers, and GPT models among other related topics.


In [34]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text

In [35]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs),
    'question': RunnablePassthrough()
})

In [37]:
parser = StrOutputParser()

In [38]:
main_chain = parallel_chain | prompt | llm | parser

In [39]:
main_chain.invoke('Can you summarize the video')

"The video discusses the concept and workings of GPT models, focusing specifically on Generative Pretrained Transformers. It explains that these bots generate new text by using a neural network called a transformer, which was pivotal in the AI boom due to its ability to learn from vast amounts of data through a process of pretrained learning. The video aims to visually explain what happens inside a transformer step-by-step.\n\nThe explanation covers different kinds of models built with transformers like those that convert audio into transcripts or vice versa (producing synthetic speech from text). It also touches upon the foundational knowledge required for understanding transformer mechanisms and offers guidance on skipping parts if viewers are familiar with this background information. The video series is part of a larger one about deep learning, which can be followed in any order but requires prior knowledge for some specific details.\n\nThe video highlights that transformers use da