## Transcribing the YouTube Video
The context we want to send the model comes from a YouTube video.

Let's download the video and transcribe it using [OpenAI's Whisper](https://openai.com/index/whisper/).

In [62]:
from pathlib import Path

YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=cdiD-9MMpb0"
TEMP_DIR = Path("..") / Path("temp")
TRANSCRIPT_FILE = (Path("..") / Path("transcription.txt"))
TEMP_DIR.mkdir(parents=True, exist_ok=True)

if not TRANSCRIPT_FILE.exists():
    import tempfile
    import whisper
    from yt_dlp import YoutubeDL
    # Create a temporary directory
    with tempfile.TemporaryDirectory(dir=TEMP_DIR) as temp_dir:
        # Define the URL of the YouTube video

        # Set up yt-dlp options
        ydl_opts = {
            "verbose": True,
            "format": "m4a/bestaudio/best",
            "outtmpl": str(Path(temp_dir) / Path("%(title)s.%(ext)s" )),
        }

        # Download the audio
        with YoutubeDL(ydl_opts) as ydl:
            error_code = ydl.download([YOUTUBE_VIDEO])
            if error_code != 0:
                raise Exception("Error downloading the video")

        # Load whisper model
        whisper_model = whisper.load_model("base")

        # Foreach audio file in the temporary directory
        for file in Path(temp_dir).iterdir():
            if file.suffix == ".m4a":
                # Transcribe the audio
                transcription = whisper_model.transcribe(str(file), fp16=False, verbose=True)["text"].strip()
            
            # Write the transcription to a file
            with TRANSCRIPT_FILE.open("w") as file:
                    file.write(transcription)

Let's read the transcription and display the first few characters to ensure everything works as expected.

In [None]:
with TRANSCRIPT_FILE.open("r") as file:
    transcription = file.read()

print(transcription[:100])
print(f"Length: {len(transcription)}")

## Using the entire transcription as a context

If you try to invoke the chain using the transcription as context, the model probably will return an **error** indicating the context is too long.

> *LLMs support limited context size*.

E.g. most books are far beyond context size so we need to find the solution to somehow truncate context off to pick essential parts.

In [None]:
from httpx import ConnectError
from langchain_ollama.llms import OllamaLLM
from ollama import ResponseError

model = OllamaLLM(model="llama3.2")

try:
    print(model.invoke("What is 2 + 2?"))
except ConnectError as e:
    print(f"Error connecting to the model: {e}")
    print("Please make sure the model is running and try again.")
    exit(1)
except ResponseError as e:
    print(f"An error occurred: {e}")
    exit(1)
except Exception as e:
    print(f"An unexpected error occurred: {e}")
    exit(1)

> **NOTE:** Looks like this model handles that big context...

In [None]:
from langchain_core.prompts import PromptTemplate

template = """
Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = PromptTemplate(input_variables=["context", "question"], template=template)

chain = prompt | model

try:
    print(chain.invoke({"context": transcription, "question": "What is the video about?"}))
except Exception as e:
    print(f"An error occurred: {e}")
    exit(1)

### How token works

[Tiktokenizer](https://tiktokenizer.vercel.app/)

https://tiktokenizer.vercel.app/?model=cl100k_base - Basically we can assume that one word is one token.

## Spiting the transcription

Since we **shouldn't use** the entire transcription as the context for the model, a potential solution is to split the transcription into smaller chunks.

We can then invoke the model using only the relevant chunks to answer a particular question:

### Load text document

[TextLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.text.TextLoader.html#textloader) - using langchain_community there is a simple TextLoader which can help us to load file as Document.

In [66]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader(TRANSCRIPT_FILE)
document = loader.load()

### Splitting document
There are many different ways to split document. In this example we will use simple splitter that splits document into **chunks** of fixed size.



In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
text_splitter.split_documents(document)[:2]

In [68]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
documents = text_splitter.split_documents(document)

### How embed works

[Link to playground](https://dashboard.cohere.com/playground/embed)

An embedding is a **mathematical representation** of the semantic meaning of a word, sentence, or document.

It's a projection of a concept in a high-dimensional space.

Embeddings have a simple characteristic:
* The projection of related concepts will be **close to each other**
* While concepts with different meanings will **lie far away**

![embed](../images/embed_subreddit_titles.png)


To provide with the most relevant chunks, we can use the embeddings of the question and the chunks of the transcription to compute the similarity between them. We can then select the chunks with the highest similarity to the question and use them as the context for the model:

![embed](../images/system3.png)

### Let's generate embeddings for an arbitrary query:

In [None]:
from langchain_ollama.embeddings import OllamaEmbeddings

embeddings = OllamaEmbeddings(model="nomic-embed-text")

try:
    embedded_query = embeddings.embed_query("My mother has two sisters and one brother.")
except ConnectError as e:
    print(f"Error connecting to the model: {e}")
    print("Please make sure the model is running and try again.")
    exit(1)
except ResponseError as e:
    print(f"An error occurred: {e}")
    exit(1)

print(f"Embedded length {len(embedded_query)}")
print(embedded_query[:10])

> To illustrate how embeddings work, let's first generate the embeddings for two different sentences:

In [70]:
sentence1 = embeddings.embed_query("How much sister does Biden have?")
sentence2 = embeddings.embed_query("Joanna has one twin brother and two older sisters.")

We can now compute the similarity between the query and each of the two sentences. The closer the embeddings are, the more similar the sentences will be.

We can use [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) to calculate the similarity between the query and each of the sentences:

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

query_sentence1_similarity = cosine_similarity([embedded_query], [sentence1])[0][0]
query_sentence2_similarity = cosine_similarity([embedded_query], [sentence2])[0][0]
query_sentence1_similarity , query_sentence2_similarity

### Setting up a Vector Store
We need an efficient way to store document chunks, their embeddings, and perform similarity searches at scale. To do this, we'll use a vector store.

A vector store is a database of embeddings that specializes in fast similarity searches.

![System 4](../images/system4.png)

To understand how a vector store works, let's create one in memory and add a few embeddings to it:

In [72]:
from langchain_community.vectorstores import DocArrayInMemorySearch

vector_store = DocArrayInMemorySearch.from_texts(
    [
        "Mary's sister is Susana",
        "John and Tommy are brothers",
        "Patricia likes white cars",
        "Pedro's mother is a teacher",
        "Lucia drives an Audi",
        "Mary has two siblings",
    ],
    embedding=embeddings,
)

We can now query the vector store to find the most similar embeddings to a given query:

In [None]:
vector_store.similarity_search_with_score(query="Who is Mary's sister?", k=3)

## Connecting the vector store to the chain
We can use the vector store to find the most relevant chunks from the transcription to send to the model.

Here is how we can connect the vector store to the chain:

![System](../images/chain4.png)

We need to configure a [Retriever](https://python.langchain.com/docs/how_to/#retrievers). The retriever will run a similarity search in the vector store and return the most similar documents back to the next step in the chain.

We can get a retriever directly from the vector store we created before:

In [None]:
retriever = vector_store.as_retriever()
retriever.invoke("Who is Mary's sister?")

Our prompt expects two parameters, "context" and "question." We can use the retriever to find the chunks we'll use as the context to answer the question.

We can create a map with the two inputs by using the `RunnableParallel` and `RunnablePassthrough` classes. This will allow us to pass the context and question to the prompt as a map with the keys "context" and "question."

In [None]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

setup = RunnableParallel(context=retriever, question=RunnablePassthrough())
setup.invoke("What color is Patricia's car?")

In [None]:
chain = setup | prompt | model
chain.invoke("What color is Patricia's car?")

Let's invoke the chain using another example:

In [None]:
chain.invoke("What car does Lucia drive?")

### Loading transcription into the vector store
We initialized the vector store with a few random strings. Let's create a new vector store using the chunks from the video transcription.

In [78]:
vector_store2 = DocArrayInMemorySearch.from_documents(documents=documents, embedding=embeddings)

Let's set up a new chain using the correct vector store. This time we are using a different equivalent syntax to specify the `RunnableParallel` portion of the chain:

In [None]:
setup2 = RunnableParallel(context=vector_store2.as_retriever(), question=RunnablePassthrough())

print(setup2.invoke("What is synthetic intelligence?"))

In [None]:
chain2 = (
    setup2
    | prompt
    | model
)

print(chain2.invoke("What is synthetic intelligence?"))