## Load environment variables

In [None]:
!pip install langchain-openai

In [None]:
!pip install openai langchain langchain_pinecone langchain[docarray] docarray pydantic==1.10.8 pytube python-dotenv tiktoken pinecone-client scikit-learn ruff git+https://github.com/openai/whisper.git

In [None]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
VIDEO = "https://www.youtube.com/watch?v=BrsocJb-fAo&t=6"

## Loading the LLM

In [None]:
from langchain_openai.chat_models import ChatOpenAI

model = ChatOpenAI(openai_api_key = OPENAI_API_KEY, model = "gpt-3.5-turbo")

Now, when we use invoke function it returns an object of type *AIMessage*. We need to get a string result and here comes the StrOutputParser Class and the whole concept of langchain where chaining the objects togther to get a nicely formatted output can be seen.  

In [None]:
from langchain_core.output_parsers import StrOutputParser
parser = StrOutputParser()

chained_model = model | parser #chaining via the pipe operator
chained_model.invoke("What is 2 + 2?")


## Prompt Template for the LLM

In [None]:
#As gpt-3.5 is a chat based model, I ma using langchain's ChatPromptTemplate
from langchain.prompts import ChatPromptTemplate

template = """
Answer the question based on the context below. If you cannnot answer the question by any means, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt.format(context=context, question=question)

In [None]:
#Chaining the prompt as well : prompt | model | parser
chained_model = prompt | chained_model

### Creating transcript for Youtube Video using pytube

Using pytube to download the youtube video. Of all the audio formats for the video (different bitrates) we choose the first one for now. We download the audio file into a temporary directory and only for the first time we are creating a transcription file 

In [None]:
import tempfile
import whisper
from pytube import YouTube

if not in os.path.exists("transcript.txt"):
    youtube = YouTube(VIDEO)
    #choosing audio to transcribe
    audio = youtube.streams.filter(only_audio=True).first()
    whisper = whisper.load_model("base")
    
    with tempfile.TemporaryDirectory() as tempdir:
        file = audio.download(output_path = tempdir)
        transcription = whisper_model.transcribe(file)["text"].strip()
        
        with open("transcription.txt", "w") as file:
            file.write(transcription)
    

### Splitting the transcription into documents
Now we have the transcription which can be passed as the context. But the whole transcription cannot be passed as the models have a maximum context window length (~16000 tokens). So we split the transcription into chunks and retreive the relevant chunk to pass as context for a question. TO do that we use TextLoader from langchain and then split the Document instance

In [None]:
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_docs = TextLoader("transcription.txt").load()

#Using RecursiveCharecterTextSplitter, it splits the whole transcript into documents where each document has 1500 charecters with 50 charecters overlapping between two documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=50)
list_of_docs = text_splitter.split_documents(text_docs)

### Retrieving the most relevant document for a question/query (R in RAG)
The documents and the question are converted into embeddings which is basically a n-dimenional vector space for words/tokens. 

In [None]:
from langchain_openai.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()


Now we perform a similarity search between the question and the documents. The document closest to the question in the embedding space is returned as the relevant context to be passed into the prompt. Similarity metrics like cosine similiarity are generally used but I am going forward with storing in a vector database which handles all of that retrieval by itself.

### Vector Databases
- Can store large number of documents
- Automatically creates and stores embeddings
- Perform similarty search efficiently


As vector database can help us retrieve the most relevant documents, all we need is a Retriever that can access the vector database and return the relevant context (most similar chunk to the question). This is done using RunnableParallel and RunnablePassThrough classes.

In [None]:
from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

vectorstore = DocArrayInMemorySearch.from_documents(list_of_docs, embeddings)
retrieval_part = RunnableParallel(context = vectorstore.as_retriever(), question = RunnablePassThrough() )

#chaining the retreiver to the pipeline
final_model = retrieval_part | chained_model

### Augment and Generate Output
We use the retrieved context and when we chain them using | operator, the prompt is *augmented* with the context and we use invoke function to *generate* the output


In [None]:
# We can ask any question here
final_model.invoke("What is Anime?") 

### Replacing local vector DB with Pinecone 
Pinecone can handle large amounts of data and perform similarity searches at scale

In [None]:
from langchain_pinecone import PineconeVectorStore

index_name = os.getenv(MY_INDEX) # my custom index name in pinecone from .env file

pinecone = PineconeVectorStore.from_documents(
    list_of_docs, embeddings, index_name=index_name
)

In [None]:
retrieval_from_pc = pinecone.as_retriever()

final_model_1 = retrieval_from_pc | chained_model

final_model_1.invoke("What is Hollywood going to start doing?")