## Chain and Retriever using Langchain

In [None]:
import os
from dotenv import load_dotenv

load_dotenv()

In [None]:
# get environment variables
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_TRACING_V2"] = "True"

In [None]:
## Data Ingestion / Loading
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader('file.pdf')
pdf_document = loader.load()
pdf_document

In [None]:
## Data Transformation / Data Splitting
from langchain.text_splitters import RecursiveCharacterTextSplitter

text_to_chunks = RecursiveCharacterTextSplitter(chunck_size=1000, chunk_overlap=20)
chunks = text_to_chunks.split_documents(pdf_document)

In [None]:
## Vector Embedding and Vector Store
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstore import FAISS

vector_db = FAISS.from_documents(chunks, OllamaEmbeddings())

Flow of the pipeline architecture:
1. define retriever as an interface to vector_db using 'as_retriever()'
2. define prompt
3. define llm model
4. define document_chain by combining prompt and llm model using 'create_stuff_documents_chain'
5. define retriver_chain by combining retriver and document_chain using 'create_retrieval_chain'
6. define output by invoking user_query to retriever_chain

In [None]:
## define retriever
retriever = vector_db.as_retriever()
retriever

In [None]:
## define prompt
## -- context is nothing but the documents in the vector db
## -- input is nothing but user query
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""
                                          Answer the question based only on the following context:
                                          {context}
                                          Question: {input}
                                          """)

In [None]:
## define LLM
from langchain_community.llms import Ollama

llm = Ollama(model="llama2")

LangChain Expression Language (LCEL) is used to construct chains. There are many chain constructors available on https://python.langchain.com/v0.1/docs/modules/chains/ but we will use the "create_stuff_documents_chain" in this application

Langchains's "create_stuff_document_chain" takes a list of documents and formats them all into a prompt, then passes that prompt to an LLM

In [None]:
## define document_chain = prompt + llm
from langchain_chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(prompt, llm)

In [None]:
## define retriever_chain = retriever + document_chain
from langchain.chains import create_retrieval_chain

retriever_chain = create_retrieval_chain(document_chain, retriever)

In [None]:
## define output
output = retriever_chain.invoke({"input":"<enter your input>"})
output['answer']