# Simple Gen AI App using Langchain

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
# Langsmith tracking
os.environ['LANGCHAIN_API_KEY'] = os.getenv('LANGCHAIN_API_KEY')
os.environ['LANGCHAIN_TRACING_V2'] = "true"
os.environ['LANGCHAIN_PROJECT'] = os.getenv('LANGCHAIN_PROJECT')

# Scrape Web Page data:
- https://python.langchain.com/v0.1/docs/use_cases/question_answering/

In [3]:
from langchain_community.document_loaders import WebBaseLoader

In [4]:
loader = WebBaseLoader("https://python.langchain.com/v0.1/docs/use_cases/question_answering/")
loader

<langchain_community.document_loaders.web_base.WebBaseLoader at 0x7fe7bf97d690>

In [5]:
docs = loader.load()
docs

[Document(metadata={'source': 'https://python.langchain.com/v0.1/docs/use_cases/question_answering/', 'title': 'Q&A with RAG | 🦜️🔗 LangChain', 'description': 'Overview', 'language': 'en'}, page_content="\n\n\n\n\nQ&A with RAG | 🦜️🔗 LangChain\n\n\n\n\n\n\n\nSkip to main contentA newer LangChain version is out! Check out the latest version.ComponentsIntegrationsGuidesAPI ReferenceMorePeopleVersioningContributingTemplatesCookbooksTutorialsYouTubev0.1Latestv0.2v0.1🦜️🔗LangSmithLangSmith DocsLangServe GitHubTemplates GitHubTemplates HubLangChain HubJS/TS Docs💬SearchGet startedIntroductionQuickstartInstallationUse casesQ&A with RAGQuickstartAdd chat historyStreamingReturning sourcesCitationsMoreExtracting structured outputChatbotsTool use and agentsQuery analysisQ&A over SQL + CSVMoreExpression LanguageGet startedRunnable interfacePrimitivesAdvantages of LCELStreamingAdd message history (memory)MoreEcosystem🦜🛠️ LangSmith🦜🕸️ LangGraph🦜️🏓 LangServeSecurityThis is documentation for LangChain v0.

# RAG Architecture
- A typical RAG application has two main components:
    - Indexing: a pipeline for ingesting data from a source and indexing it. This usually happens offline.

    - Retrieval and generation: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.

- The most common full sequence from raw data to answer looks like:

## Indexing
 - Load: First we need to load our data. This is done with DocumentLoaders.
 - Split: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.
 - Store: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a VectorStore and Embeddings model.

## Retrieval and generation
 - Retrieve: Given a user input, relevant splits are retrieved from storage using a Retriever.
 - Generate: A ChatModel / LLM produces an answer using a prompt that includes the question and the retrieved data.

# Chunk_size > Chunk_overlap

In [9]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
documents = text_splitter.split_documents(docs)
documents

[Document(metadata={'source': 'https://python.langchain.com/v0.1/docs/use_cases/question_answering/', 'title': 'Q&A with RAG | 🦜️🔗 LangChain', 'description': 'Overview', 'language': 'en'}, page_content='Q&A with RAG | 🦜️🔗 LangChain'),
 Document(metadata={'source': 'https://python.langchain.com/v0.1/docs/use_cases/question_answering/', 'title': 'Q&A with RAG | 🦜️🔗 LangChain', 'description': 'Overview', 'language': 'en'}, page_content='Skip to main contentA newer LangChain version is out! Check out the latest version.ComponentsIntegrationsGuidesAPI ReferenceMorePeopleVersioningContributingTemplatesCookbooksTutorialsYouTubev0.1Latestv0.2v0.1🦜️🔗LangSmithLangSmith DocsLangServe GitHubTemplates GitHubTemplates HubLangChain HubJS/TS Docs💬SearchGet startedIntroductionQuickstartInstallationUse casesQ&A with RAGQuickstartAdd chat historyStreamingReturning sourcesCitationsMoreExtracting structured outputChatbotsTool use and agentsQuery analysisQ&A over SQL + CSVMoreExpression LanguageGet startedR

In [10]:
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [11]:
from langchain_community.vectorstores import FAISS
vectorstoredb = FAISS.from_documents(documents, embeddings)
vectorstoredb

<langchain_community.vectorstores.faiss.FAISS at 0x7fe7c17f7df0>

# Similarity Search
- Query from a Vector Store DB

In [12]:
query = "What is RAG?"
result = vectorstoredb.similarity_search(query)
result[0].page_content

"(Q&A) chatbots. These are applications that can answer questions about specific source information. These applications use a technique known as Retrieval Augmented Generation, or RAG.What is RAG?\u200bRAG is a technique for augmenting LLM knowledge with additional data.LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model's cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing the appropriate information and inserting it into the model prompt is known as Retrieval Augmented Generation (RAG).LangChain has a number of components designed to help build Q&A applications, and RAG applications more generally. Note: Here we focus on Q&A for unstructured data. Two RAG use cases which we cover elsewhere are:Q&A over SQL dataQ&

# Instantiate your LLm Model

In [13]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")


# Retrieval Chain | Document Chain

In [17]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template(
    """
    Answer the following question based on the provided context:
    <context>
    {context}
    </context>
    """
)

document_chain = create_stuff_documents_chain(llm,prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='\n    Answer the following question based on the provided context:\n    <context>\n    {context}\n    </context>\n    '), additional_kwargs={})])
| ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7fe7c69ae950>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x7fe7c69f4400>, root_client=<openai.OpenAI object at 0x7fe7c61a47c0>, root_async_client=<openai.AsyncOpenAI object at 0x7fe7c69ae1a0>, model_name='gpt-4o', model_kwargs={}, openai_api_key=SecretStr('**********'))
| StrOutputParser(), kwargs={}, config={'run_name': 'stuff_documen

In [18]:
from langchain_core.documents import Document
document_chain.invoke({
    "input":"What are two main components of RAG application.",
    "context": [Document(page_content="A typical RAG application has two main components:Indexing: a pipeline for ingesting data from a source and indexing it. This usually happens offline.Retrieval and generation: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.")]

})

'Sure, please provide the question you would like answered based on the provided context.'

# But, we should get the documents from the Retriever. 
- Retriever can dynamically select the most relevant documents from the vector store and pass those in for a given question.

In [19]:
# Input => Retriever => Vectorstoredb => Relevant Docs
retriever = vectorstoredb.as_retriever()
from langchain.chains import create_retrieval_chain
retriever_chain = create_retrieval_chain(retriever,document_chain) # Info from Vectore sore + Context Information from document chain

In [20]:
retriever_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7fe7c17f7df0>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='\n    Answer the following question based on the provided context:\n    <context>\n    {context}\n    </context>\n    '), additional_kwargs={})])
       

In [22]:
response = retriever_chain.invoke({"input":"What are two main components of RAG application?"})
response['answer']

"Based on the provided context, Retrieval Augmented Generation (RAG) is a technique used to enhance the capabilities of language models (LLMs) by supplementing their knowledge with additional data. While LLMs possess broad reasoning abilities, their knowledge is confined to the public data available up to the point of their training. RAG is particularly useful for applications that require reasoning over private data or data introduced after the model's training cutoff date.\n\nThe RAG process involves two main components:\n\n1. **Indexing**:\n   - **Load**: The initial step is loading data through DocumentLoaders.\n   - **Split**: Large documents are broken into smaller chunks using text splitters. This facilitates easier searching and ensures the data fits within the model's context window.\n   - **Store**: These chunks are stored in a searchable format, often using a VectorStore and Embeddings model.\n\n2. **Retrieval and Generation**:\n   - **Retrieve**: When a user inputs a query,

In [23]:
response['context']

[Document(metadata={'source': 'https://python.langchain.com/v0.1/docs/use_cases/question_answering/', 'title': 'Q&A with RAG | 🦜️🔗 LangChain', 'description': 'Overview', 'language': 'en'}, page_content="over SQL dataQ&A over code (e.g., Python)RAG Architecture\u200bA typical RAG application has two main components:Indexing: a pipeline for ingesting data from a source and indexing it. This usually happens offline.Retrieval and generation: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.The most common full sequence from raw data to answer looks like:Indexing\u200bLoad: First we need to load our data. This is done with DocumentLoaders.Split: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.Store: We need somewhere to store and