# Build a Retrieval Augmented Generation (RAG) App

https://python.langchain.com/v0.2/docs/tutorials/rag/

A typical RAG application has two main components:

`Indexing`: a pipeline for ingesting data from a source and indexing it. This usually happens offline.
1. **Load**: First we need to load our data. This is done with Document Loaders.
2. **Split**: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.
3. **Store**: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a VectorStore and Embeddings model.

`Retrieval and generation:`: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.

4. **Retrieve**: Given a user input, relevant splits are retrieved from storage using a Retriever.
5. **Generate**: A ChatModel / LLM produces an answer using a prompt that includes the question and the retrieved data

In [5]:
! pip show langchain

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Name: langchain
Version: 0.1.4
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: /home/jupyter-manuel@datwit.com/.local/lib/python3.9/site-packages
Requires: aiohttp, async-timeout, dataclasses-json, jsonpatch, langchain-community, langchain-core, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: 


In [6]:
# %pip install langchain-chroma langchain  langchain-openai
from dotenv import load_dotenv, find_dotenv
import os
import warnings
from IPython.display import display, Markdown  # to see better the output text

warnings.filterwarnings('ignore')
_ = load_dotenv(find_dotenv())  # read local .env file

llm_model = "gpt-3.5-turbo"


To use LangSmith
 
It allows you to closely trace, monitor and evaluate your LLM application. It seamlessly integrates with LangChain and LangGraph, and you can use it to inspect and debug individual steps of your chains and agents as you build.

In [7]:
# If you want to get best in-class automated tracing of your model calls you
# can also set your LangSmith API key
# os.environ["LANGSMITH_API_KEY"] = getting from .env file
os.environ["LANGSMITH_TRACING"] = "true"

## 1. Indexing: Load

In this case we’ll use the `WebBaseLoader`, which uses urllib to load HTML from web URLs and BeautifulSoup to parse it to text. We can customize the HTML -> text parsing by passing in parameters to the `BeautifulSoup` parser via bs_kwargs (see BeautifulSoup docs). In this case only HTML tags with class “post-content”, “post-title”, or “post-header” are relevant, so we’ll remove all others.

In [8]:
import bs4 #beautifulsoup4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

print(f"Amount of documents loaded: {len(docs)}")
len(docs[0].page_content)

USER_AGENT environment variable not set, consider setting it to identify your requests.


Amount of documents loaded: 1


43131

In [9]:
Markdown(docs[0].page_content[:500])



      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In

## 2. Indexing: Split

In [10]:
from langchain_experimental.text_splitter import SemanticChunker
# %pip install --quiet langchain_experimental
from langchain_openai.embeddings import OpenAIEmbeddings

semantic_splitter = SemanticChunker(OpenAIEmbeddings())

docs_split = semantic_splitter.split_documents(docs)
len(docs_split)

22

In [11]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500, chunk_overlap=250,  separators=["\n\n", "\n", "."])

all_splits = text_splitter.split_documents(docs_split)

len(all_splits)

56

## 3. Indexing: Store
Now we need to index our 56 text chunks so that we can search over them at runtime. The most common way to do this is to embed the contents of each document split and insert these embeddings into a vector database (or vector store). When we want to search over our splits, we take a text search query, embed it, and perform some sort of “similarity” search to identify the stored splits with the most similar embeddings to our query embedding. The simplest similarity measure is cosine similarity — we measure the cosine of the angle between each pair of embeddings (which are high dimensional vectors).

We can embed and store all of our document splits in a single command using the `Chroma` vector store and `OpenAIEmbeddings` model.

In [None]:
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

## Retrieval and Generation:

In [None]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")

Markdown(retrieved_docs[0].page_content)

Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process. Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote. Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs. Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem

In [None]:
from langchain import hub
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

prompt = hub.pull("rlm/rag-prompt")

example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
).to_messages()

Markdown(example_messages[0].content)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: filler question 
Context: filler context 
Answer:

We’ll use the LCEL Runnable protocol to define the chain, allowing us to

* pipe together components and functions in a transparent way
* automatically trace our chain in `LangSmith`
* get streaming, async, and batched calling out of the box.

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# LangChain will automatically cast certain objects to runnables when met with the | operator. Here, `format_docs` is cast to a `RunnableLambda`, and the dict with "context" and "question" is cast to a `RunnableParallel`. The details are less important than the bigger point, which is that each object is a Runnable.
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# out = rag_chain.invoke("What is Task Decomposition?") alternative
for chunk in rag_chain.stream("What is Task Decomposition?"):
    print(chunk, end="", flush=True)

Task decomposition is the process of breaking down a complex task into smaller, more manageable subtasks. This can be achieved through methods such as prompting a language model (LLM) to think step-by-step, using task-specific instructions, or incorporating human inputs. It allows for more efficient problem-solving and planning by clarifying the steps needed to accomplish a goal.

Let's go to see the difference between the output with RAG and without RAG

In [None]:
Markdown(llm.invoke("What is Task Decomposition?").content)

Task decomposition is a process in which a complex task is broken down into smaller, more manageable sub-tasks or components. This approach is commonly used in various fields, including project management, computer science, artificial intelligence, and cognitive psychology, to simplify the execution and understanding of complex activities.

Here are some key points about task decomposition:

1. **Simplification**: By breaking a large task into smaller parts, it becomes easier to understand, plan, and execute. Each sub-task can be tackled individually, reducing cognitive load and making the overall process more manageable.

2. **Organization**: Task decomposition helps in organizing work logically. It allows for better prioritization and allocation of resources, as each sub-task can be assigned to different team members or scheduled at different times.

3. **Clarity and Focus**: Smaller tasks often have clearer objectives and outcomes, which can help maintain focus and motivation. This clarity can lead to increased efficiency and productivity.

4. **Facilitation of Problem Solving**: If a complex task encounters obstacles, decomposition allows teams to identify which specific sub-task is causing issues, making it easier to troubleshoot and resolve problems.

5. **Applications**: In computer science, task decomposition is commonly used in algorithms and programming to break problems into smaller, solvable parts. In project management, it is used to create work breakdown structures (WBS) that outline all the tasks required to complete a project.

Overall, task decomposition is a valuable strategy for enhancing understanding, improving efficiency, and facilitating effective collaboration in various domains.

## Alternative


* `create_stuff_documents_chain` function specifies how retrieved context is fed into a prompt and LLM. In this case, we will "stuff" the contents into the prompt -- i.e., we will include all retrieved context without any summarization or other processing. It largely implements our above rag_chain, with input keys context and input-- it generates an answer using retrieved context and query.

* `create_retrieval_chain` function adds the retrieval step and propagates the retrieved context through the chain, providing it alongside the final answer. It has input key input, and includes `input`, `context`, and `answer` in its output. The PromptTemplate needs to have `context` as variable.

In [None]:
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

results = rag_chain.invoke(
    {"input": "Please list all your shirts with sun protection in a table in markdown and summarize each one."})

results # dict with keys : 'input', 'context', 'answer'