# Introduction to RAG (Retrieval-Augmented Generation)

This notebook demonstrates a basic implementation of Retrieval-Augmented Generation (RAG), a powerful technique that combines retrieval-based and generative AI methods to enhance the quality and relevance of responses. RAG leverages external knowledge sources to provide accurate and context-aware answers to user queries.

## Key Components Covered:
1. **Document Loading**: Fetching and processing content from web sources using `WebBaseLoader`.
2. **Text Splitting**: Dividing documents into manageable chunks with `RecursiveCharacterTextSplitter`.
3. **Embeddings**: Generating vector representations of text using OpenAI's embeddings to enable semantic search.
4. **Vector Storage**: Storing and retrieving document chunks efficiently with `Chroma`.
5. **Retrieval**: Fetching relevant documents based on user queries using cosine similarity.
6. **Prompt Engineering**: Crafting effective prompts to guide the LLM's responses.
7. **Generation**: Using OpenAI's `ChatOpenAI` to generate concise and accurate answers based on retrieved context.

## Use Case:
The notebook walks through a practical example where the system answers the question, *"What is Task Decomposition?"*, by retrieving relevant information from a blog post and generating a coherent response.

By the end of this notebook, you will understand how to build a basic RAG pipeline, customize retrieval and generation components, and apply this technique to your own projects.

In [29]:
import os
from dotenv import load_dotenv
import tiktoken
from langchain_openai import OpenAIEmbeddings
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

In [2]:
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'

In [3]:
load_dotenv(override = True)
openai_api_key = os.getenv('OPENAI_API_KEY')
langchain_api_key = os.getenv('LANGCHAIN_API_KEY')

In [4]:
# Documents
question = "What is the capital of France?"
document = "Paris is the capital and most populous city of France."

In [5]:
def num_tokens_from_string(string: str,encoding_name: str) -> int:
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

num_tokens_from_string(question, "cl100k_base")

7

In [6]:
embed = OpenAIEmbeddings()
query_result = embed.embed_query(question)
document_result = embed.embed_query(document)
len(query_result)

1536

In [7]:
def cosine_similarity_calc(vec1,vec2):
    dot_prod = np.dot(vec1,vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    return dot_prod/(norm_vec1*norm_vec2)

In [8]:
similarity = cosine_similarity_calc(query_result, document_result)
print("Cosine Similarity:", similarity)

Cosine Similarity: 0.9010889149980191


In [9]:
## OR use library

In [10]:
cosine_similarity(np.array(query_result).reshape(1, -1), np.array(document_result).reshape(1, -1))[0][0]

0.9010889149980198

In [11]:
loader = WebBaseLoader(
    web_paths = ("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs = dict(
        parse_only = bs4.SoupStrainer(
            class_ = ("post-content", "poset-title", "post-header")
        )
    ),
)

blog_docs = loader.load()

In [12]:
blog_docs[0].page_content[:1000]

'\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n\n\nMemory\n\nS

In [13]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size = 300,
    chunk_overlap = 50
)

splits = text_splitter.split_documents(blog_docs)

In [14]:
splits[0:3]

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refi

In [15]:
vectorstore = Chroma.from_documents(documents = splits,
                                   embedding = OpenAIEmbeddings())

retriever = vectorstore.as_retriever(search_kwargs = {"k":1})

In [16]:
docs = retriever.get_relevant_documents("What is Task Decomposition?")

  docs = retriever.get_relevant_documents("What is Task Decomposition?")


In [17]:
docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Component One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a

In [18]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [19]:
prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n'), additional_kwargs={})])

In [20]:
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [21]:
llm

ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000002367CA87E20>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x0000023632351B80>, root_client=<openai.OpenAI object at 0x000002362F5A8D60>, root_async_client=<openai.AsyncOpenAI object at 0x00000236335C4070>, temperature=0.0, model_kwargs={}, openai_api_key=SecretStr('**********'))

In [22]:
chain = prompt | llm

In [23]:
chain

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n'), additional_kwargs={})])
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000002367CA87E20>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x0000023632351B80>, root_client=<openai.OpenAI object at 0x000002362F5A8D60>, root_async_client=<openai.AsyncOpenAI object at 0x00000236335C4070>, temperature=0.0, model_kwargs={}, openai_api_key=SecretStr('**********'))

In [24]:
chain.invoke({"context":docs,"question":"What is Task Decomposition?"})

AIMessage(content='Task Decomposition is a technique used by agents to break down complex tasks into smaller and simpler steps, allowing for better planning and execution. It can be achieved through methods such as Chain of Thought and Tree of Thoughts, which involve breaking down tasks into manageable steps and exploring multiple reasoning possibilities at each step. Task decomposition can also be done using simple prompting, task-specific instructions, or with human inputs.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 79, 'prompt_tokens': 315, 'total_tokens': 394, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-Bf82IDK8xc3c4QF4I3yldEO26EPtU', 'service_tier': 'default', 'finish_reason': 'stop', 'logpro

In [26]:
prompt_hub_rag = hub.pull("rlm/rag-prompt")

In [27]:
prompt_hub_rag

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})])

In [35]:
rag_chain = (
    {"context":retriever, "question":RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Task decomposition?")

'Task decomposition is the process of breaking down a complex task into smaller and simpler steps in order to make it more manageable for an agent to plan and execute. It can be done using techniques such as Chain of Thought and Tree of Thoughts, as well as through simple prompting or task-specific instructions.'