# Rag From Scratch: Overview

These notebooks walk through the process of building RAG app(s) from scratch.

They will build towards a broader understanding of the RAG langscape, as shown here:

![Screenshot 2024-03-25 at 8.30.33 PM.png](attachment:c566957c-a8ef-41a9-9b78-e089d35cf0b7.png)

## Enviornment

`(1) Packages`

In [None]:
!python --version

Python 3.10.12


In [None]:
!pip install langchain_community==0.0.33 langchain-openai langchainhub chromadb langchain==0.1.16 tiktoken openai

Collecting langchain_community==0.0.33
  Downloading langchain_community-0.0.33-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-openai
  Downloading langchain_openai-0.1.15-py3-none-any.whl (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.1/46.1 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchainhub
  Downloading langchainhub-0.1.20-py3-none-any.whl (5.0 kB)
Collecting chromadb
  Downloading chromadb-0.5.4-py3-none-any.whl (581 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m581.4/581.4 kB[0m [31m33.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain==0.1.16
  Downloading langchain-0.1.16-py3-none-any.whl (817 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.7/817.7 kB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_

## Part 1: Overview

[RAG quickstart](https://python.langchain.com/docs/use_cases/question_answering/quickstart)

In [2]:
from dotenv import load_dotenv
load_dotenv()

True

In [6]:
import os

AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")

## Part 2: Indexing

![Screenshot 2024-02-12 at 1.36.56 PM.png](attachment:d1c0f19e-1f5f-4fc6-a860-16337c1910fa.png)

In [3]:
# Documents
question = "What kinds of pets do I like?"
document = "My favorite pet is a cat."

[Count tokens](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb) considering [~4 char / token](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them)

In [4]:
import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

num_tokens_from_string(question, "cl100k_base")

8

[Text embedding models](https://python.langchain.com/docs/integrations/text_embedding/openai)

In [7]:
import os
from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = AZURE_OPENAI_ENDPOINT,
  api_key = AZURE_OPENAI_API_KEY,
  api_version = AZURE_OPENAI_API_VERSION
)

In [8]:
def generate_embeddings(text, model="ada0021_6"): # model = "deployment_name"
    return client.embeddings.create(input = [text], model=model).data[0].embedding


query_result = generate_embeddings(question)
document_result = generate_embeddings(document)

len(query_result)
query_result

[0.02595171146094799,
 -0.021968426182866096,
 0.046833787113428116,
 0.017045360058546066,
 0.02072688192129135,
 0.0022287433966994286,
 0.01525201927870512,
 0.04797187075018883,
 -0.015364103019237518,
 0.01822655089199543,
 -0.02424458973109722,
 0.0023171170614659786,
 0.012906881049275398,
 0.010915238410234451,
 -0.011734312400221825,
 -0.013769064098596573,
 -0.017088469117879868,
 -0.006996616255491972,
 0.0008988259360194206,
 -9.581683116266504e-06,
 -0.009759913198649883,
 0.010484146885573864,
 -0.016295261681079865,
 -0.015217532403767109,
 0.003606081008911133,
 -0.0018084291368722916,
 -0.017967896535992622,
 0.02279612235724926,
 0.003849647706374526,
 0.007500993087887764,
 0.013424191623926163,
 -0.019088733941316605,
 -0.004496285226196051,
 0.010932481847703457,
 -0.01349316630512476,
 0.004444553982466459,
 -0.028176143765449524,
 -0.003955265041440725,
 0.023951446637511253,
 -0.004095369949936867,
 0.016924655064940453,
 0.0001918357447721064,
 0.00579387042671

[Cosine similarity](https://platform.openai.com/docs/guides/embeddings/frequently-asked-questions) is recomended (1 indicates identical) for OpenAI embeddings.

In [9]:
import numpy as np

def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    return dot_product / (norm_vec1 * norm_vec2)

similarity = cosine_similarity(query_result, document_result)
print("Cosine Similarity:", similarity)

Cosine Similarity: 0.8575519863832503


[Document Loaders](https://python.langchain.com/docs/integrations/document_loaders/)

**Actual Implementation of RAG starts from below cell**

In [11]:
#### INDEXING ####

# Load blog
import bs4
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()
blog_docs

[Document(page_content='\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final re

[Splitter](https://python.langchain.com/docs/modules/data_connection/document_transformers/recursive_text_splitter)

> This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

In [12]:
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300,
    chunk_overlap=50)

# Make splits
splits = text_splitter.split_documents(blog_docs)

In [13]:
len(splits)
splits[0]

Document(page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n\n\

[Vectorstores](https://python.langchain.com/docs/integrations/vectorstores/)

In [14]:
# Index
from langchain_openai import AzureOpenAIEmbeddings
from langchain.vectorstores import Chroma

vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=AzureOpenAIEmbeddings(azure_deployment="ada0021_6",
                                                                    openai_api_version=AZURE_OPENAI_API_VERSION,
                                                                    chunk_size=1)
                                   )

retriever = vectorstore.as_retriever()

## Part 3: Retrieval

In [15]:
# Index
from langchain_openai import AzureOpenAIEmbeddings
from langchain_community.vectorstores import Chroma

vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=AzureOpenAIEmbeddings(azure_deployment="ada0021_6",
                                                                    openai_api_version=AZURE_OPENAI_API_VERSION,
                                                                    chunk_size=1)
                                   )


retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

In [16]:
docs = retriever.get_relevant_documents("What is Task Decomposition?")

  warn_deprecated(


In [17]:
docs

[Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via 

## Part 4: Generation

![Screenshot 2024-02-12 at 1.37.38 PM.png](attachment:f9b0e284-58e4-4d33-9594-2dad351c569a.png)

In [18]:
from langchain.llms import AzureOpenAI
from langchain.prompts.chat import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.chains import RetrievalQA

# Prompt
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt

ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template='Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n'))])

In [19]:
from langchain_openai import AzureChatOpenAI

In [20]:
llm = AzureChatOpenAI(
            openai_api_type="azure",
            openai_api_version=AZURE_OPENAI_API_VERSION,
            openai_api_key=AZURE_OPENAI_API_KEY,
            azure_endpoint=AZURE_OPENAI_ENDPOINT,
            deployment_name="gpt35turbo",
            temperature=0
        )

In [21]:
# Chain
from langchain_core.output_parsers import StrOutputParser

llm_chain = prompt | llm | StrOutputParser()


In [22]:
# Run
llm_chain.invoke({"context":docs,
                  "question":"What is Task Decomposition?"})

'Task Decomposition is a technique for decomposing hard tasks into smaller and simpler steps to enhance model performance on complex tasks. It can be done using Chain of thought (CoT) or Tree of Thoughts, and can be prompted by LLM with simple prompting, task-specific instructions, or human inputs.'

**We can also use developers prompts from HuggingFace**

In [23]:
# Getting a RAG prompt posted on huggingface and using it as prompt template for our case
from langchain import hub

prompt_hub_rag = hub.pull("rlm/rag-prompt")

In [24]:
prompt_hub_rag

ChatPromptTemplate(input_variables=['context', 'question'], metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))])

[RAG chains](https://python.langchain.com/docs/expression_language/get_started#rag-search-example)

In [25]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt_hub_rag
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Task Decomposition?")

'Task Decomposition is a technique used to break down complex tasks into smaller and simpler steps. It can be done using prompting techniques like Chain of Thought or Tree of Thoughts, or with task-specific instructions or human inputs. The goal is to make the task more manageable and easier to plan for an agent or model.'