<a href="https://colab.research.google.com/github/Pavun-KumarCH/Agentic-RAG-Systems/blob/main/Introduction_to_RAG_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Rag From Scratch: Overview
These notebooks walk through the process of building RAG app(s) from scratch.

They will build towards a broader understanding of the RAG langscape, as shown here:

![Local Image](./assets/RAG-intro.png)


In [2]:
#@title requirements
%pip install --q langchain_community tiktoken langchain-openai langchainhub chromadb langchain

In [3]:
import os
from google.colab import userdata
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ["LANGCHAIN_API_KEY"] = userdata.get('LANGCHAIN_API_KEY')
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Part 1 : Overview

[RAG](https://python.langchain.com/docs/tutorials/rag/)

In [4]:
# Load Dependencies
import bs4
from langchain import hub
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from IPython.display import Markdown

from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough



In [5]:
#### Indexing ###
loader = WebBaseLoader(
    web_paths = ("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs = dict(
        parse_only = bs4.SoupStrainer(
            class_ = ("post-content", "post-title","post-header")
        )
    ),
)
docs = loader.load()

## Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)
splits = text_splitter.split_documents(docs)

## Embed
vectorstore = Chroma.from_documents(splits, embedding = OpenAIEmbeddings())

retriever = vectorstore.as_retriever()

#### RETRIEVAL and GENERATION ####

# Prompt
prompt = hub.pull("rlm/rag-prompt")

## LLM
llm = ChatOpenAI(model_name = "gpt-3.5-turbo",
                 temperature = 0.2,
                 top_p = 0.7)

## Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

## RAG pipeline Chain
rag_chain = (
    {"context": retriever | format_docs, "question":
      RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

## Question
Markdown(rag_chain.invoke("What are the approaches to Task Decomposition?"))

The approaches to Task Decomposition include using LLM with simple prompting, task-specific instructions, and human inputs. Task decomposition involves breaking down a problem into smaller steps to make it more manageable and easier to solve. Different methods can be used to guide the decomposition process, such as providing specific instructions or using language models for prompting.

# Part-2 : Indexing

![Local Image](./assets/indexing.png)


In [6]:
# Documents
question = "What are the approaches to Task Decomposition?"
document = "My Favorite pet is a cat."

* Count tokens considering ~4 char / token

In [7]:
import tiktoken

def num_tokents_from_string(string: str, encoding_name: str) -> int:
  """Returns the number of tokens in a text string."""
  encoding = tiktoken.get_encoding(encoding_name)
  num_tokens = len(encoding.encode(string))
  return num_tokens

num_tokents_from_string(question, "cl100k_base")


9

* Text embedding models

In [8]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
query_result = embeddings.embed_query(question)
document_result = embeddings.embed_query(document)
len(query_result)

1536

* Cosine similarity is reccomended (1 indicates identical) for OpenAI embeddings.



In [9]:
# Sematic Search metric Cosine Similarity
import numpy as np

def cosine_similarity(vec1,vec2):
  dot_product = np.dot(vec1, vec2)
  norm_vec1 = np.linalg.norm(vec1)
  norm_vec2 = np.linalg.norm(vec2)
  return dot_product / (norm_vec1 * norm_vec2)

similarity = cosine_similarity(query_result, document_result)
print("Cosine Similarity:", similarity)

Cosine Similarity: 0.6775632124372357


* Document Loaders

In [10]:
#### INDEXING ####

# Load blog
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths = ("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs = dict(
        parse_only = bs4.SoupStrainer(
            class_ = ("post-content", "post-title","post-header")
        )
    ),
)
blog_docs = loader.load()

* splitter


> This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

In [11]:
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 200,
)
splits = text_splitter.split_documents(blog_docs)

* Vectorstores

In [12]:
# Index
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

vectorstors = Chroma.from_documents(splits, embedding = OpenAIEmbeddings())

retriever = vectorstors.as_retriever()

# Part-3 : Retrieval

In [13]:
# Index
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

vectorstor = Chroma.from_documents(documents = splits,
                                    embedding = OpenAIEmbeddings())

retriever = vectorstor.as_retriever(search_kwargs = {"k": 4})

In [14]:
# relevant douments search
docs = retriever.get_relevant_documents("What is Task Decomposition?")
display(len(docs))

  docs = retriever.get_relevant_documents("What is Task Decomposition?")


4

# Part 4 : Generation

![Local Image](./assets/generation.png)


In [15]:
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

# Prompt
template = """
Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='\nAnswer the question based only on the following context:\n{context}\n\nQuestion: {question}\n'), additional_kwargs={})])

In [16]:
# LLM
llm = ChatOpenAI(model_name = "gpt-3.5-turbo",
                 temperature = 0.2,
                 top_p = 0.7)

# RAG pipeline Chain
chain = prompt | llm

# Run
question = "What is Task Decomposition?"
chain.invoke({"context": docs, "question": question})

AIMessage(content='Task Decomposition is a technique used to break down complex tasks into smaller and simpler steps, making them more manageable for autonomous agents. This process involves transforming big tasks into multiple manageable tasks, allowing agents to plan ahead and execute tasks more effectively.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 48, 'prompt_tokens': 672, 'total_tokens': 720, 'completion_tokens_details': {'audio_tokens': None, 'reasoning_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-65009b76-06fd-422b-a6e2-f08bd33c73e0-0', usage_metadata={'input_tokens': 672, 'output_tokens': 48, 'total_tokens': 720, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 0}})

In [17]:
from langchain import hub

# Prompt
prompt_hub_rag = hub.pull("rlm/rag-prompt")

display(prompt_hub_rag)

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})])

* [RAG Cains](https://python.langchain.com/docs/how_to/sequence/)

In [18]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    {"context": retriever | format_docs, "question":
      RunnablePassthrough()}
    | prompt_hub_rag
    | llm
    | StrOutputParser()
)

Markdown(rag_chain.invoke("What are the approaches to Task Decomposition?"))

The approaches to Task Decomposition include using LLM with simple prompting, task-specific instructions, or human inputs. Tree of Thoughts extends CoT by exploring multiple reasoning possibilities at each step, decomposing the problem into multiple thought steps and generating a tree structure. The search process can be BFS or DFS with each state evaluated by a classifier or majority vote.