<a href="https://colab.research.google.com/github/Pavun-KumarCH/Agentic-RAG-Systems/blob/main/Introduction_to_RAG_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Rag From Scratch: Overview
These notebooks walk through the process of building RAG app(s) from scratch.

They will build towards a broader understanding of the RAG langscape, as shown here:

In [None]:
#@title requirements
%pip install --q langchain_community tiktoken langchain-openai langchainhub chromadb langchain

In [None]:
import os
from google.colab import userdata
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ["LANGCHAIN_API_KEY"] = userdata.get('LANGCHAIN_API_KEY')
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Part 1 : Overview

[RAG](https://python.langchain.com/docs/tutorials/rag/)

In [None]:
# Load Dependencies
import bs4
from langchain import hub
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from IPython.display import Markdown

from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

In [None]:
#### Indexing ###
loader = WebBaseLoader(
    web_paths = ("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs = dict(
        parse_only = bs4.SoupStrainer(
            class_ = ("post-content", "post-title","post-header")
        )
    ),
)
docs = loader.load()

## Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)
splits = text_splitter.split_documents(docs)

## Embed
vectorstore = Chroma.from_documents(splits, embedding = OpenAIEmbeddings())

retriever = vectorstore.as_retriever()

#### RETRIEVAL and GENERATION ####

# Prompt
prompt = hub.pull("rlm/rag-prompt")

## LLM
llm = ChatOpenAI(model_name = "gpt-3.5-turbo",
                 temperature = 0.2,
                 top_p = 0.7)

## Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

## RAG pipeline Chain
rag_chain = (
    {"context": retriever | format_docs, "question":
      RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

## Question
Markdown(rag_chain.invoke("What are the approaches to Task Decomposition?"))

# Part-2 : Indexing

In [None]:
# Documents
question = "What are the approaches to Task Decomposition?"
document = "My Favorite pet is a cat."

* Count tokens considering ~4 char / token

In [None]:
import tiktoken

def num_tokents_from_string(string: str, encoding_name: str) -> int:
  """Returns the number of tokens in a text string."""
  encoding = tiktoken.get_encoding(encoding_name)
  num_tokens = len(encoding.encode(string))
  return num_tokens

num_tokents_from_string(question, "cl100k_base")


* Text embedding models

In [None]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
query_result = embeddings.embed_query(question)
document_result = embeddings.embed_query(document)
len(query_result)

* Cosine similarity is reccomended (1 indicates identical) for OpenAI embeddings.



In [None]:
# Sematic Search metric Cosine Similarity
import numpy as np

def cosine_similarity(vec1,vec2):
  dot_product = np.dot(vec1, vec2)
  norm_vec1 = np.linalg.norm(vec1)
  norm_vec2 = np.linalg.norm(vec2)
  return dot_product / (norm_vec1 * norm_vec2)

similarity = cosine_similarity(query_result, document_result)
print("Cosine Similarity:", similarity)

* Document Loaders

In [None]:
#### INDEXING ####

# Load blog
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths = ("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs = dict(
        parse_only = bs4.SoupStrainer(
            class_ = ("post-content", "post-title","post-header")
        )
    ),
)
blog_docs = loader.load()

* splitter


> This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

In [None]:
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 200,
)
splits = text_splitter.split_documents(blog_docs)

* Vectorstores

In [None]:
# Index
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

vectorstors = Chroma.from_documents(splits, embedding = OpenAIEmbeddings())

retriever = vectorstors.as_retriever()

# Part-3 : Retrieval

In [None]:
# Index
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

vectorstor = Chroma.from_documents(documents = splits,
                                    embedding = OpenAIEmbeddings())

retriever = vectorstor.as_retriever(search_kwargs = {"k": 4})

In [None]:
# relevant douments search
docs = retriever.get_relevant_documents("What is Task Decomposition?")
display(len(docs))

Part 4 : Generation

In [None]:
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

# Prompt
template = """
Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt

In [None]:
# LLM
llm = ChatOpenAI(model_name = "gpt-3.5-turbo",
                 temperature = 0.2,
                 top_p = 0.7)

# RAG pipeline Chain
chain = prompt | llm

# Run
question = "What is Task Decomposition?"
chain.invoke({"context": docs, "question": question})

In [None]:
from langchain import hub

# Prompt
prompt_hub_rag = hub.pull("rlm/rag-prompt")

display(prompt_hub_rag)

* [RAG Cains](https://python.langchain.com/docs/how_to/sequence/)

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    {"context": retriever | format_docs, "question":
      RunnablePassthrough()}
    | prompt_hub_rag
    | llm
    | StrOutputParser()
)

Markdown(rag_chain.invoke("What are the approaches to Task Decomposition?"))