<a href="https://colab.research.google.com/github/ZainShaikh-12/AIAgents/blob/main/langchain_RAG_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG

One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots. These are applications that can answer questions about specific source information. These applications use a technique known as Retrieval Augmented Generation, or RAG.

# What is RAG
RAG is a technique for augmenting LLM knowledge with additional data.LangChain has a number of components designed to help build Q&A applications, and RAG applications more generally.

# Enviorment
install the packages

In [None]:
! pip install -q langchain langchain_community langchain-chroma langchain_core langchain_text_splitters

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━[0m [32m61.4/67.3 kB[0m [31m1.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m36.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
pip install -qU langchain-google-genai

In [None]:
import textwrap

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text)->Markdown:
    text : str = text.replace("•", "  *")
    return Markdown(textwrap.indent(text, "> ", predicate=lambda _: True))

After downloading the packages create or login to LangSmith account for further process

# LangSmith
many of the applications you build with langchain will contain multiple steps with multiple invocations of LLM calls.As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with [LangSmith](https://smith.langchain.com/)

After you sign up at the link above, make sure to set your envoironment variables to start logging traces:

In [None]:
from google.colab import userdata

In [None]:




# RAG KEY
LANGCHAIN_TRACING_V2 = "true"
# os.environ["LANGCHAIN_TRACING_V2"] = LANGCHAIN_TRACING_V2
LANGCHAIN_API_KEY=userdata.get('LANGCHAIN_API_KEY')

# GEMINI KEY
GEMINI_API_KEY=userdata.get('GEMINI_API_KEY')


NameError: name 'userdata' is not defined

# Concepts of RAG
* Indexing: a pipeline for ingesting data from a source and indexing it.
* Retrieval and generation: The actual RAG chain, which takes the user query at run time retreves the relevant data from the index, then passes that to the model

# INDEXING:Load
* we need to first load the blog post contents.WE can use (DocumentLoaders) for this,which are objects that load in data from a soruce and return a list of (Documents).A Document is an object with some (page_content=str) and (metadata=dict)


In [None]:
import bs4
from langchain.document_loaders import WebBaseLoader

#Only keep post title, headers, and content from the full HTML.
#example
#<div class="post-title">Title of the Post</div>
bs4_strainer = bs4.SoupStrainer(class_=("post-tile", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)

# load data into Document objects
docs = loader.load()

# print(docs)
# docs[0].page_content
# len(docs[0].page_content)
# docs.[0].page_content[:300]



# Indexing Split
Our loaded document is over 42k characters long.This is too long to fit in the context window of many models. Even for those models that could fit the full post in their context window, models can struggle to find information in very long inputs.To handle this we'll split the Document in to chunks for embedding and vector storage.This should help us retrieve only the most relevant bits of the blog post at run time

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap  = 200,
)

splitted_document = text_splitter.split_documents(docs)

len(splitted_document)
len(splitted_document[0].page_content)
#all_splits


969

# Indexing:Store

now we need to index our 66 text chunks so that we can search over them at runtime. the most common way to do this is to embed the contents of each document spilt and insert these embeddings into vector database (or vector store).

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-pro", api_key=GEMINI_API_KEY)


In [None]:
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings

vectorstore = Chroma.from_documents(
    documents=splitted_document,
    embedding=GoogleGenerativeAIEmbeddings(model="models/text-embedding-004",google_api_key=GEMINI_API_KEY)
    )

This completes the Indexing portion of the pipeline. At this point we have a query-able vector store containing the chunked contents of our blog post. Given a user question, we should ideally be able to return the snippets of the blog post that answer the question.

# Retrieval and Generation: Retrieve
Now let's write the actual application logic.We want to create a simple application that takes a user question, searches for documents relevant to that question, passes the retrieved documents and inital question to a model, and returns an answer.

In [None]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k":5})
retrievered_docs = retriever.invoke("What do you mean by  Task Decomposition?")

# retrievered_docs
retrievered_docs[0].page_content

'Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'

# Retrieval and Generation: Generate

let's put it all together into a chain that takes a question, retrieves relevant documents, constructs a prompt, passes that to a model, and parses the output

In [None]:
from langchain import hub



prompt = hub.pull("rlm/rag-prompt",api_key=LANGCHAIN_API_KEY)

example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
)

example_messages

# print(example_messages[0].content)

ChatPromptValue(messages=[HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: filler question \nContext: filler context \nAnswer:", additional_kwargs={}, response_metadata={})])

We'll use the LCEL Runnable protocol to define the chain, allowing us to

* pipe together components and functions in a trasparent way
* automatically trace our chain in LangSmith
* get streaming,async, and batched calling out of the box

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough



def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# format_docs(retrievered_docs)

    # RAG chain
rag_chain = (
    #The pipe | operator allows the output of retriever to be passed automatically as input to format_docs.
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# chain = (
#     {"context": retriever | format_docs, "question": RunnablePassthrough()}
#     | prompt
# )

# response = chain.invoke("What are the approaches to Task Decomposition?")

# response


for chunk in rag_chain.stream("How can we use Task Decomposition? and explain me this with example"):
    print(chunk, end="", flush=True)

Task decomposition is a technique used to break down complex tasks into smaller, more manageable steps. 
This can be done using LLM (Large Language Models) with simple prompting, task-specific instructions, or human inputs. 
For example, if the task is to write a novel, the task decomposition might involve breaking it down into steps such as creating an outline, writing a first draft, and editing and revising the draft.

Customizing the prompt:
* as show above we can load prompts from the prompt hub.The prompt can also be customized


In [None]:
from langchain_core.prompts import PromptTemplate

rag_prompt_template = """use the following pieces of context to answer the question.
If you dont know the answer, just say i dont know the answer.Always say "Thanks for asking"at the end of the answer:

{context}

Question: {question}
"""
custom_rag_prompt = PromptTemplate.from_template(rag_prompt_template)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

for chunk in rag_chain.stream("What is fotile?"):
   print(chunk, end="", flush=True)

I dont know the answer.
Thanks for asking