<a href="https://colab.research.google.com/github/Ayesha-Imr/Gen-AI-Projects/blob/main/RAG_with_LLM_Agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Setting up

Let's download the necessary libraries first.

In [None]:
!pip install --quiet langchain openai weaviate-client langchain_community pypdf tiktoken langchain_openai

Now, we'll import userdata in order to get access to the API keys required - openAI API Key and Serper API Key - and set them as environment variables.

In [None]:
import os
from google.colab import userdata

In [None]:
def _set_env(var: str):
    if os.environ.get(var):
        return
    os.environ[var] = userdata.get(var)


_set_env("OPENAI_API_KEY")

_set_env("SERPER_API_KEY")

## Documents preprocessing

Lets load the documents to be used for the RAG tool from google drive. I have made PDFs out of some blogs related to RAG from Data Science Dojo and saved the PDFs in my google drive in a folder.
Following are the links to the blogs I have used:

https://datasciencedojo.com/blog/rag-with-llamaindex/

https://datasciencedojo.com/blog/llm-with-rag-approach/

https://datasciencedojo.com/blog/efficient-database-optimization/

https://datasciencedojo.com/blog/rag-llm-and-finetuning-a-guide/

https://datasciencedojo.com/blog/rag-vs-finetuning-llm-debate/

https://datasciencedojo.com/blog/challenges-in-rag-based-llm-applications/

In [None]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


Now we'll use Langchain's PyPDFLoader to extract textual content from each pdf and split it into chunks based on pages (each page is a separate chunk).

In [None]:
from langchain_community.document_loaders import PyPDFLoader

# Path to the folder containing PDFs
folder_path = '/content/drive/My Drive/RAG_blogs'

# List all PDF files in the folder
pdf_files = [f for f in os.listdir(folder_path) if f.endswith('.pdf')]

# Initialize an empty list to store all pages from all PDFs
all_pages = []

# Process each PDF file
for pdf_file in pdf_files:
    pdf_path = os.path.join(folder_path, pdf_file)
    loader = PyPDFLoader(pdf_path)

    # Load and split pages
    pages = loader.load_and_split()

    # Append each page to the all_pages list
    all_pages.extend(pages)

    # Output to confirm each file is processed
    print(f"Processed {pdf_file}, number of pages loaded: {len(pages)}")

# After processing all files, check the total number of pages collected
print(f"Total number of pages collected from all PDFs: {len(all_pages)}")


Processed Optimize RAG efficiency with LlamaIndex.pdf, number of pages loaded: 5
Processed Retrieval augmented generation.pdf, number of pages loaded: 6
Processed Database Optimization.pdf, number of pages loaded: 6
Processed RAG and finetuning.pdf, number of pages loaded: 9
Processed RAG vs finetuning.pdf, number of pages loaded: 7
Processed 12 Challenges in Building Production.pdf, number of pages loaded: 9
Total number of pages collected from all PDFs: 42


Let's check the contents of the first element of all_pages.

In [None]:
print(all_pages[0])

page_content='Optimize RAG efficiency with LlamaIndex: The perfect chunk size   \n \nMuhammad Jan  \nOctober 31  \nRAG integration revolutionized search with LLM, boosting dynamic retrieval.  \nWithin the implementation of a RAG system, a piv otal factor governing its efficiency and \nperformance lies in the determination of the optimal chunk size.  How does one identify the most \neffective chunk size for seamless and efficient retrieval? This is precisely where the comprehensive \nassessment provide d by the LlamaIndex Response Evaluation tool becomes invaluable.  \nIn this article, we will provide a comprehensive walkthrough, enabling you to discern the ideal chunk \nsize through the powerful features of LlamaIndex’s Response Evaluation module.   \n  \nWhy chu nk size matters  in RAG system  \nSelecting the appropriate chunk size is a crucial determination that holds sway over the \neffectiveness and precision of a RAG system in various ways:   \n  \n \n  \n  \nPertinence and detail

## Embedding and Indexing through Weaviate

Now we'll use weaviate client to embed the PDF text chunks using OpenAI's embeddings and store them in Weaviate vectorstore.

In [None]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Weaviate
import weaviate
from weaviate.embedded import EmbeddedOptions

client = weaviate.Client(
  embedded_options = EmbeddedOptions()
)

vectorstore = Weaviate.from_documents(
    client = client,
    documents = all_pages,
    embedding = OpenAIEmbeddings(),
    by_text = False
)

embedded weaviate is already listening on port 8079


  warn_deprecated(


Embedded weaviate wasn't listening on ports http:8079 & grpc:50060, so starting embedded weaviate again
Started /root/.cache/weaviate-embedded: process ID 1013


Now we're going to define our retriever.

In [None]:
retriever = vectorstore.as_retriever()

## Defining Tools

Next we're going to define our tools. First is the retrieval tool for RAG. It will be used to answer user queries related to RAG by fetching relevant information from the vectorstore.

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.tools import tool
from langchain.prompts import PromptTemplate
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.tools.retriever import create_retriever_tool
from langchain.agents import AgentType, Tool, initialize_agent
from langchain_community.utilities import GoogleSerperAPIWrapper
from langchain_openai import OpenAI

retrieve_tool = create_retriever_tool(
    retriever,
    name="RAG_Blogs_Search",
    description="""Fetch information relevant to the user query from a vector store of blogs related RAG (Retrieval Augmented Generation). query: {query}""",
)





Next, we have a web search tool using Google Serper API which will be used to answer queries unrelated to RAG.

In [None]:
search = GoogleSerperAPIWrapper()

search_tool = Tool(
        name="Web_Search",
        func=search.run,
        description="Search the web to answer the user query. query: {query}",
    )

Adding the tools to the the tools list.

In [None]:
tools = [retrieve_tool, search_tool]

## Setting up the Agent

Lets create the prompt template for the agent.

In [None]:
prompt = PromptTemplate(
    template="""Use the tools given to you to answer the user query.
    If it is a question related to RAG (Retrieval Augmented Generation), then use the retrive_tool to get information about RAG from Data Science Dojo's blogs.
    If the query is about something else, use the web_search tool to get the answer, no need to use the retrieve_tool in that case.
    Be concise and keep your response limited to 2-3 sentences. Use simple wordings.
    Agent Scratchpad: {agent_scratchpad}
    Query: {query}""",
    input_variables=["query"])

For the LLM, we're using gpt-4 for best results because I found that gpt-3.5 struggles withcalling correct tools and would go back and forth between the two tools needlessly.

In [None]:
llm = ChatOpenAI(model_name="gpt-4", temperature=0)

Let's create the agent with our defined LLM, prompt and tools.

In [None]:
agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)


## Invoking the Agent

Lets start by asking a question related to RAG and observe the response trace.

In [None]:
response = agent_executor.invoke({"query": "What are some challenges in RAG?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `RAG_Blogs_Search` with `{'query': 'challenges in RAG'}`


[0m[36;1m[1;3mdiscussed potential solutions, highlighti ng various techniques and tools that developers can leverage 
to optimize RAG system performance and ensure accurate, reliable, and secure responses.  
By addressing these challenges, RAG systems can unlock their full potential and become a powerful 
tool for enhancing the accuracy and effectiveness of LLMs across various applications.

12 Challenges in Building Production -Ready RAG based LLM Applications  
 
Fiza Fatima  
March 29  
  
  
  
  
Large Language Models are growing smarter, transforming how we interact with technology. Yet, 
they stumble over a significant quality i.e.  accuracy . Often, they provide unreliable information or 
guess answers to questions they don’t understand —guesses that  can be completely wrong.  Read 
more  
This issue is a major concern for enterprises looking to 

Now lets see what happens if we ask a question unrelated to RAG.

In [None]:
response = agent_executor.invoke({"query": "What is Data Science Dojo?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Web_Search` with `What is Data Science Dojo?`


[0m[33;1m[1;3mData Science Dojo: Educational institution in Bellevue, Washington. Data Science Dojo Address: 2331 130th Ave NE, Bellevue, WA 98005. Data Science Dojo Hours: Closed ⋅ Opens 8 AM. Data Science Dojo Phone: (877) 360-3442. "Extremely good bootcamp whereby you get multiple flavors of data science from theory to the practical applications. An intensive one-week exercise with ... Data Science Dojo is an e-learning company that is redefining the data science, large language models, and generative AI education landscape with a simpler, ... Data Science Dojo believes that anyone can learn data science, and provides comprehensive, hands-on training that helps students jump into practical data ... Data Science Dojo is redefining the education landscape of data science, large language models, and generative AI education landscape with a ... Text data, with its 