##### Creating a multi-search agent tool to help students revise for exams, using a pdf or google search

In [111]:
# Create google search tool

import os
from dotenv import load_dotenv

load_dotenv()
os.environ['GOOGLE_API_KEY'] = os.getenv("GOOGLE_API_KEY")
os.environ["GOOGLE_CSE_ID"] = os.getenv("GOOGLE_CSE_ID")

In [112]:
from langchain_core.tools import Tool
from langchain_google_community import GoogleSearchAPIWrapper

search = GoogleSearchAPIWrapper()

search_tool = Tool(
    name='google_search',
    description='Search google for recent results',
    func=search.run,
)

In [113]:
search_tool.run("Kenya's current president")

"His Excellency DR. William Samoei Ruto, C.G.H.. Address. P.O. Box 40530 – 00100. Nairobi. Tel.: 020-2227436. Email: feedback@president.go\xa0... William Kipchirchir Samoei Arap Ruto CGH (born 21 January 1967) is a Kenyan politician who is the fifth and current president of Kenya since 13 September\xa0... Jun 25, 2024 ... After a day of protest, turmoil and bloodshed, Kenyan President William Ruto addressed the nation with a message of sadness and strength. The country's current president is William Ruto since 13 September 2022. President of the Republic of Kenya. Rais wa Jamhuri ya Kenya. Incumbent William Ruto. Sep 14, 2022 ... Kenya's newly elected president William Ruto said that climate change will be key to the government's agenda and made an ambitious pledge to ramp up clean\xa0... Jan 22, 2025 ... ... new constitution in 2010 devolving some federal powers and funding to Kenya's ... Kenyan President Ruto is the current Chairperson of the EAC. Aug 7, 2024 ... Protests against Ken

In [114]:
# The pdfloader
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_chroma import Chroma
import chromadb

os.makedirs("./chroma_db/pdf_data", exist_ok=True)
os.makedirs("./chroma_db/web_data", exist_ok=True)

loader = PyPDFDirectoryLoader(path='./rag_data/')
data = loader.load()
splitted_docs = RecursiveCharacterTextSplitter(chunk_size=1000,
                                               chunk_overlap=200).split_documents(data)
# Create ChromaDB client for PDF data
pdf_client = chromadb.PersistentClient(path="./chroma_db/pdf_data")

# Create the collection
try:
    collection = pdf_client.get_or_create_collection("pdf_docs")
    print(f"Collection 'pdf_docs' ready")
except Exception as e:
    print(f"Error creating collection: {e}")

# Create Chroma vector store for PDF documents
pdf_db = Chroma(
    client=pdf_client,
    collection_name="pdf_docs",
    embedding_function=GoogleGenerativeAIEmbeddings(model='models/text-embedding-004'),
)

# Add documents
pdf_db.add_documents(splitted_docs)


docs_retreiver = pdf_db.as_retriever()
docs_retreiver

Collection 'pdf_docs' ready


VectorStoreRetriever(tags=['Chroma', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x000001F52D40C430>, search_kwargs={})

In [115]:
pdf_db.similarity_search('What are open source apps')

[Document(id='c8ef49f7-7532-452d-a055-3a21b99547f2', metadata={'author': 'Samsung', 'creationdate': '2025-03-23T01:02:05+03:00', 'creator': 'Microsoft® Word LTSC', 'moddate': '2025-03-23T01:02:05+03:00', 'page': 2, 'page_label': '3', 'producer': 'Microsoft® Word LTSC', 'source': 'rag_data\\week 8.pdf', 'total_pages': 14}, page_content='Open source software can be used, modified and distributed by anyone who has the knowledge \nto work with code. \nBusinesses are constantly searching for digital solutions to help them run more efficiently and \nturn bigger profits faster. \nAnd one common term they may or may not have heard of that can further this agenda is open-\nsource software. \nIn this article, you will find out what open source software is and get familiar with the most \nrequired types. \nWhat’s more, we will also discover the best open source software examples of 2021. \nTable of Contents \n• What Is Open Source Software? \n• Open Source Software vs. Free Source Software  \n• T

In [116]:
# Convert this retreiver into a tool
from langchain.tools.retriever import create_retriever_tool
pdfs = create_retriever_tool(
    retriever=docs_retreiver,
    name='pdf retreiver',
    description='Get educational content about open source applications'
)

In [117]:
pdfs.name

'pdf retreiver'

In [118]:
# Create a webloader 
from langchain_community.document_loaders import WebBaseLoader

webLoader = WebBaseLoader(
    web_path='https://e-learning.embuni.ac.ke/course/view.php?id=14314'
)

content = webLoader.load()
content

[Document(metadata={'source': 'https://e-learning.embuni.ac.ke/course/view.php?id=14314', 'title': 'Log in to the site | UoEm Elearning Portal', 'language': 'en'}, page_content='\n\n\nLog in to the site | UoEm Elearning Portal\n\n\n\n\n\n\n\n\n\n\n\n\nSkip to main content\n\n\n\n\n\n\n\nSide panel\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHome\nCoursesBACHELOR OF SCIENCE IN ANIMAL PRODUCTION AND NUTRITIONAgricultural Extension and Technology Transfer\nAgricultural Marketing\nAgricultural Policy and Law\nANALYTICAL METHODS IN ANIMAL SCIENCE\nAnimal Breeding\nAnimal Climatology\n\nBACHELOR OF ARTS IN MEDIA AND COMMUNICATION STUDIESPOLITICAL ECONOMY OF THE MEDIA SEMESTER 2 2024/2025\nMEDIA SEMIOTICS SEMESTER 2 2024/2025\nRADIO PRODUCTION SEMESTER 2 2024/2025\nADVANCED ADVERTISING TECHNIQUES SEMESTER 2 2025/2026\nPSYCHOLOGY OF COMMUNICATION SEMESTER 2 2024/2025\nREADING MEDIA FORMS SEMESTER 2 2024/2025\n\nBACHELOR OF SCIENCE IN COMPUTER SCIENCE2024/25 SEM

In [119]:
splitted_web_docs = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200).split_documents(content)
# Create ChromaDB client for web data
web_client = chromadb.PersistentClient(path="./chroma_db/web_data")

# Create the collection
try:
    web_collection = web_client.get_or_create_collection("web_docs")
    print(f"Collection 'web_docs' ready")
except Exception as e:
    print(f"Error creating collection: {e}")

# Create Chroma vector store for web documents
web_db = Chroma(
    client=web_client,
    collection_name="web_docs",
    embedding_function=GoogleGenerativeAIEmbeddings(model='models/text-embedding-004'),
)

# Add documents
web_db.add_documents(splitted_web_docs)


web_retriever = web_db.as_retriever()
web_retriever

Collection 'web_docs' ready


VectorStoreRetriever(tags=['Chroma', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x000001F52D40B340>, search_kwargs={})

In [120]:
web_tool = create_retriever_tool(
    retriever=web_retriever,
    name='web_retriever',
    description='Get Unit purpose and description'
)

web_tool

Tool(name='web_retriever', description='Get Unit purpose and description', args_schema=<class 'langchain_core.tools.retriever.RetrieverInput'>, func=functools.partial(<function _get_relevant_documents at 0x000001F51ECB4EE0>, retriever=VectorStoreRetriever(tags=['Chroma', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x000001F52D40B340>, search_kwargs={}), document_prompt=PromptTemplate(input_variables=['page_content'], input_types={}, partial_variables={}, template='{page_content}'), document_separator='\n\n', response_format='content'), coroutine=functools.partial(<function _aget_relevant_documents at 0x000001F51EFCB7F0>, retriever=VectorStoreRetriever(tags=['Chroma', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x000001F52D40B340>, search_kwargs={}), document_prompt=PromptTemplate(input_variables=['page_content'], input_types={}, partial_variables={}, template='{page_content}'), document_se

In [121]:
tools = [pdfs,
         search_tool,
         web_tool]

tools

[Tool(name='pdf retreiver', description='Get educational content about open source applications', args_schema=<class 'langchain_core.tools.retriever.RetrieverInput'>, func=functools.partial(<function _get_relevant_documents at 0x000001F51ECB4EE0>, retriever=VectorStoreRetriever(tags=['Chroma', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x000001F52D40C430>, search_kwargs={}), document_prompt=PromptTemplate(input_variables=['page_content'], input_types={}, partial_variables={}, template='{page_content}'), document_separator='\n\n', response_format='content'), coroutine=functools.partial(<function _aget_relevant_documents at 0x000001F51EFCB7F0>, retriever=VectorStoreRetriever(tags=['Chroma', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x000001F52D40C430>, search_kwargs={}), document_prompt=PromptTemplate(input_variables=['page_content'], input_types={}, partial_variables={}, template='{page_

### Create multi search query agent

In [128]:
# Call prompts from langchain hub
from langchain import hub
from langchain_google_genai import ChatGoogleGenerativeAI

from langchain.memory import ChatMessageHistory
from langchain.prompts import PromptTemplate

# Define a more structured prompt template
prompt = PromptTemplate.from_template("""You are an assistant tasked with finding information. 
You have access to the following tools:

{tools}

You must follow this exact format:

Question: the input question you must answer
Thought: your reasoning about what to do next
Action: the tool name to use (must be one of: {tool_names})
Action Input: the input to pass to the tool
Observation: the result from the tool
... (you can repeat the Thought/Action/Action Input/Observation steps multiple times)
Thought: your final reasoning
Final Answer: your final answer to the question

Begin! Remember to follow the format exactly!

Question: {input}
{agent_scratchpad}
""")

memory = ChatMessageHistory(session_id="test-session")

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

In [132]:
# Create the agent - will choose a sequence of actions to take using the tools based on the query
from langchain.agents import create_structured_chat_agent, create_react_agent

agent = create_react_agent(
    llm=llm,
    tools=tools,
    prompt=prompt
)

In [None]:
# Executor to execute the agent
from langchain.agents import AgentExecutor
from langchain_core.runnables.history import RunnableWithMessageHistory

executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    handle_parsing_errors=True
)

agent_with_chat_history = RunnableWithMessageHistory(
    executor,
    # This is needed because in most real world scenarios, a session id is needed
    # It isn't really used here because we are using a simple in memory ChatMessageHistory
    lambda session_id: memory,
    input_messages_key="input",
    history_messages_key="chat_history",
)

In [134]:
agent_with_chat_history.invoke({'input': "What is the objective of this unit?"},
                               config={"configurable": {"session_id": "<foo>"}},)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find the objective of the unit. I have a tool called web_retriever that is designed to get the unit purpose and description. I will use this tool to find the objective of the unit.
Action: web_retriever
Action Input: objective of the unit[0m[38;5;200m[1;3mUoeM and Estonia Course
Supporting the transformation and implementation of competence-based teacher education curricula and processes in Kenya 2024

BACHELOR OF LIBRARY AND INFORMATION SCIENCE
PEOJECT MANAGEMENT SEMESTER 2 2024/2025
SCHOLARLY JOURNALS PUBLISHING SEMESTER 2 2024/2025
ACADEMIC LIBRARIES SEMESTER 2 2024/2025
PRINCIPLES OF ARCHIVES ADMINISTRATION SEMESTER 2 2024/2025
PRESERVATION AND CONSERVATION OF INFORMATION SEMESTER 2 2024/2025
DISASTER MANAGEMENT IN INFORMATION SCIENCE SEMESTER 2 2024/2025

DIPLOMA IN HOSPITALITY AND TOURISM MANAGEMENT
MARKETING PRINCIPLES IN HOSPITALITY AND TOURISM INDUSTRY
PROPERTY MANAGEMENT AND HOUSEKEEPING 
HOSPI

{'input': 'What is the objective of this unit?',
 'chat_history': [],
 'output': 'A unit objective is a way to establish and articulate academic expectations for students so they know precisely what is expected of them.'}

In [135]:
agent_with_chat_history.invoke({'input': "How should I work towards learning this?"},
                               config={"configurable": {"session_id": "<foo>"}},)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To provide guidance on learning, I need to understand what "this" refers to. Since I've been given access to tools for open-source applications and unit descriptions, I'll assume "this" refers to learning about open-source applications. I will start by using the pdf retriever tool to get educational content about open-source applications.
Action: pdf retreiver
Action Input: learning open source applications[0m[36;1m[1;3mSIT 223: Open Source Applications 
Purpose of the course  
The purpose of the course is to introduce students to the concepts of Open Source software, giving 
a brief history of the movement and examining current issues in development and use of open 
source software.  
Expected Learning Outcomes of the Course 
At the end of this course/unit, the learner should be able to: 
i. Explain how the Open Source movement has arisen, what it is, how it works; 
ii. Describe some of the recent contributions m

{'input': 'How should I work towards learning this?',
 'chat_history': [HumanMessage(content='What is the objective of this unit?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='A unit objective is a way to establish and articulate academic expectations for students so they know precisely what is expected of them.', additional_kwargs={}, response_metadata={})],
 'output': 'To learn about open-source applications, start by understanding the history, concepts, and recent contributions of the open-source movement. You can gain practical skills in using GNU/Linux operating systems, editing tools, libraries, and utilities. Learn to use concurrent version control systems for software project management. Consider contributing to open-source projects by fixing formatting, style, and grammar in documentation. There are also free courses available that introduce the key concepts of developing open-source software.'}

In [None]:
print(prompt.template)

Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}
