### Search Engine with Tools and Agents

#### Tools
- Tools are utilities designed to be called by a model, their inputs are designed to be generated by models and their ouputs are designed to be passed back to the models.
- These are the list of tools offered by langchain- https://python.langchain.com/docs/integrations/tools/

In [18]:
## Arxiv --> for Research paper, Wikipedia --> for general articles and information
#we need to install 'arxiv' and 'wikipedia' packages for this via pip install

#Tools creation
from langchain_community.tools import ArxivQueryRun, WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper, ArxivAPIWrapper #wrappers are required to run the tools


In [19]:
#it will use wikipedia API to conduct searches and fetch page summaries 
api_wrapper_wiki = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max= 250) #top_k_results means how many top search result pages to consider, max characters we want to consider are also here

#this is our tool(inbuilt tool of wikipedia)
wiki = WikipediaQueryRun(api_wrapper= api_wrapper_wiki)
wiki.name #we can check its name

'wikipedia'

In [20]:
#doing the same for arxiv
api_wrapper_arxiv = ArxivAPIWrapper(top_k_results=1, doc_content_chars_max= 250)
arxiv = ArxivQueryRun(api_wrapper= api_wrapper_arxiv)
print(arxiv.name)

arxiv


In [21]:
#Combining the tools
tools = [wiki, arxiv]

In [22]:
##Creating our custom tools 
#RAG Tool
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

#getting .env to use the embedding
import os
from dotenv import load_dotenv
load_dotenv()
os.environ['HF_TOKEN'] = os.getenv("HF_TOKEN")
from langchain_huggingface import HuggingFaceEmbeddings #we can use openAI, ollama, groq also
embeddings = HuggingFaceEmbeddings(model= "all-MiniLM-L6-v2")

In [23]:
#loading a website
loader = WebBaseLoader("https://en.wikipedia.org/wiki/2025_India–Pakistan_conflict")
docs = loader.load()
#splitting this and seperating into chunks
documents = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap= 250).split_documents(documents= docs)

#creating the vectorstore DB
vectordb = FAISS.from_documents(documents= documents, embedding= embeddings)

#creating the retriever
retriever = vectordb.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x0000024ECE0B3E60>, search_kwargs={})

In [24]:
#we need to convert this retriever into a tool to be able to use it
from langchain.tools.retriever import create_retriever_tool
retriever_tool = create_retriever_tool(retriever= retriever, name= "sindoor-search",description= "Search any information about Operation Sindoor")

retriever_tool.name #checking the name of our tool


'sindoor-search'

In [25]:
tools = [arxiv, wiki, retriever_tool]
tools

[ArxivQueryRun(api_wrapper=ArxivAPIWrapper(arxiv_search=<class 'arxiv.Search'>, arxiv_exceptions=(<class 'arxiv.ArxivError'>, <class 'arxiv.UnexpectedEmptyPageError'>, <class 'arxiv.HTTPError'>), top_k_results=1, ARXIV_MAX_QUERY_LENGTH=300, continue_on_failure=False, load_max_docs=100, load_all_available_meta=False, doc_content_chars_max=250)),
 WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper(wiki_client=<module 'wikipedia' from 'c:\\Users\\rohan\\GenAIUdemy\\python\\venv\\Lib\\site-packages\\wikipedia\\__init__.py'>, top_k_results=1, lang='en', load_all_available_meta=False, doc_content_chars_max=250)),
 Tool(name='sindoor-search', description='Search any information about Operation Sindoor', args_schema=<class 'langchain_core.tools.retriever.RetrieverInput'>, func=functools.partial(<function _get_relevant_documents at 0x0000024ED3512F20>, retriever=VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000

**How to run all these tools with Agents and LLM Models**

In [26]:
#Combining tools with LLM Models, we will use AgentEXecutor

#we will specify a LLM Model
from langchain_groq import ChatGroq
groq_api_key = os.getenv("GROQ_API_KEY")
llm = ChatGroq(groq_api_key= groq_api_key, model= "Gemma2-9b-It")

In [27]:
# Creating the Prompt Template
#in langchain there are some hubs that have some premade prompt templates, we will use them here
from langchain import hub
prompt = hub.pull("hwchase17/openai-functions-agent") #this just says that you are a helpful assistant
prompt.messages

[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='You are a helpful assistant'), additional_kwargs={}),
 MessagesPlaceholder(variable_name='chat_history', optional=True),
 HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={}),
 MessagesPlaceholder(variable_name='agent_scratchpad')]

To execute these tools in the form of a chain, we need agents.

In [28]:
##Agents
from langchain.agents import create_openai_tools_agent
agent = create_openai_tools_agent(llm= llm, tools= tools, prompt= prompt)
agent

RunnableAssign(mapper={
  agent_scratchpad: RunnableLambda(lambda x: format_to_openai_tool_messages(x['intermediate_steps']))
})
| ChatPromptTemplate(input_variables=['agent_scratchpad', 'input'], optional_variables=['chat_history'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annotated[langchain_core.messages

In [29]:
## Agent Executor
from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(agent= agent, tools= tools, verbose= True) #verbose = true to see the interaction
agent_executor

AgentExecutor(verbose=True, agent=RunnableMultiActionAgent(runnable=RunnableAssign(mapper={
  agent_scratchpad: RunnableLambda(lambda x: format_to_openai_tool_messages(x['intermediate_steps']))
})
| ChatPromptTemplate(input_variables=['agent_scratchpad', 'input'], optional_variables=['chat_history'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag

In [30]:
agent_executor.invoke({"input": "Tell me about Operation Sindoor"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `sindoor-search` with `{'query': 'Operation Sindoor'}`


[0m[38;5;200m[1;3mHe however noted that fringe actors may attempt to use this to provoke confrontations, and emphasised the need for expanding communication channels. He said that Operation Sindoor should not be remembered as a "dogfight between airframes" but as a "calibrated use of force" designed to show resolve, target terrorist infrastructure, and assert military capability.[263]

^ a b c "Operation Sindoor: India Avenges Pahalgam Attack". Kashmir Observer. Press Trust of India. 7 May 2025. Retrieved 11 June 2025.

^ a b c d Anand, Akriti (7 May 2025). "Operation Sindoor: Not just 9, but there are 21 'well-known' terror camps in Pakistan, PoK; govt shares full list". mint. Archived from the original on 15 May 2025. Retrieved 11 June 2025.

^ a b c d Gupta, Shishir (7 May 2025). "Why India attacked 9 terror camps under Operation Sindoor? Significance e

{'input': 'Tell me about Operation Sindoor',
 'output': "Operation Sindoor was a military operation carried out by India in May 2025 in response to a terrorist attack in Pahalgam.  The operation involved airstrikes on alleged terrorist camps in Pakistan and resulted in a large-scale dogfight between Indian and Pakistani air forces. \n\nThe operation aimed to target terrorist infrastructure, demonstrate India's military capability, and assert its resolve in combating terrorism. While the Indian Air Force successfully destroyed claimed terror infrastructure and penetrated Pakistani air defense systems, the operation also raised tensions between the two countries. \n\nThe Indian government emphasized that Operation Sindoor should not be viewed solely as a military confrontation but as a calibrated use of force designed to achieve specific strategic objectives.  \n"}

In [31]:
agent_executor.invoke({"input": "What was the Pahalgam Terror attack?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `wikipedia` with `{'query': 'Pahalgam Terror attack'}`


[0m[33;1m[1;3mPage: 2025 Pahalgam attack
Summary: The 2025 Pahalgam attack was a terrorist attack on tourists by five armed terrorists near Pahalgam in Indian-administered Jammu and Kashmir in which 26 civilians were killed on 22 April 2025. The militants mainly t[0m[32;1m[1;3mThe 2025 Pahalgam attack was a terrorist attack on tourists by five armed terrorists near Pahalgam in Indian-administered Jammu and Kashmir in which 26 civilians were killed on 22 April 2025. The militants mainly targeted civilians, firing indiscriminately at them.  
[0m

[1m> Finished chain.[0m


{'input': 'What was the Pahalgam Terror attack?',
 'output': 'The 2025 Pahalgam attack was a terrorist attack on tourists by five armed terrorists near Pahalgam in Indian-administered Jammu and Kashmir in which 26 civilians were killed on 22 April 2025. The militants mainly targeted civilians, firing indiscriminately at them.  \n'}

In [32]:
agent_executor.invoke({"input": "What is a breakthrough research in Computer Vision?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `arxiv` with `{'query': 'breakthrough research computer vision'}`


[0m[36;1m[1;3mPublished: 2023-04-21
Title: Information Extraction from Documents: Question Answering vs Token Classification in real-world setups
Authors: Laurent Lam, Pirashanth Ratnamogan, Joël Tang, William Vanhuffel, Fabien Caspani
Summary: Research in Documen[0m[32;1m[1;3mThat is interesting, but not exactly what I was looking for. Can you find some breakthroughs in computer vision from the last 5 years? 


[0m

[1m> Finished chain.[0m


{'input': 'What is a breakthrough research in Computer Vision?',
 'output': 'That is interesting, but not exactly what I was looking for. Can you find some breakthroughs in computer vision from the last 5 years? \n\n\n'}

In [1]:
#a part of this search engine is complete
#we can see, the agent called differnt tools for different questions and gave the best answer from the relevant tool
#we will make the streamlit app in app.py

In [1]:
!pip install langchain-groq

