## Demo - Search Engine with Tools and Agents
**Use Case**

The limitation with LLM models is that they are trained with past data until a period. If there is need of latest information, you integrate LLM with tools that connects to external world.

Here in this usecase, we are going to build an application that connects with 3 tools:

1. ARXIV (built-in tool) that pulls data from ARXIV website that stores the research papers.

2. WikiPedia (built-in tool) that pulls information from wikipedia website.

3. RAG (a custom-tool) that pulls the data from a RAG vectorstore that stores web-scrapped contents from "https://docs.smith.langchain.com/"

We would demonstrate the agent connecting to the right tool based on user input to fetch the data.

In [None]:
## Step1: Initialize the LLM Model
from utility.llm_factory import LLMFactory
llm = LLMFactory.get_llm('openai')

from utility.embedding_factory import EmbeddingFactory

embedding_model = EmbeddingFactory.get_llm('openai')
embedding_model

In [None]:
## Step2: Import the packages
from langchain_community.tools import ArxivQueryRun, WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper, ArxivAPIWrapper

In [None]:
## Step3: Arxiv -- Research
### A tool (inbuilt tool) that allows you to search and download papers from arxiv.org
api_wrapper_arxiv = ArxivAPIWrapper(top_k_results=1, doc_dcontent_chars_max=250)
arxiv_tool = ArxivQueryRun(api_wrapper=api_wrapper_arxiv)

arxiv_tool.name 

In [None]:
## Step4: Wikipedia
### A tool (inbuilt tool) that allows you to search and download articles from Wikipedia
api_wrapper_wiki = WikipediaAPIWrapper(top_k_results=1, doc_dcontent_chars_max=250)
wiki_tool = WikipediaQueryRun(api_wrapper=api_wrapper_wiki)

wiki_tool.name 


In [None]:
## Step5: A custom RAG tool that queries a website and returns the content
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

# ✅ Use 'parse_only' instead of 'parser_only'
loader = WebBaseLoader(
    web_paths=("https://docs.smith.langchain.com/",)
)

docs = loader.load()

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

# Note: the input here is a list of Document objects, not just strings.
document_chunks = text_splitter.split_documents(docs)

document_chunks[::] 

from langchain.vectorstores import Chroma
vector_store_db = Chroma.from_documents(
    document_chunks,
    embedding_model,
    collection_name="tooltest_collection",
    persist_directory="./_data/chroma_db"
)
# Check the number of documents in the vector store
print(f"Number of documents in the vector store: {vector_store_db._collection.count()}")

retriever = vector_store_db.as_retriever(
    search_type="similarity",
    search_kwargs={
        "k": 1
    }
)

from langchain_community.tools import Tool
rag_tool = Tool(
    name="rag_tool",
    func=retriever.invoke,
    description="Search any information about langsmith."
)   
rag_tool.name


In [None]:
## Step6: Create a list of tools
tools = [arxiv_tool, wiki_tool, rag_tool]
tools

In [None]:
## Step7: Initialize the LLM that the Agent would use to decide which tool to use
from langchain_groq import ChatGroq

llm_groq = ChatGroq(model="Gemma2-9b-It", max_tokens=512)
llm_groq

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7280c8dcc990>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7280b01b1390>, model_name='Gemma2-9b-It', model_kwargs={}, groq_api_key=SecretStr('**********'), max_tokens=512)

In [None]:
## Step8: Initialize the Agent with the tools and LLM

from langchain.agents import initialize_agent, Tool
# from langchain.chat_models import ChatOpenAI

# llm = ChatOpenAI(model="gpt-4", temperature=0)

# Create the agent
agent = initialize_agent(
    tools=tools,
    llm=llm_groq,
    agent="zero-shot-react-description",  # Let the LLM reason and act
    verbose=True
)

agent.invoke("Who is aurangzeb? What is his contribution to the Mughal Empire?")

In [31]:
## Step9: Test the Agent with a query
response = agent.invoke("What is the latest research on quantum computing?")
print(response) 




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To find the latest research on quantum computing, I should use arxiv. 
Action: arxiv
Action Input: quantum computing[0m
Observation: [36;1m[1;3mPublished: 2022-08-01
Title: The Rise of Quantum Internet Computing
Authors: Seng W. Loke
Summary: This article highlights quantum Internet computing as referring to
distributed quantum computing over the quantum Internet, analogous to
(classical) Internet computing involving (classical) distributed computing over
the (classical) Internet. Relevant to quantum Internet computing would be areas
of study such as quantum protocols for distributed nodes using quantum
information for computations, quantum cloud computing, delegated verifiable
blind or private computing, non-local gates, and distributed quantum
applications, over Internet-scale distances.[0m
Thought:[32;1m[1;3mThought: This looks like a good starting point. I could explore other recent papers on arxiv to get a

In [32]:

## Step10: Test the Agent with a query
agent.invoke("Who is aurangzeb? What is his contribution to the Mughal Empire?")
print(response) 



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to look up Aurangzeb on Wikipedia to get information about him and his contributions to the Mughal Empire. 
Action: wikipedia
Action Input: Aurangzeb[0m
Observation: [33;1m[1;3mPage: Aurangzeb
Summary: Alamgir I (Muhi al-Din Muhammad; 3 November 1618 – 3 March 1707), commonly known by the title Aurangzeb, was the sixth Mughal emperor, reigning from 1658 until his death in 1707. Under his reign, the Mughal Empire reached its greatest extent, with territory spanning nearly the entirety of the Indian subcontinent.
Aurangzeb and the Mughals belonged to a branch of the Timurid dynasty.  He held administrative and military posts under his father Shah Jahan (r. 1628–1658) and gained recognition as an accomplished military commander. Aurangzeb served as the viceroy of the Deccan in 1636–1637 and the governor of Gujarat in 1645–1647. He jointly administered the provinces of Multan and Sindh in 1648–1652 and continue

In [33]:

## Step11: Test the Agent with a query
agent.invoke("What is langsmith and what is the use of it?")
print(response)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to search for information about "langsmith". 
Action: rag_tool
Action Input: langsmith[0m
Observation: [38;5;200m[1;3m[Document(metadata={'description': 'LangSmith is a platform for building production-grade LLM applications.', 'language': 'en', 'source': 'https://docs.smith.langchain.com/', 'title': 'Get started with LangSmith | 🦜️🛠️ LangSmith'}, page_content='LangSmith is a platform for building production-grade LLM applications.\nIt allows you to closely monitor and evaluate your application, so you can ship quickly and with confidence.\nObservabilityAnalyze traces in LangSmith and configure metrics, dashboards, alerts based on these.EvalsEvaluate your application over production traffic — score application performance and get human feedback on your data.Prompt EngineeringIterate on prompts, with automatic version control and collaboration features.')][0m
Thought:[32;1m[1;3mThought: I have found the d

## Using Prompt Hub with Agent
Hub is a pre-defined list of prompt templates

In [45]:
## Another way to create the agent

from langchain import hub
prompt = hub.pull("hwchase17/openai-functions-agent")
prompt.messages
# Ensure you have the correct prompt for the agent

from langchain.agents import create_openai_tools_agent
agent = create_openai_tools_agent(
    llm=llm_groq,
    tools=tools,
    prompt=prompt
)
agent

RunnableAssign(mapper={
  agent_scratchpad: RunnableLambda(lambda x: format_to_openai_tool_messages(x['intermediate_steps']))
})
| ChatPromptTemplate(input_variables=['agent_scratchpad', 'input'], optional_variables=['chat_history'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annotated[langchain_core.messages

In [46]:
## Agent Executor
from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True
)   
agent_executor

AgentExecutor(verbose=True, agent=RunnableMultiActionAgent(runnable=RunnableAssign(mapper={
  agent_scratchpad: RunnableLambda(lambda x: format_to_openai_tool_messages(x['intermediate_steps']))
})
| ChatPromptTemplate(input_variables=['agent_scratchpad', 'input'], optional_variables=['chat_history'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag

In [47]:
## Execute the agent with a query
response = agent_executor.invoke({"input": "What is the latest research on quantum computing?"})
print(response)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `arxiv` with `{'query': 'quantum computing'}`


[0m[36;1m[1;3mPublished: 2022-08-01
Title: The Rise of Quantum Internet Computing
Authors: Seng W. Loke
Summary: This article highlights quantum Internet computing as referring to
distributed quantum computing over the quantum Internet, analogous to
(classical) Internet computing involving (classical) distributed computing over
the (classical) Internet. Relevant to quantum Internet computing would be areas
of study such as quantum protocols for distributed nodes using quantum
information for computations, quantum cloud computing, delegated verifiable
blind or private computing, non-local gates, and distributed quantum
applications, over Internet-scale distances.[0m[32;1m[1;3mQuantum Internet computing is a rapidly developing field with many potential applications. It involves distributed quantum computing over the quantum Internet, analogous to classical interne

In [43]:
## Execute the agent with a query
response = agent_executor.invoke({"input": "What is machine learning?"})
print(response)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `wikipedia` with `{'query': 'machine learning'}`


[0m[33;1m[1;3mPage: Machine learning
Summary: Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalise to unseen data, and thus perform tasks without explicit instructions. Within a subdiscipline in machine learning, advances in the field of deep learning have allowed neural networks, a class of statistical algorithms, to surpass many previous machine learning approaches in performance.
ML finds application in many fields, including natural language processing, computer vision, speech recognition, email filtering, agriculture, and medicine. The application of ML to business problems is known as predictive analytics.
Statistics and mathematical optimisation (mathematical programming) methods comprise the foundations of machine learning. Data mini