# Custom Search Tools for Retrieval Agents

## Description

**Goals:**

* Learn how to create tools that LLMs can use to answer questions
* Create a ReAct agent
* Get a satisfactory response

**Learning objectives:**

* Apply the `@tool` decorator to create a tool for an LLM agent, from a Python function
* Evaluate the quality of responses from the tools' results
* Create an Agent using [LCEL](https://python.langchain.com/docs/expression_language/why "Why Use LCEL?") chain.


In [1]:
%pip install -Uq langchain langchainhub huggingface_hub transformers jinja2 numexpr langchain-openai langchain-anthropic

Note: you may need to restart the kernel to use updated packages.


## [@tool decorator](https://python.langchain.com/docs/modules/agents/tools/custom_tools "Learn more")

The `@tool` decorator is the simplest way to define a custom tool. The decorator uses the function name as the tool name by default, but this can be overridden by passing a string as the first argument. Additionally, the decorator will use the function's docstring as the tool's description - so a **docstring** ***MUST*** be provided.

Ex.

```python
@tool
def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b
```

We can access the tool's properties directly
```python
print(multiply.name)
print(multiply.description)
print(multiply.args)
```

```shell
multiply
multiply(a: int, b: int) -> int - Multiply two numbers.
{'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}}
```

Later we'll be able to observe the difference in output format depending on whether we call our functions with the `@tool` decorator as regular functions, or as tools.

In [2]:
from langchain.tools import tool, BaseTool

## [Arxiv Search](https://python.langchain.com/docs/integrations/tools/arxiv "Learn more")

In [3]:
%pip install -Uq arxiv

Note: you may need to restart the kernel to use updated packages.


In [4]:
from langchain.utilities import ArxivAPIWrapper

In [5]:
@tool("arxiv_search")
def arxiv_search(query: str) -> str:
    """
    This function uses the ArxivAPIWrapper to search for scientific papers on Arxiv.

    Parameters:
    query (str): The search term to use in the query.

    Returns:
    str: The search results from Arxiv.
    """
    # Create an instance of the ArxivAPIWrapper
    arxiv = ArxivAPIWrapper()

    # Perform the search query
    results = arxiv.run(query)

    # Return the search results
    return results

In [6]:
print(arxiv_search("2306.11984"))

Published: 2023-06-21
Title: TauPETGen: Text-Conditional Tau PET Image Synthesis Based on Latent Diffusion Models
Authors: Se-In Jang, Cristina Lois, Emma Thibault, J. Alex Becker, Yafei Dong, Marc D. Normandin, Julie C. Price, Keith A. Johnson, Georges El Fakhri, Kuang Gong
Summary: In this work, we developed a novel text-guided image synthesis technique
which could generate realistic tau PET images from textual descriptions and the
subject's MR image. The generated tau PET images have the potential to be used
in examining relations between different measures and also increasing the
public availability of tau PET datasets. The method was based on latent
diffusion models. Both textual descriptions and the subject's MR prior image
were utilized as conditions during image generation. The subject's MR image can
provide anatomical details, while the text descriptions, such as gender, scan
time, cognitive test scores, and amyloid status, can provide further guidance
regarding where the tau 

In [7]:
print(
        arxiv_search.name,
        "\n\n",
        arxiv_search.description,
        "\n\n",
        arxiv_search.args
    )

arxiv_search 

 arxiv_search(query: str) -> str - This function uses the ArxivAPIWrapper to search for scientific papers on Arxiv.

    Parameters:
    query (str): The search term to use in the query.

    Returns:
    str: The search results from Arxiv. 

 {'query': {'title': 'Query', 'type': 'string'}}


In [8]:
# Call the function as a tool
arxiv_search.run("2306.11984")

"Published: 2023-06-21\nTitle: TauPETGen: Text-Conditional Tau PET Image Synthesis Based on Latent Diffusion Models\nAuthors: Se-In Jang, Cristina Lois, Emma Thibault, J. Alex Becker, Yafei Dong, Marc D. Normandin, Julie C. Price, Keith A. Johnson, Georges El Fakhri, Kuang Gong\nSummary: In this work, we developed a novel text-guided image synthesis technique\nwhich could generate realistic tau PET images from textual descriptions and the\nsubject's MR image. The generated tau PET images have the potential to be used\nin examining relations between different measures and also increasing the\npublic availability of tau PET datasets. The method was based on latent\ndiffusion models. Both textual descriptions and the subject's MR prior image\nwere utilized as conditions during image generation. The subject's MR image can\nprovide anatomical details, while the text descriptions, such as gender, scan\ntime, cognitive test scores, and amyloid status, can provide further guidance\nregarding w

## [Google Jobs](https://python.langchain.com/docs/integrations/tools/google_jobs "Learn more")

You'll need a SERPAPI API key to use this tool. Get one here.

Below we'll create a tool that searches Google for jobs based on 

In [9]:
%pip install -Uq google-search-results

import os

os.environ["SERPAPI_API_KEY"] = ""

Note: you may need to restart the kernel to use updated packages.


In [10]:
from langchain.tools import tool

In [11]:
from langchain.tools.google_jobs import GoogleJobsQueryRun
from langchain.utilities.google_jobs import GoogleJobsAPIWrapper

In [12]:
@tool("google_job_search")
def google_job_search(job_title: str) -> str:
    """
    This function uses the GoogleJobsAPIWrapper to search for jobs on Google Jobs.

    Parameters:
    job_title (str): The search term(s) to use in the query.

    Returns:
    str: The search results from Google Jobs.
    """
    # create an instance of the GoogleJobsQueryRun
    gjs = GoogleJobsQueryRun(api_wrapper=GoogleJobsAPIWrapper())

    # Perform the search query
    results = gjs.run(job_title)

    # return the results
    return results

In [13]:
# Print the results of a search, as a regular function
print(google_job_search("Software Engineer"))


_______________________________________________
Job Title: Software Engineer
Company Name: Availity
Location:  Anywhere 
Description: Availity delivers revenue cycle and related business solutions for health care professionals who want to build healthy, thriving organizations. Availity has the powerful tools, actionable insights and expansive network reach that medical businesses need to get an edge in an industry constantly redefined by change.

Analyzes, designs, programs, debugs and modifies software enhancements and/or new products used in local, networked, cloud-based or Internet-related computer programs. Code may be used in commercial or end-user applications, such as materials management, financial management, HRIS, mobile apps or desktop applications products. Using current programming language and technologies, writes code, completes programming and performs testing and debugging of applications. Completes documentation and procedures for installation and maintenance. May in

In [14]:
# Call the function as a tool
google_job_search.run("Software Engineer")

"\n_______________________________________________\nJob Title: Software Engineer\nCompany Name: Availity\nLocation:  Anywhere \nDescription: Availity delivers revenue cycle and related business solutions for health care professionals who want to build healthy, thriving organizations. Availity has the powerful tools, actionable insights and expansive network reach that medical businesses need to get an edge in an industry constantly redefined by change.\n\nAnalyzes, designs, programs, debugs and modifies software enhancements and/or new products used in local, networked, cloud-based or Internet-related computer programs. Code may be used in commercial or end-user applications, such as materials management, financial management, HRIS, mobile apps or desktop applications products. Using current programming language and technologies, writes code, completes programming and performs testing and debugging of applications. Completes documentation and procedures for installation and maintenance

## [Wikipedia Search](https://python.langchain.com/docs/integrations/tools/wikipedia)

In [15]:
%pip install -Uq wikipedia

Note: you may need to restart the kernel to use updated packages.


In [16]:
from langchain.tools import WikipediaQueryRun
from langchain.utilities import WikipediaAPIWrapper

In [17]:
@tool("wiki_search")
def wiki_search(query: str) -> str:
    """
    This function uses the WikipediaAPIWrapper to search for information on Wikipedia.

    Parameters:
    query (str): The search term to use in the query.

    Returns:
    str: The search results from Wikipedia.
    """
    # create an instance of the WikipediaAPIWrapper
    wikisearch = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

    # Perform the search query
    results = wikisearch.run(query)

    # return the results
    return results

In [18]:
# Test create_search_tool function
print(wiki_search("The Body Keeps the Score"))

Page: The Body Keeps the Score
Summary: The Body Keeps the Score: Brain, Mind, and Body in the Healing of Trauma is a 2014 book by Bessel van der Kolk about the effects of psychological trauma, also known as traumatic stress. The book describes van der Kolk's research and experiences on how individuals are affected by traumatic stress, and its effects on the mind and body. It is based on his 1994 Harvard Review of Psychiatry article "The body keeps the score: memory and the evolving psychobiology of posttraumatic stress".The Body Keeps the Score has been published in 36 languages. As of July 2021 the book had spent more than 141 weeks on the New York Times Bestseller List for nonfiction, with 27 of those weeks spent in the No. 1 position.

Page: Bessel van der Kolk
Summary: Bessel van der Kolk (born 1943) is a Dutch psychiatrist, author, researcher and educator. Since the 1970s his research has been in the area of post-traumatic stress. He is the author of The New York Times best selle

In [19]:
# Call the function as a tool
wiki_search.run("The Body Keeps the Score")

'Page: The Body Keeps the Score\nSummary: The Body Keeps the Score: Brain, Mind, and Body in the Healing of Trauma is a 2014 book by Bessel van der Kolk about the effects of psychological trauma, also known as traumatic stress. The book describes van der Kolk\'s research and experiences on how individuals are affected by traumatic stress, and its effects on the mind and body. It is based on his 1994 Harvard Review of Psychiatry article "The body keeps the score: memory and the evolving psychobiology of posttraumatic stress".The Body Keeps the Score has been published in 36 languages. As of July 2021 the book had spent more than 141 weeks on the New York Times Bestseller List for nonfiction, with 27 of those weeks spent in the No. 1 position.\n\nPage: Bessel van der Kolk\nSummary: Bessel van der Kolk (born 1943) is a Dutch psychiatrist, author, researcher and educator. Since the 1970s his research has been in the area of post-traumatic stress. He is the author of The New York Times best

# Create an Agent using GPT-3.5-Turbo

### Set Environment Variables

Let's turn on LangSmith tracing for observability. We'll also set our OpenAI API key.

#### LangSmith
Sign up for LangSmith [here](https://smith.langchain.com/). 

It will be instrumental in understanding your LLM applications as projects become increasingly more complex because it allows the tracing of action steps made during runtime. This is especially true for us since we're building custom tools, creating a ReAct agent, and tying everything together inside a runnable chain.

In [20]:
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "build-custom-tools"
os.environ["LANGCHAIN_API_KEY"] = ""
os.environ["OPENAI_API_KEY"] = ""

### Why use an Instruct model instead of the latest "0125" model which has function calling, 4x context window, etc..?

We may opt to use the "Instruct" model because its class is considered an `LLM` by LangChain. This means that [`langchain_openai.llms.base.OpenAI`](https://api.python.langchain.com/en/latest/llms/langchain_openai.llms.base.OpenAI.html#langchain_openai.llms.base.OpenAI "LangChain API Documentation") will output strings rather than a list of messages, which is what [`langchain_openai.chat_models.base.ChatOpenAI`](https://api.python.langchain.com/en/latest/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html "LangChain API Documentation") does.

In our case, if we use the "Instruct" model, we'd be severly limiting our context window to 4,096 tokens. But we'd be gaining saving ourselves the hassle of dealing with conversation history. 

---

You may wish to see the [GPT-3.5-Turbo](https://platform.openai.com/docs/models/gpt-3-5-turbo "OpenAI Documentation") section for more detailed information.

In [33]:
# from langchain_openai import OpenAI

# chat_model = OpenAI(
#     model_name="gpt-3.5-turbo-instruct",
#     temperature=0,
#     openai_api_key=os.environ["OPENAI_API_KEY"]
# )

from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(
    model_name="gpt-3.5-turbo-0125",
    temperature=0.3, # Use 0.0 to adhere to retrieved context
    openai_api_key=os.environ["OPENAI_API_KEY"],
)

## Create a Prompt from a List of Messages

In [34]:
from langchain.schema import (
    HumanMessage,
    SystemMessage,
)

messages = [
    SystemMessage(content="I can choose to use different tools to accomplish a variety of tasks."),
    HumanMessage(content="I have a question I need help answering:"),
]

In [35]:
# Let's see what the chat messages look like before they are formatted for the LLM call.
messages

[SystemMessage(content='I can choose to use different tools to accomplish a variety of tasks.'),
 HumanMessage(content='I have a question I need help answering:')]

In [36]:
# Generate a completion of our prompt
res = chat_model.invoke(messages)
print(res)

content='Of course! What is your question?' response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 34, 'total_tokens': 42}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_4f0b692a78', 'finish_reason': 'stop', 'logprobs': None}


## Create an Agent

**What's an agent?**: An Agent in the context of LLMs and LangChain is a software entity that utilizes a Language Model (LLM) to perform tasks and interact with users. It acts as an intermediary between the user and the LLM, determining which actions to take and in what order. Agents can call external tools, utilize memory, and employ various strategies to achieve their goals, making them powerful problem-solving entities in the realm of natural language processing.


In [37]:
from langchain import hub
from langchain.agents import AgentExecutor, load_tools
from langchain.agents.format_scratchpad import format_log_to_str
from langchain.agents.output_parsers import (
    ReActJsonSingleInputOutputParser,
)
from langchain.tools.render import render_text_description
from langchain.utilities import SerpAPIWrapper

In [38]:
# define the tools
# `load_tools` doesn't support custom tools, so we'll initialize it as an empty list
tools = load_tools(["llm-math"], llm=chat_model)

# and now we'll append our custom tools
tools = tools + [arxiv_search, google_job_search, wiki_search]

# Inspect the tools
tools

[Tool(name='Calculator', description='Useful for when you need to answer questions about math.', func=<bound method Chain.run of LLMMathChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['question'], template='Translate a math problem into a expression that can be executed using Python\'s numexpr library. Use the output of running this code to answer the question.\n\nQuestion: ${{Question with math problem.}}\n```text\n${{single line mathematical expression that solves the problem}}\n```\n...numexpr.evaluate(text)...\n```output\n${{Output of running the code}}\n```\nAnswer: ${{Answer}}\n\nBegin.\n\nQuestion: What is 37593 * 67?\n```text\n37593 * 67\n```\n...numexpr.evaluate("37593 * 67")...\n```output\n2518731\n```\nAnswer: 2518731\n\nQuestion: 37593^(1/5)\n```text\n37593**(1/5)\n```\n...numexpr.evaluate("37593**(1/5)")...\n```output\n8.222831614237718\n```\nAnswer: 8.222831614237718\n\nQuestion: {question}\n'), llm=ChatOpenAI(client=<openai.resources.chat.completions.Comp

In [39]:
# setup ReAct style prompt
prompt = hub.pull("hwchase17/react-json")
prompt = prompt.partial(
    tools=render_text_description(tools),
    tool_names=", ".join([t.name for t in tools]),
)

# define the agent
chat_model_with_stop = chat_model.bind(stop=["\nObservation"])
agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_log_to_str(x["intermediate_steps"]),
    }
    | prompt
    | chat_model_with_stop
    | ReActJsonSingleInputOutputParser()
)

# instantiate AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)

In [40]:
invocation = agent_executor.invoke(
    {
        "input": "What is OHSU? How old is it in years? At what address can I find it?"
    }
)

print("\n\n" + invocation["output"])



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mQuestion: What is OHSU? How old is it in years? At what address can I find it?

Thought: To answer these questions, we can search for information about OHSU on Wikipedia.

Action:
```
{
  "action": "wiki_search",
  "action_input": "OHSU"
}
``` 
[0m[36;1m[1;3mPage: Oregon Health & Science University
Summary: Oregon Health & Science University (OHSU) is a
public research university focusing primarily on health sciences with a main campus, including two hospitals, in Portland, Oregon. The institution was founded in 1887 as the University of Oregon Medical Department and later became the University of Oregon Medical School. In 1974, the campus became an independent, self-governed institution called the University of Oregon Health Sciences Center, combining state dentistry, medicine, nursing, and public health programs into a single center. It was renamed Oregon Health Sciences University in 1981 and took its current name in 20

# The Trace
The Agent successfully answered the question correctly on the first run. Although it did make extraneous Wiki searches, I'm happy with the generated answer since it provides the requested information.

You can check out the chain's trace [here](https://smith.langchain.com/public/c2ee4b39-493c-4e35-a9bd-c96eae6b0d94/r "AgentExecutor: Full LangSmith trace").