# Custom Search Tools for Retrieval Agents


In [1]:
%pip install -Uq langchain langchainhub huggingface_hub transformers jinja2 numexpr langchain-openai

Note: you may need to restart the kernel to use updated packages.


## [@tool decorator](https://python.langchain.com/docs/modules/agents/tools/custom_tools "Learn more")

The `@tool` decorator is the simplest way to define a custom tool. The decorator uses the function name as the tool name by default, but this can be overridden by passing a string as the first argument. Additionally, the decorator will use the function's docstring as the tool's description - so a **docstring** ***MUST*** be provided.

Ex.

```python
@tool
def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b
```

We can access the tool's properties directly
```python
print(multiply.name)
print(multiply.description)
print(multiply.args)
```

```shell
multiply
multiply(a: int, b: int) -> int - Multiply two numbers.
{'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}}
```

Later we'll be able to observe the difference in output format depending on whether we call our functions with the `@tool` decorator as regular functions, or as tools.

In [2]:
from langchain.tools import tool, BaseTool

## LangSmith
If you have access to LangSmith, it will be extremely helpful as projects become increasingly more complex because it allows the tracing of action steps made throughout execution. 

This is especially true for us since we're building custom tools, creating an agent using Hugging Face's Model Hub, and tying everything together inside a runnable chain.

It becomes even more true as you add memory, vectorstore searching and so on.

In [3]:
import os

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = ""
os.environ["LANGCHAIN_PROJECT"] = "build-custom-tools"

## [Arxiv Search](https://python.langchain.com/docs/integrations/tools/arxiv "Learn more")

In [4]:
%pip install -Uq arxiv

Note: you may need to restart the kernel to use updated packages.


In [5]:
from langchain.utilities import ArxivAPIWrapper

In [6]:
@tool("arxiv_search")
def arxiv_search(query: str) -> str:
    """
    This function uses the ArxivAPIWrapper to search for scientific papers on Arxiv.

    Parameters:
    query (str): The search term to use in the query.

    Returns:
    str: The search results from Arxiv.
    """
    # Create an instance of the ArxivAPIWrapper
    arxiv = ArxivAPIWrapper()

    # Perform the search query
    results = arxiv.run(query)

    # Return the search results
    return results

In [7]:
print(arxiv_search("2306.11984"))

Published: 2023-06-21
Title: TauPETGen: Text-Conditional Tau PET Image Synthesis Based on Latent Diffusion Models
Authors: Se-In Jang, Cristina Lois, Emma Thibault, J. Alex Becker, Yafei Dong, Marc D. Normandin, Julie C. Price, Keith A. Johnson, Georges El Fakhri, Kuang Gong
Summary: In this work, we developed a novel text-guided image synthesis technique
which could generate realistic tau PET images from textual descriptions and the
subject's MR image. The generated tau PET images have the potential to be used
in examining relations between different measures and also increasing the
public availability of tau PET datasets. The method was based on latent
diffusion models. Both textual descriptions and the subject's MR prior image
were utilized as conditions during image generation. The subject's MR image can
provide anatomical details, while the text descriptions, such as gender, scan
time, cognitive test scores, and amyloid status, can provide further guidance
regarding where the tau 

In [8]:
print(
        arxiv_search.name,
        "\n\n",
        arxiv_search.description,
        "\n\n",
        arxiv_search.args
    )

arxiv_search 

 arxiv_search(query: str) -> str - This function uses the ArxivAPIWrapper to search for scientific papers on Arxiv.

    Parameters:
    query (str): The search term to use in the query.

    Returns:
    str: The search results from Arxiv. 

 {'query': {'title': 'Query', 'type': 'string'}}


In [9]:
# Call the function as a tool
arxiv_search.run("2306.11984")

"Published: 2023-06-21\nTitle: TauPETGen: Text-Conditional Tau PET Image Synthesis Based on Latent Diffusion Models\nAuthors: Se-In Jang, Cristina Lois, Emma Thibault, J. Alex Becker, Yafei Dong, Marc D. Normandin, Julie C. Price, Keith A. Johnson, Georges El Fakhri, Kuang Gong\nSummary: In this work, we developed a novel text-guided image synthesis technique\nwhich could generate realistic tau PET images from textual descriptions and the\nsubject's MR image. The generated tau PET images have the potential to be used\nin examining relations between different measures and also increasing the\npublic availability of tau PET datasets. The method was based on latent\ndiffusion models. Both textual descriptions and the subject's MR prior image\nwere utilized as conditions during image generation. The subject's MR image can\nprovide anatomical details, while the text descriptions, such as gender, scan\ntime, cognitive test scores, and amyloid status, can provide further guidance\nregarding w

## [Google Jobs](https://python.langchain.com/docs/integrations/tools/google_jobs "Learn more")

In [10]:
%pip install -Uq google-search-results

os.environ["SERPAPI_API_KEY"] = ""

Note: you may need to restart the kernel to use updated packages.


In [11]:
from langchain.tools import tool

In [12]:
from langchain.tools.google_jobs import GoogleJobsQueryRun
from langchain.utilities.google_jobs import GoogleJobsAPIWrapper

In [13]:
@tool("google_job_search")
def google_job_search(query: str) -> str:
    """
    This function uses the GoogleJobsAPIWrapper to search for jobs on Google Jobs.

    Parameters:
    query (str): The search term to use in the query.

    Returns:
    str: The search results from Google Jobs.
    """
    # create an instance of the GoogleJobsQueryRun
    gjs = GoogleJobsQueryRun(api_wrapper=GoogleJobsAPIWrapper())

    # Perform the search query
    results = gjs.run(query)

    # return the results
    return results

In [14]:
# Print the results of a search, as a regular function
print(google_job_search("Software Engineer"))


_______________________________________________
Job Title: (USA) Software Engineer II
Company Name: Walmart
Location:   Bentonville, AR   
Description: Position Summary...

What you'll do...

What you'll do...

It’s exciting time to join our Walmart journey. We are seeking a Software Engineer III to support the Finance Technology.  

About the Team

As a part of Walmart Finance Technology – Finance Data Factory (FDF), we build industry defining Data platform/workflows that process large quality of general ledger data and applications that run on data platform for visualization also, Machine Learning and statistical modeling that enable transparency, governance, and automation. The goals include improved Business partner user experience; standardizing the core and utilizing reporting/predictive analytics to uncover insights faster.

You will be in the unique position to be of service to associates as a member of this organization supporting all segments of Walmart Finance. If you are t

In [15]:
# Call the function as a tool
google_job_search.run("Software Engineer")

"\n_______________________________________________\nJob Title: (USA) Software Engineer II\nCompany Name: Walmart\nLocation:   Bentonville, AR   \nDescription: Position Summary...\n\nWhat you'll do...\n\nWhat you'll do...\n\nIt’s exciting time to join our Walmart journey. We are seeking a Software Engineer III to support the Finance Technology. \u202f\n\nAbout the Team\n\nAs a part of Walmart Finance Technology – Finance Data Factory (FDF), we build industry defining Data platform/workflows that process large quality of general ledger data and applications that run on data platform for visualization also, Machine Learning and statistical modeling that enable transparency, governance, and automation. The goals include improved Business partner user experience; standardizing the core and utilizing reporting/predictive analytics to uncover insights faster.\n\nYou will be in the unique position to be of service to associates as a member of this organization supporting all segments of Walmar

## [Wikipedia Search](https://python.langchain.com/docs/integrations/tools/wikipedia)

In [16]:
%pip install -Uq wikipedia

Note: you may need to restart the kernel to use updated packages.


In [17]:
from langchain.tools import WikipediaQueryRun
from langchain.utilities import WikipediaAPIWrapper

In [18]:
@tool("wiki_search")
def wiki_search(query: str) -> str:
    """
    This function uses the WikipediaAPIWrapper to search for information on Wikipedia.

    Parameters:
    query (str): The search term to use in the query.

    Returns:
    str: The search results from Wikipedia.
    """
    # create an instance of the WikipediaAPIWrapper
    wikisearch = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

    # Perform the search query
    results = wikisearch.run(query)

    # return the results
    return results

In [19]:
# Test create_search_tool function
print(wiki_search("The Body Keeps the Score"))

Page: The Body Keeps the Score
Summary: The Body Keeps the Score: Brain, Mind, and Body in the Healing of Trauma is a 2014 book by Bessel van der Kolk about the effects of psychological trauma, also known as traumatic stress. The book describes van der Kolk's research and experiences on how individuals are affected by traumatic stress, and its effects on the mind and body. It is based on his 1994 Harvard Review of Psychiatry article "The body keeps the score: memory and the evolving psychobiology of posttraumatic stress".The Body Keeps the Score has been published in 36 languages. As of July 2021 the book had spent more than 141 weeks on the New York Times Bestseller List for nonfiction, with 27 of those weeks spent in the No. 1 position.

Page: Bessel van der Kolk
Summary: Bessel van der Kolk (born 1943) is a psychiatrist, author, researcher and educator based in Boston, United States. Since the 1970s his research has been in the area of post-traumatic stress. He is the author of The 

In [20]:
# Call the function as a tool
wiki_search.run("The Body Keeps the Score")

'Page: The Body Keeps the Score\nSummary: The Body Keeps the Score: Brain, Mind, and Body in the Healing of Trauma is a 2014 book by Bessel van der Kolk about the effects of psychological trauma, also known as traumatic stress. The book describes van der Kolk\'s research and experiences on how individuals are affected by traumatic stress, and its effects on the mind and body. It is based on his 1994 Harvard Review of Psychiatry article "The body keeps the score: memory and the evolving psychobiology of posttraumatic stress".The Body Keeps the Score has been published in 36 languages. As of July 2021 the book had spent more than 141 weeks on the New York Times Bestseller List for nonfiction, with 27 of those weeks spent in the No. 1 position.\n\nPage: Bessel van der Kolk\nSummary: Bessel van der Kolk (born 1943) is a psychiatrist, author, researcher and educator based in Boston, United States. Since the 1970s his research has been in the area of post-traumatic stress. He is the author o

# Create an Agent using [HuggingFaceHub](https://python.langchain.com/docs/integrations/chat/huggingface#huggingfacehub "Learn more")

Works with *HuggingFaceTextGenInference*, *HuggingFaceEndpoint*, and *HuggingFaceHub* LLMs.

Upon instantiating this class, the model_id is resolved from the url provided to the LLM, and the appropriate tokenizer is loaded from the HuggingFace Hub.

See the [Python API Documentation for LangChain](https://api.python.langchain.com/en/stable/chat_models/langchain_community.chat_models.huggingface.ChatHuggingFace.html?highlight=chathuggingface#langchain-community-chat-models-huggingface-chathuggingface "Learn more") for more detailed information.

In [21]:
from langchain_community.llms import HuggingFaceHub

# Instantiate the HuggingFaceHub LLM
llm = HuggingFaceHub(
    repo_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    model_kwargs={
        "max_new_tokens": 4096,
        "top_k": 35,
        "temperature": 0.1,
        "repetition_penalty": 1.05,
    },
    huggingfacehub_api_token="",
    )

# Instantiate `ChatHuggingFace` to apply chat templates
from langchain_community.chat_models.huggingface import ChatHuggingFace

chat_model = ChatHuggingFace(llm=llm)

  from .autonotebook import tqdm as notebook_tqdm
                    repo_id was transferred to model_kwargs.
                    Please confirm that repo_id is what you intended.
                    task was transferred to model_kwargs.
                    Please confirm that task is what you intended.
                    huggingfacehub_api_token was transferred to model_kwargs.
                    Please confirm that huggingfacehub_api_token is what you intended.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


## Apply LangChain Chat templates via `ChatHuggingFace`

In [22]:
from langchain.schema import (
    HumanMessage,
    SystemMessage,
)

messages = [
    SystemMessage(content="I can choose to use different tools to accomplish a variety of tasks."),
    HumanMessage(content="I have a question I need help answering:"),
]

In [23]:
# Let's see what the chat messages look like before they are formatted for the LLM call.
chat_model._to_chat_prompt(messages)

'<|system|>\nI can choose to use different tools to accomplish a variety of tasks.</s>\n<|user|>\nI have a question I need help answering:</s>\n<|assistant|>\n'

In [24]:
# Call the model
res = chat_model.invoke(messages)
print(res.content)

<|system|>
I can choose to use different tools to accomplish a variety of tasks.</s>
<|user|>
I have a question I need help answering:</s>
<|assistant|>
Please provide me with the question you need help answering, and I will do my best to provide a clear and concise response. Alternatively, you can specify the topic or context of the question to help me understand it better. Thank you!


#### NOTE:
The above section setting up the HuggingFace LLM broke somewhere around patch "0.1.0" of LangChain. F**k it, we'll use ChatOpenAI, since it seems to be the only one that works consistently.

In [25]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(
    model_name="gpt-3.5-turbo-1106",
    temperature=0.3,
    openai_api_key="",
)

## Create an Agent

**What's an agent?**: An Agent in the context of LLMs and LangChain is a software entity that utilizes a Language Model (LLM) to perform tasks and interact with users. It acts as an intermediary between the user and the LLM, determining which actions to take and in what order. Agents can call external tools, utilize memory, and employ various strategies to achieve their goals, making them powerful problem-solving entities in the realm of natural language processing.


In [26]:
from langchain import hub
from langchain.agents import AgentExecutor, load_tools
from langchain.agents.format_scratchpad import format_log_to_str
from langchain.agents.output_parsers import (
    ReActJsonSingleInputOutputParser,
)
from langchain.tools.render import render_text_description
from langchain.utilities import SerpAPIWrapper

In [27]:
# define the tools
# `load_tools` doesn't support custom tools, so we'll initialize it as an empty list
tools = load_tools(["llm-math"], llm=llm)

# and now we'll append our custom tools
tools = tools + [arxiv_search, google_job_search, wiki_search]

# Inspect the tools
tools

[Tool(name='Calculator', description='Useful for when you need to answer questions about math.', func=<bound method Chain.run of LLMMathChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['question'], template='Translate a math problem into a expression that can be executed using Python\'s numexpr library. Use the output of running this code to answer the question.\n\nQuestion: ${{Question with math problem.}}\n```text\n${{single line mathematical expression that solves the problem}}\n```\n...numexpr.evaluate(text)...\n```output\n${{Output of running the code}}\n```\nAnswer: ${{Answer}}\n\nBegin.\n\nQuestion: What is 37593 * 67?\n```text\n37593 * 67\n```\n...numexpr.evaluate("37593 * 67")...\n```output\n2518731\n```\nAnswer: 2518731\n\nQuestion: 37593^(1/5)\n```text\n37593**(1/5)\n```\n...numexpr.evaluate("37593**(1/5)")...\n```output\n8.222831614237718\n```\nAnswer: 8.222831614237718\n\nQuestion: {question}\n'), llm=HuggingFaceHub(client=<InferenceClient(model='HuggingFace

In [28]:
# setup ReAct style prompt
prompt = hub.pull("hwchase17/react-json")
prompt = prompt.partial(
    tools=render_text_description(tools),
    tool_names=", ".join([t.name for t in tools]),
)

# define the agent
chat_model_with_stop = chat_model.bind(stop=["\nObservation"])
agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_log_to_str(x["intermediate_steps"]),
    }
    | prompt
    | chat_model_with_stop
    | ReActJsonSingleInputOutputParser()
)

# instantiate AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)

In [29]:
agent_executor.invoke(
    {
        "input": "What is OHSU? How old is it in years? At what address can I find it?"
    }
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mQuestion: What is OHSU? How old is it in years? At what address can I find it?

Thought: OHSU could be an organization or institution. I should start by searching for information about OHSU to understand what it is and its age. Once I have that information, I can look for its address.

Action:
```
{
  "action": "wiki_search",
  "action_input": "OHSU"
}
```[0m[36;1m[1;3mPage: Oregon Health & Science University
Summary: Oregon Health & Science University (OHSU) is a
public research university focusing primarily on health sciences with a main campus, including two hospitals, in Portland, Oregon. The institution was founded in 1887 as the University of Oregon Medical Department and later became the University of Oregon Medical School. In 1974, the campus became an independent, self-governed institution called the University of Oregon Health Sciences Center, combining state dentistry, medicine, nursing, and public health progra

{'input': 'What is OHSU? How old is it in years? At what address can I find it?',
 'output': 'Oregon Health & Science University (OHSU) is 134 years old and its main campus is located in Portland, Oregon.'}

# The Trace
The latest trace was lightning fast. It only took a single wiki search to find the answer.

You can check out the chain's trace [here](https://smith.langchain.com/public/3f7cd76d-b279-4e6b-a6ec-0efe916ab4ed/r "AgentExecutor: Full LangSmith trace").