# Building a Metaphor Data Agent

This tutorial walks through using the LLM tools provided by the [Metaphor API](https://platform.metaphor.systems/) to allow LLMs to easily search and retrieve HTML content from the Internet.

To get started, you will need an [OpenAI api key](https://platform.openai.com/account/api-keys) and a [Metaphor API key](https://dashboard.metaphor.systems/overview)

We will import the relevant agents and tools and pass them our keys here:

In [1]:
# Set up OpenAI
import openai
from llama_index.agent import OpenAIAgent

ImportError: cannot import name 'BaseChatEngine' from partially initialized module 'llama_index.chat_engine.types' (most likely due to a circular import) (/Users/jerryliu/Programming/gpt_index/llama_index/chat_engine/types.py)

In [None]:
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")
llm.complete("hello world")

In [3]:
openai.api_key = "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"

In [3]:
# Set up Metaphor tool
from llama_hub.tools.metaphor.base import MetaphorToolSpec

metaphor_tool = MetaphorToolSpec(
    api_key="f6e1ff14-56be-4ab8-a4e9-a6924f693cdc",
)

metaphor_tool_list = metaphor_tool.to_tool_list()
for tool in metaphor_tool_list:
    print(tool.metadata.name)

search
retrieve_documents
search_and_retrieve_documents
find_similar
current_date


## Testing the Metaphor tools

We've imported our OpenAI agent, set up the api key, and initialized our tool, checking the methods that it has available. Let's test out the tool before setting up our Agent.

All of the Metaphor search tools make use of the `AutoPrompt` option where Metaphor will pass the query through an LLM to refine and improve it.

In [3]:
metaphor_tool.search("machine learning transformers", num_results=3)

[Metaphor Tool] Autoprompt: Here's a great article about machine learning transformers:


[{'title': 'How Transformers work in deep learning and NLP: an intuitive introduction | AI Summer',
  'url': 'https://theaisummer.com/transformer/',
  'id': 'Ojj7s8CjrKEYuUnM2kkJtQ'},
 {'title': 'On the potential of Transformers in Reinforcement Learning',
  'url': 'https://lorenzopieri.com/rl_transformers/',
  'id': 'ysJlYSgeGW3l4zyOBoSGcg'},
 {'title': 'Transformers: An Overview of the Most Novel AI Architecture',
  'url': 'https://towardsdatascience.com/transformers-an-overview-of-the-most-novel-ai-architecture-cdd7961eef84?gi=61d04932798f',
  'id': 'E0CK1M6eS91nlvh6aIoQ9w'}]

In [None]:
metaphor_tool.retrieve_documents(["iEYMai5rS9k0hN5_BH0VZg"])

In [None]:
metaphor_tool.find_similar(
    "https://www.mihaileric.com/posts/transformers-attention-in-disguise/"
)

In [None]:
metaphor_tool.search_and_retrieve_documents(
    "This is the best explanation for machine learning transformers:", num_results=1
)

We can see we have different tools to search for results, retrieve the results, find similar results to a web page, and finally a tool that combines search and document retrieval into a single tool. We will test them out in LLM Agents below:

### Using the Search and Retrieve documents tools in an Agent

We can create an agent with access to the above tools and start testing it out:

In [4]:
# We don't give the Agent our unwrapped retrieve document tools, instead passing the wrapped tools
agent = OpenAIAgent.from_tools(
    metaphor_tool_list,
    verbose=True,
)

In [5]:
agent.chat("Can you summarize the news published in the last month on superconductors")

=== Calling Function ===
Calling function: search with args: {
  "query": "news about superconductors",
  "start_published_date": "2021-10-01",
  "end_published_date": "2021-10-31"
}
[Metaphor Tool] Autoprompt: Here's the latest news about superconductors:
Got output: [{'title': 'High-temperature superconducting joints make an all-superconducting NMR magnet – Physics World', 'url': 'https://physicsworld.com/a/high-temperature-superconducting-joints-make-an-all-superconducting-nmr-magnet/', 'id': 'jABmPnMlMQO1Lrg9BDrhXw'}, {'title': 'Breakthrough or bust? Claim of room-temperature superconductivity draws fire', 'url': 'https://www.science.org/content/article/breakthrough-or-bust-claim-room-temperature-superconductivity-draws-fire', 'id': 'LXl2iWJ-DliVqVteihfZ6g'}, {'title': 'New tricks for finding better superconductive materials', 'url': 'https://phys.org/news/2021-10-superconductive-materials.html', 'id': '54HBOIJvnjg9oQFIjdIJNw'}, {'title': 'NSF Grant Funds New 40T Superconducting Ma

InvalidRequestError: [] is too short - 'messages'

In [6]:
print(agent.chat("What are the best resturants in toronto?"))

=== Calling Function ===
Calling function: search with args: {
  "query": "best restaurants in Toronto"
}
[Metaphor Tool] Autoprompt: Here is a great restaurant in Toronto:
Got output: [{'title': 'Via Allegro Ristorante - Toronto Fine Dining Restaurant', 'url': 'https://viaallegroristorante.com/', 'id': 'EVlexzJh-lzkVr4tb2y_qw'}, {'title': 'Location', 'url': 'https://osteriagiulia.ca/', 'id': 'HjP5c54vqb3n3UNa3HevSA'}, {'title': 'PATOIS • TORONTO', 'url': 'https://www.patoistoronto.com/', 'id': 'h3A1DO4glTdK2yw0XV_QKQ'}, {'title': 'Sophisticated Dining - Toronto Restaurant | Scaramouche', 'url': 'https://www.scaramoucherestaurant.com/', 'id': 'QWkwiSzB3JwT7gUgg2mhig'}, {'title': 'ITALIAN, INSPIRED.', 'url': 'https://figotoronto.com/', 'id': 'OvBcTqEo1tCSywr4ATptCg'}, {'title': 'Oretta | Toronto, ON', 'url': 'https://www.oretta.to/', 'id': 'UnYBUVwHS_KvSryHEfrp4w'}, {'title': 'Select A Restaurant', 'url': 'https://www.torontopho.com/', 'id': 'DiQ1hU1gmrIzpKnOaVvZmw'}, {'title': 'La Feni

In [7]:
print(agent.chat("tell me more about Osteria Giulia"))

=== Calling Function ===
Calling function: find_similar with args: {
  "url": "https://osteriagiulia.ca/",
  "num_results": 1
}
Got output: [{'title': 'Location', 'url': 'https://osteriagiulia.ca/', 'id': 'HjP5c54vqb3n3UNa3HevSA'}]
=== Calling Function ===
Calling function: retrieve_documents with args: {
  "ids": ["HjP5c54vqb3n3UNa3HevSA"]
}
Got output: [Document(id_='91990ba3-c091-4830-adc8-2f9b4d785675', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='f5d55bc509de58dc2ac7bf91e54d8054c59df2b20337e73d78859256736f393f', text='<div><div><li><div>\n<div><div><p></p><h3>Location</h3><p></p></div><div><p></p><h3>134 AVENUE ROAD</h3><p></p></div><div><p></p><h3>Toronto, ON</h3><p></p></div><div><p></p><h2><a href="tel:416.964.8686">416.964.8686</a></h2><p></p></div><div><p></p><h2><a href="mailto:info@osteriagiulia.ca">info@osteriagiulia.ca</a></h2><p></p></div><div><p></p><h2>(for general inquires only...<br />\nno reserv

## Avoiding Context Window Issues

The above example shows the core uses of the Metaphor tool. We can easily retrieve a clean list of links related to a query, and then we can fetch the content of the article as a cleaned up html extract. Alternatively, the search_and_retrieve_documents tool directly returns the documents from our search result.

We can see that the content of the articles is somewhat long compared to current LLM context windows, and so to allow retrieval and summary of many documents we will set up and use another tool from LlamaIndex that allows us to load text into a VectorStore, and query it for retrieval. This is where the `search_and_retrieve_documents` tool become particularly useful. The Agent can make a single query to retrieve a large number of documents, using a very small number of tokens, and then make queries to retrieve specific information from the documents.

In [8]:
from llama_index.tools.tool_spec.load_and_search.base import LoadAndSearchToolSpec

# The search_and_retrieve_documents tool is the third in the tool list, as seen above
wrapped_retrieve = LoadAndSearchToolSpec.from_defaults(
    metaphor_tool_list[2],
)

Our wrapped retrieval tools separate loading and reading into separate interfaces. We use `load` to load the documents into the vector store, and `read` to query the vector store. Let's try it out again

In [9]:
wrapped_retrieve.load("This is the best explanation for machine learning transformers:")
print(wrapped_retrieve.read("what is a transformer"))
print(wrapped_retrieve.read("who wrote the first paper on transformers"))

[Metaphor Tool] Autoprompt: Here is a great explanation of machine learning transformers:
A transformer is an architecture that uses attention to significantly improve the performance of deep learning NLP translation models. It is designed to handle text data that is inherently sequential. Transformers take a text sequence as input and produce another text sequence as output, such as translating an input English sentence to Spanish. Transformers have become popular in the field of natural language processing and have been used in various projects, including Google's BERT and OpenAI's GPT series.
The first paper on transformers was written by the authors of "Attention is all you need".


## Creating the Agent

We now are ready to create an Agent that can use Metaphors services to it's full potential. We will use our wrapped read and load tools, as well as the `get_date` utility for the following agent and test it out below:

In [14]:
# Just pass the wrapped tools and the get_date utility
agent = OpenAIAgent.from_tools(
    [*wrapped_retrieve.to_tool_list(), metaphor_tool_list[4]],
    verbose=True,
)

In [15]:
response = agent.chat(
    "Can you summarize everything published in the last month regarding news on superconductors"
)
print(response)

=== Calling Function ===
Calling function: current_date with args: {}
Got output: 2023-10-01
=== Calling Function ===
Calling function: search_and_retrieve_documents with args: {
  "query": "superconductors",
  "start_published_date": "2023-09-01",
  "end_published_date": "2023-09-30"
}
[Metaphor Tool] Autoprompt: "Here is an interesting article about superconductors:
Got output: Content loaded! You can now search the information using read_search_and_retrieve_documents
=== Calling Function ===
Calling function: read_search_and_retrieve_documents with args: {
  "query": "superconductors"
}
Got output: 
Superconductors are materials that can conduct an electric current without any resistance, meaning that no energy is lost through heat. Superconductors have been known about for more than 100 years, but previous ones have worked only at extremely low temperatures or when under very high pressures.
Here is a summary of the news published in the last month regarding superconductors:

1. Su

We asked the agent to retrieve documents related to superconductors from this month. It used the `get_date` tool to determine the current month, and then applied the filters in Metaphor based on publication date when calling `search`. It then loaded the documents using `retrieve_documents` and read them using `read_retrieve_documents`.

We can make another query to the vector store to read from it again, now that the articles are loaded: