# Building a Metaphor Data Agent

This tutorial walks through using the LLM tools provided by the [Metaphor API](https://platform.metaphor.systems/) to allow LLMs to easily search and retrieve HTML content from the Internet.

To get started, you will need an [OpenAI api key](https://platform.openai.com/account/api-keys) and a [Metaphor API key](https://dashboard.metaphor.systems/overview).

We will import the relevant agents and tools and pass them our keys here:

In [1]:
# Set up OpenAI
import openai
from llama_index.agent import OpenAIAgent
openai.api_key = 'sk-your-key'

# Set up Metaphor tool
from llama_hub.tools.metaphor.base import MetaphorToolSpec
metaphor_tool = MetaphorToolSpec(
    api_key='your-key',
)

metaphor_tool_list = metaphor_tool.to_tool_list()
for tool in metaphor_tool_list:
    print(tool.metadata.name)

metaphor_search
retrieve_documents
current_date
find_similar


## Testing the Metaphor tools

We've imported our OpenAI agent, set up the api key, and initalized our tool, checking the methods that it has available. Let's test out the tool before setting up our Agent:

In [2]:
metaphor_tool.metaphor_search('This is the best explanation for machine learning transformers:', num_results=3)

[{'title': 'Transformers: Attention in Disguise',
  'url': 'https://www.mihaileric.com/posts/transformers-attention-in-disguise/',
  'id': 'iEYMai5rS9k0hN5_BH0VZg'},
 {'title': 'An Analogy for Understanding Transformers - LessWrong',
  'url': 'https://www.lesswrong.com/posts/euam65XjigaCJQkcN/an-analogy-for-understanding-transformers',
  'id': 'zKjHcPuGa9wuBT_EVDLrJg'},
 {'title': 'A Conceptual Guide to Transformers: Part I',
  'url': 'https://benlevinstein.substack.com/p/a-conceptual-guide-to-transformers',
  'id': 'hyRl0cnGaBk6RgIMpWMMew'}]

In [3]:
metaphor_tool.retrieve_documents(['iEYMai5rS9k0hN5_BH0VZg'])

[Document(id_='551c0613-124b-45d8-985f-02b7c7a9e634', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='bbd2ad7f78cf02100add6077decdf885ffebfe37743bfc13e80e61b5b833e46c', text='<div><div><p>\n <a href="/static/transformers-bumblebee-d4ec62a6a46aae8f83b7419242f59f4e-48854.jpg">\n \n \n \n \n \n \n \n </a>\n </p>\n<p>In this post, we will be describing a class of sequence processing models known as Transformers (…<em>robots in disguise</em>).\nJokes aside, Transformers came out on the scene not too long ago and have rocked the natural language processing community because of\ntheir pitch: state-of-the-art and efficient sequence processing <strong>without recurrent units or convolution</strong>. </p>\n<p><em>“No recurrent units or convolution?! What are these models even made of?!”</em>, you may be exclaiming to unsuspecting strangers on the streets. </p>\n<p>Not much it turns out, other than a bunch of attention and feedf

In [4]:
metaphor_tool.find_similar('https://www.mihaileric.com/posts/transformers-attention-in-disguise/')

[{'title': 'Transformers: Attention in Disguise',
  'url': 'https://www.mihaileric.com/posts/transformers-attention-in-disguise/',
  'id': 'iEYMai5rS9k0hN5_BH0VZg'},
 {'title': 'Transformers: a Primer',
  'url': 'http://www.columbia.edu/~jsl2239/transformers.html',
  'id': 'cXbT9IsR5u8NtLTAIcUWOA'},
 {'title': 'Illustrated Guide to Transformers- Step by Step Explanation',
  'url': 'https://towardsdatascience.com/illustrated-guide-to-transformers-step-by-step-explanation-f74876522bc0?gi=8fe76db5c4d9',
  'id': 'czTCPZ1vqo-f92WQwKvRig'}]

## Avoiding Context Window Problems

The above example shows the main uses of the Metaphor tool. We can easily retrieve a clean list of links related to a query, and then we can fetch the content of the article as a cleaned up html extract. 

We can see that the content of the articles is somewhat long compared to current LLM context windows, and so to allow retrieval and summary of many documents we will set up and use another tool from LlamaIndex that allows us to load text into a VectorStore, and query it for retrieval

In [5]:
from llama_index.tools.tool_spec.load_and_search.base import LoadAndSearchToolSpec

# The retrieve documents tool is the second in the tool list, as seen above
wrapped_retrieve = LoadAndSearchToolSpec.from_defaults(
    metaphor_tool_list[1],
)

Our wrapped retrieval tools seperate loading and reading into seperate interfaces. We use `load` to load the documents into the vector store, and `read` to query the vector store. Let's try it out again

In [6]:
wrapped_retrieve.load(['iEYMai5rS9k0hN5_BH0VZg'])
wrapped_retrieve.read('what is a transformer')

'\nA Transformer is a class of sequence processing models that use attention and feedforward operations to achieve state-of-the-art and efficient sequence processing without recurrent units or convolution.'

## Creating the Agent

We now have a total of 5 tools that we will make available to the Agent, allowing it to use Metaphor's services to it's full potential:

`metaphor_search`: To search for a list of relevant articles

`retrieve_documents`: To load the articles into a vector store

`read_retrieve_documents`: To query the articles loaded into the vector store

`find_similar`: To find articles similiar to a given URL

`get_date`: A simple utility method to get todays date for the Agent, for better usage of date based filtering


We will pass all of these functions into our Agent, and test it out below:

In [7]:

# We don't give the Agent our unwrapped retrieve document tools, instead passing the wrapped tools
agent = OpenAIAgent.from_tools(
    [metaphor_tool_list[0], *wrapped_retrieve.to_tool_list(), metaphor_tool_list[2], metaphor_tool_list[3]],
    verbose=True,
)


In [8]:
print(agent.chat('Can you summarize the news published in the last month on superconductors'))

=== Calling Function ===
Calling function: current_date with args: {}
Got output: 2023-08-16
=== Calling Function ===
Calling function: metaphor_search with args: {
  "query": "news about superconductors",
  "num_results": 5,
  "start_published_date": "2023-07-16",
  "end_published_date": "2023-08-16"
}
Got output: [{'title': 'Why a "room-temperature superconductor" would be a huge deal', 'url': 'https://www.vox.com/future-perfect/23816753/superconductor-room-temperature-lk99-quantum-fusion', 'id': 'cD13waRIIwL6_S4SlIRHjQ'}, {'title': 'Korean team claims to have created the first room-temperature, ambient-pressure superconductor', 'url': 'https://phys.org/news/2023-07-korean-team-room-temperature-ambient-pressure-superconductor.html', 'id': 'jXsm1TpKVy9zg5kH3bRuag'}, {'title': "Room-temperature superconductors: Here's everything you need to know", 'url': 'https://www.newscientist.com/article/2385270-room-temperature-superconductors-heres-everything-you-need-to-know/?utm_campaign=RSS%7C

We asked the agent to retrieve documents related to superconductors from this month. It used the `get_date` tool to determine the current month, and then applied the filters in Metaphor based on publication date when calling `metaphor_search`. It then loaded the documents using `retrieve_documents` and read them using `read_retrieve_documents`.

We can make another query to the vector store to read from it again, now that the articles are loaded:

In [9]:
print(agent.chat('whats the process to make the new superconductor?'))

=== Calling Function ===
Calling function: find_similar with args: {
  "url": "https://phys.org/news/2023-07-korean-team-room-temperature-ambient-pressure-superconductor.html",
  "num_results": 1
}
Got output: [{'title': 'Korean team claims to have created the first room-temperature, ambient-pressure superconductor', 'url': 'https://phys.org/news/2023-07-korean-team-room-temperature-ambient-pressure-superconductor.html', 'id': 'jXsm1TpKVy9zg5kH3bRuag'}]
=== Calling Function ===
Calling function: retrieve_documents with args: {
  "ids": ["jXsm1TpKVy9zg5kH3bRuag"]
}
Got output: Content loaded! You can now search the information using read_retrieve_documents
=== Calling Function ===
Calling function: read_retrieve_documents with args: {
  "query": "process to make the new superconductor"
}
Got output: 
The process to make the new superconductor, LK-99, involves mixing powdered compounds containing lead, oxygen, sulphur and phosphorus, then heating them at a high temperature for several ho