# Building a Metaphor Data Agent

This tutorial walks through using the LLM tools provided by the [Metaphor API](https://platform.metaphor.systems/) to allow LLMs to easily search and retrieve HTML content from the Internet.

To get started, you will need an [OpenAI api key](https://platform.openai.com/account/api-keys) and a [Metaphor API key](https://dashboard.metaphor.systems/overview).

We will import the relevant agents and tools and pass them our keys here:

In [11]:
# Set up OpenAI
import openai
from llama_index.agent import OpenAIAgent
openai.api_key = 'sk-your-key'

# Set up Metaphor tool
from llama_hub.tools.metaphor.base import MetaphorToolSpec
metaphor_tool = MetaphorToolSpec(
    api_key='your-key',
)

metaphor_tool_list = metaphor_tool.to_tool_list()
for tool in metaphor_tool_list:
    print(tool.metadata.name)

search
retrieve_documents
current_date
find_similar


## Testing the Metaphor tools

We've imported our OpenAI agent, set up the api key, and initalized our tool, checking the methods that it has available. Let's test out the tool before setting up our Agent:

In [2]:
metaphor_tool.search('This is the best explanation for machine learning transformers:', num_results=3)

[{'title': 'Illustrated Guide to Transformers- Step by Step Explanation',
  'url': 'https://towardsdatascience.com/illustrated-guide-to-transformers-step-by-step-explanation-f74876522bc0?gi=8fe76db5c4d9',
  'id': 'czTCPZ1vqo-f92WQwKvRig'},
 {'title': 'Transformers, explained: Understand the model behind GPT, BERT, and T5',
  'url': 'https://www.youtube.com/watch?v=SZorAJ4I-sA&feature=youtu.be',
  'id': 'VkJuLxzjSgX8-OjN2Huwqg'},
 {'title': 'A Deep Dive Into the Transformer Architecture — The Development of Transformer Models',
  'url': 'https://towardsdatascience.com/a-deep-dive-into-the-transformer-architecture-the-development-of-transformer-models-acbdf7ca34e0?gi=f20b470a0326',
  'id': 'UssKoV3Wje2RyicVGcbwuQ'}]

In [3]:
metaphor_tool.retrieve_documents(['iEYMai5rS9k0hN5_BH0VZg'])

[Document(id_='d22776c9-bf0d-4642-911c-55e03a0127fd', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='bbd2ad7f78cf02100add6077decdf885ffebfe37743bfc13e80e61b5b833e46c', text='<div><div><p>\n <a href="/static/transformers-bumblebee-d4ec62a6a46aae8f83b7419242f59f4e-48854.jpg">\n \n \n \n \n \n \n \n </a>\n </p>\n<p>In this post, we will be describing a class of sequence processing models known as Transformers (…<em>robots in disguise</em>).\nJokes aside, Transformers came out on the scene not too long ago and have rocked the natural language processing community because of\ntheir pitch: state-of-the-art and efficient sequence processing <strong>without recurrent units or convolution</strong>. </p>\n<p><em>“No recurrent units or convolution?! What are these models even made of?!”</em>, you may be exclaiming to unsuspecting strangers on the streets. </p>\n<p>Not much it turns out, other than a bunch of attention and feedf

In [4]:
metaphor_tool.find_similar('https://www.mihaileric.com/posts/transformers-attention-in-disguise/')

[{'title': 'Transformers: Attention in Disguise',
  'url': 'https://www.mihaileric.com/posts/transformers-attention-in-disguise/',
  'id': 'iEYMai5rS9k0hN5_BH0VZg'},
 {'title': 'Transformers: a Primer',
  'url': 'http://www.columbia.edu/~jsl2239/transformers.html',
  'id': 'cXbT9IsR5u8NtLTAIcUWOA'},
 {'title': 'Illustrated Guide to Transformers- Step by Step Explanation',
  'url': 'https://towardsdatascience.com/illustrated-guide-to-transformers-step-by-step-explanation-f74876522bc0?gi=8fe76db5c4d9',
  'id': 'czTCPZ1vqo-f92WQwKvRig'}]

## Avoiding Context Window Problems

The above example shows the main uses of the Metaphor tool. We can easily retrieve a clean list of links related to a query, and then we can fetch the content of the article as a cleaned up html extract. 

We can see that the content of the articles is somewhat long compared to current LLM context windows, and so to allow retrieval and summary of many documents we will set up and use another tool from LlamaIndex that allows us to load text into a VectorStore, and query it for retrieval

In [5]:
from llama_index.tools.tool_spec.load_and_search.base import LoadAndSearchToolSpec

# The retrieve documents tool is the second in the tool list, as seen above
wrapped_retrieve = LoadAndSearchToolSpec.from_defaults(
    metaphor_tool_list[1],
)

Our wrapped retrieval tools seperate loading and reading into seperate interfaces. We use `load` to load the documents into the vector store, and `read` to query the vector store. Let's try it out again

In [6]:
wrapped_retrieve.load(['iEYMai5rS9k0hN5_BH0VZg'])
wrapped_retrieve.read('what is a transformer')

'\nA Transformer is a class of sequence processing models that use attention and feedforward operations to achieve state-of-the-art and efficient sequence processing without recurrent units or convolution.'

## Creating the Agent

We now have a total of 5 tools that we will make available to the Agent, allowing it to use Metaphor's services to it's full potential:

`search`: To search for a list of relevant articles

`retrieve_documents`: To load the articles into a vector store

`read_retrieve_documents`: To query the articles loaded into the vector store

`find_similar`: To find articles similiar to a given URL

`get_date`: A simple utility method to get todays date for the Agent, for better usage of date based filtering


We will pass all of these functions into our Agent, and test it out below:

In [7]:

# We don't give the Agent our unwrapped retrieve document tools, instead passing the wrapped tools
agent = OpenAIAgent.from_tools(
    [metaphor_tool_list[0], *wrapped_retrieve.to_tool_list(), metaphor_tool_list[2], metaphor_tool_list[3]],
    verbose=True,
)


In [8]:
print(agent.chat('Can you summarize all of the news published in the last month on superconductors'))

=== Calling Function ===
Calling function: current_date with args: {}
Got output: 2023-08-16
=== Calling Function ===
Calling function: search with args: {
  "query": "news about superconductors",
  "start_published_date": "2023-07-16",
  "end_published_date": "2023-08-16"
}
Got output: [{'title': 'Get ready for the age of the superconductor (maybe)', 'url': 'https://newatlas.com/science/get-ready-for-the-age-of-the-superconductor-maybe/', 'id': '1-PPJlmzmhdzV8cjvGftLw'}, {'title': 'After LK-99 letdown, scientists still hopeful to find ‘Holy Grail’ superconductor - WTOP News', 'url': 'https://wtop.com/tech/2023/08/after-lk-99-letdown-scientists-still-hopeful-to-find-holy-grail-superconductor/', 'id': '93B_9_V-66be9IAhiejhAQ'}, {'title': 'Team creates simple superconducting device that could dramatically cut energy use in computing', 'url': 'https://phys.org/news/2023-07-team-simple-superconducting-device-energy.html', 'id': 'rdfzvO08RiLEF1O-ihfOmQ'}, {'title': 'Korean team claims to ha

We asked the agent to retrieve documents related to superconductors from this month. It used the `get_date` tool to determine the current month, and then applied the filters in Metaphor based on publication date when calling `search`. It then loaded the documents using `retrieve_documents` and read them using `read_retrieve_documents`.

We can make another query to the vector store to read from it again, now that the articles are loaded:

In [9]:
print(agent.chat('whats the process to make the new superconductor?'))

=== Calling Function ===
Calling function: read_retrieve_documents with args: {
  "query": "process to make new superconductor"
}
Got output: 
The process to make the new superconductor involves substituting a fraction of the lead in the material with copper ions, resulting in tiny structural distortions. This deformation leads to the creation of superconducting quantum wells, which are key to achieving superconductivity. The material is then unable to relax and lose its superconductivity. The process may also involve using a laser, a diamond anvil, and a lot of pressure.
The process to make the new superconductor involves substituting a fraction of the lead in the material with copper ions, which leads to tiny structural distortions. These distortions create superconducting quantum wells, which are essential for achieving superconductivity. The material is intentionally prevented from relaxing and losing its superconductivity. Additionally, the process may involve the use of a laser, 