# Tools: Retrieval

In [1]:
from openai import OpenAI
import json
from dotenv import load_dotenv
import time

In [2]:
load_dotenv()

True

In [3]:
client = OpenAI()

In [4]:
def show_json(obj):
    display(json.loads(obj.model_dump_json()))

**Create an assistant with 'retrieval' enabled**

In [5]:
assistant = client.beta.assistants.create(
  instructions="You are a customer support chatbot. Use your knowledge base to best respond to customer queries.",
  model="gpt-4-1106-preview",
  tools=[{"type": "retrieval"}]
)

**Upload the 'knowledge' file that will be used for retrieval (Indicate an 'assistants' purpose)**

The file I used for this demo is the one that started it all with LLMs: "Attention Is All You Need" by Vaswani et al.
You can find it in: https://arxiv.org/pdf/1706.03762.pdf

In [6]:
file = client.files.create(
  file=open("attentionisallyouneed.pdf", "rb"),
  purpose='assistants'
)

**Add the file to the Assistant**

In [7]:
assistant = client.beta.assistants.update(
  assistant.id,
  tools=[{"type": "retrieval"}],
  file_ids=[file.id]
)

**Create a Thread, specify the Messaage, add it to the Thread, and Run**

In [8]:
thread = client.beta.threads.create()

In [9]:
user_message = "What are the applications of Attention in our Model?"

In [10]:
message = client.beta.threads.messages.create(
  thread_id=thread.id,
  role="user",
  content=user_message,
  file_ids=[file.id]
)

In [11]:
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
)

In [12]:
def wait_on_run(run, thread):
    while run.status == "queued" or run.status == "in_progress":
        run = client.beta.threads.runs.retrieve(
            thread_id=thread.id,
            run_id=run.id,
        )
        time.sleep(0.5)
    return run

In [13]:
run = wait_on_run(run, thread)
show_json(run)

{'id': 'run_PwwfGdUS1eg2JsDzW0B3BJM0',
 'assistant_id': 'asst_KMgcINhW2I784lTr0JDLINog',
 'cancelled_at': None,
 'completed_at': 1703116088,
 'created_at': 1703116066,
 'expires_at': None,
 'failed_at': None,
 'file_ids': ['file-1wLy8MwPEI56OxhHU63VYvcr'],
 'instructions': 'You are a customer support chatbot. Use your knowledge base to best respond to customer queries.',
 'last_error': None,
 'metadata': {},
 'model': 'gpt-4-1106-preview',
 'object': 'thread.run',
 'required_action': None,
 'started_at': 1703116068,
 'status': 'completed',
 'thread_id': 'thread_jNSq0UHUFblEJnOP8bfttcOX',
 'tools': [{'type': 'retrieval'}]}

In [14]:
messages = client.beta.threads.messages.list(thread_id=thread.id, order="desc")
show_json(messages)

{'data': [{'id': 'msg_SHEexyUbXPsymbC2LBz9VQKo',
   'assistant_id': 'asst_KMgcINhW2I784lTr0JDLINog',
   'content': [{'text': {'annotations': [],
      'value': 'The model described in the document utilizes multi-head attention in three distinct ways, each facilitating different aspects of sequence-to-sequence information processing:\n\n1. **Encoder-Decoder Attention:**\n   - The queries in this layer come from the previous decoder layer, while the memory keys and values are derived from the output of the encoder.\n   - This configuration allows every position in the decoder to attend over all positions in the input sequence, following the typical encoder-decoder attention mechanisms found in some sequence-to-sequence models.\n\n2. **Encoder Self-Attention Layers:**\n   - In these layers, the keys, values, and queries all originate from the output of the previous encoder layer.\n   - Each position in the encoder has the ability to attend to all positions in the previous layer of the enc

In [15]:
print(messages.data[0].content[0].text.value)

The model described in the document utilizes multi-head attention in three distinct ways, each facilitating different aspects of sequence-to-sequence information processing:

1. **Encoder-Decoder Attention:**
   - The queries in this layer come from the previous decoder layer, while the memory keys and values are derived from the output of the encoder.
   - This configuration allows every position in the decoder to attend over all positions in the input sequence, following the typical encoder-decoder attention mechanisms found in some sequence-to-sequence models.

2. **Encoder Self-Attention Layers:**
   - In these layers, the keys, values, and queries all originate from the output of the previous encoder layer.
   - Each position in the encoder has the ability to attend to all positions in the previous layer of the encoder.

3. **Decoder Self-Attention Layers:**
   - Positions within the decoder can attend to all positions in the decoder up to and including the current position.
   - 