## OpenAI Assistant API

The OpenAI Assistants API is a framework for building AI assistants within applications, leveraging OpenAI models like GPT-3.5 and GPT-4. These assistants can perform a variety of tasks, supported by tools like Code Interpreter, Retrieval, and Function calling. It's currently in beta, with ongoing development and feedback collection.

[Docs](https://platform.openai.com/docs/assistants/overview)

#### Components

- **Assistant**: The top-level entity, configured with specific instructions, a chosen model, and tools.
- **Thread**: Each thread is a unique conversation session with the Assistant, containing the dialogue history.
- **Message**: Individual components of a Thread, representing communication from either the user or the Assistant.
- **Run**: An instance where the Assistant processes a Thread to generate responses, guided by its configuration.
- **Run Step:** Detailed breakdown of actions taken during a Run, including message creation and tool invocations.

```bash
OpenAI Assistants API
│
├── Assistant
│   ├── Model (e.g., GPT-3.5 or GPT-4)
│   ├── Instructions (Defines behavior and response guidelines)
│   ├── Tools (Code Interpreter, Retrieval, Function Calling)
│   └── Files (Optional, for file-based interactions)
│
├── Thread (Represents a conversation session)
│   ├── Message 1 (User or Assistant generated)
│   │   ├── Text
│   │   └── Files (Optional)
│   ├── Message 2
│   ├── Message 3
│   └── ...(Additional Messages as needed)
│
└── Run (Invocation of an Assistant on a Thread)
    ├── Run Step 1 (Details of each action taken)
    │   ├── Message Creation
    │   └── Tool Calls (if any)
    ├── Run Step 2
    └── ...(Additional Run Steps as needed)
```


#### Tools

**Code Interpreter:** Enables Assistants to write and execute Python code, processing and generating diverse data formats, including images and CSV files.
**Knowledge Retrieval**: Augments Assistants with external knowledge, like proprietary information or user-provided documents. It implements vector search for content retrieval.
**Function Calling**: Similar to the Chat Completions API, this allows Assistants to call defined functions and use their results in ongoing Runs.


#### File Handling
- Assistants can interact with various file formats, with a maximum size of 512MB per file.
- File citations and paths are included in Messages, allowing for referencing and downloading generated files.


#### Limitations
- Limitations include absence of support for image files in user Messages, streaming output, and notifications for object status updates.

In [1]:
import os
import sys
import time
import argparse
from typing import List, Callable

sys.path.append('./')

from modules import llm
from modules import rand
from turbo4 import Turbo4
from modules import embeddings
from data_types import Chat, TurboTool, FactStoringTool
from instruments import PostgresAgentInstruments

from dotenv import find_dotenv, load_dotenv
load_dotenv(find_dotenv())

DB_URL = os.environ.get("POSTGRES_CONNECTION_URL")
POSTGRES_TABLE_DEFINITIONS_CAP_REF = "TABLE_DEFINITIONS"


In [2]:
custom_function_tool_config = {
    "type": "function",
    "function": {
        "name": "store_fact",
        "description": "A function that stores a fact.",
        "parameters": {
            "type": "object",
            "properties": {"fact": {"type": "string"}},
        },
    },
}

run_sql_tool_config = {
    "type": "function",
    "function": {
        "name": "run_sql",
        "description": "Run a SQL query against the postgres database",
        "parameters": {
            "type": "object",
            "properties": {
                "sql": {
                    "type": "string",
                    "description": "The SQL query to run",
                }
            },
            "required": ["sql"],
        },
    },
}

In [3]:
raw_prompt = "Get the mean age of everyone who had covid."
prompt = f"Fulfill this database query: {raw_prompt}. "
tools = []

assistant_name = "Turbo4"
assistant = Turbo4()
session_id = rand.generate_session_id(assistant_name + raw_prompt)

In [4]:
with PostgresAgentInstruments(DB_URL, session_id) as (agent_instruments, db):
    database_embedder = embeddings.DatabaseEmbedder(db)

    table_definitions = database_embedder.get_similar_table_defs_for_prompt(
        raw_prompt
    )

    prompt = llm.add_cap_ref(
        prompt,
        f"Use these {POSTGRES_TABLE_DEFINITIONS_CAP_REF} to satisfy the database query.",
        POSTGRES_TABLE_DEFINITIONS_CAP_REF,
        table_definitions,
    )

    # CUSTOM TOOLS WITH CLASS
    tools = [
        TurboTool("run_sql", run_sql_tool_config, agent_instruments.run_sql),
    ]

    res = (
        assistant.get_or_create_assistant(assistant_name)
        .set_instructions(
            "You're an elite SQL developer. You generate the most concise and performant SQL queries."
        )
        .equip_tools(tools)
        .make_thread()
        .add_message(prompt)
        .run_thread()
        .add_message(
            "Use the run_sql function to run the SQL you've just generated.",
        )
        .run_thread(toolbox=[tools[0].name])
        .run_validation(agent_instruments.validate_run_sql)
        .spy_on_assistant(agent_instruments.make_agent_chat_file(assistant_name))
        .get_costs_and_tokens(
            agent_instruments.make_agent_cost_file(assistant_name)
        )
    )

    print(f"✅ Turbo4 Assistant finished.")


get_or_create_assistant(Turbo4, gpt-4-1106-preview)
set_instructions()
equip_tools([TurboTool(name='run_sql', config={'type': 'function', 'function': {'name': 'run_sql', 'description': 'Run a SQL query against the postgres database', 'parameters': {'type': 'object', 'properties': {'sql': {'type': 'string', 'description': 'The SQL query to run'}}, 'required': ['sql']}}}, function=<bound method PostgresAgentInstruments.run_sql of <instruments.PostgresAgentInstruments object at 0x106e46f50>>)], False)
make_thread()
add_message(Fulfill this database query: Get the mean age of everyone who had covid..  Use these TABLE_DEFINITIONS to satisfy the database query.

TABLE_DEFINITIONS

CREATE TABLE patients (
id integer,
patient_id character varying(255),
state character varying(2),
fips integer,
diagnosed_covid boolean,
diagnoses_date date,
current_age integer,
age_at_diagnosis double precision
);

CREATE TABLE patient_medical_history (
id integer,
patient_id character varying(255),
has_diabetes

In [5]:
# Check if there are chat messages
if res.chat_messages:
    # Retrieve the last chat message
    final_message = res.chat_messages[0].message
    print("Final Message from Assistant:", final_message)
else:
    print("No chat messages found.")

Final Message from Assistant: The SQL query to get the mean age of everyone who had COVID has been successfully executed. The result has been delivered to a JSON file.


### Simple Prompt Solution 

- Same thing, only 2 api calls instead of 8+



In [6]:
session_id = rand.generate_session_id(assistant_name + raw_prompt)

In [7]:
with PostgresAgentInstruments(DB_URL, session_id) as (agent_instruments, db):
    database_embedder = embeddings.DatabaseEmbedder(db)

    table_definitions = database_embedder.get_similar_table_defs_for_prompt(
        raw_prompt
    )

    prompt = f"Fulfill this database query: {raw_prompt}."

    prompt = llm.add_cap_ref(
        prompt,
        f"Use these {POSTGRES_TABLE_DEFINITIONS_CAP_REF} to satisfy the database query.",
        POSTGRES_TABLE_DEFINITIONS_CAP_REF,
        table_definitions,
    )

    # CUSTOM TOOLS WITH CLASS
    tools = [
        TurboTool("run_sql", run_sql_tool_config, agent_instruments.run_sql),
    ]

    sql_response = llm.prompt(
        prompt,
        model="gpt-4-1106-preview",
        instructions="You're an elite SQL developer. You generate the most concise and performant SQL queries.",
    )

    print(sql_response)

    # Chaining
    final_response = llm.prompt_func(
        "Use the run_sql function to run the SQL you've just generated: "
        + sql_response,
        model="gpt-4-1106-preview",
        instructions="You're an elite SQL developer. You generate the most concise and performant SQL queries.",
        turbo_tools=tools,
    )

    print(final_response)
    print(f"sql validation result: {agent_instruments.validate_run_sql()}")


To calculate the mean age of all patients who had been diagnosed with COVID-19, you can use the following SQL query:

```sql
SELECT AVG(current_age) AS mean_age
FROM patients
WHERE diagnosed_covid = TRUE;
```

This SQL query selects the average (mean) of the `current_age` column from the `patients` table but only includes rows where `diagnosed_covid` is set to `TRUE`, meaning only the patients who had been diagnosed with COVID-19 will be considered in the calculation. The result will be labeled as `mean_age`.
['Successfully delivered results to json file']
sql validation result: (True, '')


#### Custom Tool

In [8]:
# Tool
def store_fact(fact: str, folder: str = "./agent_results/facts"):
    # Ensure the folder exists
    os.makedirs(folder, exist_ok=True)

    # Define a file name. You might want to customize this to avoid overwriting files.
    file_name = f"fact_{int(time.time())}.txt"
    file_path = os.path.join(folder, file_name)

    # Store the fact in the file
    with open(file_path, 'w') as file:
        file.write(fact)
        print(f"Fact stored in {file_path}")

    return "Fact stored."

In [10]:
assistant_name = "Turbo4_facts"
assistant = Turbo4()
session_id = rand.generate_session_id(assistant_name + raw_prompt)
tools.append(FactStoringTool(name="store_fact", config=custom_function_tool_config, function=store_fact))


In [11]:
# ----------- Example use case of Turbo4 and the Assistants API ------------
res = (
    assistant.get_or_create_assistant(assistant_name)
    .make_thread()
    .equip_tools(tools)
    .add_message("Generate 10 random facts about LLM technology.")
    .run_thread()
    .add_message("Use the store_fact function to 1 fact.")
    .run_thread(toolbox=["store_fact"])
)

get_or_create_assistant(Turbo4_facts, gpt-4-1106-preview)
make_thread()
equip_tools([TurboTool(name='run_sql', config={'type': 'function', 'function': {'name': 'run_sql', 'description': 'Run a SQL query against the postgres database', 'parameters': {'type': 'object', 'properties': {'sql': {'type': 'string', 'description': 'The SQL query to run'}}, 'required': ['sql']}}}, function=<bound method PostgresAgentInstruments.run_sql of <instruments.PostgresAgentInstruments object at 0x2c08e3040>>), FactStoringTool(name='store_fact', config={'type': 'function', 'function': {'name': 'store_fact', 'description': 'A function that stores a fact.', 'parameters': {'type': 'object', 'properties': {'fact': {'type': 'string'}}}}}, function=<function store_fact at 0x2920f4700>)], False)
add_message(Generate 10 random facts about LLM technology.)
run_thread(None)
add_message(Use the store_fact function to 1 fact.)
run_thread(['store_fact'])
run_thread() Calling store_fact({'fact': "LLM stands for 'Large 

In [12]:
# Check if there are chat messages
if res.chat_messages:
    # Retrieve the last chat message
    final_message = res.chat_messages[0].message
    print("Final Message from Assistant:", final_message)
else:
    print("No chat messages found.")

Final Message from Assistant: The fact about LLMs has been successfully stored. If you need any more assistance or have more facts to store, feel free to let me know!
