# Introduction to GPT Assistants API: Building a Research Assistant

## Overview
In this notebook, we'll explore the OpenAI Assistants API by building a practical research assistant that can help analyze academic papers and generate research summaries. We'll cover the fundamental concepts of the API and walk through a complete implementation.

The Assistants API allows us to create AI assistants with specific personalities, capabilities, and access to various tools. In this lesson, we'll learn about:
- Creating an Assistant with custom instructions
- Managing conversation Threads
- Sending and receiving Messages
- Executing Runs
- Working with the Code Interpreter tool

## Setup and Dependencies

In [1]:
import os
import getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"var: ")

_set_env("OPENAI_API_KEY")

In [2]:
from openai import OpenAI
import time

# Initialize the OpenAI client
client = OpenAI()

## What is the Assistants API?

The Assistants API is a powerful toolkit for building AI-driven applications. It allows developers to create assistants capable of responding to user queries using **models**, **tools**, and **files**. These assistants are designed to solve problems, perform computations, and provide helpful, context-aware interactions.

### Key Features:
1. **Customizable Behavior**:
   - Define how the assistant behaves using instructions.
   - Tailor the assistant's personality and capabilities.

2. **Tool Integration**:
   - Leverage built-in tools such as:
     - **Code Interpreter**: Execute and debug Python code.
     - **File Search**: Search through uploaded files.
     - **Function Calling**: Integrate external tools for custom tasks.

3. **Persistent Conversations**:
   - Use **Threads** to store conversations and manage context.
   - Threads automatically truncate long histories to fit model limits.

4. **Object Architecture**:
   - **Assistant**: The core AI entity.
   - **Thread**: A persistent conversation between a user and the assistant.
   - **Message**: A unit of communication (text, files, etc.) in a thread.
   - **Run**: A session where the assistant processes input and generates output.

## Understanding the Core Concepts

The Assistants API is built around several key objects:
1. **Assistant**: The AI entity with specific capabilities and instructions
2. **Thread**: A conversation session that maintains message history
3. **Message**: Individual communications between the user and assistant
4. **Run**: An execution of the assistant on a thread
5. **Run Step**: Detailed steps taken by the assistant during a run

## Creating Our Research Assistant

Let's create an assistant specialized in research paper analysis:

In [3]:
assistant = client.beta.assistants.create(
    name="Research Analyst",
    instructions="""You are a research assistant specialized in analyzing academic papers and research data.
    Your tasks include:
    - Summarizing research findings
    - Analyzing statistical data
    - Creating visualizations of research results
    - Providing critical analysis of methodologies
    Always maintain academic rigor and cite specific sections when referring to source materials.""",
    tools=[{"type": "code_interpreter"}],
    model="gpt-4o"
)

print(f"Assistant ID: {assistant.id}")

Assistant ID: asst_R2mnM0fsD4DFpdGfNxE9KFgJ


## Starting a Research Session

When a user wants to begin analyzing a paper, we create a new Thread:

In [4]:
thread = client.beta.threads.create()
print(f"Thread ID: {thread.id}")

Thread ID: thread_7fKBVFYI36vI167Fldwjwwhm


## Adding Research Questions

Let's simulate a user asking questions about a research dataset:

In [None]:
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="""I have a dataset of patient recovery times after two different treatments.
    Can you help me analyze the statistical significance of the results?
    
    Treatment A: [45, 42, 39, 47, 41, 43, 40, 44, 38, 46]
    Treatment B: [52, 49, 51, 47, 50, 53, 48, 51, 49, 50]"""
)

## Creating a Run with Response Streaming

We'll implement a response handler to process the assistant's analysis in real-time:

In [6]:
try:
    from typing_extensions import override
except ImportError:
    # Fallback if typing_extensions is not available
    def override(func):
        return func

from openai import AssistantEventHandler

class ResearchEventHandler(AssistantEventHandler):
    @override
    def on_text_created(self, text) -> None:
        print(f"\nAssistant > ", end="", flush=True)
    
    @override
    def on_text_delta(self, delta, snapshot):
        print(delta.value, end="", flush=True)
    
    def on_tool_call_created(self, tool_call):
        print(f"\nRunning analysis: {tool_call.type}\n", flush=True)
    
    def on_tool_call_delta(self, delta, snapshot):
        if delta.type == 'code_interpreter':
            if delta.code_interpreter.input:
                print(delta.code_interpreter.input, end="", flush=True)
            if delta.code_interpreter.outputs:
                print(f"\nResults:", flush=True)
                for output in delta.code_interpreter.outputs:
                    if output.type == "logs":
                        print(f"\n{output.logs}", flush=True)

# Execute the analysis
with client.beta.threads.runs.stream(
    thread_id=thread.id,
    assistant_id=assistant.id,
    event_handler=ResearchEventHandler(),
) as stream:
    stream.until_done()


Assistant > To analyze the statistical significance of recovery times after two different treatments, we can conduct a two-sample t-test. This test will help determine if there is a significant difference in the means of the two treatments.

### Steps for the Two-Sample t-test:

1. **Assumptions Check:**
   - **Normality:** Each sample should be approximately normally distributed. We can check this by looking at histograms or conducting a normality test.
   - **Homogeneity of Variances:** The variances of the two groups should be equal. We can test this assumption using Levene's test.

2. **Hypotheses:**
   - Null Hypothesis (\(H_0\)): There is no significant difference in mean recovery times between Treatment A and Treatment B (\(\mu_A = \mu_B\)).
   - Alternative Hypothesis (\(H_a\)): There is a significant difference in mean recovery times (\(\mu_A \neq \mu_B\)).

3. **Conduct the Two-Sample t-test:**
   - Calculate the t-statistic and the associated p-value.

Let's perform these s

In [7]:
run = client.beta.threads.runs.create_and_poll(
  thread_id=thread.id,
  assistant_id=assistant.id,
)

run

Run(id='run_Rz5mNLO6JfefY7a0jvQQFZjU', assistant_id='asst_R2mnM0fsD4DFpdGfNxE9KFgJ', cancelled_at=None, completed_at=1752506453, created_at=1752506443, expires_at=None, failed_at=None, incomplete_details=None, instructions='You are a research assistant specialized in analyzing academic papers and research data.\n    Your tasks include:\n    - Summarizing research findings\n    - Analyzing statistical data\n    - Creating visualizations of research results\n    - Providing critical analysis of methodologies\n    Always maintain academic rigor and cite specific sections when referring to source materials.', last_error=None, max_completion_tokens=None, max_prompt_tokens=None, metadata={}, model='gpt-4o', object='thread.run', parallel_tool_calls=True, required_action=None, response_format='auto', started_at=1752506448, status='completed', thread_id='thread_7fKBVFYI36vI167Fldwjwwhm', tool_choice='auto', tools=[CodeInterpreterTool(type='code_interpreter')], truncation_strategy=TruncationStra

## Retrieving Conversation History

We can review the entire conversation history from the Thread:

In [9]:
messages = client.beta.threads.messages.list(thread_id=thread.id)
for msg in messages:
    try:
        print(f"{msg.role}: {msg.content[0].text.value}\n")
    except:
        print(msg.content[0])

assistant: Is there anything specific you would like to explore further with this data, such as additional visualizations, further statistical analyses, or a discussion around the possible implications of these findings?

assistant: The two-sample t-test yields a p-value of approximately \(2.73 \times 10^{-6}\). This is far below the conventional alpha level of 0.05, indicating a statistically significant difference in the mean recovery times between Treatment A and Treatment B.

### Conclusion:
- **Reject the Null Hypothesis (\(H_0\))**: There is strong evidence to suggest that the mean recovery times for the two treatments are significantly different.
- Treatment A appears to be associated with faster recovery times compared to Treatment B.

If there are further inquiries or if you need assistance with another aspect of this analysis, please feel free to ask!

ImageFileContentBlock(image_file=ImageFile(file_id='file-QmCthetRxPxnmbgbHQbkDN', detail=None), type='image_file')
assistant:

## Practice Exercise

Now it's your turn! Try creating an assistant for a different use case. Here's a template to get started:

API key quickstart:
1. https://platform.openai.com/docs/quickstart?api-mode=responses

OpenAI Assistants API quickstart:
1. https://platform.openai.com/docs/assistants/quickstart

In [None]:
# Create your custom assistant
custom_assistant = client.beta.assistants.create(
    name="[Your Assistant Name]",
    instructions="[Your detailed instructions]",
    tools=[{"type": "code_interpreter"}],  # Add other tools as needed
    model="gpt-4o"
)

# Create a new thread
custom_thread = client.beta.threads.create()

# Add your first message
custom_message = client.beta.threads.messages.create(
    thread_id=custom_thread.id,
    role="user",
    content="[Your first question or request]"
)

# Run the assistant with the event handler
with client.beta.threads.runs.stream(
    thread_id=custom_thread.id,
    assistant_id=custom_assistant.id,
    event_handler=ResearchEventHandler(),
) as stream:
    stream.until_done()

## Key Takeaways

1. The Assistants API provides a structured way to create specialized AI assistants with specific capabilities and personalities.
2. Threads maintain conversation context and history, making it easy to build complex interactions.
3. The streaming capability allows for real-time response processing and interactive tools usage.
4. Tools like Code Interpreter enable assistants to perform complex calculations and generate visualizations.
5. The API's architecture makes it simple to build sophisticated AI applications while maintaining clean conversation management.

## Next Steps

- Experiment with different tool combinations
- Try implementing file handling capabilities
- Explore function calling for custom tool integration
- Build more complex conversation flows using thread management

Remember to handle your API keys securely and implement proper error handling in production environments!