# Building an Analytics Dashboard Assistant with OpenAI

This tutorial will guide you through creating an intelligent analytics assistant using OpenAI's Assistants API. Our assistant will be capable of:
- Analyzing multiple data files using File Search
- Generating visualizations and insights using Code Interpreter
- Creating interactive dashboards based on user queries

## Setup and Dependencies

First, let's install the required packages:

In [None]:
%pip install openai pandas matplotlib seaborn plotly

In [1]:
import os
import getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"var: ")

_set_env("OPENAI_API_KEY")

In [2]:
import json
import time
from IPython.display import display, HTML

## Initializing the OpenAI Client

First, we'll set up our OpenAI client with the appropriate API key:

In [3]:
from openai import OpenAI
import time

# Initialize the OpenAI client
client = OpenAI()

## Creating the Analytics Assistant

We'll create an assistant that combines both Code Interpreter and File Search capabilities:

In [5]:
def create_analytics_assistant():
    assistant = client.beta.assistants.create(
        name="Analytics Dashboard Assistant",
        instructions="""You are an expert data analyst and visualization specialist. 
        Your role is to:
        1. Analyze data files provided by users
        2. Generate insightful visualizations
        3. Create comprehensive analytics dashboards
        4. Explain trends and patterns in the data
        Always provide clear explanations of your analysis process.""",
        model="gpt-4o",
        tools=[
            {"type": "code_interpreter"},
            {"type": "file_search"}
        ]
    )
    return assistant

analytics_assistant = create_analytics_assistant()

## Setting Up the Vector Store for File Search

The File Search capability requires setting up a vector store for our data files:

In [6]:
def create_vector_store(name="Analytics Files"):
    vector_store = client.beta.vector_stores.create(
        name=name,
    )
    return vector_store

def add_files_to_vector_store(vector_store_id, file_ids):
    batch = client.beta.vector_stores.file_batches.create_and_poll(
        vector_store_id=vector_store_id,
        file_ids=file_ids
    )
    return batch

# Create vector store
vector_store = create_vector_store()

# Update assistant with vector store
analytics_assistant = client.beta.assistants.update(
    assistant_id=analytics_assistant.id,
    tool_resources={
        "file_search": {
            "vector_store_ids": [vector_store.id]
        }
    }
)

## File Upload Helper Functions

Let's create helper functions to handle file uploads:

In [7]:
def upload_file(file_path):
    """Upload a file for the assistant to use"""
    with open(file_path, 'rb') as file:
        response = client.files.create(
            file=file,
            purpose='assistants'
        )
    return response

def attach_files_to_assistant(assistant_id, file_ids):
    """Attach files to the assistant for code interpreter"""
    assistant = client.beta.assistants.update(
        assistant_id=assistant_id,
        tool_resources={
            "code_interpreter": {
                "file_ids": file_ids
            }
        }
    )
    return assistant

## Creating and Managing Threads

Now let's create functions to manage conversation threads:

In [8]:
def create_thread_with_files(files=None):
    """Create a new thread with optional files"""
    if files:
        messages = [{
            "role": "user",
            "content": "I've uploaded some files for analysis.",
            "attachments": [
                {
                    "file_id": file_id,
                    "tools": [{"type": "code_interpreter"}, {"type": "file_search"}]
                } for file_id in files
            ]
        }]
        thread = client.beta.threads.create(messages=messages)
    else:
        thread = client.beta.threads.create()
    return thread

def add_message_to_thread(thread_id, content, files=None):
    """Add a message to an existing thread"""
    if files:
        message = client.beta.threads.messages.create(
            thread_id=thread_id,
            role="user",
            content=content,
            attachments=[
                {
                    "file_id": file_id,
                    "tools": [{"type": "code_interpreter"}, {"type": "file_search"}]
                } for file_id in files
            ]
        )
    else:
        message = client.beta.threads.messages.create(
            thread_id=thread_id,
            role="user",
            content=content
        )
    return message

## Running the Assistant and Handling Responses

Here's how we'll handle running the assistant and processing its responses:

In [9]:
def run_assistant(thread_id, assistant_id):
    """Create and manage a run of the assistant"""
    run = client.beta.threads.runs.create(
        thread_id=thread_id,
        assistant_id=assistant_id
    )
    
    while True:
        run = client.beta.threads.runs.retrieve(
            thread_id=thread_id,
            run_id=run.id
        )
        
        if run.status == 'completed':
            break
        elif run.status == 'failed':
            raise Exception(f"Run failed: {run.last_error}")
        elif run.status == 'requires_action':
            # Handle any required actions (function calls, etc.)
            pass
        
        time.sleep(1)
    
    # Get messages after run completes
    messages = client.beta.threads.messages.list(thread_id=thread_id)
    return messages

def display_assistant_response(messages):
    """Display the assistant's response including any generated visualizations"""
    for message in messages:
        if message.role == "assistant":
            for content in message.content:
                if content.type == 'text':
                    print(content.text.value)
                elif content.type == 'image_file':
                    # Handle image display
                    file_id = content.image_file.file_id
                    image_data = client.files.content(file_id)
                    # Display image using IPython
                    display(HTML(f'<img src="data:image/png;base64,{image_data}" />'))

## Example Usage: Creating an Analytics Dashboard

Let's put it all together with an example:

In [10]:
# Upload sample data files
pdf_data_new = upload_file('./pdfs/ai-agents-paper.pdf')

In [11]:
pdf_data_new.id

'file-6btRe3kHaTTDvBfww662MG'

In [12]:
batch = add_files_to_vector_store(vector_store.id, [pdf_data_new.id,pdf_data_new.id])
batch

VectorStoreFileBatch(id='vsfb_52ce69881fee483a81424da0395be857', created_at=1752510062, file_counts=FileCounts(cancelled=0, completed=1, failed=0, in_progress=0, total=1), object='vector_store.file_batch', status='completed', vector_store_id='vs_68752e4145308191868f1e23c0ab2b63')

In [13]:
assistant = client.beta.assistants.update(
    assistant_id=analytics_assistant.id,
    tool_resources={
        "file_search": {
            "vector_store_ids": [vector_store.id]
        }
    }
)

In [14]:
# # # Add files to vector store
# # add_files_to_vector_store(vector_store.id, [sales_data.id])

# # Create a thread with the files
thread = create_thread_with_files([pdf_data_new.id, pdf_data_new.id])

# Ask for analysis and dashboard creation
analysis_request = """
Summarize the pdfs into a simple bullet point structure.
"""

add_message_to_thread(thread.id, analysis_request)

# Run the assistant
messages = run_assistant(thread.id, assistant.id)

In [15]:
from IPython.display import Markdown

Markdown(messages.data[0].content[0].text.value)

Here is a bullet-point summary of the content from the PDF titled "ai-agents-paper.pdf":

- **Purpose and Scope:**
  - Discusses the construction and applications of LLM-based autonomous agents in social sciences, natural sciences, and engineering.
  - Provides strategies for evaluating LLM-based autonomous agents using subjective and objective criteria【11:0†source】.

- **LLM-based Autonomous Agent Construction:**
  - Two main aspects: Architecture design to leverage LLM capabilities and agent capability acquisition for task-specific functionality.
  - Architecture setup akin to network structure; capability acquisition similar to network parameter learning【11:0†source】.
  - Unified framework for agent architecture design and capability acquisition strategies【11:18†source】.

- **Agent Architecture Design:**
  - Focus on creating architectures that enhance LLM functionality through integration with additional modules.
  - Emphasis on bridging capabilities of traditional LLMs with requirements of autonomous agents【11:0†source】.

- **Capability Acquisition Strategies:**
  - Fine-tuning with task-specific datasets as a direct method.
  - Alternative strategies that do not require fine-tuning are suitable for both open and closed-source LLMs, albeit with context window limitations【11:10†source】【11:12†source】.

- **Applications in Various Fields:**
  - **Social Science:** Simulation experiments, mental health support, political science, economy, and social simulations.
  - **Natural Science:** Experiment assistants and education tools.
  - **Engineering:** Civil, computer, and aerospace engineering applications【11:10†source】.

- **Memory and Reflection in Agents:**
  - Importance of managing memory (e.g., dealing with overflow and duplicates).
  - Agents can reflect on past experiences to derive insights and improve decision-making【11:4†source】【11:8†source】.

- **Planning Module:**
  - Decomposition of tasks into subtasks for efficient problem-solving.
  - Use of single-path and multi-path reasoning with feedback mechanisms【11:9†source】【11:16†source】.

- **Evaluation and Challenges:**
  - Subjective evaluations like human annotations and Turing tests.
  - Objective evaluations with specific metrics and benchmarks【11:10†source】.
  - Challenges include improving inference speed and optimizing agent actions using LLMs【11:14†source】【11:15†source】.

In [16]:
# Upload sample data files
pdf_data = upload_file('./pdfs/paper.pdf')

# # Add files to vector store
# add_files_to_vector_store(vector_store.id, [sales_data.id])

# Create a thread with the files
thread = create_thread_with_files([pdf_data.id])

# Ask for analysis and dashboard creation
analysis_request = """
Summarize the pdf data.
"""

add_message_to_thread(thread.id, analysis_request)

# Run the assistant
messages = run_assistant(thread.id, analytics_assistant.id)

In [17]:
Markdown(messages.data[0].content[0].text.value)

The uploaded PDF file, titled "paper.pdf," provides an in-depth exploration of the Transformer model, particularly in the context of machine translation and English constituency parsing.

1. **Transformer Model Overview**:
   - The document discusses the Transformer, a sequence transduction model which relies entirely on attention mechanisms, specifically self-attention, rather than recurrent layers used in typical encoder-decoder architectures【9:7†source】.

2. **Model Variations and Performance**:
   - The file presents various configurations of the Transformer model, evaluating their performance on English-to-German translations. Different model sizes, attention head variations, and embedding settings are analyzed for their impact on BLEU scores and training efficiency【9:2†source】【9:5†source】.
   - It highlights the more efficient performance of the Transformer (big model) over previous state-of-the-art models in translation tasks【9:5†source】.

3. **Machine Translation**:
   - The paper reports that the Transformer model achieves high BLEU scores with relatively lower training costs compared to other architectures, marking a significant improvement in translation tasks【9:5†source】【9:7†source】.

4. **English Constituency Parsing**:
   - The document also delves into the application of the Transformer model for English constituency parsing, where it outperforms other models, confirming its versatility beyond just translation tasks【9:12†source】.

5. **Technical Details**:
   - Detailed descriptions of model components, such as multi-head attention and position-wise feed-forward networks, are included【9:10†source】【9:15†source】.
   - Discussions on positional encoding and regularization techniques such as dropout and label smoothing are also part of the study【9:3†source】【9:5†source】.

This summary captures the primary focus and findings presented in the PDF, showcasing both the theoretical insights and practical evaluations that the document provides on the Transformer model.

## Best Practices and Tips

1. **File Management**:
   - Keep track of file IDs and clean up unused files
   - Use appropriate file formats (CSV, JSON, Excel) for data
   - Consider file size limits (512MB per file)

2. **Vector Store Organization**:
   - Group related files in the same vector store
   - Use descriptive names for vector stores
   - Monitor vector store expiration policies

3. **Error Handling**:
   - Implement proper error handling for API calls
   - Monitor run status and handle failures gracefully
   - Validate file uploads and data formats

4. **Performance Optimization**:
   - Use appropriate chunk sizes for File Search
   - Monitor token usage and context windows
   - Implement request rate limiting

## Conclusion

This tutorial demonstrated how to create an intelligent analytics assistant that combines the power of OpenAI's Code Interpreter and File Search capabilities. The assistant can analyze multiple data sources, generate visualizations, and create interactive dashboards based on user queries.

You can extend this foundation by:
- Adding more sophisticated visualization capabilities
- Implementing custom dashboard templates
- Adding support for more data formats
- Creating specialized analysis functions
- Implementing caching for frequently accessed data

Remember to handle API keys securely and implement proper error handling in production environments.