# Tools - Code Interpreter

## Building a Smart Data Analysis Assistant with OpenAI's Code Interpreter

Let's use the Code Interpreter tool in OpenAI's Assistants API to create a sophisticated data analysis assistant. Code Interpreter allows assistants to write and run Python code in a sandboxed environment, making it perfect for data processing, analysis, and visualization tasks.

Key capabilities we'll explore:
- Processing various file formats
- Generating visualizations
- Running iterative analysis
- Handling code execution outputs

## Setup

First, let's set up our environment with the necessary imports:

In [None]:
import os
import getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"var: ")

_set_env("OPENAI_API_KEY")

In [None]:
from openai import OpenAI
import time

# Initialize the OpenAI client
client = OpenAI()

In [None]:
from typing_extensions import override
from openai import AssistantEventHandler
import json

## Creating a Data Analysis Assistant

Let's create an assistant specifically designed for data analysis:

In [None]:
def create_analysis_assistant():
    assistant = client.beta.assistants.create(
        name="Data Analysis Expert",
        instructions="""You are a skilled data analyst and visualization expert.
        When analyzing data:
        1. Start with exploratory data analysis
        2. Create meaningful visualizations
        3. Explain your findings clearly
        4. Use statistical methods when appropriate
        5. Document your code and explain your process""",
        model="gpt-4o",
        tools=[{"type": "code_interpreter"}]
    )
    return assistant

# Create our assistant
analysis_assistant = create_analysis_assistant()
print(f"Created assistant with ID: {analysis_assistant.id}")

## Event Handler for Code Execution

Let's create an event handler to manage the streaming output from code execution:

In [None]:
class AnalysisEventHandler(AssistantEventHandler):
    @override
    def on_text_created(self, text) -> None:
        print(f"\nassistant > ", end="", flush=True)
        
    @override
    def on_text_delta(self, delta, snapshot):
        print(delta.value, end="", flush=True)
    
    @override
    def on_tool_call_created(self, tool_call):
        print(f"\n[Running Code Interpreter]\n", flush=True)
        
    @override
    def on_tool_call_delta(self, delta, snapshot):
        if delta.type == 'code_interpreter':
            if delta.code_interpreter.input:
                print(delta.code_interpreter.input, end="", flush=True)
            if delta.code_interpreter.outputs:
                print(f"\n\nOutput >\n", flush=True)
                for output in delta.code_interpreter.outputs:
                    if output.type == "logs":
                        print(f"{output.logs}", flush=True)

## Thread Manager for Analysis Sessions

Let's create a class to manage our analysis threads:

In [None]:
class AnalysisThreadManager:
    def __init__(self, assistant_id):
        self.client = OpenAI()
        self.assistant_id = assistant_id
        self.thread = None
        
    def start_session(self):
        """Start a new analysis session"""
        self.thread = self.client.beta.threads.create()
        return self.thread
    
    def add_file_for_analysis(self, file_path):
        """Upload and attach a file for analysis"""
        file = self.client.files.create(
            file=open(file_path, "rb"),
            purpose="assistants"
        )
        
        message = self.client.beta.threads.messages.create(
            thread_id=self.thread.id,
            role="user",
            content="Please analyze this data file.",
            attachments=[{
                "file_id": file.id,
                "tools": [{"type": "code_interpreter"}]
            }]
        )
        return file.id, message
    
    def ask_question(self, question):
        """Ask a question about the data"""
        message = self.client.beta.threads.messages.create(
            thread_id=self.thread.id,
            role="user",
            content=question
        )
        return message
    
    def get_analysis(self, event_handler=None):
        """Run the analysis and get results"""
        with self.client.beta.threads.runs.stream(
            thread_id=self.thread.id,
            assistant_id=self.assistant_id,
            event_handler=event_handler
        ) as stream:
            stream.until_done()
        
        # Get the latest message with any generated files
        messages = self.client.beta.threads.messages.list(
            thread_id=self.thread.id,
            order="desc",
            limit=1
        )
        return messages.data[0]
    
    def download_file(self, file_id, output_path):
        """Download a file generated by Code Interpreter"""
        file_content = self.client.files.content(file_id)
        with open(output_path, 'wb') as f:
            f.write(file_content.read())

## Example Analysis Session

Let's put everything together with a sample analysis workflow:

In [None]:
# Initialize our analysis manager
analysis_manager = AnalysisThreadManager(analysis_assistant.id)

# Start a new session
session = analysis_manager.start_session()

# Example: Upload a data file for analysis
# Assuming we have a sample_data.csv file
file_id, _ = analysis_manager.add_file_for_analysis("sample_data.csv")

# Request initial analysis
analysis_manager.ask_question("""
Please perform an initial analysis of this dataset:
1. Check for missing values
2. Show basic statistics
3. Create a visualization of the main trends
4. Identify any interesting patterns
""")

# Get the analysis results
response = analysis_manager.get_analysis(AnalysisEventHandler())

# If there are any generated files, download them
for content in response.content:
    if content.type == 'image_file':
        analysis_manager.download_file(
            content.image_file.file_id,
            f"analysis_plot_{content.image_file.file_id}.png"
        )

## Processing Code Interpreter Outputs

Let's create a function to process and display Code Interpreter outputs:

In [None]:
def process_code_outputs(message):
    """Process and display Code Interpreter outputs from a message"""
    for content in message.content:
        if content.type == 'text':
            print("Analysis Results:")
            print(content.text.value)
        elif content.type == 'image_file':
            print(f"\nGenerated image file: {content.image_file.file_id}")
            # You can download and display the image here

## Advanced Analysis Examples

Here are some example analysis tasks you can perform:

In [None]:
# Example 1: Statistical Analysis
analysis_manager.ask_question("""
Perform a statistical analysis of the numeric columns:
1. Calculate correlations
2. Test for normal distribution
3. Identify potential outliers
4. Create box plots for visualization
""")

# Example 2: Time Series Analysis
analysis_manager.ask_question("""
Analyze the temporal patterns in the data:
1. Plot trends over time
2. Check for seasonality
3. Calculate moving averages
4. Identify any significant changes
""")

# Example 3: Predictive Analysis
analysis_manager.ask_question("""
Build a simple predictive model:
1. Prepare the data (split features/target)
2. Create a basic regression model
3. Evaluate the model performance
4. Visualize the predictions vs actual values
""")

## Handling Generated Files

Code Interpreter can generate various types of files. Here's how to handle them:

In [None]:
def handle_generated_files(message):
    """Process files generated by Code Interpreter"""
    generated_files = []
    
    for content in message.content:
        if content.type == 'image_file':
            generated_files.append({
                'type': 'image',
                'file_id': content.image_file.file_id
            })
        elif content.type == 'text':
            # Check for file path annotations
            for annotation in content.text.annotations:
                if hasattr(annotation, 'file_path'):
                    generated_files.append({
                        'type': 'data',
                        'file_id': annotation.file_path.file_id
                    })
    
    return generated_files

## Best Practices and Tips

1. **File Handling**:
   - Keep files under 512 MB
   - Use supported file formats (CSV, JSON, XLSX, etc.)
   - Clean up downloaded files when no longer needed

2. **Code Execution**:
   - Monitor execution outputs for errors
   - Handle generated files appropriately
   - Keep track of file IDs for later reference

3. **Analysis Flow**:
   - Start with exploratory analysis
   - Break down complex analyses into steps
   - Save important visualizations and results

## Cleanup Function

In [None]:
def cleanup_resources(file_ids=None):
    """Clean up uploaded and generated files"""
    if file_ids:
        for file_id in file_ids:
            try:
                client.files.delete(file_id)
                print(f"Deleted file: {file_id}")
            except Exception as e:
                print(f"Error deleting file: {e}")

## Conclusion

We've built a comprehensive data analysis system using OpenAI's Assistants API with Code Interpreter. The system can:
- Handle data file uploads and downloads
- Perform complex analyses
- Generate visualizations
- Create detailed reports

Try experimenting with different types of analyses and data files to explore the full capabilities of Code Interpreter. Remember to handle your files and API resources responsibly in production environments.