Add Multimodal Output Support for `execute_xxx` Series Tools

## Summary

Currently, the execution result tools in `jupyter_mcp_server/server.py` (e.g., `append_execute_code_cell`, `execute_cell_with_progress`, etc.) only return text output, displaying only a `"[Image Output (PNG)]"` placeholder for image outputs. It is proposed to add full multimodal output support to fully leverage the visual understanding capabilities of state-of-the-art multimodal large models.

## Core Objectives

1.  **Unleash the Potential of Multimodal Large Models**: Support direct output and analysis of visual content such as images and charts.
2.  **Enhance User Experience**: Allow the Agent to directly "see" and understand data visualization results.
3.  **Maintain Backward Compatibility**: Ensure that LLMs that do not support multimodality can still work correctly through environment variable control.
4.  **Unify Output Format**: Provide consistent multimodal support for all execution result tools.

## Analysis of Current Problems

### Affected Functions (5)
Functions that currently return execution results but do not support multimodality:
- `append_execute_code_cell(cell_source: str) -> list[str]`
- `insert_execute_code_cell(cell_index: int, cell_source: str) -> list[str]`
- `execute_cell_with_progress(cell_index: int, timeout_seconds: int = 300) -> list[str]`
- `execute_cell_simple_timeout(cell_index: int, timeout_seconds: int = 300) -> list[str]`
- `execute_cell_streaming(cell_index: int, timeout_seconds: int = 300, progress_interval: int = 5) -> list[str]`

### Limitations of Current Output Processing

**In `jupyter_mcp_server/utils.py`:**
```python
elif "image/png" in data:
    return "[Image Output (PNG)]"  # Only returns placeholder text
```

**Advanced Implementation:**
```python
elif ("image/png" in output['data']) and ALLOW_IMG:
    raw_image_data = base64.b64decode(output['data']['image/png'])
    processed_image_data = self._preprocess_image(raw_image_data)
    return Image(data=processed_image_data, format="image/png")  # Returns the actual image object
```

### Environment Variable Configuration

```json
{
  "mcpServers": {
    "jupyter": {
      ...
      "env": {
        ...
        "ALLOW_IMG_OUTPUT": "true"
      }
    }
  }
}
```

## Output Example

### Current Output (Text Only)
```python
# Execute code containing a matplotlib chart
result = await execute_cell_with_progress(2)
print(result)
# ['import matplotlib.pyplot as plt\nplt.plot([1,2,3,4])\nplt.show()', '[Image Output (PNG)]']
```

### Improved Output (Multimodal)
```python
# Enable image output
result = await execute_cell_with_progress(2)
print(result)
# [
#   'import matplotlib.pyplot as plt\nplt.plot([1,2,3,4])\nplt.show()', 
#   Image(data=b'...', format='image/png')  # Actual image object
# ]

# Disable image output (compatibility mode)
result = await execute_cell_with_progress(2) 
print(result)
# [
#   'import matplotlib.pyplot as plt\nplt.plot([1,2,3,4])\nplt.show()', 
#   '[Image Output (PNG) - Image display disabled]'
# ]

```

## Typical Use Cases

### Data Visualization Analysis
```python
# The Agent can "see" and analyze charts
await append_execute_code_cell("""
import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('sales_data.csv')
df.plot(kind='bar', x='month', y='revenue')
plt.title('Monthly Revenue')
plt.show()
""")
# Returns: ['DataFrame plotted successfully', Image(...)]
# The Agent can now understand the chart content and provide insights based on visuals
```

### Machine Learning Model Visualization
```python
# The Agent can analyze model training curves
await execute_cell_with_progress(5)  # Cell containing a loss function chart
# Returns: ['Training completed', Image(...)]
# The Agent can evaluate the training effect and suggest optimization strategies
```

## Compatibility Considerations

### Backward Compatibility Guarantee
1.  **Enabled by Default**: `ALLOW_IMG_OUTPUT=true` ensures that the new feature is available out of the box.
2.  **Graceful Degradation**: LLMs that do not support multimodality will still receive text descriptions.
3.  **Error Handling**: Automatically degrades to text output when image processing fails.
4.  **Type Safety**: Use `Union[str, Image]` to ensure type checking passes.

## Additional Notes

This improvement proposal is inspired by [Anthropic's article on writing effective tools for AI Agents](https://www.anthropic.com/engineering/writing-tools-for-agents). The article emphasizes:

> **Tools should return meaningful contextual information to the Agent**
> 
> Tool implementations should prioritize contextual relevance over flexibility, avoiding the return of low-level technical identifiers.

Multimodal output support aligns with this principle:
- ✅ **Provides Rich Context**: Images contain more information than text descriptions.
- ✅ **Supports Advanced Reasoning**: Leverages the visual understanding capabilities of the latest multimodal models.  
- ✅ **Enhances Interactive Experience**: The Agent can perform in-depth analysis of visual content.
- ✅ **Maintains Flexibility**: Supports LLMs with different capabilities through configuration.

With the popularization of multimodal large models such as Claude 4 and Gemini 2.5 Pro, adding visual output support to Agent tools has become a necessary condition for fully realizing the potential of AI. This improvement will significantly enhance the utility of the Jupyter MCP Server in visually intensive scenarios such as data science and machine learning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Multimodal Output Support for `execute_xxx` Series Tools #69

Summary

Core Objectives

Analysis of Current Problems

Affected Functions (5)

Limitations of Current Output Processing

Environment Variable Configuration

Output Example

Current Output (Text Only)

Improved Output (Multimodal)

Typical Use Cases

Data Visualization Analysis

Machine Learning Model Visualization

Compatibility Considerations

Backward Compatibility Guarantee

Additional Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Add Multimodal Output Support for execute_xxx Series Tools #69

Description

Summary

Core Objectives

Analysis of Current Problems

Affected Functions (5)

Limitations of Current Output Processing

Environment Variable Configuration

Output Example

Current Output (Text Only)

Improved Output (Multimodal)

Typical Use Cases

Data Visualization Analysis

Machine Learning Model Visualization

Compatibility Considerations

Backward Compatibility Guarantee

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add Multimodal Output Support for `execute_xxx` Series Tools #69