-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture
The MATLAB MCP Server is a layered system that bridges AI agents to MATLAB execution through a multi-component stack designed for scalability, security, and seamless async job handling.
graph TB
Agent["AI Agent<br/>(Claude, Cursor, etc.)"]
Agent -->|MCP Protocol<br/>stdio or SSE| FastMCP["MCP Server Layer<br/>FastMCP + Tool Registry"]
FastMCP -->|Tool Calls| Tools["Tool Implementation Layer<br/>20 Built-in Tools<br/>+ Custom Tools"]
Tools -->|Code Execution| Executor["Job Execution Layer<br/>Hybrid Sync/Async<br/>Timeout Promotion<br/>Progress Tracking"]
Executor -->|Engine Acquire/Release| PoolMgr["Engine Pool Manager<br/>Elastic Scaling<br/>Health Checks<br/>Proactive Warmup"]
PoolMgr -->|Execute Code| Engines["MATLAB Engine Pool<br/>Engine 1..N<br/>R2022b+"]
Executor -->|Job State| Tracker["Job Tracker<br/>In-Memory Registry<br/>Status & Results"]
Tools -->|Session Isolation| Sessions["Session Manager<br/>Per-User Temp Dirs<br/>Workspace Cleanup"]
Tools -->|Pre-Execution Check| Security["Security Validator<br/>Function Blocklist<br/>Filename Sanitization"]
Tools -->|Result Formatting| Formatter["Result Formatter<br/>Text/Variable/Plot<br/>Truncation"]
Formatter -->|Figure Conversion| Plotly["Plotly Converter<br/>MATLAB→Interactive JSON<br/>Static PNG Generation"]
FastMCP -->|Health/Metrics| Monitor["Monitoring System<br/>MetricsCollector<br/>MetricsStore<br/>Dashboard UI"]
Responsibilities:
- FastMCP server setup and tool registration (20 built-in + custom tools)
- Server lifecycle management (startup, graceful shutdown, resource draining)
- Context and session routing for stdio vs. SSE transports
- Background task orchestration (health checks, cleanup, metrics sampling)
- Lifespan management with proper exception handling and cleanup order
Key Design Decisions:
- Uses FastMCP as the MCP protocol handler (abstracts away protocol complexity)
- Separates server state (
MatlabMCPServerclass) from the actual MCP instance - Provides context helpers (
_get_session_id(),_get_temp_dir()) to abstract transport differences
Core Tools (core.py):
-
execute_code— Run MATLAB code with security validation before delegation to executor -
check_code— Lint code via MATLAB'scheckcode, parse JSON output -
get_workspace— Retrieve current workspace variables viawhoscommand
File Management (files.py):
-
upload_data— Decode base64, write to session temp dir with size/filename validation -
delete_file— Remove files (path-traversal protected) -
list_files— Directory enumeration with metadata -
read_script,read_image,read_data— Format-aware file readers
Discovery (discovery.py):
-
list_toolboxes— Runver, filter by whitelist/blacklist config -
list_functions— Runhelp <toolbox>with injection prevention -
get_help— Retrieve function help text
Job Management (jobs.py):
-
get_job_status— Query job tracker, read progress file if running -
get_job_result— Return completed/failed job result -
cancel_job— Cancel pending/running job via future -
list_jobs— Session-scoped job enumeration
Admin & Monitoring (admin.py, monitoring.py):
-
get_pool_status— Engine pool utilization snapshot -
get_server_metrics— Aggregated performance metrics -
get_server_health— Overall health classification (healthy/degraded/unhealthy) -
get_error_log— Recent error events with aggregation
Custom Tools (custom.py):
- Load tool definitions from YAML (
custom_tools.yaml) - Generate typed async handlers with proper
inspect.Signaturefor FastMCP introspection - Marshal parameters and delegate to executor
Key Design Decisions:
- Each tool is a pure async function with standard signature
- Security validation happens at tool level (not deeper)
- Custom tools use Pydantic for parameter validation
Responsibilities:
- Orchestrate full job lifecycle: create → acquire engine → inject context → execute → store result
- Hybrid sync/async: complete synchronously within timeout, auto-promote to async if exceeded
- Workspace injection: set
__mcp_job_id__and__mcp_temp_dir__for agent scripts - Error handling: graceful capture of stdout/stderr, structured error formatting
- Metrics integration: record execution times, completion/failure events
Execution Flow:
- Security validator checks code for blocked functions
- Job created in tracker (PENDING state)
- Engine acquired from pool
- Job context injected into workspace
- Code executed synchronously
-
Timeout Decision:
- If completes in
sync_timeout(default 30s): return result immediately - If timeout exceeded: promote to async, return
job_id, release engine
- If completes in
- Async background task monitors completion, updates job status
- Engine released
Key Design Decisions:
-
_safe_serialize()converts arbitrary Python objects to JSON-serializable forms (handles numpy arrays, dataclasses, etc.) -
_inject_job_context()sets workspace variables safely, catching exceptions - Uses
concurrent.futures.Futurefor background task management
Pool Manager Responsibilities:
-
Elastic Scaling: Start with
min_engines, grow tomax_engineson demand -
Proactive Warmup: When utilization exceeds
proactive_warmup_threshold(80%), pre-start a new engine -
Scale-Down: Stop idle engines after
scale_down_idle_timeout(15 min), down to minimum -
Health Checks: Periodic
1+1eval; replace unresponsive engines - Request Queueing: Async queue for jobs waiting on busy engines
- Deferred Cleanup: Engines marked for stop after current job completes
Engine Wrapper Responsibilities:
- Lifecycle: Start/stop engine, track state (STOPPED → STARTING → IDLE ↔ BUSY)
- Execution: Synchronous and background (async) code execution
- Workspace Management: Optional full reset between jobs
- Health Ping: Quick responsiveness check
-
Path Management:
addpath()support for custom MATLAB paths
Engine States:
-
STOPPED— Engine not running -
STARTING— Startup in progress -
IDLE— Ready to accept jobs -
BUSY— Currently executing -
ERROR— Unresponsive or crashed (will be replaced)
Key Design Decisions:
- Lazy loading of
matlab.enginemodule (enables test mocking without MATLAB installed) - Thread-safe state machine with explicit state transitions
- Health checks use trivial eval (
1+1) to avoid overhead - Scale-down considers engine age and idle time to prevent thrashing
Responsibilities:
- Job Registry: In-memory store for all jobs (active + historical)
- State Management: Enforce transitions (PENDING → RUNNING → COMPLETED/FAILED/CANCELLED)
- Session Filtering: List/prune jobs by session ID
-
TTL-Based Cleanup: Remove expired jobs older than
job_retention_seconds - Metadata Storage: Track engine ID, result dict, error dict, timestamps, background future
Job Lifecycle:
-
Job(session_id, code)— Created PENDING -
mark_running(engine_id)— Transitioned to RUNNING, timer starts -
mark_completed(result)ormark_failed(error)ormark_cancelled()— Terminal state, timer stops -
elapsed_seconds— Frozen at completion, immutable
Key Design Decisions:
- Dataclass for simplicity (no ORM overhead)
- Auto-generated
job_idwithj-prefix -
futurefield storesconcurrent.futures.Futurefor async monitoring/cancellation
Responsibilities:
- Per-User Isolation: Each session gets unique temp directory
- Lifecycle Management: Create, retrieve, destroy sessions with TTL
- Workspace Cleanup: Optional full reset between sessions
- Max Sessions Enforcement: Configurable limit with FIFO eviction
- Activity Tracking: Last-active timestamp for idle detection
Session Isolation Strategy:
- stdio transport: Single "default" session for the agent
- SSE transport: Per-client session identified by
session_id - Temp dir:
{temp_dir}/session-{session_id}/for file isolation - Workspace:
clear allon first execution ifworkspace_isolation=true
Key Design Decisions:
- Thread-safe with asyncio locks
- Sessions auto-created on first request
- Idle sessions pruned asynchronously to avoid blocking
Pre-Execution Checks:
-
Function Blocklist: Default blocks 11 dangerous functions (
system,unix,dos,!,eval,feval,evalc,evalin,assignin,perl,python) -
Smart Scanning: Strips MATLAB string literals (
'...',"...") and comments (%...,/*...*/) before matching to prevent false positives -
Filename Sanitization: Restricts to
[a-zA-Z0-9._-], prevents../path traversal -
Upload Limits: Enforces
max_upload_size_mb
BlockedFunctionError: Raised when blocked function detected; recorded as event in metrics
Key Design Decisions:
- Precompiled regex patterns for performance
- Blocklist is user-configurable (whitelist mode available)
- Filename sanitizer is stateless and reusable
Responsibilities:
-
Text Formatting: Truncate output to
max_inline_text_length(default 50KB), optionally save excess to file - Variable Formatting: Detect type/size, elide large values, format as JSON
- Response Building: Construct standard MCP response dicts with status/output/variables/plots/error
- Delegation: Pass plots to Plotly converter, images to thumbnail generator
Output Handling:
- Short results: Inline in response
- Large results: Inline truncated + file URL
- Variables: JSON dict with type hints
- Plots: Plotly JSON + static PNG + optional thumbnail
Key Design Decisions:
- Stateless utility class
- Graceful fallback if Pillow unavailable (skip thumbnails)
- File saving is optional (default: save large results)
Plotly Converter (output/plotly_convert.py, output/plotly_style_mapper.py, matlab_helpers/mcp_extract_props.m)
Two-Part Conversion:
MATLAB Side (mcp_extract_props.m):
- Extract raw figure properties (line data, markers, colors, axes, legends, grid, ticks)
- Handle FastPlot objects for high-resolution data
- Detect layout type (single axes, subplots, tiled layout)
- Output JSON file with schema version
Python Side (plotly_style_mapper.py):
- Map MATLAB line styles (
-,--,:,-.) to Plotly equivalents - Convert MATLAB color names/RGB to CSS hex
- Handle marker styles (circle, square, diamond, etc.)
- Build Plotly traces per chart type (line, scatter, bar, histogram, surface, image)
- Support WebGL for 10,000+ data points
- Compute subplot domains (multi-axes layout)
Result:
- Interactive Plotly JSON (renderable in web UIs)
- Static PNG for email/chat
- Optional thumbnail (max width 400px)
Key Design Decisions:
- JSON file acts as intermediate format (decouples MATLAB from Python rendering)
- Separate converters per chart type for maintainability
- WebGL threshold avoids unnecessary GPU usage for small datasets
MetricsCollector (collector.py):
- Accumulate counters: jobs completed/failed/cancelled, sessions created, errors, health failures
- Maintain ring buffer of execution times (compute avg, p95 percentile)
- Record events asynchronously (fire-and-forget)
- Sample system state periodically (pool utilization, job counts, memory/CPU)
MetricsStore (store.py):
- Async SQLite backend with WAL journaling
- Time-series tables for metrics and events
- Index on timestamp for fast historical queries
- Automatic schema creation on init
Dashboard (dashboard.py):
- Starlette sub-app serving
/health,/metrics,/dashboardendpoints - Real-time WebSocket-like updates via polling
- Pre-caches HTML on startup
Health Evaluation (health.py):
- Classify server as "healthy", "degraded", or "unhealthy"
- Based on: engine availability, error rates, health check failures
- Return detailed issue list
Key Design Decisions:
- Metrics are optional (disabled by default to reduce overhead)
- Store is persistent across restarts
- Dashboard uses Plotly.js for interactive charts
- Health status returned as HTTP codes (503 for unhealthy, 200 otherwise)
sequenceDiagram
participant Agent
participant Server
participant Executor
participant Pool
participant Engine
participant Tracker
Agent->>Server: execute_code("x = magic(3)")
Server->>Server: Security check (OK)
Server->>Tracker: Create job (PENDING)
Server->>Executor: execute(session_id, code)
Executor->>Pool: acquire()
Pool->>Engine: Engine acquired (IDLE→BUSY)
Executor->>Engine: inject_job_context(__mcp_job_id__, etc.)
Executor->>Engine: eval(code)
Engine-->>Executor: result in 2.5s
Executor->>Executor: _safe_serialize(result)
Executor->>Tracker: mark_completed(result)
Executor->>Pool: release()
Pool->>Engine: Engine released (BUSY→IDLE)
Executor-->>Server: {status: completed, output: ...}
Server-->>Agent: MCP result (inline)
sequenceDiagram
participant Agent
participant Server
participant Executor
participant Pool
participant Engine
participant BgTask
participant Tracker
Agent->>Server: execute_code("long_simulation()")
Server->>Server: Security check (OK)
Server->>Tracker: Create job (PENDING)
Server->>Executor: execute(session_id, code)
Executor->>Pool: acquire()
Pool->>Engine: Engine acquired (IDLE→BUSY)
Executor->>Engine: eval(code, background=True)
Engine-->>Executor: MockFuture (code running in thread)
Executor->>Executor: Wait sync_timeout (30s)
Executor->>Executor: Timeout exceeded!
Executor->>Tracker: mark_running(engine_id)
Executor->>Tracker: Store future reference
Executor-->>Server: {status: running, job_id: abc123}
Server-->>Agent: MCP result (job_id)
Note over BgTask: Background task monitors future
Engine-->>BgTask: Code completes after 120s
BgTask->>Tracker: mark_completed(result)
BgTask->>Pool: release()
Pool->>Engine: Engine released (BUSY→IDLE)
Agent->>Server: get_job_result("abc123")
Server->>Tracker: Retrieve completed job
Server-->>Agent: Full result
sequenceDiagram
participant Agent
participant Server as Server/Tools
participant Security
participant Session
participant Executor
participant Engine
Agent->>Server: upload_data("data.csv", base64_content)
Server->>Security: sanitize_filename("data.csv")
Server->>Session: Get temp_dir for session
Server->>Session: Write file to temp_dir/data.csv
Server-->>Agent: {status: uploaded}
Agent->>Server: execute_code("T = readtable('data.csv');")
Server->>Security: Validate code (OK)
Server->>Executor: execute(session_id, code, temp_dir)
Executor->>Executor: Inject __mcp_temp_dir__ = temp_dir
Executor->>Engine: eval(code)
Engine-->>Executor: Table loaded, result
Executor-->>Server: {status: completed, output: ...}
Server-->>Agent: Result with table preview
sequenceDiagram
participant Agent
participant Server
participant Executor
participant Engine
participant PropsHelper as mcp_extract_props.m
participant Converter as plotly_style_mapper
participant Formatter
Agent->>Server: execute_code("plot(sin(0:0.1:2*pi))")
Server->>Executor: execute(...)
Executor->>Engine: eval(code)
Engine-->>Executor: Figure created (handle returned)
Executor->>PropsHelper: Call mcp_extract_props(fig_handle)
PropsHelper->>PropsHelper: Extract axes, lines, markers, colors, grid
PropsHelper-->>Executor: JSON file written to temp
Executor->>Formatter: Format result
Formatter->>Converter: Convert figure JSON to Plotly
Converter->>Converter: Map MATLAB styles to Plotly
Converter->>Converter: Build trace objects
Converter-->>Formatter: Plotly JSON + PNG
Formatter-->>Server: {plotly_figure: {...}, static_image_png: ...}
Server-->>Agent: Interactive Plotly + static PNG
Decision: Execute all code synchronously first; auto-promote to async if timeout exceeded.
Rationale:
- Simplifies agent logic (no need to pre-declare async)
- Most code completes quickly; async overhead only when needed
- Timeout-based promotion is transparent to agent
Trade-off:
- Engines are held during timeout window (blocks other requests)
- Mitigation: Configurable
sync_timeout(default 30s is often adequate)
Decision: Scale engines on demand (min→max), pre-start when utilization high.
Rationale:
- Cost-effective for variable load
- Proactive warmup prevents request queuing under spike
- Health checks replace broken engines automatically
Trade-off:
- More engines = more memory (MATLAB engines are heavyweight, ~200MB each)
- Idle engines are stopped after timeout to recover memory
Decision: Each session/user gets isolated temp dir with clear all on switch.
Rationale:
- Prevents data leakage between users (multi-user SSE mode)
- Automatic cleanup on session expiry
- No state pollution
Trade-off:
- Startup cost of
clear all(rebuild workspace) - Mitigation: Configurable via
workspace_isolationflag
Decision: Track jobs in memory with periodic TTL-based cleanup.
Rationale:
- Fast access (no DB latency)
- Simple implementation
- Sufficient for typical job lifetimes (hours)
Trade-off:
- Job history lost on restart
- Not suitable for extremely high job volumes (10k+ concurrent)
Decision: MATLAB side extracts figure properties to JSON; Python side converts to Plotly.
Rationale:
- Decouples MATLAB rendering from Python logic
- MATLAB side handles complex figure introspection (FastPlot, tiled layouts)
- Python side handles style translation (more maintainable)
Trade-off:
- Extra file I/O (JSON intermediate)
- Requires MATLAB helper script (
mcp_extract_props.m)
Decision: Precompiled regex patterns for function/construct detection.
Rationale:
- Fast, stateless checks (no AST parsing)
- Smart literal stripping avoids false positives
- Configurable blocklists
Trade-off:
- Cannot detect obfuscated/indirect invocations (e.g.,
eval(eval('system'))) - Mitigation: Trusted environment assumed (agents are AI clients, not arbitrary users)
Decision: Metrics collection is disabled by default; opt-in via config.
Rationale:
- Zero overhead for resource-constrained deployments
- Reduces operational complexity
- Enabled in production for debugging
Trade-off:
- Dashboard unavailable if disabled
- Mitigation: Can be enabled at runtime via config reload
Decision: Use FastMCP library instead of custom MCP implementation.
Rationale:
- Handles MCP protocol complexity (SSE, stdio, JSON-RPC)
- Supports dynamic tool registration
- Active community maintenance
Trade-off:
- External dependency (but small and stable)
- Limited customization (but sufficient for use cases)
- Single session, single agent
- Communication via stdin/stdout
- No network overhead
- Simplest setup (local machine)
- Multiple sessions, multiple concurrent agents
- HTTP-based (remote-capable)
- Requires reverse proxy with authentication in production
- Session isolation enforced at manager level
| Component | Bottleneck | Mitigation |
|---|---|---|
| Engine Pool | MATLAB memory (~200MB/engine) | Set max_engines based on available RAM |
| Job Tracker | In-memory job list | Adjust job_retention_seconds to prune old jobs |
| Session Manager | Temp directory disk usage | Monitor {temp_dir} disk space; auto-cleanup |
| File Uploads | Network bandwidth | Set max_upload_size_mb appropriately |
| Monitoring DB | SQLite write throughput | Reduce sample_interval if contention observed |
Agent ←→ [MCP Server] ←→ [Engine Pool] ←→ [MATLAB]
(stdio transport, 2-4 engines typical)
[Load Balancer]
↓
[MCP Server 1] ←→ [Shared Engine Pool] ←→ [MATLAB]
↑
[MCP Server 2] ←→ (or separate engines per server)
↑
[Agent 1, Agent 2, Agent 3]
(requires authentication, monitoring critical)