Architecture

System Overview

The MATLAB MCP Server is a layered system that bridges AI agents to MATLAB execution through a multi-component stack designed for scalability, security, and seamless async job handling.

graph TB
    Agent["AI Agent<br/>(Claude, Cursor, etc.)"]
    
    Agent -->|MCP Protocol<br/>stdio or SSE| FastMCP["MCP Server Layer<br/>FastMCP + Tool Registry"]
    
    FastMCP -->|Tool Calls| Tools["Tool Implementation Layer<br/>20 Built-in Tools<br/>+ Custom Tools"]
    
    Tools -->|Code Execution| Executor["Job Execution Layer<br/>Hybrid Sync/Async<br/>Timeout Promotion<br/>Progress Tracking"]
    
    Executor -->|Engine Acquire/Release| PoolMgr["Engine Pool Manager<br/>Elastic Scaling<br/>Health Checks<br/>Proactive Warmup"]
    
    PoolMgr -->|Execute Code| Engines["MATLAB Engine Pool<br/>Engine 1..N<br/>R2022b+"]
    
    Executor -->|Job State| Tracker["Job Tracker<br/>In-Memory Registry<br/>Status & Results"]
    
    Tools -->|Session Isolation| Sessions["Session Manager<br/>Per-User Temp Dirs<br/>Workspace Cleanup"]
    
    Tools -->|Pre-Execution Check| Security["Security Validator<br/>Function Blocklist<br/>Filename Sanitization"]
    
    Tools -->|Result Formatting| Formatter["Result Formatter<br/>Text/Variable/Plot<br/>Truncation"]
    
    Formatter -->|Figure Conversion| Plotly["Plotly Converter<br/>MATLAB→Interactive JSON<br/>Static PNG Generation"]
    
    FastMCP -->|Health/Metrics| Monitor["Monitoring System<br/>MetricsCollector<br/>MetricsStore<br/>Dashboard UI"]

Component Responsibilities

MCP Server Layer (`server.py`)

Responsibilities:

FastMCP server setup and tool registration (20 built-in + custom tools)
Server lifecycle management (startup, graceful shutdown, resource draining)
Context and session routing for stdio vs. SSE transports
Background task orchestration (health checks, cleanup, metrics sampling)
Lifespan management with proper exception handling and cleanup order

Key Design Decisions:

Uses FastMCP as the MCP protocol handler (abstracts away protocol complexity)
Separates server state (MatlabMCPServer class) from the actual MCP instance
Provides context helpers (_get_session_id(), _get_temp_dir()) to abstract transport differences

Tool Implementation Layer (`tools/`)

Core Tools (core.py):

execute_code — Run MATLAB code with security validation before delegation to executor
check_code — Lint code via MATLAB's checkcode, parse JSON output
get_workspace — Retrieve current workspace variables via whos command

File Management (files.py):

upload_data — Decode base64, write to session temp dir with size/filename validation
delete_file — Remove files (path-traversal protected)
list_files — Directory enumeration with metadata
read_script, read_image, read_data — Format-aware file readers

Discovery (discovery.py):

list_toolboxes — Run ver, filter by whitelist/blacklist config
list_functions — Run help <toolbox> with injection prevention
get_help — Retrieve function help text

Job Management (jobs.py):

get_job_status — Query job tracker, read progress file if running
get_job_result — Return completed/failed job result
cancel_job — Cancel pending/running job via future
list_jobs — Session-scoped job enumeration

Admin & Monitoring (admin.py, monitoring.py):

get_pool_status — Engine pool utilization snapshot
get_server_metrics — Aggregated performance metrics
get_server_health — Overall health classification (healthy/degraded/unhealthy)
get_error_log — Recent error events with aggregation

Custom Tools (custom.py):

Load tool definitions from YAML (custom_tools.yaml)
Generate typed async handlers with proper inspect.Signature for FastMCP introspection
Marshal parameters and delegate to executor

Key Design Decisions:

Each tool is a pure async function with standard signature
Security validation happens at tool level (not deeper)
Custom tools use Pydantic for parameter validation

Job Execution Layer (`jobs/executor.py`)

Responsibilities:

Orchestrate full job lifecycle: create → acquire engine → inject context → execute → store result
Hybrid sync/async: complete synchronously within timeout, auto-promote to async if exceeded
Workspace injection: set __mcp_job_id__ and __mcp_temp_dir__ for agent scripts
Error handling: graceful capture of stdout/stderr, structured error formatting
Metrics integration: record execution times, completion/failure events

Execution Flow:

Security validator checks code for blocked functions
Job created in tracker (PENDING state)
Engine acquired from pool
Job context injected into workspace
Code executed synchronously
Timeout Decision:
- If completes in sync_timeout (default 30s): return result immediately
- If timeout exceeded: promote to async, return job_id, release engine
Async background task monitors completion, updates job status
Engine released

Key Design Decisions:

_safe_serialize() converts arbitrary Python objects to JSON-serializable forms (handles numpy arrays, dataclasses, etc.)
_inject_job_context() sets workspace variables safely, catching exceptions
Uses concurrent.futures.Future for background task management

Engine Pool Manager (`pool/manager.py`, `pool/engine.py`)

Pool Manager Responsibilities:

Elastic Scaling: Start with min_engines, grow to max_engines on demand
Proactive Warmup: When utilization exceeds proactive_warmup_threshold (80%), pre-start a new engine
Scale-Down: Stop idle engines after scale_down_idle_timeout (15 min), down to minimum
Health Checks: Periodic 1+1 eval; replace unresponsive engines
Request Queueing: Async queue for jobs waiting on busy engines
Deferred Cleanup: Engines marked for stop after current job completes

Engine Wrapper Responsibilities:

Lifecycle: Start/stop engine, track state (STOPPED → STARTING → IDLE ↔ BUSY)
Execution: Synchronous and background (async) code execution
Workspace Management: Optional full reset between jobs
Health Ping: Quick responsiveness check
Path Management: addpath() support for custom MATLAB paths

Engine States:

STOPPED — Engine not running
STARTING — Startup in progress
IDLE — Ready to accept jobs
BUSY — Currently executing
ERROR — Unresponsive or crashed (will be replaced)

Key Design Decisions:

Lazy loading of matlab.engine module (enables test mocking without MATLAB installed)
Thread-safe state machine with explicit state transitions
Health checks use trivial eval (1+1) to avoid overhead
Scale-down considers engine age and idle time to prevent thrashing

Job Tracker (`jobs/tracker.py`, `jobs/models.py`)

Responsibilities:

Job Registry: In-memory store for all jobs (active + historical)
State Management: Enforce transitions (PENDING → RUNNING → COMPLETED/FAILED/CANCELLED)
Session Filtering: List/prune jobs by session ID
TTL-Based Cleanup: Remove expired jobs older than job_retention_seconds
Metadata Storage: Track engine ID, result dict, error dict, timestamps, background future

Job Lifecycle:

Job(session_id, code) — Created PENDING
mark_running(engine_id) — Transitioned to RUNNING, timer starts
mark_completed(result) or mark_failed(error) or mark_cancelled() — Terminal state, timer stops
elapsed_seconds — Frozen at completion, immutable

Key Design Decisions:

Dataclass for simplicity (no ORM overhead)
Auto-generated job_id with j- prefix
future field stores concurrent.futures.Future for async monitoring/cancellation

Session Manager (`session/manager.py`)

Responsibilities:

Per-User Isolation: Each session gets unique temp directory
Lifecycle Management: Create, retrieve, destroy sessions with TTL
Workspace Cleanup: Optional full reset between sessions
Max Sessions Enforcement: Configurable limit with FIFO eviction
Activity Tracking: Last-active timestamp for idle detection

Session Isolation Strategy:

stdio transport: Single "default" session for the agent
SSE transport: Per-client session identified by session_id
Temp dir: {temp_dir}/session-{session_id}/ for file isolation
Workspace: clear all on first execution if workspace_isolation=true

Key Design Decisions:

Thread-safe with asyncio locks
Sessions auto-created on first request
Idle sessions pruned asynchronously to avoid blocking

Security Validator (`security/validator.py`)

Pre-Execution Checks:

Function Blocklist: Default blocks 11 dangerous functions (system, unix, dos, !, eval, feval, evalc, evalin, assignin, perl, python)
Smart Scanning: Strips MATLAB string literals ('...', "...") and comments (%..., /*...*/) before matching to prevent false positives
Filename Sanitization: Restricts to [a-zA-Z0-9._-], prevents ../ path traversal
Upload Limits: Enforces max_upload_size_mb

BlockedFunctionError: Raised when blocked function detected; recorded as event in metrics

Key Design Decisions:

Precompiled regex patterns for performance
Blocklist is user-configurable (whitelist mode available)
Filename sanitizer is stateless and reusable

Result Formatter (`output/formatter.py`)

Responsibilities:

Text Formatting: Truncate output to max_inline_text_length (default 50KB), optionally save excess to file
Variable Formatting: Detect type/size, elide large values, format as JSON
Response Building: Construct standard MCP response dicts with status/output/variables/plots/error
Delegation: Pass plots to Plotly converter, images to thumbnail generator

Output Handling:

Short results: Inline in response
Large results: Inline truncated + file URL
Variables: JSON dict with type hints
Plots: Plotly JSON + static PNG + optional thumbnail

Key Design Decisions:

Stateless utility class
Graceful fallback if Pillow unavailable (skip thumbnails)
File saving is optional (default: save large results)

Plotly Converter (`output/plotly_convert.py`, `output/plotly_style_mapper.py`, `matlab_helpers/mcp_extract_props.m`)

Two-Part Conversion:

MATLAB Side (mcp_extract_props.m):

Extract raw figure properties (line data, markers, colors, axes, legends, grid, ticks)
Handle FastPlot objects for high-resolution data
Detect layout type (single axes, subplots, tiled layout)
Output JSON file with schema version

Python Side (plotly_style_mapper.py):

Map MATLAB line styles (-, --, :, -.) to Plotly equivalents
Convert MATLAB color names/RGB to CSS hex
Handle marker styles (circle, square, diamond, etc.)
Build Plotly traces per chart type (line, scatter, bar, histogram, surface, image)
Support WebGL for 10,000+ data points
Compute subplot domains (multi-axes layout)

Result:

Interactive Plotly JSON (renderable in web UIs)
Static PNG for email/chat
Optional thumbnail (max width 400px)

Key Design Decisions:

JSON file acts as intermediate format (decouples MATLAB from Python rendering)
Separate converters per chart type for maintainability
WebGL threshold avoids unnecessary GPU usage for small datasets

Monitoring System (`monitoring/`)

MetricsCollector (collector.py):

Accumulate counters: jobs completed/failed/cancelled, sessions created, errors, health failures
Maintain ring buffer of execution times (compute avg, p95 percentile)
Record events asynchronously (fire-and-forget)
Sample system state periodically (pool utilization, job counts, memory/CPU)

MetricsStore (store.py):

Async SQLite backend with WAL journaling
Time-series tables for metrics and events
Index on timestamp for fast historical queries
Automatic schema creation on init

Dashboard (dashboard.py):

Starlette sub-app serving /health, /metrics, /dashboard endpoints
Real-time WebSocket-like updates via polling
Pre-caches HTML on startup

Health Evaluation (health.py):

Classify server as "healthy", "degraded", or "unhealthy"
Based on: engine availability, error rates, health check failures
Return detailed issue list

Key Design Decisions:

Metrics are optional (disabled by default to reduce overhead)
Store is persistent across restarts
Dashboard uses Plotly.js for interactive charts
Health status returned as HTTP codes (503 for unhealthy, 200 otherwise)

Data Flow Diagrams

Synchronous Execution (Complete in Timeout)

sequenceDiagram
    participant Agent
    participant Server
    participant Executor
    participant Pool
    participant Engine
    participant Tracker

    Agent->>Server: execute_code("x = magic(3)")
    Server->>Server: Security check (OK)
    Server->>Tracker: Create job (PENDING)
    Server->>Executor: execute(session_id, code)
    Executor->>Pool: acquire()
    Pool->>Engine: Engine acquired (IDLE→BUSY)
    Executor->>Engine: inject_job_context(__mcp_job_id__, etc.)
    Executor->>Engine: eval(code)
    Engine-->>Executor: result in 2.5s
    Executor->>Executor: _safe_serialize(result)
    Executor->>Tracker: mark_completed(result)
    Executor->>Pool: release()
    Pool->>Engine: Engine released (BUSY→IDLE)
    Executor-->>Server: {status: completed, output: ...}
    Server-->>Agent: MCP result (inline)

Async Promotion (Timeout Exceeded)

sequenceDiagram
    participant Agent
    participant Server
    participant Executor
    participant Pool
    participant Engine
    participant BgTask
    participant Tracker

    Agent->>Server: execute_code("long_simulation()")
    Server->>Server: Security check (OK)
    Server->>Tracker: Create job (PENDING)
    Server->>Executor: execute(session_id, code)
    Executor->>Pool: acquire()
    Pool->>Engine: Engine acquired (IDLE→BUSY)
    Executor->>Engine: eval(code, background=True)
    Engine-->>Executor: MockFuture (code running in thread)
    Executor->>Executor: Wait sync_timeout (30s)
    Executor->>Executor: Timeout exceeded!
    Executor->>Tracker: mark_running(engine_id)
    Executor->>Tracker: Store future reference
    Executor-->>Server: {status: running, job_id: abc123}
    Server-->>Agent: MCP result (job_id)
    Note over BgTask: Background task monitors future
    Engine-->>BgTask: Code completes after 120s
    BgTask->>Tracker: mark_completed(result)
    BgTask->>Pool: release()
    Pool->>Engine: Engine released (BUSY→IDLE)
    Agent->>Server: get_job_result("abc123")
    Server->>Tracker: Retrieve completed job
    Server-->>Agent: Full result

File Upload & Execution

sequenceDiagram
    participant Agent
    participant Server as Server/Tools
    participant Security
    participant Session
    participant Executor
    participant Engine

    Agent->>Server: upload_data("data.csv", base64_content)
    Server->>Security: sanitize_filename("data.csv")
    Server->>Session: Get temp_dir for session
    Server->>Session: Write file to temp_dir/data.csv
    Server-->>Agent: {status: uploaded}
    
    Agent->>Server: execute_code("T = readtable('data.csv');")
    Server->>Security: Validate code (OK)
    Server->>Executor: execute(session_id, code, temp_dir)
    Executor->>Executor: Inject __mcp_temp_dir__ = temp_dir
    Executor->>Engine: eval(code)
    Engine-->>Executor: Table loaded, result
    Executor-->>Server: {status: completed, output: ...}
    Server-->>Agent: Result with table preview

Plot Generation & Conversion

sequenceDiagram
    participant Agent
    participant Server
    participant Executor
    participant Engine
    participant PropsHelper as mcp_extract_props.m
    participant Converter as plotly_style_mapper
    participant Formatter

    Agent->>Server: execute_code("plot(sin(0:0.1:2*pi))")
    Server->>Executor: execute(...)
    Executor->>Engine: eval(code)
    Engine-->>Executor: Figure created (handle returned)
    Executor->>PropsHelper: Call mcp_extract_props(fig_handle)
    PropsHelper->>PropsHelper: Extract axes, lines, markers, colors, grid
    PropsHelper-->>Executor: JSON file written to temp
    Executor->>Formatter: Format result
    Formatter->>Converter: Convert figure JSON to Plotly
    Converter->>Converter: Map MATLAB styles to Plotly
    Converter->>Converter: Build trace objects
    Converter-->>Formatter: Plotly JSON + PNG
    Formatter-->>Server: {plotly_figure: {...}, static_image_png: ...}
    Server-->>Agent: Interactive Plotly + static PNG

Key Design Decisions & Trade-Offs

1. Hybrid Sync/Async Execution

Decision: Execute all code synchronously first; auto-promote to async if timeout exceeded.

Rationale:

Simplifies agent logic (no need to pre-declare async)
Most code completes quickly; async overhead only when needed
Timeout-based promotion is transparent to agent

Trade-off:

Engines are held during timeout window (blocks other requests)
Mitigation: Configurable sync_timeout (default 30s is often adequate)

2. Elastic Engine Pool with Proactive Warmup

Decision: Scale engines on demand (min→max), pre-start when utilization high.

Rationale:

Cost-effective for variable load
Proactive warmup prevents request queuing under spike
Health checks replace broken engines automatically

Trade-off:

More engines = more memory (MATLAB engines are heavyweight, ~200MB each)
Idle engines are stopped after timeout to recover memory

3. Per-Session Temp Directories

Decision: Each session/user gets isolated temp dir with clear all on switch.

Rationale:

Prevents data leakage between users (multi-user SSE mode)
Automatic cleanup on session expiry
No state pollution

Trade-off:

Startup cost of clear all (rebuild workspace)
Mitigation: Configurable via workspace_isolation flag

4. Job Tracker as In-Memory Store

Decision: Track jobs in memory with periodic TTL-based cleanup.

Rationale:

Fast access (no DB latency)
Simple implementation
Sufficient for typical job lifetimes (hours)

Trade-off:

Job history lost on restart
Not suitable for extremely high job volumes (10k+ concurrent)

5. Plotly Conversion as Two-Stage Process

Decision: MATLAB side extracts figure properties to JSON; Python side converts to Plotly.

Rationale:

Decouples MATLAB rendering from Python logic
MATLAB side handles complex figure introspection (FastPlot, tiled layouts)
Python side handles style translation (more maintainable)

Trade-off:

Extra file I/O (JSON intermediate)
Requires MATLAB helper script (mcp_extract_props.m)

6. Security Validation via Regex Scan

Decision: Precompiled regex patterns for function/construct detection.

Rationale:

Fast, stateless checks (no AST parsing)
Smart literal stripping avoids false positives
Configurable blocklists

Trade-off:

Cannot detect obfuscated/indirect invocations (e.g., eval(eval('system')))
Mitigation: Trusted environment assumed (agents are AI clients, not arbitrary users)

7. Optional Monitoring System

Decision: Metrics collection is disabled by default; opt-in via config.

Rationale:

Zero overhead for resource-constrained deployments
Reduces operational complexity
Enabled in production for debugging

Trade-off:

Dashboard unavailable if disabled
Mitigation: Can be enabled at runtime via config reload

8. FastMCP as Protocol Handler

Decision: Use FastMCP library instead of custom MCP implementation.

Rationale:

Handles MCP protocol complexity (SSE, stdio, JSON-RPC)
Supports dynamic tool registration
Active community maintenance

Trade-off:

External dependency (but small and stable)
Limited customization (but sufficient for use cases)

Transport Modes

stdio (Default)

Single session, single agent
Communication via stdin/stdout
No network overhead
Simplest setup (local machine)

SSE (Server-Sent Events)

Multiple sessions, multiple concurrent agents
HTTP-based (remote-capable)
Requires reverse proxy with authentication in production
Session isolation enforced at manager level

Scalability Considerations

Component	Bottleneck	Mitigation
Engine Pool	MATLAB memory (~200MB/engine)	Set `max_engines` based on available RAM
Job Tracker	In-memory job list	Adjust `job_retention_seconds` to prune old jobs
Session Manager	Temp directory disk usage	Monitor `{temp_dir}` disk space; auto-cleanup
File Uploads	Network bandwidth	Set `max_upload_size_mb` appropriately
Monitoring DB	SQLite write throughput	Reduce `sample_interval` if contention observed

Deployment Architectures

Single-User (stdio)

Agent ←→ [MCP Server] ←→ [Engine Pool] ←→ [MATLAB]
(stdio transport, 2-4 engines typical)

Multi-User (SSE)

[Load Balancer]
    ↓
[MCP Server 1] ←→ [Shared Engine Pool] ←→ [MATLAB]
    ↑
[MCP Server 2] ←→ (or separate engines per server)
    ↑
[Agent 1, Agent 2, Agent 3]
(requires authentication, monitoring critical)

Architecture

Architecture

System Overview

Component Responsibilities

MCP Server Layer (server.py)

Tool Implementation Layer (tools/)

Job Execution Layer (jobs/executor.py)

Engine Pool Manager (pool/manager.py, pool/engine.py)

Job Tracker (jobs/tracker.py, jobs/models.py)

Session Manager (session/manager.py)

Security Validator (security/validator.py)

Result Formatter (output/formatter.py)

Plotly Converter (output/plotly_convert.py, output/plotly_style_mapper.py, matlab_helpers/mcp_extract_props.m)

Monitoring System (monitoring/)

Data Flow Diagrams

Synchronous Execution (Complete in Timeout)

Async Promotion (Timeout Exceeded)

File Upload & Execution

Plot Generation & Conversion

Key Design Decisions & Trade-Offs

1. Hybrid Sync/Async Execution

2. Elastic Engine Pool with Proactive Warmup

3. Per-Session Temp Directories

4. Job Tracker as In-Memory Store

5. Plotly Conversion as Two-Stage Process

6. Security Validation via Regex Scan

7. Optional Monitoring System

8. FastMCP as Protocol Handler

Transport Modes

stdio (Default)

SSE (Server-Sent Events)

Scalability Considerations

Deployment Architectures

Single-User (stdio)

Multi-User (SSE)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

MCP Server Layer (`server.py`)

Tool Implementation Layer (`tools/`)

Job Execution Layer (`jobs/executor.py`)

Engine Pool Manager (`pool/manager.py`, `pool/engine.py`)

Job Tracker (`jobs/tracker.py`, `jobs/models.py`)

Session Manager (`session/manager.py`)

Security Validator (`security/validator.py`)

Result Formatter (`output/formatter.py`)

Plotly Converter (`output/plotly_convert.py`, `output/plotly_style_mapper.py`, `matlab_helpers/mcp_extract_props.m`)

Monitoring System (`monitoring/`)