Skip to content

Architecture

github-actions[bot] edited this page Mar 23, 2026 · 20 revisions

Architecture

System Overview

The MATLAB MCP Server is a layered system that bridges AI agents to MATLAB execution through a multi-component stack designed for scalability, security, and seamless async job handling.

graph TB
    Agent["AI Agent<br/>(Claude, Cursor, etc.)"]
    
    Agent -->|MCP Protocol<br/>stdio or SSE| FastMCP["MCP Server Layer<br/>FastMCP + Tool Registry"]
    
    FastMCP -->|Tool Calls| Tools["Tool Implementation Layer<br/>20 Built-in Tools<br/>+ Custom Tools"]
    
    Tools -->|Code Execution| Executor["Job Execution Layer<br/>Hybrid Sync/Async<br/>Timeout Promotion<br/>Progress Tracking"]
    
    Executor -->|Engine Acquire/Release| PoolMgr["Engine Pool Manager<br/>Elastic Scaling<br/>Health Checks<br/>Proactive Warmup"]
    
    PoolMgr -->|Execute Code| Engines["MATLAB Engine Pool<br/>Engine 1..N<br/>R2022b+"]
    
    Executor -->|Job State| Tracker["Job Tracker<br/>In-Memory Registry<br/>Status & Results"]
    
    Tools -->|Session Isolation| Sessions["Session Manager<br/>Per-User Temp Dirs<br/>Workspace Cleanup"]
    
    Tools -->|Pre-Execution Check| Security["Security Validator<br/>Function Blocklist<br/>Filename Sanitization"]
    
    Tools -->|Result Formatting| Formatter["Result Formatter<br/>Text/Variable/Plot<br/>Truncation"]
    
    Formatter -->|Figure Conversion| Plotly["Plotly Converter<br/>MATLAB→Interactive JSON<br/>Static PNG Generation"]
    
    FastMCP -->|Health/Metrics| Monitor["Monitoring System<br/>MetricsCollector<br/>MetricsStore<br/>Dashboard UI"]
Loading

Component Responsibilities

MCP Server Layer (server.py)

Responsibilities:

  • FastMCP server setup and tool registration (20 built-in + custom tools)
  • Server lifecycle management (startup, graceful shutdown, resource draining)
  • Context and session routing for stdio vs. SSE transports
  • Background task orchestration (health checks, cleanup, metrics sampling)
  • Lifespan management with proper exception handling and cleanup order

Key Design Decisions:

  • Uses FastMCP as the MCP protocol handler (abstracts away protocol complexity)
  • Separates server state (MatlabMCPServer class) from the actual MCP instance
  • Provides context helpers (_get_session_id(), _get_temp_dir()) to abstract transport differences

Tool Implementation Layer (tools/)

Core Tools (core.py):

  • execute_code — Run MATLAB code with security validation before delegation to executor
  • check_code — Lint code via MATLAB's checkcode, parse JSON output
  • get_workspace — Retrieve current workspace variables via whos command

File Management (files.py):

  • upload_data — Decode base64, write to session temp dir with size/filename validation
  • delete_file — Remove files (path-traversal protected)
  • list_files — Directory enumeration with metadata
  • read_script, read_image, read_data — Format-aware file readers

Discovery (discovery.py):

  • list_toolboxes — Run ver, filter by whitelist/blacklist config
  • list_functions — Run help <toolbox> with injection prevention
  • get_help — Retrieve function help text

Job Management (jobs.py):

  • get_job_status — Query job tracker, read progress file if running
  • get_job_result — Return completed/failed job result
  • cancel_job — Cancel pending/running job via future
  • list_jobs — Session-scoped job enumeration

Admin & Monitoring (admin.py, monitoring.py):

  • get_pool_status — Engine pool utilization snapshot
  • get_server_metrics — Aggregated performance metrics
  • get_server_health — Overall health classification (healthy/degraded/unhealthy)
  • get_error_log — Recent error events with aggregation

Custom Tools (custom.py):

  • Load tool definitions from YAML (custom_tools.yaml)
  • Generate typed async handlers with proper inspect.Signature for FastMCP introspection
  • Marshal parameters and delegate to executor

Key Design Decisions:

  • Each tool is a pure async function with standard signature
  • Security validation happens at tool level (not deeper)
  • Custom tools use Pydantic for parameter validation

Job Execution Layer (jobs/executor.py)

Responsibilities:

  • Orchestrate full job lifecycle: create → acquire engine → inject context → execute → store result
  • Hybrid sync/async: complete synchronously within timeout, auto-promote to async if exceeded
  • Workspace injection: set __mcp_job_id__ and __mcp_temp_dir__ for agent scripts
  • Error handling: graceful capture of stdout/stderr, structured error formatting
  • Metrics integration: record execution times, completion/failure events

Execution Flow:

  1. Security validator checks code for blocked functions
  2. Job created in tracker (PENDING state)
  3. Engine acquired from pool
  4. Job context injected into workspace
  5. Code executed synchronously
  6. Timeout Decision:
    • If completes in sync_timeout (default 30s): return result immediately
    • If timeout exceeded: promote to async, return job_id, release engine
  7. Async background task monitors completion, updates job status
  8. Engine released

Key Design Decisions:

  • _safe_serialize() converts arbitrary Python objects to JSON-serializable forms (handles numpy arrays, dataclasses, etc.)
  • _inject_job_context() sets workspace variables safely, catching exceptions
  • Uses concurrent.futures.Future for background task management

Engine Pool Manager (pool/manager.py, pool/engine.py)

Pool Manager Responsibilities:

  • Elastic Scaling: Start with min_engines, grow to max_engines on demand
  • Proactive Warmup: When utilization exceeds proactive_warmup_threshold (80%), pre-start a new engine
  • Scale-Down: Stop idle engines after scale_down_idle_timeout (15 min), down to minimum
  • Health Checks: Periodic 1+1 eval; replace unresponsive engines
  • Request Queueing: Async queue for jobs waiting on busy engines
  • Deferred Cleanup: Engines marked for stop after current job completes

Engine Wrapper Responsibilities:

  • Lifecycle: Start/stop engine, track state (STOPPED → STARTING → IDLE ↔ BUSY)
  • Execution: Synchronous and background (async) code execution
  • Workspace Management: Optional full reset between jobs
  • Health Ping: Quick responsiveness check
  • Path Management: addpath() support for custom MATLAB paths

Engine States:

  • STOPPED — Engine not running
  • STARTING — Startup in progress
  • IDLE — Ready to accept jobs
  • BUSY — Currently executing
  • ERROR — Unresponsive or crashed (will be replaced)

Key Design Decisions:

  • Lazy loading of matlab.engine module (enables test mocking without MATLAB installed)
  • Thread-safe state machine with explicit state transitions
  • Health checks use trivial eval (1+1) to avoid overhead
  • Scale-down considers engine age and idle time to prevent thrashing

Job Tracker (jobs/tracker.py, jobs/models.py)

Responsibilities:

  • Job Registry: In-memory store for all jobs (active + historical)
  • State Management: Enforce transitions (PENDING → RUNNING → COMPLETED/FAILED/CANCELLED)
  • Session Filtering: List/prune jobs by session ID
  • TTL-Based Cleanup: Remove expired jobs older than job_retention_seconds
  • Metadata Storage: Track engine ID, result dict, error dict, timestamps, background future

Job Lifecycle:

  1. Job(session_id, code) — Created PENDING
  2. mark_running(engine_id) — Transitioned to RUNNING, timer starts
  3. mark_completed(result) or mark_failed(error) or mark_cancelled() — Terminal state, timer stops
  4. elapsed_seconds — Frozen at completion, immutable

Key Design Decisions:

  • Dataclass for simplicity (no ORM overhead)
  • Auto-generated job_id with j- prefix
  • future field stores concurrent.futures.Future for async monitoring/cancellation

Session Manager (session/manager.py)

Responsibilities:

  • Per-User Isolation: Each session gets unique temp directory
  • Lifecycle Management: Create, retrieve, destroy sessions with TTL
  • Workspace Cleanup: Optional full reset between sessions
  • Max Sessions Enforcement: Configurable limit with FIFO eviction
  • Activity Tracking: Last-active timestamp for idle detection

Session Isolation Strategy:

  • stdio transport: Single "default" session for the agent
  • SSE transport: Per-client session identified by session_id
  • Temp dir: {temp_dir}/session-{session_id}/ for file isolation
  • Workspace: clear all on first execution if workspace_isolation=true

Key Design Decisions:

  • Thread-safe with asyncio locks
  • Sessions auto-created on first request
  • Idle sessions pruned asynchronously to avoid blocking

Security Validator (security/validator.py)

Pre-Execution Checks:

  • Function Blocklist: Default blocks 11 dangerous functions (system, unix, dos, !, eval, feval, evalc, evalin, assignin, perl, python)
  • Smart Scanning: Strips MATLAB string literals ('...', "...") and comments (%..., /*...*/) before matching to prevent false positives
  • Filename Sanitization: Restricts to [a-zA-Z0-9._-], prevents ../ path traversal
  • Upload Limits: Enforces max_upload_size_mb

BlockedFunctionError: Raised when blocked function detected; recorded as event in metrics

Key Design Decisions:

  • Precompiled regex patterns for performance
  • Blocklist is user-configurable (whitelist mode available)
  • Filename sanitizer is stateless and reusable

Result Formatter (output/formatter.py)

Responsibilities:

  • Text Formatting: Truncate output to max_inline_text_length (default 50KB), optionally save excess to file
  • Variable Formatting: Detect type/size, elide large values, format as JSON
  • Response Building: Construct standard MCP response dicts with status/output/variables/plots/error
  • Delegation: Pass plots to Plotly converter, images to thumbnail generator

Output Handling:

  • Short results: Inline in response
  • Large results: Inline truncated + file URL
  • Variables: JSON dict with type hints
  • Plots: Plotly JSON + static PNG + optional thumbnail

Key Design Decisions:

  • Stateless utility class
  • Graceful fallback if Pillow unavailable (skip thumbnails)
  • File saving is optional (default: save large results)

Plotly Converter (output/plotly_convert.py, output/plotly_style_mapper.py, matlab_helpers/mcp_extract_props.m)

Two-Part Conversion:

MATLAB Side (mcp_extract_props.m):

  • Extract raw figure properties (line data, markers, colors, axes, legends, grid, ticks)
  • Handle FastPlot objects for high-resolution data
  • Detect layout type (single axes, subplots, tiled layout)
  • Output JSON file with schema version

Python Side (plotly_style_mapper.py):

  • Map MATLAB line styles (-, --, :, -.) to Plotly equivalents
  • Convert MATLAB color names/RGB to CSS hex
  • Handle marker styles (circle, square, diamond, etc.)
  • Build Plotly traces per chart type (line, scatter, bar, histogram, surface, image)
  • Support WebGL for 10,000+ data points
  • Compute subplot domains (multi-axes layout)

Result:

  • Interactive Plotly JSON (renderable in web UIs)
  • Static PNG for email/chat
  • Optional thumbnail (max width 400px)

Key Design Decisions:

  • JSON file acts as intermediate format (decouples MATLAB from Python rendering)
  • Separate converters per chart type for maintainability
  • WebGL threshold avoids unnecessary GPU usage for small datasets

Monitoring System (monitoring/)

MetricsCollector (collector.py):

  • Accumulate counters: jobs completed/failed/cancelled, sessions created, errors, health failures
  • Maintain ring buffer of execution times (compute avg, p95 percentile)
  • Record events asynchronously (fire-and-forget)
  • Sample system state periodically (pool utilization, job counts, memory/CPU)

MetricsStore (store.py):

  • Async SQLite backend with WAL journaling
  • Time-series tables for metrics and events
  • Index on timestamp for fast historical queries
  • Automatic schema creation on init

Dashboard (dashboard.py):

  • Starlette sub-app serving /health, /metrics, /dashboard endpoints
  • Real-time WebSocket-like updates via polling
  • Pre-caches HTML on startup

Health Evaluation (health.py):

  • Classify server as "healthy", "degraded", or "unhealthy"
  • Based on: engine availability, error rates, health check failures
  • Return detailed issue list

Key Design Decisions:

  • Metrics are optional (disabled by default to reduce overhead)
  • Store is persistent across restarts
  • Dashboard uses Plotly.js for interactive charts
  • Health status returned as HTTP codes (503 for unhealthy, 200 otherwise)

Data Flow Diagrams

Synchronous Execution (Complete in Timeout)

sequenceDiagram
    participant Agent
    participant Server
    participant Executor
    participant Pool
    participant Engine
    participant Tracker

    Agent->>Server: execute_code("x = magic(3)")
    Server->>Server: Security check (OK)
    Server->>Tracker: Create job (PENDING)
    Server->>Executor: execute(session_id, code)
    Executor->>Pool: acquire()
    Pool->>Engine: Engine acquired (IDLE→BUSY)
    Executor->>Engine: inject_job_context(__mcp_job_id__, etc.)
    Executor->>Engine: eval(code)
    Engine-->>Executor: result in 2.5s
    Executor->>Executor: _safe_serialize(result)
    Executor->>Tracker: mark_completed(result)
    Executor->>Pool: release()
    Pool->>Engine: Engine released (BUSY→IDLE)
    Executor-->>Server: {status: completed, output: ...}
    Server-->>Agent: MCP result (inline)
Loading

Async Promotion (Timeout Exceeded)

sequenceDiagram
    participant Agent
    participant Server
    participant Executor
    participant Pool
    participant Engine
    participant BgTask
    participant Tracker

    Agent->>Server: execute_code("long_simulation()")
    Server->>Server: Security check (OK)
    Server->>Tracker: Create job (PENDING)
    Server->>Executor: execute(session_id, code)
    Executor->>Pool: acquire()
    Pool->>Engine: Engine acquired (IDLE→BUSY)
    Executor->>Engine: eval(code, background=True)
    Engine-->>Executor: MockFuture (code running in thread)
    Executor->>Executor: Wait sync_timeout (30s)
    Executor->>Executor: Timeout exceeded!
    Executor->>Tracker: mark_running(engine_id)
    Executor->>Tracker: Store future reference
    Executor-->>Server: {status: running, job_id: abc123}
    Server-->>Agent: MCP result (job_id)
    Note over BgTask: Background task monitors future
    Engine-->>BgTask: Code completes after 120s
    BgTask->>Tracker: mark_completed(result)
    BgTask->>Pool: release()
    Pool->>Engine: Engine released (BUSY→IDLE)
    Agent->>Server: get_job_result("abc123")
    Server->>Tracker: Retrieve completed job
    Server-->>Agent: Full result
Loading

File Upload & Execution

sequenceDiagram
    participant Agent
    participant Server as Server/Tools
    participant Security
    participant Session
    participant Executor
    participant Engine

    Agent->>Server: upload_data("data.csv", base64_content)
    Server->>Security: sanitize_filename("data.csv")
    Server->>Session: Get temp_dir for session
    Server->>Session: Write file to temp_dir/data.csv
    Server-->>Agent: {status: uploaded}
    
    Agent->>Server: execute_code("T = readtable('data.csv');")
    Server->>Security: Validate code (OK)
    Server->>Executor: execute(session_id, code, temp_dir)
    Executor->>Executor: Inject __mcp_temp_dir__ = temp_dir
    Executor->>Engine: eval(code)
    Engine-->>Executor: Table loaded, result
    Executor-->>Server: {status: completed, output: ...}
    Server-->>Agent: Result with table preview
Loading

Plot Generation & Conversion

sequenceDiagram
    participant Agent
    participant Server
    participant Executor
    participant Engine
    participant PropsHelper as mcp_extract_props.m
    participant Converter as plotly_style_mapper
    participant Formatter

    Agent->>Server: execute_code("plot(sin(0:0.1:2*pi))")
    Server->>Executor: execute(...)
    Executor->>Engine: eval(code)
    Engine-->>Executor: Figure created (handle returned)
    Executor->>PropsHelper: Call mcp_extract_props(fig_handle)
    PropsHelper->>PropsHelper: Extract axes, lines, markers, colors, grid
    PropsHelper-->>Executor: JSON file written to temp
    Executor->>Formatter: Format result
    Formatter->>Converter: Convert figure JSON to Plotly
    Converter->>Converter: Map MATLAB styles to Plotly
    Converter->>Converter: Build trace objects
    Converter-->>Formatter: Plotly JSON + PNG
    Formatter-->>Server: {plotly_figure: {...}, static_image_png: ...}
    Server-->>Agent: Interactive Plotly + static PNG
Loading

Key Design Decisions & Trade-Offs

1. Hybrid Sync/Async Execution

Decision: Execute all code synchronously first; auto-promote to async if timeout exceeded.

Rationale:

  • Simplifies agent logic (no need to pre-declare async)
  • Most code completes quickly; async overhead only when needed
  • Timeout-based promotion is transparent to agent

Trade-off:

  • Engines are held during timeout window (blocks other requests)
  • Mitigation: Configurable sync_timeout (default 30s is often adequate)

2. Elastic Engine Pool with Proactive Warmup

Decision: Scale engines on demand (min→max), pre-start when utilization high.

Rationale:

  • Cost-effective for variable load
  • Proactive warmup prevents request queuing under spike
  • Health checks replace broken engines automatically

Trade-off:

  • More engines = more memory (MATLAB engines are heavyweight, ~200MB each)
  • Idle engines are stopped after timeout to recover memory

3. Per-Session Temp Directories

Decision: Each session/user gets isolated temp dir with clear all on switch.

Rationale:

  • Prevents data leakage between users (multi-user SSE mode)
  • Automatic cleanup on session expiry
  • No state pollution

Trade-off:

  • Startup cost of clear all (rebuild workspace)
  • Mitigation: Configurable via workspace_isolation flag

4. Job Tracker as In-Memory Store

Decision: Track jobs in memory with periodic TTL-based cleanup.

Rationale:

  • Fast access (no DB latency)
  • Simple implementation
  • Sufficient for typical job lifetimes (hours)

Trade-off:

  • Job history lost on restart
  • Not suitable for extremely high job volumes (10k+ concurrent)

5. Plotly Conversion as Two-Stage Process

Decision: MATLAB side extracts figure properties to JSON; Python side converts to Plotly.

Rationale:

  • Decouples MATLAB rendering from Python logic
  • MATLAB side handles complex figure introspection (FastPlot, tiled layouts)
  • Python side handles style translation (more maintainable)

Trade-off:

  • Extra file I/O (JSON intermediate)
  • Requires MATLAB helper script (mcp_extract_props.m)

6. Security Validation via Regex Scan

Decision: Precompiled regex patterns for function/construct detection.

Rationale:

  • Fast, stateless checks (no AST parsing)
  • Smart literal stripping avoids false positives
  • Configurable blocklists

Trade-off:

  • Cannot detect obfuscated/indirect invocations (e.g., eval(eval('system')))
  • Mitigation: Trusted environment assumed (agents are AI clients, not arbitrary users)

7. Optional Monitoring System

Decision: Metrics collection is disabled by default; opt-in via config.

Rationale:

  • Zero overhead for resource-constrained deployments
  • Reduces operational complexity
  • Enabled in production for debugging

Trade-off:

  • Dashboard unavailable if disabled
  • Mitigation: Can be enabled at runtime via config reload

8. FastMCP as Protocol Handler

Decision: Use FastMCP library instead of custom MCP implementation.

Rationale:

  • Handles MCP protocol complexity (SSE, stdio, JSON-RPC)
  • Supports dynamic tool registration
  • Active community maintenance

Trade-off:

  • External dependency (but small and stable)
  • Limited customization (but sufficient for use cases)

Transport Modes

stdio (Default)

  • Single session, single agent
  • Communication via stdin/stdout
  • No network overhead
  • Simplest setup (local machine)

SSE (Server-Sent Events)

  • Multiple sessions, multiple concurrent agents
  • HTTP-based (remote-capable)
  • Requires reverse proxy with authentication in production
  • Session isolation enforced at manager level

Scalability Considerations

Component Bottleneck Mitigation
Engine Pool MATLAB memory (~200MB/engine) Set max_engines based on available RAM
Job Tracker In-memory job list Adjust job_retention_seconds to prune old jobs
Session Manager Temp directory disk usage Monitor {temp_dir} disk space; auto-cleanup
File Uploads Network bandwidth Set max_upload_size_mb appropriately
Monitoring DB SQLite write throughput Reduce sample_interval if contention observed

Deployment Architectures

Single-User (stdio)

Agent ←→ [MCP Server] ←→ [Engine Pool] ←→ [MATLAB]
(stdio transport, 2-4 engines typical)

Multi-User (SSE)

[Load Balancer]
    ↓
[MCP Server 1] ←→ [Shared Engine Pool] ←→ [MATLAB]
    ↑
[MCP Server 2] ←→ (or separate engines per server)
    ↑
[Agent 1, Agent 2, Agent 3]
(requires authentication, monitoring critical)

Clone this wiki locally