feat: v2.0.0 — MQTT events, agent hardening, and JSON contract by chenliuyun · Pull Request #6 · OpenWonderLabs/switchbot-openapi-cli

chenliuyun · 2026-04-19T09:07:26Z

Summary

This PR bumps 1.3.2 → 2.0.0 and lands four phases of improvements plus a set of contract fixes discovered during review.

Breaking changes

Top-level JSON envelope — every --json response is now {schemaVersion:'1.1', data:...} or {schemaVersion:'1.1', error:...}. Consumers that read parsed.foo must now read parsed.data.foo (or parsed.error on failure).
batch.failed[].error shape — string → ErrorPayload object. Read .message for the old string content; use .transient / .retryAfterMs for retry decisions.
HTTP MCP default bind — 0.0.0.0 → 127.0.0.1. Pass --bind 0.0.0.0 --auth-token <token> to restore external reachability.

New features (Phases A–I, already on branch since v1.3.2)

Phase A — HTTP auth (Bearer token), safe-by-default bind, CORS, rate limiting
Phase B — Idempotency keys end-to-end (CLI + batch + MCP)
Phase C — MQTT client + EventSubscriptionManager infrastructure
Phase D — Richer error payloads (kind, transient, retryAfterMs, errorClass)
Phase F — account_overview MCP tool for agent cold-start
Phase G — Health (/healthz, /ready), metrics (/metrics), structured logging (pino)
Phase H — Docker + systemd deployment artifacts
Phase I — Tool descriptions, schema-versioning docs, agent guide

Contract fixes (this fixup wave)

Phase 1 — Per-request profile routing via AsyncLocalStorage — multi-tenant HTTP now actually routes each request to the correct SwitchBot account.
Phase 2 — Top-level {schemaVersion, data|error} envelope (the breaking change). All ~20 printJson callsites now wrap automatically.
Phase 3 — EventSubscriptionManager properly initialized from env vars (SWITCHBOT_MQTT_*); /ready returns 503 + reason:'mqtt disabled' when MQTT creds absent; /metrics adds switchbot_mqtt_state{state=...} gauge; switchbot://events MCP resource registered.
Phase 4 — Dead scheduleStableEvent timer removed; swallowed JSON parse errors in MQTT shadow handler replaced with log.debug; no-op try/rethrow removed.
Phase 5 — Type safety (NodeJS.ErrnoException, Device shape); new tests for IdempotencyCache, logger, EventSubscriptionManager defaults, /ready + /metrics health endpoints.

Migration guide (consumers of the JSON output)

Before (≤ 1.3.2)	After (2.0.0)
`parsed.foo`	`parsed.data.foo`
`parsed.error` (on stderr)	`parsed.error.message` etc.
`parsed.failed[i].error` (string)	`parsed.failed[i].error.message`
`mcp serve` binds `0.0.0.0`	`mcp serve` binds `127.0.0.1`; add `--bind 0.0.0.0 --auth-token $T`

Test plan

npm run build — clean TypeScript compile, zero errors
npm test — 685 tests pass (41 test files)
All pre-existing tests updated for envelope (parsed.data.*)
New test file tests/commands/mcp-http-health.test.ts — /ready 503, /metrics state gauge, EventSubscriptionManager defaults
New test files tests/lib/idempotency.test.ts, tests/logger.test.ts

🤖 Generated with Claude Code

…ting - MCP HTTP now binds 127.0.0.1 by default (not 0.0.0.0) - Add --bind <host> flag to override (must have --auth-token for external) - Add --auth-token <token> flag for Bearer auth (fallback: SWITCHBOT_MCP_TOKEN env) - Add --cors-origin <url> flag (repeatable) for CORS preflight - Add --rate-limit <n> flag (default 60 req/min) per profile - Constant-time token comparison to prevent timing attacks - Graceful shutdown on SIGTERM/SIGINT with 30s drain timeout - Startup log now shows truth about binding (e.g. 'listening on http://127.0.0.1:3030/mcp') - All tests pass (659/659)

- New src/lib/idempotency.ts with LRU cache (1024 entries, 60s TTL) - Modify executeCommand() to accept optional { idempotencyKey } param - Thread cache through idempotencyCache.run() for transparent dedup - No key = always execute (backward compat) - Expired/new keys trigger fresh execution and cache update - All tests pass (659/659)

…ency-key-prefix integration Thread idempotency keys through the CLI interface: - devices command: add --idempotency-key <key> to replay single commands safely - devices batch: add --idempotency-key-prefix <prefix> to derive per-device keys Examples: switchbot devices command BOT1 turnOn --idempotency-key abc123 switchbot devices batch turnOn --ids A,B,C --idempotency-key-prefix batch-001 All 659 tests passing. Backward compatible — idempotency is opt-in.

…ructure Lay foundation for real-time event streaming: - src/mqtt/client.ts: New MQTT client with reconnect logic, auth refresh callbacks, state management (connecting/connected/reconnecting/failed) - src/mcp/events-subscription.ts: Event subscription manager with ring buffer (1000 events), overflow detection, per-subscriber filtering, idle cleanup - src/commands/mcp.ts: Integrate shared EventSubscriptionManager into HTTP serve mode, with graceful shutdown Features: - Auth refresh callbacks on reconnect failure for cert rotation scenarios - Synthetic events for overflow notices (events.dropped) and reconnection (events.reconnected) - Per-subscriber event filtering using existing filter grammar - Idle subscriber cleanup after 10 minutes - Exponential backoff for reconnection (1s, 2s, 4s, ...30s) Note: MQTT credential resolution still TBD — awaiting SwitchBot MQTT endpoint documentation. All 659 tests passing. Foundation ready for event streaming integration.

Add detailed error information to help agents make intelligent retry decisions: - ErrorPayload: new fields retryAfterMs, transient, errorClass - ApiError: track Retry-After header value and classify transience - batch command: failed[] now returns {deviceId, error: ErrorPayload} instead of {deviceId, error: string} - schemaVersion bumped to "1.1" (backward-compatible additive change) Error classification: - transient: true for 429, 5xx, connection timeouts (can retry) - errorClass: network|api|device-offline|device-busy|guard|usage - retryAfterMs: parsed from Retry-After header when available All 659 tests passing. Agents can now examine error.errorClass to branch on error type and use retryAfterMs to determine backoff.

Add account_overview MCP tool and CLI command for bootstrap initialization: - Bundles: device list, IR remotes, scenes, quota usage, cache status, MQTT state - Single call replaces: list_devices + list_scenes + quota status + cache show - Includes MQTT connection state in HTTP mode (eventManager.getState()) - schemaVersion 1.1, version 1.7.0 in response Useful for: - Agent cold-start (one call to understand account state) - Periodic health checks (cache age, quota, MQTT connection) - Integration debugging All 659 tests passing.

Add observability infrastructure for production monitoring: - src/logger.ts: pino logger factory (LOG_LEVEL, LOG_FORMAT env vars) - /healthz endpoint: always 200, returns {ok, version, pid, uptimeSec} - /ready endpoint: 200 when MQTT connected, 503 otherwise - /metrics endpoint: Prometheus text format (0.0.4) with gauges: - switchbot_mqtt_connected - switchbot_mqtt_subscribers - process_uptime_seconds No debug logging added yet (deferred to Phase G part 2 when needed). Health endpoints bypass auth/rate limiting for orchestrator liveness probes. All 659 tests passing.

Add production deployment files: - Dockerfile: multi-stage build, Node 20-alpine, unprivileged user (10001), healthcheck - docker-compose.example.yml: example setup with env vars, healthcheck - contrib/systemd/switchbot-mcp.service: systemd unit with hardening (ProtectSystem, PrivateTmp) Usage: docker build -t switchbot:1.7 . docker-compose --env-file .env up Or systemd: sudo cp contrib/systemd/switchbot-mcp.service /etc/systemd/system/ sudo systemctl enable --now switchbot-mcp All 659 tests passing.

Improve agent developer experience with richer documentation: - Upgraded tool descriptions for send_command and list_devices (120+ chars with context) - docs/schema-versioning.md: explains v1→v1.1 backward-compatibility and migration path - Clarified that schemaVersion "1.1" is backward-compatible with "1" parsers Schema versioning policy: - Additive changes (new optional fields) → minor bump (1.1, 1.2, ...) - Breaking changes → major bump (2.0) - Parsers pinning "1" continue to work on 1.1+ (backward-compatible) - Migration guide included for v1.6 → v1.7 (batch error payload change) All 659 tests passing.

Bumps package.json 1.7.0 → 2.0.0 and refreshes the hard-coded version strings inside the MCP server, /healthz, /ready, and account_overview. Adds tsconfig.build.json (sourceMap:false, declaration:false) plus a build:prod + clean + prepublishOnly pipeline so the published tarball drops .js.map and .d.ts files. Result against the prior build: - package size: 140.2 kB → 83.0 kB (−41%) - unpacked: 622.7 kB → 328.1 kB (−47%) - files: 144 → 45 A CLI binary has no consumers that import its types or need shipped source maps; local dev still emits both via the default tsc target. Version 2.0.0 is the first npm release after 1.3.2 and carries three breaking changes that land over the following commits: JSON envelope with top-level schemaVersion, batch.failed[].error shape from string to object, and HTTP MCP default bind flipped to 127.0.0.1.

Previously, HTTP MCP requests extracted x-switchbot-profile / ?profile but used the value only as a rate-limit bucket key. Every tool call then resolved credentials via the process-global --profile flag in loadConfig(), so multi-tenant HTTP deployments silently collapsed all traffic onto the default account. This change introduces src/lib/request-context.ts — a tiny AsyncLocalStorage wrapper with withRequestContext() and getActiveProfile(). loadConfig() and configFilePath() now read the active profile via getActiveProfile(), which prefers the ALS context and falls back to the CLI flag when no HTTP context is active. The HTTP handler wraps each request in withRequestContext so tool calls land in the right account. Also rejects unknown profiles with 401 before entering MCP dispatch, so probing for valid profile names is closed off and agents get a clear error instead of a confusing credentials-missing exit. Stdio mode is unchanged: no request context, so getActiveProfile() goes straight to the flag lookup. Tests: tests/lib/request-context.test.ts covers concurrent isolation, nested contexts, and flag fallback.

…lope Every --json response now emits {schemaVersion:'1.1', data:...} on success and {schemaVersion:'1.1', error:...} on failure, fulfilling the contract documented in docs/schema-versioning.md. - src/utils/output.ts: printJson wraps payload in {schemaVersion, data}; handleError JSON branch wraps in {schemaVersion, error} - src/commands/capabilities.ts: switch raw console.log to printJson - src/commands/schema.ts: drop non-json-mode raw branch, always use printJson - docs/schema-versioning.md: add envelope shape examples, migration guide from v1.x, note that batch.summary.schemaVersion is the historical nested location kept for back-compat - All test files updated to unwrap .data (success) or .error (failure) from the parsed envelope

…//events resource - src/mqtt/client.ts: add 'disabled' to MqttState - src/mqtt/credential.ts: new file — resolve MQTT config from SWITCHBOT_MQTT_HOST / USERNAME / PASSWORD env vars; returns null when any are absent - src/mcp/events-subscription.ts: getState() returns 'disabled' (not 'idle') when no client; add getRecentEvents(limit) to expose ring buffer for MCP resource reads - src/commands/mcp.ts: - import getMqttConfig and call eventManager.initialize() on startup if creds present; log a warning and leave manager disabled if not - remove dead mqttInitialized variable - /ready: returns 503 + {ready:false, reason:'mqtt disabled', mqtt:'disabled'} when MQTT is not configured; 503 + reason:'mqtt failed' on failure - /metrics: add switchbot_mqtt_state{state=...} gauge (one per state) so dashboards can distinguish disabled/connecting/connected/failed - register switchbot://events MCP resource backed by the ring buffer; returns {state, count, events[]} snapshot when read - add resources:{} to server capabilities - tests/commands/mcp-http-health.test.ts: new file covering /ready 503 + reason, /metrics state gauge, and EventSubscriptionManager defaults

- src/mqtt/client.ts: delete scheduleStableEvent() and its caller in onConnect(); the timer body only nulled itself and never emitted anything. Also remove the unused stableThresholdMs field. - src/mcp/events-subscription.ts: replace empty catch {} with log.debug({err, topic}, ...) so JSON parse failures on shadow payloads are visible at debug level instead of silently discarded; simplify the no-op try/rethrow in subscribe() to a direct parseFilter() call.

…qtt and events Type safety: - src/mqtt/client.ts: replace (err as any).code with (err as NodeJS.ErrnoException).code - src/mcp/events-subscription.ts: import Device type and construct a Device-compatible shape instead of casting a partial object as any New tests: - tests/lib/idempotency.test.ts: LRU eviction, TTL expiry, concurrent same-key behavior, undefined-key passthrough, clear() - tests/logger.test.ts: LOG_LEVEL=warn silences debug; LOG_LEVEL=debug enables it; setLogLevel/getLogLevel roundtrip

The v2.4.0 release notes claimed "MCP tools mirror the tier in meta.agentSafetyTier" but only aggregate_device_history (added in 2.5.0 work) actually exposed it. This fix adds _meta: { agentSafetyTier: <tier> } to all other 10 MCP tool registrations, matching their CLI safety tiers from COMMAND_META: - list_devices, get_device_status, get_device_history, query_device_history, list_scenes, search_catalog, describe_device, account_overview: read - send_command, run_scene: action Also adds tests/mcp/tool-meta.test.ts to verify every tool has _meta and spot-check key tiers match expected values. Fixes bug #6. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Document every fix landed in this branch beyond the history-aggregate feature: bugs #1, #4, #5, #6, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #18 from the OpenClaw v2.4.0 smoke-test report. Call out the deferred items (#2, #7) explicitly so readers don't assume they were overlooked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The v2.4.0 release notes claimed "MCP tools mirror the tier in meta.agentSafetyTier" but only aggregate_device_history (added in 2.5.0 work) actually exposed it. This fix adds _meta: { agentSafetyTier: <tier> } to all other 10 MCP tool registrations, matching their CLI safety tiers from COMMAND_META: - list_devices, get_device_status, get_device_history, query_device_history, list_scenes, search_catalog, describe_device, account_overview: read - send_command, run_scene: action Also adds tests/mcp/tool-meta.test.ts to verify every tool has _meta and spot-check key tiers match expected values. Fixes bug #6.

Document every fix landed in this branch beyond the history-aggregate feature: bugs #1, #4, #5, #6, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #18 from the OpenClaw v2.4.0 smoke-test report. Call out the deferred items (#2, #7) explicitly so readers don't assume they were overlooked.

chenliuyun added 16 commits April 19, 2026 15:53

docs: add GitHub Releases link in README

46e8709

chenliuyun merged commit f9e4ca3 into main Apr 19, 2026
3 checks passed

chenliuyun deleted the feat/agent-hardening branch April 19, 2026 09:19

chenliuyun mentioned this pull request Apr 20, 2026

feat: 2.5.0 — history aggregate + v2.4.0 report bug fixes #19

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v2.0.0 — MQTT events, agent hardening, and JSON contract#6

feat: v2.0.0 — MQTT events, agent hardening, and JSON contract#6
chenliuyun merged 16 commits intomainfrom
feat/agent-hardening

chenliuyun commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chenliuyun commented Apr 19, 2026

Summary

Breaking changes

New features (Phases A–I, already on branch since v1.3.2)

Contract fixes (this fixup wave)

Migration guide (consumers of the JSON output)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant