Skip to content

feat: multi-MCP provider abstraction#8

Merged
WZ merged 9 commits into
mainfrom
feature/multi-mcp-abstraction
Mar 9, 2026
Merged

feat: multi-MCP provider abstraction#8
WZ merged 9 commits into
mainfrom
feature/multi-mcp-abstraction

Conversation

@WZ
Copy link
Copy Markdown
Owner

@WZ WZ commented Mar 9, 2026

Summary

  • Replace single Grafana MCP coupling with a multi-provider architecture supporting N MCP servers
  • Add MultiMcpClient that merges tool lists, routes callTool() by ownership, and provides role-based queries
  • Extract Loki pre-fetch logic into pluggable LogProviderAdapter / GrafanaLokiAdapter
  • Make investigation prompts accept dynamic provider-contributed fragments
  • Full backward compatibility: existing grafana: config key still works

Supported provider combinations (after adding provider configs)

  1. Grafana MCP alone — metrics + logs + dashboards (current default)
  2. Grafana MCP + VictoriaLogs MCP — Grafana for metrics/dashboards, VictoriaLogs for logs
  3. Coroot MCP or other platforms — via new adapters

Changes

  • Config: providers[] array with roles enum (metrics, logs, dashboards, dependencies), extensible for future roles like tracing
  • MultiMcpClient: tool merging, routing, collision prefixing, parallel connect/disconnect
  • GrafanaLokiAdapter: extracted from InvestigationAgent.getLokiLabelsHint() and getWorkingLogSelector()
  • Investigation agent: role-aware pre-fetch via hasRole() and adapter calls instead of hardcoded Grafana tool names
  • Prompts: buildLogCorrelationPrompt, buildMetricDeepDivePrompt, buildInfraHealthPrompt accept optional provider fragments

Test plan

  • 532 tests pass (19 new files, 6 new test files)
  • npx tsc --noEmit clean
  • Backward compat: grafana: config key transforms to providers[] automatically
  • Security scan: no secrets in diff

🤖 Generated with Claude Code

Wilson Li and others added 6 commits March 8, 2026 23:45
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract Loki pre-fetch logic (getLabelsHint, getWorkingLogSelector) from
InvestigationAgent into a standalone GrafanaLokiAdapter that implements
the new LogProviderAdapter interface. This prepares for multi-provider
support where the investigation agent delegates log queries through
adapter instances rather than hardcoding Loki-specific logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace McpClient with MultiMcpClient across the codebase:
- Create factory function (src/mcp/factory.ts) to build MultiMcpClient from config
- Update entrypoints (index.ts, cli.tsx, discover.tsx) to use createMultiMcpClient
- Update agent constructors (core.ts, investigation.ts, discovery.ts) to accept MultiMcpClient
- Update test mocks to include role-query methods (getProvidersByRole, getToolsByRole, hasRole)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 9, 2026 07:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the codebase from a single Grafana MCP dependency into a multi-provider MCP architecture, enabling multiple MCP servers to contribute tools by role (metrics/logs/dashboards/etc.) while keeping backward compatibility with the legacy grafana: config.

Changes:

  • Introduces MultiMcpClient (tool merging, collision handling, role queries) plus a factory to build it from config providers.
  • Extracts Loki prefetch logic into a pluggable LogProviderAdapter with a concrete GrafanaLokiAdapter.
  • Updates agents/entrypoints/prompts/config schema & tests to be provider/role-aware and support provider-contributed prompt fragments.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/mcp/multi-client.ts New multi-provider MCP router (tool merge, routing, role queries).
src/mcp/multi-client.test.ts Tests for tool merging/routing/collision behavior and role queries.
src/mcp/factory.ts Factory to create MultiMcpClient from config.providers.
src/mcp/adapters/types.ts Introduces LogProviderAdapter interface and TimeWindow type.
src/mcp/adapters/grafana-loki.ts Moves Loki label/selector prefetch logic into an adapter.
src/mcp/adapters/grafana-loki.test.ts Tests adapter behaviors (labels hint, selector probing, prompt fragment).
src/config/schema.ts Adds providers[] + roles with backward-compat transform from grafana.
src/config/schema.test.ts Tests providers array validation + backward compat behavior.
src/index.ts Switches runtime MCP wiring to createMultiMcpClient(config).
src/discover.tsx Switches discovery CLI wiring to createMultiMcpClient(config).
src/cli.tsx Switches interactive CLI MCP wiring to createMultiMcpClient(config).
src/agent/investigation.ts Uses roles/adapters for prefetch + prompt fragments; dynamic excluded tools.
src/agent/investigation.test.ts Updates mocks/types for MultiMcpClient.
src/agent/discovery.ts Updates MCP type to MultiMcpClient.
src/agent/discovery.test.ts Updates mocks/types for MultiMcpClient.
src/agent/core.ts Updates MCP type to MultiMcpClient.
src/agent/core.test.ts Updates mocks/types for MultiMcpClient.
src/agent/rca-prompts.ts Adds optional provider fragments to phase prompt builders.
src/agent/rca-prompts.test.ts Tests provider fragment override + default fallback behavior.
docs/plans/2026-03-08-web-gui-plan.md Adds a web GUI implementation plan document.
docs/plans/2026-03-08-multi-mcp-implementation.md Adds multi-MCP implementation plan document.
docs/plans/2026-03-08-multi-mcp-abstraction-design.md Adds multi-MCP design document.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/agent/investigation.ts Outdated
Comment thread src/mcp/multi-client.ts
Comment thread src/mcp/multi-client.ts Outdated
Comment thread src/config/schema.ts
Comment thread src/agent/investigation.ts
Comment thread src/agent/rca-prompts.ts Outdated
Comment thread src/mcp/adapters/grafana-loki.ts Outdated
Comment thread src/mcp/adapters/grafana-loki.ts Outdated
Comment thread docs/plans/2026-03-08-web-gui-plan.md Outdated
Wilson Li and others added 2 commits March 9, 2026 00:51
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@WZ WZ force-pushed the feature/multi-mcp-abstraction branch from 21aa96e to 030cc30 Compare March 9, 2026 08:38
- Use `__` delimiter for tool collision prefixing (OpenAI charset compliance)
- Add provider name regex validation and uniqueness check
- Use Promise.allSettled with rollback on partial connect failure
- Detect Grafana Loki by tool availability, not provider name
- Fix log prompt header to be generic when custom fragment provided
- Add LogQL value/regex escaping for service names
- Parse JSON for probe success check instead of brittle string matching
- Handle collision-prefixed tool names in excludedTools filter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@WZ WZ merged commit 6e268ec into main Mar 9, 2026
1 check passed
WZ added a commit that referenced this pull request Apr 2, 2026
- Use `__` delimiter for tool collision prefixing (OpenAI charset compliance)
- Add provider name regex validation and uniqueness check
- Use Promise.allSettled with rollback on partial connect failure
- Detect Grafana Loki by tool availability, not provider name
- Fix log prompt header to be generic when custom fragment provided
- Add LogQL value/regex escaping for service names
- Parse JSON for probe success check instead of brittle string matching
- Handle collision-prefixed tool names in excludedTools filter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WZ added a commit that referenced this pull request Apr 2, 2026
feat: multi-MCP provider abstraction
WZ added a commit that referenced this pull request Apr 17, 2026
B-1 (issue #8): Skip ServiceHealthPoller start for stacks with no viable
metrics provider at boot/create time. Stacks with 3 registered providers
but toolCount:0 at runtime used to log-spam "metric query tool not found"
every 10s. ProviderRegistry now emits change events (add/update/remove/
test); StackManager subscribes and auto-starts a previously-skipped poller
once a working metrics provider shows up — no server restart needed.

B-2: Add last_active_at / inactive_at / deleted_at columns to the stacks
table via idempotent ALTER (guarded by PRAGMA table_info). Activity bumps
fire on any /api request, successful poll cycle, or stack-scoped webhook.
A background reaper marks stacks inactive after 30 days and soft-deletes
after 60 days (default stack is exempt). Soft-deleted stacks stay in the
DB for audit but are filtered from getStack / getStackBySlug / listStacks;
listStacksIncludingDeleted is the escape hatch. The explicit DELETE route
remains a hard delete, distinct from TTL soft-delete.

Tests: 21 new tests covering migration idempotency, the gate + event-
driven re-start, the TTL transitions, and listing filters. Migration was
dry-run against a copy of the real shared DB to confirm no corruption.
WZ added a commit that referenced this pull request Apr 17, 2026
B-1 (issue #8): Skip ServiceHealthPoller start for stacks with no viable
metrics provider at boot/create time. Stacks with 3 registered providers
but toolCount:0 at runtime used to log-spam "metric query tool not found"
every 10s. ProviderRegistry now emits change events (add/update/remove/
test); StackManager subscribes and auto-starts a previously-skipped poller
once a working metrics provider shows up — no server restart needed.

B-2: Add last_active_at / inactive_at / deleted_at columns to the stacks
table via idempotent ALTER (guarded by PRAGMA table_info). Activity bumps
fire on any /api request, successful poll cycle, or stack-scoped webhook.
A background reaper marks stacks inactive after 30 days and soft-deletes
after 60 days (default stack is exempt). Soft-deleted stacks stay in the
DB for audit but are filtered from getStack / getStackBySlug / listStacks;
listStacksIncludingDeleted is the escape hatch. The explicit DELETE route
remains a hard delete, distinct from TTL soft-delete.

Tests: 21 new tests covering migration idempotency, the gate + event-
driven re-start, the TTL transitions, and listing filters. Migration was
dry-run against a copy of the real shared DB to confirm no corruption.
WZ added a commit that referenced this pull request Apr 17, 2026
B-1 (issue #8): Skip ServiceHealthPoller start for stacks with no viable
metrics provider at boot/create time. Stacks with 3 registered providers
but toolCount:0 at runtime used to log-spam "metric query tool not found"
every 10s. ProviderRegistry now emits change events (add/update/remove/
test); StackManager subscribes and auto-starts a previously-skipped poller
once a working metrics provider shows up — no server restart needed.

B-2: Add last_active_at / inactive_at / deleted_at columns to the stacks
table via idempotent ALTER (guarded by PRAGMA table_info). Activity bumps
fire on any /api request, successful poll cycle, or stack-scoped webhook.
A background reaper marks stacks inactive after 30 days and soft-deletes
after 60 days (default stack is exempt). Soft-deleted stacks stay in the
DB for audit but are filtered from getStack / getStackBySlug / listStacks;
listStacksIncludingDeleted is the escape hatch. The explicit DELETE route
remains a hard delete, distinct from TTL soft-delete.

Tests: 21 new tests covering migration idempotency, the gate + event-
driven re-start, the TTL transitions, and listing filters. Migration was
dry-run against a copy of the real shared DB to confirm no corruption.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants