feat: multi-MCP provider abstraction#8
Merged
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract Loki pre-fetch logic (getLabelsHint, getWorkingLogSelector) from InvestigationAgent into a standalone GrafanaLokiAdapter that implements the new LogProviderAdapter interface. This prepares for multi-provider support where the investigation agent delegates log queries through adapter instances rather than hardcoding Loki-specific logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace McpClient with MultiMcpClient across the codebase: - Create factory function (src/mcp/factory.ts) to build MultiMcpClient from config - Update entrypoints (index.ts, cli.tsx, discover.tsx) to use createMultiMcpClient - Update agent constructors (core.ts, investigation.ts, discovery.ts) to accept MultiMcpClient - Update test mocks to include role-query methods (getProvidersByRole, getToolsByRole, hasRole) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors the codebase from a single Grafana MCP dependency into a multi-provider MCP architecture, enabling multiple MCP servers to contribute tools by role (metrics/logs/dashboards/etc.) while keeping backward compatibility with the legacy grafana: config.
Changes:
- Introduces
MultiMcpClient(tool merging, collision handling, role queries) plus a factory to build it from config providers. - Extracts Loki prefetch logic into a pluggable
LogProviderAdapterwith a concreteGrafanaLokiAdapter. - Updates agents/entrypoints/prompts/config schema & tests to be provider/role-aware and support provider-contributed prompt fragments.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/mcp/multi-client.ts | New multi-provider MCP router (tool merge, routing, role queries). |
| src/mcp/multi-client.test.ts | Tests for tool merging/routing/collision behavior and role queries. |
| src/mcp/factory.ts | Factory to create MultiMcpClient from config.providers. |
| src/mcp/adapters/types.ts | Introduces LogProviderAdapter interface and TimeWindow type. |
| src/mcp/adapters/grafana-loki.ts | Moves Loki label/selector prefetch logic into an adapter. |
| src/mcp/adapters/grafana-loki.test.ts | Tests adapter behaviors (labels hint, selector probing, prompt fragment). |
| src/config/schema.ts | Adds providers[] + roles with backward-compat transform from grafana. |
| src/config/schema.test.ts | Tests providers array validation + backward compat behavior. |
| src/index.ts | Switches runtime MCP wiring to createMultiMcpClient(config). |
| src/discover.tsx | Switches discovery CLI wiring to createMultiMcpClient(config). |
| src/cli.tsx | Switches interactive CLI MCP wiring to createMultiMcpClient(config). |
| src/agent/investigation.ts | Uses roles/adapters for prefetch + prompt fragments; dynamic excluded tools. |
| src/agent/investigation.test.ts | Updates mocks/types for MultiMcpClient. |
| src/agent/discovery.ts | Updates MCP type to MultiMcpClient. |
| src/agent/discovery.test.ts | Updates mocks/types for MultiMcpClient. |
| src/agent/core.ts | Updates MCP type to MultiMcpClient. |
| src/agent/core.test.ts | Updates mocks/types for MultiMcpClient. |
| src/agent/rca-prompts.ts | Adds optional provider fragments to phase prompt builders. |
| src/agent/rca-prompts.test.ts | Tests provider fragment override + default fallback behavior. |
| docs/plans/2026-03-08-web-gui-plan.md | Adds a web GUI implementation plan document. |
| docs/plans/2026-03-08-multi-mcp-implementation.md | Adds multi-MCP implementation plan document. |
| docs/plans/2026-03-08-multi-mcp-abstraction-design.md | Adds multi-MCP design document. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
21aa96e to
030cc30
Compare
- Use `__` delimiter for tool collision prefixing (OpenAI charset compliance) - Add provider name regex validation and uniqueness check - Use Promise.allSettled with rollback on partial connect failure - Detect Grafana Loki by tool availability, not provider name - Fix log prompt header to be generic when custom fragment provided - Add LogQL value/regex escaping for service names - Parse JSON for probe success check instead of brittle string matching - Handle collision-prefixed tool names in excludedTools filter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WZ
added a commit
that referenced
this pull request
Apr 2, 2026
- Use `__` delimiter for tool collision prefixing (OpenAI charset compliance) - Add provider name regex validation and uniqueness check - Use Promise.allSettled with rollback on partial connect failure - Detect Grafana Loki by tool availability, not provider name - Fix log prompt header to be generic when custom fragment provided - Add LogQL value/regex escaping for service names - Parse JSON for probe success check instead of brittle string matching - Handle collision-prefixed tool names in excludedTools filter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WZ
added a commit
that referenced
this pull request
Apr 2, 2026
feat: multi-MCP provider abstraction
WZ
added a commit
that referenced
this pull request
Apr 17, 2026
B-1 (issue #8): Skip ServiceHealthPoller start for stacks with no viable metrics provider at boot/create time. Stacks with 3 registered providers but toolCount:0 at runtime used to log-spam "metric query tool not found" every 10s. ProviderRegistry now emits change events (add/update/remove/ test); StackManager subscribes and auto-starts a previously-skipped poller once a working metrics provider shows up — no server restart needed. B-2: Add last_active_at / inactive_at / deleted_at columns to the stacks table via idempotent ALTER (guarded by PRAGMA table_info). Activity bumps fire on any /api request, successful poll cycle, or stack-scoped webhook. A background reaper marks stacks inactive after 30 days and soft-deletes after 60 days (default stack is exempt). Soft-deleted stacks stay in the DB for audit but are filtered from getStack / getStackBySlug / listStacks; listStacksIncludingDeleted is the escape hatch. The explicit DELETE route remains a hard delete, distinct from TTL soft-delete. Tests: 21 new tests covering migration idempotency, the gate + event- driven re-start, the TTL transitions, and listing filters. Migration was dry-run against a copy of the real shared DB to confirm no corruption.
7 tasks
WZ
added a commit
that referenced
this pull request
Apr 17, 2026
B-1 (issue #8): Skip ServiceHealthPoller start for stacks with no viable metrics provider at boot/create time. Stacks with 3 registered providers but toolCount:0 at runtime used to log-spam "metric query tool not found" every 10s. ProviderRegistry now emits change events (add/update/remove/ test); StackManager subscribes and auto-starts a previously-skipped poller once a working metrics provider shows up — no server restart needed. B-2: Add last_active_at / inactive_at / deleted_at columns to the stacks table via idempotent ALTER (guarded by PRAGMA table_info). Activity bumps fire on any /api request, successful poll cycle, or stack-scoped webhook. A background reaper marks stacks inactive after 30 days and soft-deletes after 60 days (default stack is exempt). Soft-deleted stacks stay in the DB for audit but are filtered from getStack / getStackBySlug / listStacks; listStacksIncludingDeleted is the escape hatch. The explicit DELETE route remains a hard delete, distinct from TTL soft-delete. Tests: 21 new tests covering migration idempotency, the gate + event- driven re-start, the TTL transitions, and listing filters. Migration was dry-run against a copy of the real shared DB to confirm no corruption.
WZ
added a commit
that referenced
this pull request
Apr 17, 2026
B-1 (issue #8): Skip ServiceHealthPoller start for stacks with no viable metrics provider at boot/create time. Stacks with 3 registered providers but toolCount:0 at runtime used to log-spam "metric query tool not found" every 10s. ProviderRegistry now emits change events (add/update/remove/ test); StackManager subscribes and auto-starts a previously-skipped poller once a working metrics provider shows up — no server restart needed. B-2: Add last_active_at / inactive_at / deleted_at columns to the stacks table via idempotent ALTER (guarded by PRAGMA table_info). Activity bumps fire on any /api request, successful poll cycle, or stack-scoped webhook. A background reaper marks stacks inactive after 30 days and soft-deletes after 60 days (default stack is exempt). Soft-deleted stacks stay in the DB for audit but are filtered from getStack / getStackBySlug / listStacks; listStacksIncludingDeleted is the escape hatch. The explicit DELETE route remains a hard delete, distinct from TTL soft-delete. Tests: 21 new tests covering migration idempotency, the gate + event- driven re-start, the TTL transitions, and listing filters. Migration was dry-run against a copy of the real shared DB to confirm no corruption.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MultiMcpClientthat merges tool lists, routescallTool()by ownership, and provides role-based queriesLogProviderAdapter/GrafanaLokiAdaptergrafana:config key still worksSupported provider combinations (after adding provider configs)
Changes
providers[]array withrolesenum (metrics,logs,dashboards,dependencies), extensible for future roles liketracingInvestigationAgent.getLokiLabelsHint()andgetWorkingLogSelector()hasRole()and adapter calls instead of hardcoded Grafana tool namesbuildLogCorrelationPrompt,buildMetricDeepDivePrompt,buildInfraHealthPromptaccept optional provider fragmentsTest plan
npx tsc --noEmitcleangrafana:config key transforms toproviders[]automatically🤖 Generated with Claude Code