Skip to content

[Feature][Java] Move MCP resource discovery to runtime#639

Open
yanand0909 wants to merge 1 commit into
apache:mainfrom
yanand0909:runtime_mcp_resource_discovery
Open

[Feature][Java] Move MCP resource discovery to runtime#639
yanand0909 wants to merge 1 commit into
apache:mainfrom
yanand0909:runtime_mcp_resource_discovery

Conversation

@yanand0909
Copy link
Copy Markdown
Collaborator

Linked issue: 608

Purpose of change

MCP tools and prompts were previously discovered at compile time inside AgentPlan.extractJavaMCPServer(): an MCPServer connection was opened, listTools()/listPrompts() were called over the network, results were serialized into the plan, and the connection was immediately closed. This had three problems:

  1. Build-time server dependency — AgentPlan construction failed if the MCP server was unreachable, even though the agent wouldn't run until later.
  2. Static capabilities — Tools and prompts were frozen at compile time; any server-side changes required recompilation.
  3. No connection reuse — The connection opened for discovery was closed immediately; a new one was established later for actual tool calls.

This PR implements Part 1: Runtime Discovery from discussion #543. MCP tool and prompt discovery is moved from AgentPlan construction to ActionExecutionOperator.open():

Tests

  • AgentPlanDeclareMCPServerTest — Updated to assert the new compile-time contract: TOOL and PROMPT providers are absent from AgentPlan providers (deferred to runtime), only MCP_SERVER is registered. Tool/prompt retrieval tests now instantiate the MCPServer directly from the plan's provider, simulating what JavaMCPResourceDiscovery does at operator startup.
  • JavaMCPResourceDiscoveryTest (new) — Six unit tests covering JavaMCPResourceDiscovery with stub objects (no live server required):
    • Happy path: tools and prompts are discovered and placed in ResourceCache
    • supportsPrompts() = false: prompt discovery is skipped
    • Non-JavaResourceProvider entries are ignored

API

No public API changes

Documentation

  • doc-needed
  • doc-not-needed
  • doc-included

@wenjin272
Copy link
Copy Markdown
Collaborator

Thanks for your contribution, @yanand0909. Due to the upcoming traditional Chinese holiday, my review may be slightly delayed.

Copy link
Copy Markdown
Collaborator

@weiqingy weiqingy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on — moving discovery out of plan construction is a real cleanup. Two cross-cutting questions before I leave inline notes:

  1. MCPServer lifecycle when discovery yields nothing. discoverJavaMCPResources instantiates the server and calls listTools() / supportsPrompts(), both of which lazily initialize an McpSyncClient over HTTP (MCPServer.getClient() at MCPServer.java:304-309, lazy create at MCPServer.java:316-343). The only things put in cache are the discovered tools and prompts; the server itself isn't cached. Shutdown then relies on MCPTool.close() / MCPPrompt.close() calling back into mcpServer.close() — which works when at least one tool or prompt was discovered. If listTools() returns empty AND prompts are unsupported (or also empty), nothing in the cache references the server, so mcpServer.close() never runs and the initialized McpSyncClient (with its HTTP transport) leaks. An empty-tools server is unusual but possible — a prompts-only server, a misconfigured server, a server going through gradual rollout. Two shapes that would close this gap, in case either helps: (a) put the server itself in the cache (cache.put(serverName, MCP_SERVER, mcpServer) after provide()), or (b) track discovered servers in a list owned by ActionExecutionOperator and close them in the operator's close(). How do you want to handle it?

  2. Shutdown contract isn't pinned by a test. The 6 new unit tests cover discovery happy/sad paths, but none verify that cache.close() releases the server. Combined with #1, that means the lifecycle invariant (whichever form it takes) lives entirely in reviewer memory. The FakeMCPServer stub already has the right shape — adding a closeCalled flag plus a test that closes the cache and asserts it flipped is a few lines. Worth doing?

shortTermMemState = getRuntimeContext().getMapState(shortTermMemStateDescriptor);

resourceCache = new ResourceCache(agentPlan.getResourceProviders());
JavaMCPResourceDiscovery.discoverJavaMCPResources(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaMCPResourceDiscovery.discoverJavaMCPResources runs synchronously here, which means a slow or briefly-unreachable MCP server stalls (or fails) the entire operator's startup. Discussion #543 anticipates this with failFastOnStartup(true) as the default and a graceful-degradation opt-in — this PR ships fail-fast only. Is the intent to defer the opt-in to a follow-up, and does the current shape leave room for it (e.g., a per-server try/catch boundary around the discovery loop)?

continue;
}

Object mcpServer = rp.provide(null);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rp.provide(null) passes null as ResourceContext. PythonMCPResourceDiscovery at PythonMCPResourceDiscovery.java:73 passes cache.getResourceContext() for the same role. Today MCPServer's constructor just stores the field without dereferencing it, so the null is benign — but if a future Java MCP server resolves a dependent resource through getResourceContext(), it would NPE here in a way that doesn't manifest on the Python side. Any reason to keep the null rather than match the Python path?


// Tools and prompts are NOT serialized into the plan (they are runtime-discovered)
assertFalse(
json.contains("\"add\"") && json.contains("java_serializable"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertFalse(
    json.contains("\"add\"") && json.contains("java_serializable"),
    "JSON should not contain a serialized 'add' tool provider");

This passes when either substring is missing, so a regression that drops "add" from the JSON for an unrelated reason would silently make this test green — the very class of bug the test is meant to catch. Since the design contract you're locking down is "no java_serializable providers for MCP-discovered resources at all," one alternative in case it helps:

assertFalse(json.contains("java_serializable"),
        "JSON should not contain any java_serializable provider entries (MCP discovery is deferred)");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants