Skip to content

fix(mcp): retry connect calls on transient grpc errs#1062

Merged
sicoyle merged 5 commits into
dapr:mainfrom
sicoyle:fix/mcp-connect-retries
May 28, 2026
Merged

fix(mcp): retry connect calls on transient grpc errs#1062
sicoyle merged 5 commits into
dapr:mainfrom
sicoyle:fix/mcp-connect-retries

Conversation

@sicoyle
Copy link
Copy Markdown
Contributor

@sicoyle sicoyle commented May 27, 2026

Description

This PR wraps the mcp connect calls (sync and async) with a bounded retry that absorbs CANCELLED or UNAVAILABLE grpc errs within the timeout budget specified. Any other err propagates.

Issue reference

We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation.

Please reference the issue this PR will close: #[issue number]

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

  • Code compiles correctly
  • Created/updated tests
  • Extended the documentation

sicoyle added 2 commits May 27, 2026 09:58
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Copilot AI review requested due to automatic review settings May 27, 2026 15:00
@sicoyle sicoyle requested review from a team as code owners May 27, 2026 15:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds bounded retry behavior to MCP client connect() calls (sync + async) so that transient gRPC errors during workflow scheduling (CANCELLED, UNAVAILABLE) are retried within the caller-provided timeout budget, and updates the MCP client tests to cover these retry paths.

Changes:

  • Add transient gRPC error classification and bounded retry loop around schedule_new_workflow() in sync DaprMCPClient.connect().
  • Add the equivalent retry logic to async AioDaprMCPClient.connect().
  • Add new unit tests validating retry success, non-transient propagation, and deadline exhaustion.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
ext/dapr-ext-workflow/dapr/ext/workflow/mcp.py Implements transient-error classification + retry loop for sync MCP connect() scheduling, and adjusts remaining timeout passed to completion wait.
ext/dapr-ext-workflow/dapr/ext/workflow/aio/mcp.py Implements the same retry-and-budget logic for async MCP connect().
ext/dapr-ext-workflow/tests/test_mcp_client.py Adds tests for retry-on-transient-gRPC-error behavior for both sync and async clients.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ext/dapr-ext-workflow/dapr/ext/workflow/mcp.py Outdated
Comment thread ext/dapr-ext-workflow/dapr/ext/workflow/mcp.py Outdated
Comment thread ext/dapr-ext-workflow/dapr/ext/workflow/aio/mcp.py Outdated
Comment thread ext/dapr-ext-workflow/dapr/ext/workflow/aio/mcp.py Outdated
sicoyle added 2 commits May 27, 2026 10:25
Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: Samantha Coyle <sam@diagrid.io>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

❌ Patch coverage is 98.40000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.66%. Comparing base (bffb749) to head (58859e6).
⚠️ Report is 135 commits behind head on main.

Files with missing lines Patch % Lines
ext/dapr-ext-workflow/dapr/ext/workflow/aio/mcp.py 94.73% 1 Missing ⚠️
ext/dapr-ext-workflow/dapr/ext/workflow/mcp.py 96.55% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1062      +/-   ##
==========================================
- Coverage   86.63%   82.66%   -3.97%     
==========================================
  Files          84      146      +62     
  Lines        4473    14693   +10220     
==========================================
+ Hits         3875    12146    +8271     
- Misses        598     2547    +1949     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sicoyle sicoyle added this to the v1.18 milestone May 27, 2026
@sicoyle sicoyle added this pull request to the merge queue May 28, 2026
Merged via the queue into dapr:main with commit 8571c3e May 28, 2026
18 of 19 checks passed
sicoyle added a commit that referenced this pull request May 28, 2026
* fix(mcp): retry connect calls on transient grpc errs



* style: comment cleanup



* fix: address copilot feedback



* style: appease linter



---------


(cherry picked from commit 8571c3e)

Signed-off-by: Samantha Coyle <sam@diagrid.io>
Signed-off-by: dapr-bot <dapr-bot@users.noreply.github.com>
Co-authored-by: Sam <sam@diagrid.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants