worker: infinitely wait for start#983
Conversation
Instead of returning error on worker start after 10s, wait indefinitely until the worker can start or until shutdown has been signalled. This is important for environments whereby you spin up 100-500 workers at the same time and it takes some time for the cluster to settle. Signed-off-by: joshvanl <me@joshvanl.dev>
There was a problem hiding this comment.
Pull request overview
This PR changes the durable task gRPC worker startup behavior to wait indefinitely for the work-item stream to become available, rather than failing after a fixed 10s timeout—intended to better support environments that start hundreds of workers concurrently.
Changes:
- Replace the fixed 10s startup wait/timeout with an indefinite poll loop.
- Abort the startup wait if shutdown is signaled.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: joshvanl <me@joshvanl.dev>
There was a problem hiding this comment.
Pull request overview
This PR changes the Durable Task gRPC worker startup behavior to wait indefinitely for the work-item stream to be established (instead of failing after 10 seconds), unless shutdown is signaled.
Changes:
- Replace the fixed 10s
_stream_readywait duringTaskHubGrpcWorker.start()with an indefinite retry loop that also checks for shutdown and run-loop exit. - Adjust
TaskHubGrpcWorker.stop()to allow stopping whilestart()is blocked waiting for the stream.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: joshvanl <me@joshvanl.dev>
There was a problem hiding this comment.
Pull request overview
Updates the Durable Task gRPC worker startup behavior to avoid failing after a fixed 10s window, which better supports environments that start large numbers of workers concurrently and need more time for the cluster/sidecar to become ready.
Changes:
- Change
TaskHubGrpcWorker.start()to wait indefinitely for the work-item stream to be established, while still aborting if shutdown is signaled or the run-loop thread exits. - Track the run-loop thread via a new instance attribute and adjust
stop()’s guard to allowstop()to unblock an in-progressstart().
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #983 +/- ##
==========================================
- Coverage 86.63% 81.46% -5.17%
==========================================
Files 84 139 +55
Lines 4473 13525 +9052
==========================================
+ Hits 3875 11018 +7143
- Misses 598 2507 +1909 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: joshvanl <me@joshvanl.dev>
Instead of returning error on worker start after 10s, wait indefinitely until the worker can start or until shutdown has been signalled. This is important for environments whereby you spin up 100-500 workers at the same time and it takes some time for the cluster to settle.