fix(cloud-agent): prevent agent-server EADDRINUSE on start retry#60900
Merged
Conversation
| return wait_for_health_check(self.execute, self.id, AGENT_SERVER_PORT, max_attempts, poll_interval) | ||
|
|
||
| def _agent_server_is_healthy(self) -> bool: | ||
| return wait_for_health_check(self.execute, self.id, AGENT_SERVER_PORT, max_attempts=1, poll_interval=0.0) |
Member
There was a problem hiding this comment.
max_attemps=1 and poll internal = 0.0.? is that right?
Contributor
Author
There was a problem hiding this comment.
yes, so max_attempts=1 so the bash for loop runs once = a single /health check. So that if nothing is listensing, we fail quick
for poll_interval=0.0, the trailing sleep in the loop becomes a no-op sleep 0.0. With the default 0.5 it'd waste half a second on a probe that never loops
Contributor
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (0 modified, 1 added, 0 deleted) What this means:
Next steps:
|
Contributor
|
Reviews (1): Last reviewed commit: "test(mcp): update unit test snapshots" | Re-trigger Greptile |
VojtechBartos
approved these changes
Jun 1, 2026
Contributor
|
Size Change: 0 B Total Size: 80.9 MB ℹ️ View Unchanged
|
147b5fa to
0f71d80
Compare
Contributor
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (0 modified, 1 added, 0 deleted) What this means:
Next steps:
|
The agent-server binds its port before the session is initialized, but the readiness probe only passes once the session exists. A slow session init could trip the probe, fail the start_agent_server activity, and trigger a Temporal retry that relaunched the server while the first process was still holding the port — crashing the new process with EADDRINUSE. Make the Modal launch idempotent: reuse an already-healthy server, otherwise free the port before relaunching. Also widen the readiness budget so slow resume/cold-start session inits don't trip the probe in the first place.
41bb3a9 to
6223577
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The cloud agent-server occasionally crashed on startup with
Error: listen EADDRINUSE: address already in use :::8080.Root cause is a timing interaction.