deploy: wait for network-agent readiness in one-click startup#304
Conversation
Wait for network-agent readiness before starting Cubelet-dependent services and extend the Cubelet network-agent initialization timeout. This avoids template artifact distribution racing Cubelet startup when network-agent is still coming up. Assisted-by: Codex:GPT-5.5 Signed-off-by: xiongxz <xiuzhang.xiong@lexmount.com>
fe2e298 to
6c52e93
Compare
|
/cubebot review |
|
Claude finished @kinwin-ustc's task in 1m 9s —— View job Review Checklist
Code Review: PR #304This PR fixes a race condition where What Was Changed
Positive Aspects
Potential ConcernsNo Deadly error message references This references Edge case: network-agent goes down after SecurityNo security concerns. The readiness check is against NitThe SummaryThis is a clean, focused fix for a genuine race condition. The implementation is consistent across both script variants, uses existing infrastructure, and has appropriate configurability. The matched timeouts between the startup script and Cubelet's internal config are a thoughtful touch. |
Fix one-click startup racing network-agent readiness.
up.shandup-compute.shpreviously startednetwork-agentand immediately continued with the rest of the stack. On slower hosts, Cubelet can start whilenetwork-agentis still coming up, which can leave Cubelet-dependent services unavailable during template artifact distribution. In that case template creation can fail with:This waits for the
network-agent/readyzendpoint before starting Cubelet-dependent services, and exposesNETWORK_AGENT_READY_TIMEOUTso one-click deployments can tune the wait budget. The default Cubelet config also extends the internal network-agent initialization timeout to match the one-click startup budget.Validation
Assisted-by: Codex:GPT-5.5