Skip to content

[Cherry-Pick][APIServer][Feature] Add configurable worker health check timeout via FD_WORKER_ALIVE_TIMEOUT(#5865)#5867

Merged
Jiang-Jia-Jun merged 2 commits intorelease/2.4from
copilot/cherry-pick-pr-5865
Jan 5, 2026
Merged

[Cherry-Pick][APIServer][Feature] Add configurable worker health check timeout via FD_WORKER_ALIVE_TIMEOUT(#5865)#5867
Jiang-Jia-Jun merged 2 commits intorelease/2.4from
copilot/cherry-pick-pr-5865

Conversation

Copy link
Contributor

Copilot AI commented Jan 4, 2026

Motivation

Cherry-pick of #5865 to release/2.4 branch. Worker processes executing computations beyond 30 seconds triggered spurious "worker not healthy" errors due to hardcoded timeout in health check calls.

Modifications

Environment Variable

  • Added FD_WORKER_ALIVE_TIMEOUT to fastdeploy/envs.py (default: 30s, maintains backward compatibility)

Service Layer

  • Modified 4 check_health() invocations in serving_chat.py and serving_completion.py to accept configurable timeout via time_interval_threashold parameter
  • Added fastdeploy.envs imports

Documentation

  • Updated docs/usage/environment_variables.md and docs/zh/usage/environment_variables.md

Usage or Command

# Default 30s timeout
python -m fastdeploy.entrypoints.openai.serving_chat

# Extended timeout for long-running computations
FD_WORKER_ALIVE_TIMEOUT=120 python -m fastdeploy.entrypoints.openai.serving_chat

Accuracy Tests

N/A - Configuration change only, no model output affected.

Checklist

  • Add at least a tag in the PR title.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.
Original prompt

#5865 请将这个PR的改动cherry pick进release 2.4分支。 需要注意,你的PR标题需要遵循规则『PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)』


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link

paddle-bot bot commented Jan 4, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Jan 4, 2026
…a FD_WORKER_ALIVE_TIMEOUT

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
@Jiang-Jia-Jun Jiang-Jia-Jun marked this pull request as ready for review January 4, 2026 13:24
Copilot AI changed the title [WIP] [Cherry-Pick] Cherry pick changes from PR #5865 [Cherry-Pick][APIServer][Feature] Add configurable worker health check timeout via FD_WORKER_ALIVE_TIMEOUT(#5865) Jan 4, 2026
Copilot AI requested a review from Jiang-Jia-Jun January 4, 2026 13:25
@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 9de6ae3 into release/2.4 Jan 5, 2026
12 of 18 checks passed
@Jiang-Jia-Jun Jiang-Jia-Jun deleted the copilot/cherry-pick-pr-5865 branch January 5, 2026 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants