Skip to content

# perf(server): unblock event loop and grow concurrency budget#903

Merged
Pangjiping merged 10 commits into
alibaba:mainfrom
Pangjiping:perf/server-uvicorn-tuning
May 18, 2026
Merged

# perf(server): unblock event loop and grow concurrency budget#903
Pangjiping merged 10 commits into
alibaba:mainfrom
Pangjiping:perf/server-uvicorn-tuning

Conversation

@Pangjiping
Copy link
Copy Markdown
Collaborator

@Pangjiping Pangjiping commented May 17, 2026

Summary

  • Remove sync Kubernetes/Docker calls from the event loop so concurrent control-plane requests stop serializing.
  • Bump per-process concurrency knobs: uvicorn workers/limits, anyio threadpool size, informer-cached list path.
  • Defaults preserve current behavior (workers=1); operators dial up via the [server] TOML section.

Changes (commit by commit)

1. perf(server): expose uvicorn worker/concurrency knobs (745c1945)

  • pyproject.toml: uvicornuvicorn[standard] (pulls uvloop / httptools / watchfiles).
  • ServerConfig: workers (default 1), limit_concurrency (1024), backlog (2048), loop ("auto"), http ("auto").
  • cli.py: thread fields into uvicorn.run; --reload forces workers=1 and prints a notice.
  • main.py dev __main__: pass loop / http.
  • Docs (configuration.md) + unit tests (tests/test_config.py).

2. perf(server): unblock event loop by running blocking routes in threadpool (77327880)

  • api/lifecycle.py: 12 handlers async def → sync def (list/get/patch/delete/pause/resume/renew sandbox + create/list/get/delete snapshot + get_sandbox_endpoint). FastAPI auto-offloads sync routes to the anyio threadpool.
  • api/pool.py: 5 pool handlers same conversion.
  • create_sandbox stays async (its service is genuinely async).
  • Drop now-unused asyncio import and manual to_thread inside create_snapshot.
  • New regression: 8 × 200 ms concurrent list_sandboxes finishes in ~250 ms (vs 1.6 s serial floor).

3. perf(server): serve list_custom_objects from informer cache (35d9bf47)

  • New services/k8s/label_selector.py: minimal grammar (empty / bare key / key=value / comma-AND); unsupported syntax → parse_selector returns None and the caller falls back to the direct API path.
  • WorkloadInformer.list() snapshot helper.
  • K8sClient.list_custom_objects consults the informer cache when synced; otherwise unchanged.
  • Tests cover grammar plus cache-hit / unsynced-fallback / unsupported-selector-fallback paths.

4. perf(server): grow anyio threadpool, unblock create path (225e5ece)

  • ServerConfig.thread_pool_size (default 200).
  • lifespan: current_default_thread_limiter().total_tokens = thread_pool_size.
  • kubernetes_service: wrap the four sync Kubernetes calls inside create_sandbox / _wait_for_sandbox_ready (_ensure_pvc_volumes, workload_provider.create_workload, get_workload, delete_workload) with asyncio.to_thread so the event loop stays responsive while the create path runs.

Behavior changes

  • New [server] keys (workers, limit_concurrency, backlog, thread_pool_size, loop, http). All additive; existing configs keep working.
  • list_custom_objects now serves from the informer cache when synced; same eventual-consistency window as the existing get_custom_object path.
  • No change to API contracts, response shapes, error codes, or HTTP semantics.

Risks

  • thread_pool_size 200 per process; oversize trades fd / apiserver QPS pressure for parallelism — tune with workers × replicaCount.
  • Informer cache lag is bounded by watch latency (ms) and informer_resync_seconds (default 300 s); same as today's get_custom_object.
  • workers > 1 means N × informer watch streams to the apiserver; default stays at 1 to keep apiserver baseline unchanged.

Out of scope (follow-up PRs)

  • HPA Helm template.
  • Observability / Prometheus metrics.
  • pool_service.list_pools cache (uses its own list path).
  • list_pods cache (CoreV1, arbitrary selectors).
  • docker_service.create_sandbox event-loop blocking.

Testing

  • Not run (explain why)
  • Unit tests
  • Integration tests
  • e2e / manual verification

Breaking Changes

  • None
  • Yes (describe impact and migration path)

Checklist

  • Linked Issue or clearly described motivation closes server worker block #887
  • Added/updated docs (if needed)
  • Added/updated tests (if needed)
  • Security impact considered
  • Backward compatibility considered

Pangjiping and others added 4 commits May 17, 2026 15:07
Add ServerConfig fields to make uvicorn process count, concurrency
limits, socket backlog, and event-loop/HTTP parser implementation
configurable. Defaults preserve current behavior (workers=1) while
enabling operators to scale a single pod across multiple Python
processes when apiserver capacity allows.

- pyproject.toml: switch to uvicorn[standard] for uvloop/httptools/watchfiles
- config.py: ServerConfig.workers, limit_concurrency, backlog, loop, http
- cli.py: thread new fields into uvicorn.run; force workers=1 under --reload
- main.py: pass loop/http to dev __main__ entry
- examples + configuration.md: document tunables and apiserver tradeoff

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…pool

Sandbox/snapshot/pool route handlers were async def but called
synchronous service methods that issue blocking Kubernetes/Docker API
requests (50-200 ms each). Each in-flight call stalled the entire
event loop, serializing every concurrent request.

Convert blocking-only handlers to sync def so FastAPI offloads them to
the anyio threadpool, letting concurrent requests run in parallel.
create_sandbox stays async (its service is async with cooperative
polling).

- api/lifecycle.py: 12 handlers async -> sync; drop manual to_thread in
  create_snapshot now that the route itself runs in the threadpool;
  drop unused asyncio import
- api/pool.py: 5 pool handlers async -> sync
- tests/test_routes_list_sandboxes.py: regression locks in threadpool
  parallelism (8 x 200 ms calls finish in ~250 ms, not ~1.6 s)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
list_custom_objects always issued a direct apiserver call, even though
the informer is already watching the same namespace and serves
get_custom_object from cache. Under multi-worker deployments the list
QPS scales with workers x replicas and pressures the apiserver
unnecessarily.

Prefer the informer cache when synced and the label selector falls
within the supported in-memory grammar (empty, bare key existence,
key=value, comma-joined AND). Anything else falls back to the existing
direct API path, preserving today's behavior.

- services/k8s/label_selector.py: minimal parser/matcher for the subset
  of selectors callers in this repo actually emit
- services/k8s/informer.py: WorkloadInformer.list() snapshot helper
- services/k8s/client.py: list_custom_objects consults the cache first
- tests/k8s: cover label_selector grammar + cache-hit/miss/fallback
  behavior on the client

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous round moved blocking list/get/delete handlers onto sync
def routes so FastAPI offloads them to anyio's default threadpool. Two
follow-up bottlenecks remain:

1. anyio's default threadpool is 40 tokens; bursts of concurrent
   sandbox CRUD requests start queueing once that ceiling is hit.
2. lifecycle.create_sandbox is async and the Kubernetes service body
   still issues sync K8s calls (_ensure_pvc_volumes, workload_provider
   create/get/delete) directly on the event loop. Each 50-200 ms
   round-trip stalls every other in-flight request, and the rate
   limiter's time.sleep makes it worse when read/write QPS is set.

Add a configurable thread_pool_size (default 200) applied at lifespan
startup, and wrap the blocking K8s calls inside the create path with
asyncio.to_thread so the event loop stays responsive.

- config.py: ServerConfig.thread_pool_size
- main.py: lifespan sets anyio current_default_thread_limiter total_tokens
- services/k8s/kubernetes_service.py: to_thread wraps the four sync K8s
  calls in create_sandbox / _wait_for_sandbox_ready
- configuration.md, tests/test_config.py: doc and field tests

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f3763f4f16

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/opensandbox_server/cli.py Outdated
Comment thread server/opensandbox_server/services/k8s/client.py
Pangjiping and others added 2 commits May 18, 2026 13:40
Importing opensandbox_server.main in the CLI eagerly constructed
sandbox_service, restoring containers and starting expiration Timer
threads in the supervisor process before uvicorn.run was called. With
[server].workers > 1 that left orphan timers in the supervisor and (on
spawn) duplicated them across workers. Read config and logging directly
in the CLI so only worker processes initialize the service graph.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
list_custom_objects returns the informer cache snapshot once synced, but
create/patch/delete previously left the cache untouched, so a list
immediately after a write could include the old or freshly-deleted
object until the watch event arrived. Add delete_from_cache to the
informer and have the K8sClient write paths upsert or evict cache
entries through a non-creating informer lookup.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 145555c178

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/opensandbox_server/cli.py Outdated
Comment thread server/opensandbox_server/config.py Outdated
Comment thread server/opensandbox_server/cli.py Outdated
Docker expiration timers live in process-local state on
DockerSandboxService, so each uvicorn worker schedules its own
threading.Timer per sandbox. A renewal handled by one worker only
updates that process's _sandbox_expirations, leaving other workers
to fire stale timers at the pre-renewal time and remove the sandbox.

Reject the combination at AppConfig validation until the Docker
runtime grows shared expiration state. Kubernetes is unaffected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Generalwin
Generalwin previously approved these changes May 18, 2026
…ble from TOML

Remove the [server].workers field. Multi-worker mode exposed too many
foot-guns (per-process Docker expiration timers racing on renew, k8s
informer cache divergence, import-time side effects in the supervisor)
and the supported way to scale on Kubernetes is replica count, not
in-process worker fan-out. uvicorn now runs single-process; the
deferred-import comment in cli.py is kept for the reload supervisor.

Fix [server].limit_concurrency so the documented disable path actually
works from TOML. TOML has no null literal, so Optional[int] could not
be set to None: the field now accepts 0 as a sentinel and a
field_validator collapses it to None before uvicorn sees it. Default
1024 is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ae74253a66

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/opensandbox_server/services/k8s/client.py
Comment thread server/opensandbox_server/services/k8s/client.py
Copy link
Copy Markdown
Collaborator

@ninan-nn ninan-nn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Pangjiping Pangjiping merged commit a25dcb3 into alibaba:main May 18, 2026
20 of 21 checks passed
@Pangjiping Pangjiping deleted the perf/server-uvicorn-tuning branch May 18, 2026 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working component/server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server worker block

3 participants