Skip to content

Add bounded generation batch scheduling#2

Merged
fcogidi merged 5 commits intomainfrom
feat/bounded_runner
Apr 17, 2026
Merged

Add bounded generation batch scheduling#2
fcogidi merged 5 commits intomainfrom
feat/bounded_runner

Conversation

@fcogidi
Copy link
Copy Markdown
Collaborator

@fcogidi fcogidi commented Apr 17, 2026

Summary

  • Add a bounded in-flight scheduler for generate_batch and agenerate_batch when max_parallel_requests is set.
  • Keep the uncapped path unchanged for backward-compatible broad fan-out behavior.
  • Preserve ordered batch results, callback semantics, cancellation behavior, and queue-wait timing.
  • Refactor generation logic into a private helper module so LMClient keeps the public API surface explicit.
  • Update docs to explain that explicit max_parallel_requests is required for large or memory-sensitive Python batch runs.

Testing

  • uv run pytest tests/test_client_batch.py tests/test_client_limits.py tests/test_cli_bench.py -q
  • uv run pre-commit run -a

@fcogidi fcogidi requested a review from Copilot April 17, 2026 18:41
@fcogidi fcogidi marked this pull request as ready for review April 17, 2026 18:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds bounded in-flight scheduling for generation batches when LMClient(max_parallel_requests=...) is set, to avoid creating one task per batch item up front while preserving ordering, callbacks, and cancellation semantics.

Changes:

  • Refactors async generation logic into src/infermesh/_generation.py and routes LMClient.agenerate* through it.
  • Implements bounded-window scheduling for agenerate_batch (and therefore generate_batch, via the sync runner) and plumbs queue-admission timing into request metrics.
  • Updates docs and adds tests covering bounded ordering, strict-failure cancellation, and queue-wait including scheduler delay.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/infermesh/client.py Routes agenerate/agenerate_batch through new private helpers; updates docstrings to document bounded batching behavior.
src/infermesh/_generation.py New helper module implementing bounded-window scheduling and shared generation request logic.
src/infermesh/_client_runtime.py Adds optional queue_started_at to include scheduler delay in queue-wait metrics.
tests/test_client_batch.py Adds bounded-window tests for ordering, concurrency cap, and strict failure behavior.
tests/test_client_limits.py Adds test asserting queue-wait includes bounded-scheduler delay.
docs/guide.md Documents using max_parallel_requests for large batches; updates examples.
README.md Documents bounded batch behavior; updates examples to set max_parallel_requests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/infermesh/_generation.py
@fcogidi fcogidi merged commit 2354908 into main Apr 17, 2026
8 checks passed
@fcogidi fcogidi deleted the feat/bounded_runner branch April 17, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants