docs: update architecture-and-performance.md to reflect AIMD changes#467
Open
docs: update architecture-and-performance.md to reflect AIMD changes#467
Conversation
…ncy control (#466) Account for the AIMD throttle manager throughout the architecture doc: concurrency formula, max_parallel_requests guidance, new ThrottleConfig section, common problems table, and tuning workflow. Add sync-engine caveats noting AIMD is fully active on the async path. Made-with: Cursor
Contributor
Greptile SummaryThis PR updates Key changes reviewed:
|
| Filename | Overview |
|---|---|
| docs/concepts/architecture-and-performance.md | Documentation updated to reflect AIMD concurrency control and DatasetBuilder rename. All ThrottleConfig defaults match the actual implementation; sync/async engine caveat is correct based on the strip_rate_limit_codes flag difference between the two paths. One minor log-pattern inaccuracy in the tip section. |
Sequence Diagram
sequenceDiagram
participant DD as DataDesigner
participant TM as ThrottleManager (AIMD)
participant HTTP as HTTP Transport
participant LLM as Inference Server
Note over DD,LLM: Async engine path — AIMD active
DD->>TM: acquire_async() [slot request]
TM-->>DD: slot granted (in_flight < current_limit)
DD->>HTTP: POST /chat (via _apost)
HTTP->>LLM: HTTP request
LLM-->>HTTP: HTTP 429 (rate limited)
Note over HTTP: strip_rate_limit_codes=True → 429 not retried by transport
HTTP-->>DD: RateLimitError raised
DD->>TM: release_rate_limited(retry_after)
Note over TM: current_limit *= reduce_factor (0.75)<br/>blocked_until = now + cooldown_seconds
TM-->>DD: limit reduced + cooldown applied
loop After success_window consecutive successes
DD->>TM: release_success()
TM-->>DD: current_limit += additive_increase (up to ceiling)
end
Note over DD,LLM: Sync engine path — AIMD inactive
DD->>TM: acquire_sync() [slot request]
TM-->>DD: slot granted (fixed at max_parallel_requests)
DD->>HTTP: POST /chat (via _post_sync)
HTTP->>LLM: HTTP request
LLM-->>HTTP: HTTP 429
Note over HTTP: strip_rate_limit_codes=False → transport retries 429 internally
HTTP-->>DD: success response (429 hidden)
DD->>TM: release_success()
Note over TM: limit unchanged — AIMD never triggered
Prompt To Fix All With AI
This is a comment left during a code review.
Path: docs/concepts/architecture-and-performance.md
Line: 236
Comment:
**Log pattern misses "reduced to" variant and recovery messages**
The suggested log pattern `concurrency reduced from X → Y` will not match the second reduction log emitted in `throttle_manager.py` when the rate-limit ceiling is first recorded:
```
🪫📉 '<model>' [<domain>] server rate-limited at <N> (server limit ~<M>) — concurrency reduced to <K> (retrying in Zs)
```
That message uses "reduced **to**", not "reduced **from**". Similarly, the recovery path emits `concurrency recovered to N` and `concurrency fully recovered (N)` — both of which look like increases but won't match `concurrency increased from X → Y`.
Broadening the recommended search patterns slightly will help users catch all throttle activity:
```suggestion
You can observe this in the logs — look for messages containing `concurrency reduced` (decrease events) and `concurrency increased` or `concurrency recovered` (ramp-up events).
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "updates" | Re-trigger Greptile
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📋 Summary
Update the Architecture & Performance doc to account for AIMD adaptive concurrency control and the
ColumnWiseDatasetBuilder→DatasetBuilderrename.Closes #466
🔄 Changes
🔧 Changed
max_parallel_requestswith AIMD-managedcurrent_throttle_limit; explained decrease/increase behaviormax_parallel_requestssection — reframed as a ceiling; updated guidance; added AIMD-aware tuning tipDatasetBuilder(renamed in chore: rename ColumnWiseDatasetBuilder to DatasetBuilder #437)✨ Added
ThrottleConfigparameters🤖 Generated with AI
Made with Cursor