docs: update architecture-and-performance.md to reflect AIMD changes by nabinchha · Pull Request #467 · NVIDIA-NeMo/DataDesigner

nabinchha · 2026-03-25T19:36:26Z

📋 Summary

Update the Architecture & Performance doc to account for AIMD adaptive concurrency control and the ColumnWiseDatasetBuilder → DatasetBuilder rename.

Closes #466

🔄 Changes

🔧 Changed

"What Data Designer Does" — added bullet for automatic rate-limit adaptation
"What Data Designer Does NOT Do" — reworded "Rate limit" to "Impose rate limits" with clarification
Concurrency Formula — replaced static max_parallel_requests with AIMD-managed current_throttle_limit; explained decrease/increase behavior
max_parallel_requests section — reframed as a ceiling; updated guidance; added AIMD-aware tuning tip
Common Problems table — updated "Low throughput" row; added "Frequent 429 → recovery cycles" row
Tuning Workflow — added AIMD context and new step for checking throttle logs
Execution Model admonition — updated from "column-wise dataset generator" to DatasetBuilder (renamed in chore: rename ColumnWiseDatasetBuilder to DatasetBuilder #437)

✨ Added

New "Adaptive Throttling (RunConfig)" section documenting ThrottleConfig parameters
Sync engine caveats noting AIMD is fully active on the async engine path only

🤖 Generated with AI

Made with Cursor

…ncy control (#466) Account for the AIMD throttle manager throughout the architecture doc: concurrency formula, max_parallel_requests guidance, new ThrottleConfig section, common problems table, and tuning workflow. Add sync-engine caveats noting AIMD is fully active on the async path. Made-with: Cursor

greptile-apps · 2026-03-25T19:42:08Z

Greptile Summary

This PR updates docs/concepts/architecture-and-performance.md to document the new AIMD adaptive concurrency controller and the ColumnWiseDatasetBuilder → DatasetBuilder rename introduced in #437. The additions are technically accurate and well-structured.

Key changes reviewed:

AIMD concurrency control — new ThrottleConfig section with all five parameters (reduce_factor, additive_increase, success_window, cooldown_seconds, ceiling_overshoot). Verified against packages/data-designer-config/src/data_designer/config/run_config.py; all default values match exactly.
Sync vs async engine caveat — verified against http_model_client.py (strip_rate_limit_codes=False on sync path keeps 429 in the transport retry list; True on async path lets 429 propagate to the AIMD controller). The docs' claim that "AIMD is inactive on the sync path" is correct.
Concurrency Formula — replaced the static max_parallel_requests variable with current_throttle_limit (representing DomainThrottleState.current_limit), accurately reflecting runtime AIMD management.
DatasetBuilder rename — execution model admonition updated consistently.
One minor log-pattern suggestion: the tip section recommends watching for concurrency reduced from X → Y, but the ThrottleManager also emits concurrency reduced to X and concurrency recovered to N variants that users should be aware of.

Confidence Score: 5/5

Documentation-only PR; all technical claims verified against the implementation. Safe to merge.
All ThrottleConfig defaults and AIMD behavior descriptions are accurate based on code review of throttle_manager.py, run_config.py, http_model_client.py, and retry.py. The sync/async engine caveat is correctly justified by the strip_rate_limit_codes flag. The single P2 comment is a minor wording improvement that doesn't block merge.
No files require special attention.

Important Files Changed

Filename	Overview
docs/concepts/architecture-and-performance.md	Documentation updated to reflect AIMD concurrency control and DatasetBuilder rename. All ThrottleConfig defaults match the actual implementation; sync/async engine caveat is correct based on the strip_rate_limit_codes flag difference between the two paths. One minor log-pattern inaccuracy in the tip section.

Sequence Diagram

sequenceDiagram
    participant DD as DataDesigner
    participant TM as ThrottleManager (AIMD)
    participant HTTP as HTTP Transport
    participant LLM as Inference Server

    Note over DD,LLM: Async engine path — AIMD active

    DD->>TM: acquire_async() [slot request]
    TM-->>DD: slot granted (in_flight < current_limit)
    DD->>HTTP: POST /chat (via _apost)
    HTTP->>LLM: HTTP request
    LLM-->>HTTP: HTTP 429 (rate limited)
    Note over HTTP: strip_rate_limit_codes=True → 429 not retried by transport
    HTTP-->>DD: RateLimitError raised
    DD->>TM: release_rate_limited(retry_after)
    Note over TM: current_limit *= reduce_factor (0.75)<br/>blocked_until = now + cooldown_seconds
    TM-->>DD: limit reduced + cooldown applied

    loop After success_window consecutive successes
        DD->>TM: release_success()
        TM-->>DD: current_limit += additive_increase (up to ceiling)
    end

    Note over DD,LLM: Sync engine path — AIMD inactive

    DD->>TM: acquire_sync() [slot request]
    TM-->>DD: slot granted (fixed at max_parallel_requests)
    DD->>HTTP: POST /chat (via _post_sync)
    HTTP->>LLM: HTTP request
    LLM-->>HTTP: HTTP 429
    Note over HTTP: strip_rate_limit_codes=False → transport retries 429 internally
    HTTP-->>DD: success response (429 hidden)
    DD->>TM: release_success()
    Note over TM: limit unchanged — AIMD never triggered

Prompt To Fix All With AI

This is a comment left during a code review.
Path: docs/concepts/architecture-and-performance.md
Line: 236

Comment:
**Log pattern misses "reduced to" variant and recovery messages**

The suggested log pattern `concurrency reduced from X → Y` will not match the second reduction log emitted in `throttle_manager.py` when the rate-limit ceiling is first recorded:

```
🪫📉 '<model>' [<domain>] server rate-limited at <N> (server limit ~<M>) — concurrency reduced to <K> (retrying in Zs)
```

That message uses "reduced **to**", not "reduced **from**". Similarly, the recovery path emits `concurrency recovered to N` and `concurrency fully recovered (N)` — both of which look like increases but won't match `concurrency increased from X → Y`.

Broadening the recommended search patterns slightly will help users catch all throttle activity:

```suggestion
    You can observe this in the logs — look for messages containing `concurrency reduced` (decrease events) and `concurrency increased` or `concurrency recovered` (ramp-up events).
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "updates" | Re-trigger Greptile}

docs/concepts/architecture-and-performance.md

nabinchha added 2 commits March 25, 2026 13:27

updates

c165c4f

nabinchha requested a review from a team as a code owner March 25, 2026 19:36

greptile-apps bot reviewed Mar 25, 2026

View reviewed changes

docs/concepts/architecture-and-performance.md Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update architecture-and-performance.md to reflect AIMD changes#467

docs: update architecture-and-performance.md to reflect AIMD changes#467
nabinchha wants to merge 2 commits intomainfrom
nmulepati/docs/466-architecture-and-performance-aimd

nabinchha commented Mar 25, 2026

Uh oh!

greptile-apps bot commented Mar 25, 2026

Confidence Score: 5/5

Sequence Diagram

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nabinchha commented Mar 25, 2026

📋 Summary

🔄 Changes

🔧 Changed

✨ Added

Uh oh!

greptile-apps bot commented Mar 25, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant