Skip to content

[benchmarking] Adds support for fine-grained config overrides and global ray config#1840

Merged
praateekmahajan merged 5 commits into
mainfrom
26.06-improve_benchmark_config_overrides
Apr 23, 2026
Merged

[benchmarking] Adds support for fine-grained config overrides and global ray config#1840
praateekmahajan merged 5 commits into
mainfrom
26.06-improve_benchmark_config_overrides

Conversation

@rlratzel
Copy link
Copy Markdown
Contributor

@rlratzel rlratzel commented Apr 21, 2026

  • Adds support for fine-grained config overrides, which are needed for use cases such as per-machine config files that only override specific values unique to the machine.
  • Also adds a global ray configuration which is inherited by each entry. Entries can override the global ray config by providing their own ray section.

These changes make it practical to write small override files that change only specific entries or requirements without duplicating the full configuration. For example:

Base config (nightly-benchmark.yaml) defines many entries including:

entries:
  - name: domain_classification_xenna
    timeout_s: 1400
    requirements:
      - metric: throughput_docs_per_sec
        min_value: 3000

Override file (my_overrides.yaml) changes only that entry's timeout and requirement minimum:

entries:
  - name: domain_classification_xenna
    timeout_s: 2000
    requirements:
      - metric: throughput_docs_per_sec
        min_value: 2000

Running with both files:

python benchmarking/run.py \
  --config nightly-benchmark.yaml \
  --config my_overrides.yaml

Results in domain_classification_xenna using timeout_s: 2000 and min_value: 2000, while all other entries remain unchanged.

The previous implementation performed a shallow copy, which would mean the entire entries list would be overridden by the content above and every entry specified in previously-read YAMLs (eg. nightly-benchmark.yaml) would not be present in the final config.

The global ray section eliminates a significant amount of repeated YAML from nightly-benchmark.yaml

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 21, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rlratzel rlratzel self-assigned this Apr 21, 2026
@rlratzel rlratzel marked this pull request as ready for review April 21, 2026 05:06
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

This PR replaces the shallow dict.update config merge with a deep recursive merge (update_config/merge_config_files), enabling fine-grained per-file overrides of nested values and individual list items. It also adds a top-level ray section to Session that is inherited by all entries, eliminating ~120 lines of repeated YAML from nightly-benchmark.yaml.

Confidence Score: 5/5

Safe to merge; all prior P1 concerns have been addressed and the only remaining finding is a P2 documentation gap.

Prior thread issues (None-document crash, key-order match robustness, non-dict append semantics) are either fixed or previously flagged. The one new finding is a P2 usability note about scalar-list append-only semantics — it doesn't cause incorrect benchmark execution or data loss, so it does not reduce the score below 5.

No files require special attention; the scalar-list documentation gap in benchmarking/run.py and benchmarking/README.md is optional to address before merge.

Important Files Changed

Filename Overview
benchmarking/run.py Introduces update_config (deep recursive merge) and merge_config_files; replaces the former shallow dict.update loop. The None-document guard and for…else append-on-no-match logic are correct. The scalar-list append-only semantics are undocumented and could surprise users trying to override notification lists.
benchmarking/runner/session.py Adds a global ray: dict field with a correct {**self.ray, **entry.ray} merge in post_init, giving per-entry values precedence over global defaults. Logic is straightforward and correctly ordered after Entry objects are instantiated.
benchmarking/nightly-benchmark.yaml Removes repeated per-entry ray blocks and adds a single global ray section (num_cpus: 64, num_gpus: 4, enable_object_spilling: false). Spot-checked: entries with partial overrides (num_cpus: 16 / 8, num_gpus: 0) still produce the same final merged values as before.
benchmarking/README.md Documents the new deep-merge semantics, global ray section, and per-entry override examples. Accurate for dict and list-of-dicts cases; scalar-list append behaviour is not yet described.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["merge_config_files(config_files)"] --> B["for each YAML file"]
    B --> C["yaml.full_load_all(f)"]
    C --> D{document is None?}
    D -- yes --> B
    D -- no --> E["update_config(config_dict, new_dict)"]
    E --> F{key in config_dict?}
    F -- no --> G["config_dict[key] = value"]
    F -- yes --> H{both dicts?}
    H -- yes --> I["recurse: update_config(config_dict[key], value)"]
    H -- no --> J{both lists?}
    J -- yes --> K["for each item in override list"]
    K --> L{item is non-empty dict?}
    L -- yes --> M["find base item with same first-key name & value"]
    M -- found --> N["recurse: update_config(base_item, item)"]
    M -- not found --> O["append item to base list"]
    L -- no --> P["append scalar/empty to base list"]
    J -- no --> Q["config_dict[key] = value (replace)"]
    E --> R["return config_dict"]
    R --> S["Session.from_dict(config_dict)"]
    S --> T["Entry.from_dict per entry"]
    T --> U["Session.__post_init__"]
    U --> V["entry.ray = {**global_ray, **entry.ray}"]
Loading

Reviews (7): Last reviewed commit: "handles empty YAML files." | Re-trigger Greptile

Comment thread benchmarking/run.py
@@ -12,6 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 File-level ERA001 suppression too broad

# ruff: noqa: ERA001 disables "found commented-out code" for the entire file. The trigger appears to be the YAML example block in lines 76–87 of update_config. Suppressing the rule file-wide will silently hide any genuinely commented-out code added anywhere in run.py in the future. Prefer a targeted inline suppression on just those lines, or wrap the example in a docstring.

Or, better, move the inline YAML example into the function's docstring and remove this directive entirely.

Comment thread benchmarking/run.py
Comment thread benchmarking/run.py
Comment on lines +92 to +98
first_key = next(iter(sub_val.keys()))
for config_sub_val in config_dict[key]:
if (
isinstance(config_sub_val, dict)
and config_sub_val
and next(iter(config_sub_val.keys())) == first_key
and config_sub_val[first_key] == sub_val[first_key]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Key-order dependency can silently drop overrides

List items are matched by comparing the first key name of the override dict against the first key name of each base dict (next(iter(config_sub_val.keys())) == first_key). If a base entry has a different first key than the override — e.g., enabled: comes before name: in the base file but name: is first in the override — the condition is always False, so the override item is silently appended rather than merged.

This creates a duplicate entry that is only caught later in Session.__post_init__ as a confusing ValueError with no hint about the root cause.

A more robust match uses config_sub_val.get(first_key) == sub_val[first_key] (removing the next(iter(...)) == first_key check), so matching works regardless of key ordering in the base dict.

Comment thread benchmarking/run.py Outdated
rlratzel and others added 5 commits April 23, 2026 15:41
…ched

   by first key), and merge_config_files() to apply it across all config
   files. Replaces the previous shallow dict.update() in main() which would
   clobber entire top-level keys (e.g. all entries) when overriding config.

Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
Adds a top-level `ray` section to the YAML config that defines global
ray defaults (num_cpus, num_gpus, enable_object_spilling) inherited by
all entries. Per-entry `ray` sections now only need to specify keys that
differ from the global defaults.

Session.__post_init__ applies the merge: `{**self.ray, **entry.ray}`,
so per-entry values always take precedence. Updates nightly-benchmark.yaml
to use a global `ray` section and removes redundant per-entry ray blocks
(18 entries now specify only their overrides; 18 had no overrides and had
their ray sections removed entirely). Updates README to document the behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r1.2.0 Pick this label for auto cherry-picking into r1.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants