refactor(discover): scope-based model tag selection (unscoped vs scoped) by coketaste · Pull Request #109 · ROCm/madengine

coketaste · 2026-04-22T22:56:19Z

Summary

Refines how --tags select models in discovery so that scoping is driven by the tag shape (unscoped vs Scope/...), instead of a strict flag or short-name suffix matching across scopes.

Behavior

Unscoped tags (e.g. inference, pyt_foo): match models via the tag list, or by exact name among root-level (unscoped) models only. They do not follow into a scoped tree via name.split("/")[-1].
Scoped tags (e.g. MAD/inference, MAD/pyt_foo): restrict candidates to the given scope, then match by tag list or full name as before.

This restores loose madengine run --tags <tag> selection (tag-list / root-level naming) while stopping accidental selection of MAD/... or MAD-private/... models from a bare unscoped tag.

Implementation notes

Removes the strict parameter and related plumbing from BuildOrchestrator / RunOrchestrator (no strict_discovery).
Updates unit tests for unscoped tag selection and scope boundaries.

Test plan

madengine discover / run with unscoped and scoped tags cover expected models only.
Unit: tests/unit/test_discover_models.py

… selection Introduce a `strict` parameter on `DiscoverModels` (default `False`). When `strict=True`, tag resolution only matches on the full model name, preventing a bare tag like `pyt_huggingface_gpt2` from inadvertently selecting namespaced models (`MAD/pyt_huggingface_gpt2`, `MAD-private/pyt_huggingface_gpt2`). `BuildOrchestrator` (used by `madengine run`) now passes `strict=True`, so `run` does exact-name matching while `discover` retains the existing loose behaviour. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a strict mode to model discovery to enforce exact full-name matching (intended for madengine run) while keeping the existing loose/tag-based matching for discover.

Changes:

Added strict: bool = False parameter to DiscoverModels and used it to control selection matching logic.
Updated model selection to use exact-name matching when strict=True (unscoped tags path).
Updated BuildOrchestrator to instantiate DiscoverModels(..., strict=True).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
src/madengine/utils/discover_models.py	Introduces `strict` flag and updates model selection matching behavior.
src/madengine/orchestration/build_orchestrator.py	Enables strict discovery when building via `BuildOrchestrator`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Remove the strict flag and short-name backward-compat matching from DiscoverModels. Scoping is now determined solely by the presence of a scope/ prefix in the tag: - Unscoped tags (e.g. inference, pyt_foo) match by tag-list field or exact root-level name; they never cross into a scoped namespace via short-name split. - Scoped tags (e.g. MAD/inference, MAD/pyt_foo) restrict candidates to the named scope, then match by tag-list or exact full name as before. This restores madengine run --tags <tag> loose selection while preventing unintended cross-scope name leakage. Remove strict_discovery from BuildOrchestrator.execute() and RunOrchestrator._build_phase() as they are no longer needed. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Copilot

Pull request overview

This PR aims to introduce strict exact-name selection for model discovery (primarily for madengine run) to avoid unscoped tags accidentally selecting scoped/namespaced models, while keeping discover behavior loose by default.

Changes:

Updated DiscoverModels.select_models() to stop matching models by short-name suffix (name.split('/')[-1]).
Refactored/rewrote unit tests to reflect the new unscoped tag-selection semantics.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`src/madengine/utils/discover_models.py`	Removes short-name fallback matching for both standard and custom models during tag selection.
`tests/unit/test_discover_models.py`	Updates test cases and expectations around unscoped tag selection and scope boundaries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Refactors DiscoverModels tag selection so unscoped tags no longer “short-name” match scoped models via name.split('/')[-1], aligning discovery behavior with tag shape (unscoped vs scope/...) and updating unit tests accordingly.

Changes:

Removed short-name suffix matching (split('/')[-1]) for both models.json entries and CustomModels during selection.
Updated unit tests to validate unscoped name matching stays root-only while tag-field matching remains scope-agnostic.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`src/madengine/utils/discover_models.py`	Removes suffix-based model-name matching in `select_models()` to prevent unscoped tags from crossing scope boundaries by short name.
`tests/unit/test_discover_models.py`	Replaces short-name backward-compat tests with coverage for unscoped vs scoped selection behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… in unscoped branch When extra args are appended with colon syntax (e.g. inference:batch-size=32), the raw tag string was passed to _model_entry_has_tag and the 'all' check instead of the already-extracted model_name (the portion before ':'). This caused tag-list matching to silently fail for any --tags value that included extra args, because it searched for 'inference:batch-size=32' in the model's tags list instead of 'inference'. Fix both the models and custom_models loops in the unscoped branch to use model_name consistently. Add a regression test for this case. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Clarify that unscoped tag name-based matching is root-only, but tag-field matching is scope-agnostic and can select models in any scope. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

- Add get_gpu_arch() utility to tests/fixtures/utils.py for runtime GPU architecture detection (AMD/NVIDIA), used by test fixtures - Add dynamic_skip_gpu_arch_model_dir fixture that injects the current GPU arch into the skip list, so the test works on any machine (not just gfx942 nodes) - Fix gen_sys_env_details context override in container_runner.py: the old 'or' condition made the context key a no-op since the parameter defaults to True; now context can explicitly disable it - Disable sys env collection in test_docker_gpus via additional_context so the pre-script doesn't OOM-kill on MI350X (gfx950) with 6 GPUs Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Copilot

Pull request overview

Refactors model discovery tag selection so scoping is inferred from tag shape (scope/tag vs unscoped), eliminating cross-scope short-name leakage and fixing tag matching when :extra-args are present.

Changes:

Updated DiscoverModels.select_models() to remove short-name (split('/')[-1]) matching and to match tags using the pre-colon token (model_name) so inference:... works.
Added/updated unit tests to validate unscoped vs scoped selection semantics and scope boundaries.
Made sys-env collection in ContainerRunner explicitly overridable via additional_context.gen_sys_env_details, and adjusted e2e tests/fixtures accordingly.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/madengine/utils/discover_models.py`	Implements scope-shaped tag selection and fixes tag matching when tags include `:extra-args`.
`tests/unit/test_discover_models.py`	Updates coverage for unscoped tags (root-only name match, scope-agnostic tag match) and scoped tag behavior.
`src/madengine/execution/container_runner.py`	Allows additional_context to explicitly disable sys-env detail collection.
`tests/e2e/test_run_workflows.py`	Disables sys-env detail collection for a GPU-binding workflow via additional_context.
`tests/fixtures/utils.py`	Introduces `get_gpu_arch()` helper with caching for e2e fixture manipulation.
`tests/e2e/test_build_workflows.py`	Adds fixture that mutates `models.json` to include the detected GPU arch in `skip_gpu_arch` to reduce hardware coupling.
`CHANGELOG.md`	Documents the discovery behavior changes and related fixes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

- get_gpu_arch() now returns a short model token (e.g. A100, H100) on NVIDIA instead of the full product string (NVIDIA A100-SXM4-40GB), matching the format expected by skip_gpu_arch model definitions - AMD path now resolves ROCm via get_rocm_path() (consistent with has_gpu()) - filter_images_by_skip_gpu_arch() applies the same two-step normalization (strip NVIDIA prefix, drop SKU suffix) so runtime arch matches skip list - Added @requires_gpu to test_commandline_argument_skip_gpu_arch and test_commandline_argument_disable_skip_gpu_arch_fail to skip on CPU-only CI Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

…o coketaste/v2-discover

Copilot

Pull request overview

Refines model discovery tag selection so scoping is determined by tag shape (unscoped vs scope/...), reducing accidental cross-scope model selection while keeping tag-based selection flexible. Also updates GPU-arch skip behavior and E2E stability by improving GPU architecture detection and making system-env collection controllable via runtime context.

Changes:

Adjusted DiscoverModels.select_models() matching to prevent unscoped name tags from crossing scope boundaries and to correctly handle :<extra-args> tags.
Added GPU architecture detection helper for tests and made E2E skip-gpu-arch tests hardware-agnostic.
Made system environment collection optionally disable-able via context and updated E2E workflow to avoid OOM in system-env pre-scripts.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`src/madengine/utils/discover_models.py`	Updates tag parsing/matching rules for unscoped vs scoped tags and fixes `:<extra-args>` handling.
`tests/unit/test_discover_models.py`	Replaces/updates unit tests to validate new unscoped/scoped tag-selection semantics.
`src/madengine/orchestration/image_filtering.py`	Aligns NVIDIA GPU-arch normalization for `skip_gpu_arch` filtering.
`tests/fixtures/utils.py`	Adds cached `get_gpu_arch()` helper used by E2E tests for dynamic skip list injection.
`tests/e2e/test_build_workflows.py`	Introduces dynamic fixture for `skip_gpu_arch` tests and guards them with `requires_gpu`.
`src/madengine/execution/container_runner.py`	Fixes precedence so context can disable system environment collection scripts.
`tests/e2e/test_run_workflows.py`	Disables sys-env collection for a GPU-binding test to avoid OOM in pre-scripts.
`CHANGELOG.md`	Documents the discovery behavior changes and test/runtime adjustments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coketaste · 2026-04-24T01:26:42Z

        skip_list = [arch.strip() for arch in skip_gpu_arch_str.split(",")]
        sys_gpu_arch = runtime_gpu_arch
        if sys_gpu_arch and "NVIDIA" in sys_gpu_arch:
-            sys_gpu_arch = sys_gpu_arch.split()[1]
+            # Normalize "NVIDIA A100-SXM4-40GB" -> "A100", "NVIDIA H100 PCIe" -> "H100"
+            # (mirrors get_gpu_arch() in tests/fixtures/utils.py)
+            after_prefix = sys_gpu_arch[sys_gpu_arch.index("NVIDIA") + len("NVIDIA ") :]
+            sys_gpu_arch = after_prefix.split("-")[0].split()[0]

        if sys_gpu_arch in skip_list:


the Copilot comment as a false positive.

coketaste self-assigned this Apr 22, 2026

coketaste requested review from Cemberk, Rohan138, gargrahul and leconcio as code owners April 22, 2026 22:56

Copilot AI review requested due to automatic review settings April 22, 2026 22:56

Copilot started reviewing on behalf of coketaste April 22, 2026 22:58 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread src/madengine/utils/discover_models.py Outdated

Comment thread src/madengine/orchestration/build_orchestrator.py Outdated

Comment thread src/madengine/utils/discover_models.py Outdated

coketaste and others added 2 commits April 23, 2026 10:17

Merge branch 'develop' into coketaste/v2-discover

ce6890e

Copilot AI review requested due to automatic review settings April 23, 2026 15:26

Copilot started reviewing on behalf of coketaste April 23, 2026 15:29 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread src/madengine/utils/discover_models.py

Comment thread src/madengine/utils/discover_models.py

Comment thread tests/unit/test_discover_models.py Outdated

coketaste changed the title ~~feat(discover): add strict mode for exact-match run selection~~ refactor(discover): scope-based model tag selection (unscoped vs scoped) Apr 23, 2026

coketaste requested a review from Copilot April 23, 2026 15:55

Copilot started reviewing on behalf of coketaste April 23, 2026 15:56 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread src/madengine/utils/discover_models.py

Comment thread src/madengine/utils/discover_models.py Outdated

Comment thread tests/unit/test_discover_models.py Outdated

coketaste and others added 4 commits April 23, 2026 14:04

docs(test): fix misleading docstring in TestUnscopedTagSelection

9ce5378

Clarify that unscoped tag name-based matching is root-only, but tag-field matching is scope-agnostic and can select models in any scope. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

docs(changelog): update [2.0.1] with discover tag selection changes

b5fdf70

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 23, 2026 22:26

Copilot started reviewing on behalf of coketaste April 23, 2026 22:28 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread tests/fixtures/utils.py Outdated

Comment thread tests/fixtures/utils.py Outdated

Comment thread tests/e2e/test_build_workflows.py

Comment thread tests/e2e/test_build_workflows.py

coketaste and others added 3 commits April 23, 2026 22:35

docs(changelog): add test fix entries to [2.0.1]

1dc6c59

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Merge branch 'coketaste/v2-discover' of github.com:ROCm/madengine int…

82037bf

…o coketaste/v2-discover

Copilot AI review requested due to automatic review settings April 23, 2026 22:54

Copilot started reviewing on behalf of coketaste April 23, 2026 22:55 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

coketaste merged commit 572a6a2 into develop Apr 24, 2026
4 checks passed

Conversation

coketaste commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Behavior

Implementation notes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coketaste Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coketaste commented Apr 22, 2026 •

edited

Loading