Skip to content

refactor(discover): scope-based model tag selection (unscoped vs scoped)#109

Merged
coketaste merged 10 commits intodevelopfrom
coketaste/v2-discover
Apr 24, 2026
Merged

refactor(discover): scope-based model tag selection (unscoped vs scoped)#109
coketaste merged 10 commits intodevelopfrom
coketaste/v2-discover

Conversation

@coketaste
Copy link
Copy Markdown
Collaborator

@coketaste coketaste commented Apr 22, 2026

Summary

Refines how --tags select models in discovery so that scoping is driven by the tag shape (unscoped vs Scope/...), instead of a strict flag or short-name suffix matching across scopes.

Behavior

  • Unscoped tags (e.g. inference, pyt_foo): match models via the tag list, or by exact name among root-level (unscoped) models only. They do not follow into a scoped tree via name.split("/")[-1].
  • Scoped tags (e.g. MAD/inference, MAD/pyt_foo): restrict candidates to the given scope, then match by tag list or full name as before.

This restores loose madengine run --tags <tag> selection (tag-list / root-level naming) while stopping accidental selection of MAD/... or MAD-private/... models from a bare unscoped tag.

Implementation notes

  • Removes the strict parameter and related plumbing from BuildOrchestrator / RunOrchestrator (no strict_discovery).
  • Updates unit tests for unscoped tag selection and scope boundaries.

Test plan

  • madengine discover / run with unscoped and scoped tags cover expected models only.
  • Unit: tests/unit/test_discover_models.py

… selection

Introduce a `strict` parameter on `DiscoverModels` (default `False`).
When `strict=True`, tag resolution only matches on the full model name,
preventing a bare tag like `pyt_huggingface_gpt2` from inadvertently
selecting namespaced models (`MAD/pyt_huggingface_gpt2`,
`MAD-private/pyt_huggingface_gpt2`).

`BuildOrchestrator` (used by `madengine run`) now passes `strict=True`,
so `run` does exact-name matching while `discover` retains the existing
loose behaviour.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
@coketaste coketaste self-assigned this Apr 22, 2026
Copilot AI review requested due to automatic review settings April 22, 2026 22:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a strict mode to model discovery to enforce exact full-name matching (intended for madengine run) while keeping the existing loose/tag-based matching for discover.

Changes:

  • Added strict: bool = False parameter to DiscoverModels and used it to control selection matching logic.
  • Updated model selection to use exact-name matching when strict=True (unscoped tags path).
  • Updated BuildOrchestrator to instantiate DiscoverModels(..., strict=True).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/madengine/utils/discover_models.py Introduces strict flag and updates model selection matching behavior.
src/madengine/orchestration/build_orchestrator.py Enables strict discovery when building via BuildOrchestrator.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/madengine/utils/discover_models.py Outdated
Comment thread src/madengine/orchestration/build_orchestrator.py Outdated
Comment thread src/madengine/utils/discover_models.py Outdated
coketaste and others added 2 commits April 23, 2026 10:17
Remove the strict flag and short-name backward-compat matching from
DiscoverModels. Scoping is now determined solely by the presence of a
scope/ prefix in the tag:

- Unscoped tags (e.g. inference, pyt_foo) match by tag-list field or
  exact root-level name; they never cross into a scoped namespace via
  short-name split.
- Scoped tags (e.g. MAD/inference, MAD/pyt_foo) restrict candidates to
  the named scope, then match by tag-list or exact full name as before.

This restores madengine run --tags <tag> loose selection while
preventing unintended cross-scope name leakage. Remove strict_discovery
from BuildOrchestrator.execute() and RunOrchestrator._build_phase() as
they are no longer needed.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 23, 2026 15:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to introduce strict exact-name selection for model discovery (primarily for madengine run) to avoid unscoped tags accidentally selecting scoped/namespaced models, while keeping discover behavior loose by default.

Changes:

  • Updated DiscoverModels.select_models() to stop matching models by short-name suffix (name.split('/')[-1]).
  • Refactored/rewrote unit tests to reflect the new unscoped tag-selection semantics.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/madengine/utils/discover_models.py Removes short-name fallback matching for both standard and custom models during tag selection.
tests/unit/test_discover_models.py Updates test cases and expectations around unscoped tag selection and scope boundaries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/madengine/utils/discover_models.py
Comment thread src/madengine/utils/discover_models.py
Comment thread tests/unit/test_discover_models.py Outdated
@coketaste coketaste changed the title feat(discover): add strict mode for exact-match run selection refactor(discover): scope-based model tag selection (unscoped vs scoped) Apr 23, 2026
@coketaste coketaste requested a review from Copilot April 23, 2026 15:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors DiscoverModels tag selection so unscoped tags no longer “short-name” match scoped models via name.split('/')[-1], aligning discovery behavior with tag shape (unscoped vs scope/...) and updating unit tests accordingly.

Changes:

  • Removed short-name suffix matching (split('/')[-1]) for both models.json entries and CustomModels during selection.
  • Updated unit tests to validate unscoped name matching stays root-only while tag-field matching remains scope-agnostic.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/madengine/utils/discover_models.py Removes suffix-based model-name matching in select_models() to prevent unscoped tags from crossing scope boundaries by short name.
tests/unit/test_discover_models.py Replaces short-name backward-compat tests with coverage for unscoped vs scoped selection behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/madengine/utils/discover_models.py
Comment thread src/madengine/utils/discover_models.py Outdated
Comment thread tests/unit/test_discover_models.py Outdated
coketaste and others added 4 commits April 23, 2026 14:04
… in unscoped branch

When extra args are appended with colon syntax (e.g. inference:batch-size=32),
the raw tag string was passed to _model_entry_has_tag and the 'all' check
instead of the already-extracted model_name (the portion before ':').

This caused tag-list matching to silently fail for any --tags value that
included extra args, because it searched for 'inference:batch-size=32'
in the model's tags list instead of 'inference'.

Fix both the models and custom_models loops in the unscoped branch to use
model_name consistently. Add a regression test for this case.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Clarify that unscoped tag name-based matching is root-only, but
tag-field matching is scope-agnostic and can select models in any scope.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Add get_gpu_arch() utility to tests/fixtures/utils.py for runtime GPU
  architecture detection (AMD/NVIDIA), used by test fixtures
- Add dynamic_skip_gpu_arch_model_dir fixture that injects the current
  GPU arch into the skip list, so the test works on any machine (not
  just gfx942 nodes)
- Fix gen_sys_env_details context override in container_runner.py: the
  old 'or' condition made the context key a no-op since the parameter
  defaults to True; now context can explicitly disable it
- Disable sys env collection in test_docker_gpus via additional_context
  so the pre-script doesn't OOM-kill on MI350X (gfx950) with 6 GPUs

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 23, 2026 22:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors model discovery tag selection so scoping is inferred from tag shape (scope/tag vs unscoped), eliminating cross-scope short-name leakage and fixing tag matching when :extra-args are present.

Changes:

  • Updated DiscoverModels.select_models() to remove short-name (split('/')[-1]) matching and to match tags using the pre-colon token (model_name) so inference:... works.
  • Added/updated unit tests to validate unscoped vs scoped selection semantics and scope boundaries.
  • Made sys-env collection in ContainerRunner explicitly overridable via additional_context.gen_sys_env_details, and adjusted e2e tests/fixtures accordingly.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/madengine/utils/discover_models.py Implements scope-shaped tag selection and fixes tag matching when tags include :extra-args.
tests/unit/test_discover_models.py Updates coverage for unscoped tags (root-only name match, scope-agnostic tag match) and scoped tag behavior.
src/madengine/execution/container_runner.py Allows additional_context to explicitly disable sys-env detail collection.
tests/e2e/test_run_workflows.py Disables sys-env detail collection for a GPU-binding workflow via additional_context.
tests/fixtures/utils.py Introduces get_gpu_arch() helper with caching for e2e fixture manipulation.
tests/e2e/test_build_workflows.py Adds fixture that mutates models.json to include the detected GPU arch in skip_gpu_arch to reduce hardware coupling.
CHANGELOG.md Documents the discovery behavior changes and related fixes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/fixtures/utils.py Outdated
Comment thread tests/fixtures/utils.py Outdated
Comment thread tests/e2e/test_build_workflows.py
Comment thread tests/e2e/test_build_workflows.py
coketaste and others added 3 commits April 23, 2026 22:35
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- get_gpu_arch() now returns a short model token (e.g. A100, H100) on
  NVIDIA instead of the full product string (NVIDIA A100-SXM4-40GB),
  matching the format expected by skip_gpu_arch model definitions
- AMD path now resolves ROCm via get_rocm_path() (consistent with has_gpu())
- filter_images_by_skip_gpu_arch() applies the same two-step normalization
  (strip NVIDIA prefix, drop SKU suffix) so runtime arch matches skip list
- Added @requires_gpu to test_commandline_argument_skip_gpu_arch and
  test_commandline_argument_disable_skip_gpu_arch_fail to skip on CPU-only CI

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 23, 2026 22:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refines model discovery tag selection so scoping is determined by tag shape (unscoped vs scope/...), reducing accidental cross-scope model selection while keeping tag-based selection flexible. Also updates GPU-arch skip behavior and E2E stability by improving GPU architecture detection and making system-env collection controllable via runtime context.

Changes:

  • Adjusted DiscoverModels.select_models() matching to prevent unscoped name tags from crossing scope boundaries and to correctly handle :<extra-args> tags.
  • Added GPU architecture detection helper for tests and made E2E skip-gpu-arch tests hardware-agnostic.
  • Made system environment collection optionally disable-able via context and updated E2E workflow to avoid OOM in system-env pre-scripts.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/madengine/utils/discover_models.py Updates tag parsing/matching rules for unscoped vs scoped tags and fixes :<extra-args> handling.
tests/unit/test_discover_models.py Replaces/updates unit tests to validate new unscoped/scoped tag-selection semantics.
src/madengine/orchestration/image_filtering.py Aligns NVIDIA GPU-arch normalization for skip_gpu_arch filtering.
tests/fixtures/utils.py Adds cached get_gpu_arch() helper used by E2E tests for dynamic skip list injection.
tests/e2e/test_build_workflows.py Introduces dynamic fixture for skip_gpu_arch tests and guards them with requires_gpu.
src/madengine/execution/container_runner.py Fixes precedence so context can disable system environment collection scripts.
tests/e2e/test_run_workflows.py Disables sys-env collection for a GPU-binding test to avoid OOM in pre-scripts.
CHANGELOG.md Documents the discovery behavior changes and test/runtime adjustments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 94 to 102
skip_list = [arch.strip() for arch in skip_gpu_arch_str.split(",")]
sys_gpu_arch = runtime_gpu_arch
if sys_gpu_arch and "NVIDIA" in sys_gpu_arch:
sys_gpu_arch = sys_gpu_arch.split()[1]
# Normalize "NVIDIA A100-SXM4-40GB" -> "A100", "NVIDIA H100 PCIe" -> "H100"
# (mirrors get_gpu_arch() in tests/fixtures/utils.py)
after_prefix = sys_gpu_arch[sys_gpu_arch.index("NVIDIA") + len("NVIDIA ") :]
sys_gpu_arch = after_prefix.split("-")[0].split()[0]

if sys_gpu_arch in skip_list:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Copilot comment as a false positive.

@coketaste coketaste merged commit 572a6a2 into develop Apr 24, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants