Skip to content

refactor(auth): centralize credential loading and Docker registry login#106

Merged
coketaste merged 3 commits intomainfrom
coketaste/v2-review
Apr 20, 2026
Merged

refactor(auth): centralize credential loading and Docker registry login#106
coketaste merged 3 commits intomainfrom
coketaste/v2-review

Conversation

@coketaste
Copy link
Copy Markdown
Collaborator

Summary

This change introduces a dedicated madengine.core.auth module so credential loading and Docker registry login behave consistently everywhere, with a single place to fix or extend behavior. DockerBuilder and ContainerRunner now delegate registry login to a shared login_to_registry implementation; BuildOrchestrator and RunOrchestrator use load_credentials from the same module. Unused or duplicated paths are removed as part of the cleanup.

Motivation

  • Avoid duplicated logic between build/run flows and image pull/build paths.
  • Make credential precedence and registry key handling (e.g. docker.iodockerhub) consistent.
  • Reduce maintenance cost when updating authentication or error handling.

What changed

  • New src/madengine/core/auth.py:
    • load_credentials() — loads credential.json with optional override/supplement from MAD_DOCKERHUB_* env vars (see docstring for precedence).
    • login_to_registry(...) — shared Docker login used by both DockerBuilder and ContainerRunner (thin instance methods delegate to this function).
  • Orchestrators import load_credentials from madengine.core.auth instead of duplicating loading logic.
  • Tests: unit coverage in tests/unit/test_auth.py; integration expectations updated where login_to_registry is exercised (e.g. tests/integration/test_container_execution.py).

How to test

pip install -e ".[dev]"
pytest tests/unit/test_auth.py tests/integration/test_container_execution.py -v
pytest  # full suite if you prefer

coketaste and others added 2 commits April 17, 2026 23:25
… and dead code

- Extract `_load_credentials` from both orchestrators into `core/auth.py` (`load_credentials`)
- Remove dead `_filter_images_by_dockerfile_context` method from RunOrchestrator
- Raise `ConfigurationError` instead of `SystemExit` in BuildOrchestrator config loading
- Add CLAUDE.md with codebase guidance and architecture docs
- Update tests to cover auth module and refactored error handling

Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
… code

- Add login_to_registry() to core/auth.py; DockerBuilder and ContainerRunner
  delegate to it (raise_on_failure=True/False respectively), eliminating ~120
  lines of duplicated logic
- Remove unused functions: find_and_replace_pattern, substring_found (ops.py),
  highlight_log_section (log_formatting.py), SessionTracker.get_session_start
  and SessionTracker.load_marker (session_tracker.py)
- Clean up unused imports across core, deployment, execution, orchestration,
  and utils modules

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 18, 2026 21:07
@coketaste coketaste self-assigned this Apr 18, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a centralized auth module (madengine.core.auth) to unify credential loading and Docker registry login behavior across build/run flows, while also cleaning up duplicated logic and tightening command/IO handling in several utilities.

Changes:

  • Added madengine.core.auth with shared load_credentials() and login_to_registry(); updated orchestrators/builders/runners to delegate to it.
  • Updated error taxonomy and tests (e.g., ConnectionErrorNetworkError, TimeoutErrorDeploymentTimeoutError, removed RuntimeError alias).
  • Hardened several shell/IO paths (more quoting, cache clearing for lru-cached availability checks, CSV/report robustness tweaks).

Reviewed changes

Copilot reviewed 39 out of 39 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/unit/test_error_handling.py Updates unit tests to reflect renamed error classes and alias removal.
tests/unit/test_deployment.py Clears lru_cache for deterministic is_rocprofv3_available tests.
tests/unit/test_auth.py Adds unit tests for load_credentials() behavior and precedence.
tests/integration/test_errors.py Updates integration tests for renamed error types.
tests/integration/test_docker_integration.py Updates build-arg quoting expectations using shlex.quote.
src/madengine/utils/session_tracker.py Improves CSV row counting by ignoring blank lines; removes unused marker/session helpers.
src/madengine/utils/rocm_tool_manager.py Preserves primary stderr across fallback execution attempts.
src/madengine/utils/ops.py Removes unused dictionary substring helper functions and unused import.
src/madengine/utils/log_formatting.py Aligns truncation message with “tail” behavior; removes unused imports and helper.
src/madengine/utils/gpu_validator.py Removes unused typing imports.
src/madengine/utils/gpu_config.py Replaces print() with structured logging for GPU config messages.
src/madengine/utils/discover_models.py Quotes cp arguments and replaces assert with TypeError for runtime validation.
src/madengine/utils/config_parser.py Fixes directory-walk boundary condition (startswith correctness).
src/madengine/scripts/common/tools/rocm_smi_utils.py Replaces bare except with except Exception.
src/madengine/scripts/common/tools/amd_smi_utils.py Replaces bare except with except Exception.
src/madengine/reporting/update_perf_csv.py Drops non-scalar configs before DataFrame creation to avoid pandas errors.
src/madengine/reporting/csv_to_email.py Corrects return typing (can return None) and updates imports.
src/madengine/orchestration/run_orchestrator.py Switches to centralized load_credentials(); removes legacy filtering/credential helpers.
src/madengine/orchestration/build_orchestrator.py Switches to centralized load_credentials(); improves config error propagation.
src/madengine/execution/docker_builder.py Uses shared login_to_registry(), improves build-arg quoting, and avoids mutable default args.
src/madengine/execution/container_runner.py Uses shared login_to_registry(), removes debug print, and tightens exception handling.
src/madengine/deployment/slurm.py Changes behavior when job status cannot be verified (now treated as failure).
src/madengine/deployment/kubernetes.py Cleans up unused imports.
src/madengine/deployment/factory.py Warns when Kubernetes deps are missing instead of silently skipping registration.
src/madengine/deployment/config_loader.py Removes unused import.
src/madengine/deployment/common.py Adds lru_cache to is_rocprofv3_available().
src/madengine/deployment/base.py Adjusts perf.csv writing to preserve header order for existing files.
src/madengine/core/timeout.py Removes unused typing import.
src/madengine/core/errors.py Renames error classes, removes alias, and cleans up imports/formatting.
src/madengine/core/docker.py Avoids default Console() in signature; adds quoting for env values and docker exec payload.
src/madengine/core/dataprovider.py Removes debug prints.
src/madengine/core/context.py Removes debug prints and corrects get_numa_balancing() return type annotation.
src/madengine/core/auth.py New shared auth module for credential loading and Docker registry login.
src/madengine/cli/validators.py Ensures dockerfile_matched is always initialized for error paths.
src/madengine/cli/constants.py Converts ExitCode to an IntEnum.
src/madengine/cli/commands/run.py Maps timeout=0 to no timeout (None) per help text.
src/madengine/cli/app.py Prints real installed package version (fallbacks to unknown).
pyproject.toml Moves pytest out of core dependencies (remains in dev/all extras).
CLAUDE.md Adds repository dev workflow and architecture guidance.
Comments suppressed due to low confidence (1)

src/madengine/core/docker.py:102

  • container_name is safely quoted for the grep/cleanup checks, but the actual docker run --name ... still uses the unquoted container_name. This can fail for names with spaces/special chars and reintroduces shell-injection risk. Please reuse container_name_quoted consistently when building the docker run command.
                command += "-e " + evar + "=" + shlex.quote(str(envVars[evar])) + " "

        command += "--workdir /myworkspace/ "
        command += "--name " + container_name + " "
        command += image + " "

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/madengine/deployment/factory.py Outdated
Comment thread src/madengine/deployment/slurm.py Outdated
Comment thread tests/unit/test_auth.py Outdated
Comment thread src/madengine/core/auth.py
Comment thread src/madengine/core/auth.py Outdated
Comment thread src/madengine/execution/container_runner.py Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/madengine/core/errors.py:95

  • The PR description focuses on centralizing credential loading / registry login, but this PR also includes broader changes (e.g., renaming core error types like ConnectionErrorNetworkError and TimeoutErrorDeploymentTimeoutError). Please either update the PR description to cover these API-affecting changes, or split them into a separate PR to keep scope and review risk manageable.

class NetworkError(MADEngineError):
    """Connection and network errors."""

    def __init__(self, message: str, context: Optional[ErrorContext] = None, **kwargs):
        super().__init__(
            message,
            ErrorCategory.CONNECTION,
            context,
            recoverable=True,
            **kwargs
        )


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/madengine/core/auth.py
Comment thread src/madengine/execution/docker_builder.py
Comment thread src/madengine/core/docker.py
Comment thread src/madengine/deployment/factory.py
@coketaste coketaste merged commit f067dac into main Apr 20, 2026
4 checks passed
srinivamd added a commit to srinivamd/madengine that referenced this pull request Apr 23, 2026
Extracts duplicated credential loading and Docker registry login logic
from BuildOrchestrator and RunOrchestrator into a shared core/auth.py
module (~120 lines of duplicated login logic removed).

Security improvements over PR ROCm#106:
- Registry password passed via MAD_REGISTRY_PASSWORD env var instead
  of argv (avoids password leaking in process list / shell history)
- docker ps uses --filter name=^/$  with re.escape() to avoid regex
  metachar false positives (see docker.py fix)
- build-arg values wrapped with str() before shlex.quote()

Copilot review comment ROCm#2 fix: login_to_registry typed as
Optional[str] (was str) since callers pass None for DockerHub
and tests call with None.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants