refactor(auth): centralize credential loading and Docker registry login#106
refactor(auth): centralize credential loading and Docker registry login#106
Conversation
… and dead code - Extract `_load_credentials` from both orchestrators into `core/auth.py` (`load_credentials`) - Remove dead `_filter_images_by_dockerfile_context` method from RunOrchestrator - Raise `ConfigurationError` instead of `SystemExit` in BuildOrchestrator config loading - Add CLAUDE.md with codebase guidance and architecture docs - Update tests to cover auth module and refactored error handling Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
… code - Add login_to_registry() to core/auth.py; DockerBuilder and ContainerRunner delegate to it (raise_on_failure=True/False respectively), eliminating ~120 lines of duplicated logic - Remove unused functions: find_and_replace_pattern, substring_found (ops.py), highlight_log_section (log_formatting.py), SessionTracker.get_session_start and SessionTracker.load_marker (session_tracker.py) - Clean up unused imports across core, deployment, execution, orchestration, and utils modules Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a centralized auth module (madengine.core.auth) to unify credential loading and Docker registry login behavior across build/run flows, while also cleaning up duplicated logic and tightening command/IO handling in several utilities.
Changes:
- Added
madengine.core.authwith sharedload_credentials()andlogin_to_registry(); updated orchestrators/builders/runners to delegate to it. - Updated error taxonomy and tests (e.g.,
ConnectionError→NetworkError,TimeoutError→DeploymentTimeoutError, removedRuntimeErroralias). - Hardened several shell/IO paths (more quoting, cache clearing for lru-cached availability checks, CSV/report robustness tweaks).
Reviewed changes
Copilot reviewed 39 out of 39 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_error_handling.py | Updates unit tests to reflect renamed error classes and alias removal. |
| tests/unit/test_deployment.py | Clears lru_cache for deterministic is_rocprofv3_available tests. |
| tests/unit/test_auth.py | Adds unit tests for load_credentials() behavior and precedence. |
| tests/integration/test_errors.py | Updates integration tests for renamed error types. |
| tests/integration/test_docker_integration.py | Updates build-arg quoting expectations using shlex.quote. |
| src/madengine/utils/session_tracker.py | Improves CSV row counting by ignoring blank lines; removes unused marker/session helpers. |
| src/madengine/utils/rocm_tool_manager.py | Preserves primary stderr across fallback execution attempts. |
| src/madengine/utils/ops.py | Removes unused dictionary substring helper functions and unused import. |
| src/madengine/utils/log_formatting.py | Aligns truncation message with “tail” behavior; removes unused imports and helper. |
| src/madengine/utils/gpu_validator.py | Removes unused typing imports. |
| src/madengine/utils/gpu_config.py | Replaces print() with structured logging for GPU config messages. |
| src/madengine/utils/discover_models.py | Quotes cp arguments and replaces assert with TypeError for runtime validation. |
| src/madengine/utils/config_parser.py | Fixes directory-walk boundary condition (startswith correctness). |
| src/madengine/scripts/common/tools/rocm_smi_utils.py | Replaces bare except with except Exception. |
| src/madengine/scripts/common/tools/amd_smi_utils.py | Replaces bare except with except Exception. |
| src/madengine/reporting/update_perf_csv.py | Drops non-scalar configs before DataFrame creation to avoid pandas errors. |
| src/madengine/reporting/csv_to_email.py | Corrects return typing (can return None) and updates imports. |
| src/madengine/orchestration/run_orchestrator.py | Switches to centralized load_credentials(); removes legacy filtering/credential helpers. |
| src/madengine/orchestration/build_orchestrator.py | Switches to centralized load_credentials(); improves config error propagation. |
| src/madengine/execution/docker_builder.py | Uses shared login_to_registry(), improves build-arg quoting, and avoids mutable default args. |
| src/madengine/execution/container_runner.py | Uses shared login_to_registry(), removes debug print, and tightens exception handling. |
| src/madengine/deployment/slurm.py | Changes behavior when job status cannot be verified (now treated as failure). |
| src/madengine/deployment/kubernetes.py | Cleans up unused imports. |
| src/madengine/deployment/factory.py | Warns when Kubernetes deps are missing instead of silently skipping registration. |
| src/madengine/deployment/config_loader.py | Removes unused import. |
| src/madengine/deployment/common.py | Adds lru_cache to is_rocprofv3_available(). |
| src/madengine/deployment/base.py | Adjusts perf.csv writing to preserve header order for existing files. |
| src/madengine/core/timeout.py | Removes unused typing import. |
| src/madengine/core/errors.py | Renames error classes, removes alias, and cleans up imports/formatting. |
| src/madengine/core/docker.py | Avoids default Console() in signature; adds quoting for env values and docker exec payload. |
| src/madengine/core/dataprovider.py | Removes debug prints. |
| src/madengine/core/context.py | Removes debug prints and corrects get_numa_balancing() return type annotation. |
| src/madengine/core/auth.py | New shared auth module for credential loading and Docker registry login. |
| src/madengine/cli/validators.py | Ensures dockerfile_matched is always initialized for error paths. |
| src/madengine/cli/constants.py | Converts ExitCode to an IntEnum. |
| src/madengine/cli/commands/run.py | Maps timeout=0 to no timeout (None) per help text. |
| src/madengine/cli/app.py | Prints real installed package version (fallbacks to unknown). |
| pyproject.toml | Moves pytest out of core dependencies (remains in dev/all extras). |
| CLAUDE.md | Adds repository dev workflow and architecture guidance. |
Comments suppressed due to low confidence (1)
src/madengine/core/docker.py:102
container_nameis safely quoted for the grep/cleanup checks, but the actualdocker run --name ...still uses the unquotedcontainer_name. This can fail for names with spaces/special chars and reintroduces shell-injection risk. Please reusecontainer_name_quotedconsistently when building thedocker runcommand.
command += "-e " + evar + "=" + shlex.quote(str(envVars[evar])) + " "
command += "--workdir /myworkspace/ "
command += "--name " + container_name + " "
command += image + " "
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 39 out of 39 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/madengine/core/errors.py:95
- The PR description focuses on centralizing credential loading / registry login, but this PR also includes broader changes (e.g., renaming core error types like
ConnectionError→NetworkErrorandTimeoutError→DeploymentTimeoutError). Please either update the PR description to cover these API-affecting changes, or split them into a separate PR to keep scope and review risk manageable.
class NetworkError(MADEngineError):
"""Connection and network errors."""
def __init__(self, message: str, context: Optional[ErrorContext] = None, **kwargs):
super().__init__(
message,
ErrorCategory.CONNECTION,
context,
recoverable=True,
**kwargs
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Extracts duplicated credential loading and Docker registry login logic from BuildOrchestrator and RunOrchestrator into a shared core/auth.py module (~120 lines of duplicated login logic removed). Security improvements over PR ROCm#106: - Registry password passed via MAD_REGISTRY_PASSWORD env var instead of argv (avoids password leaking in process list / shell history) - docker ps uses --filter name=^/$ with re.escape() to avoid regex metachar false positives (see docker.py fix) - build-arg values wrapped with str() before shlex.quote() Copilot review comment ROCm#2 fix: login_to_registry typed as Optional[str] (was str) since callers pass None for DockerHub and tests call with None.
Summary
This change introduces a dedicated
madengine.core.authmodule so credential loading and Docker registry login behave consistently everywhere, with a single place to fix or extend behavior.DockerBuilderandContainerRunnernow delegate registry login to a sharedlogin_to_registryimplementation;BuildOrchestratorandRunOrchestratoruseload_credentialsfrom the same module. Unused or duplicated paths are removed as part of the cleanup.Motivation
docker.io→dockerhub) consistent.What changed
src/madengine/core/auth.py:load_credentials()— loadscredential.jsonwith optional override/supplement fromMAD_DOCKERHUB_*env vars (see docstring for precedence).login_to_registry(...)— shared Docker login used by bothDockerBuilderandContainerRunner(thin instance methods delegate to this function).load_credentialsfrommadengine.core.authinstead of duplicating loading logic.tests/unit/test_auth.py; integration expectations updated wherelogin_to_registryis exercised (e.g.tests/integration/test_container_execution.py).How to test