refactor(v2-review): shell injection hardening, bug fixes, and test cleanup#122
Merged
Conversation
…onfig cleanup - Harden shell commands with shlex.quote() in docker.py, container_runner.py, docker_builder.py, and run_orchestrator.py to prevent injection via user-controlled values (image names, paths, container names) - Fix TypeError when kfd_renderDs is None on restricted ROCm < 6.4.1 systems - Add CANCELLED to deployment monitor terminal states to prevent infinite loop - Consolidate pytest config into pyproject.toml and delete redundant pytest.ini - Remove sys.path hack and duplicate marker registration from conftest.py - Fix global error handler state leak in test_error_handling.py - Add test_shell_quoting.py with 11 tests validating quoting behavior Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens several Docker-related shell commands against injection, fixes a couple of runtime bugs in GPU mapping and deployment monitoring, and consolidates pytest configuration into pyproject.toml while cleaning up test infrastructure.
Changes:
- Add
shlex.quote()escaping for user-controlled values interpolated intodocker run/build/pull/tag/rmi/inspectcommands across core execution paths. - Fix ROCm < 6.4.1 crash when KFD topology is inaccessible, and prevent infinite monitoring loops on cancelled deployments.
- Consolidate pytest config into
pyproject.toml, remove redundant marker registration /sys.pathhacks, and add new unit tests for shell quoting.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/madengine/core/docker.py |
Quote container name, image, mounts, and CWD in docker run command construction. |
src/madengine/execution/docker_builder.py |
Quote image/dockerfile/context in docker build command (but other shell interpolations remain). |
src/madengine/execution/container_runner.py |
Quote image names in docker pull/tag/rmi and quote mount paths in mount args. |
src/madengine/orchestration/run_orchestrator.py |
Quote image name in docker image inspect and docker pull. |
src/madengine/core/context.py |
Raise a clear error when KFD topology is unavailable on ROCm < 6.4.1 instead of crashing. |
src/madengine/deployment/base.py |
Treat CANCELLED as terminal to stop monitoring loops. |
tests/unit/test_shell_quoting.py |
Add unit tests validating quoting behavior across hardened command paths. |
tests/unit/test_error_handling.py |
Reset global error handler between tests to prevent state leakage. |
tests/conftest.py |
Remove sys.path insertion and redundant marker registration. |
pyproject.toml |
Make pytest configuration the source of truth, enable --strict-markers, and define markers/warnings. |
pytest.ini |
Remove redundant config file that could override pyproject.toml. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…n up test imports Add shlex.quote() to remaining unquoted shell interpolations in docker_builder.py (grep, docker manifest inspect, docker tag, docker push, head commands). Remove unused pytest and tempfile imports and dead temp file block from test_shell_quoting.py. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
[tool.pytest.ini_options].minversion is the minimum pytest version, not Python. The previous value "3.8" was the Python version. Use 7.0 to match the floor actually required by the config — pythonpath was added in pytest 7.0.0. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
CHANGELOG.md [2.0.3]: - Added: K8s `storage_class` field + preset default change from `local_path_storage_class: "local-path"` to `storage_class: "nfs-banff"` - Changed: pytest config consolidation into pyproject.toml; minversion corrected to 7.0; conftest.py cleanup - Fixed: `kfd_renderDs is None` TypeError on restricted ROCm < 6.4.1; deployment monitor infinite loop on CANCELLED jobs - Security: extended shell-injection hardening (shlex.quote) across docker.py, container_runner.py, docker_builder.py, run_orchestrator.py - Tests: new test_shell_quoting.py (11 tests); test_error_handling.py global-state-leak fix examples/k8s-configs/README.md: - Update storage-class table to show new `storage_class` fallback chain - Document 2.0.3 preset default change (`local-path` -> `nfs-banff` for single-node results PVC) with override snippet for clusters that still want local-path - Mark `local_path_storage_class` as removed-from-preset-but-still-honoured Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
coketaste
added a commit
that referenced
this pull request
May 28, 2026
Resolves the CHANGELOG conflict by adopting upstream's v2.0.3 release date (2026-05-26 — finalized upstream via #122) and graduating this branch's Unreleased entries into a new v2.1.0 section dated 2026-05-28, since slurm_multi / --use-image / --build-on-compute are feature work. Auto-merged from upstream: - fix: generate MAD_MULTI_NODE_RUNNER for Docker local deployment (#126) - docs/wiki/index.html (wiki path rename, #129)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
shlex.quote()to prevent injection via image names, container names, mount paths, and context pathsTypeErrorcrash when KFD topology is inaccessible on ROCm < 6.4.1 systems (kfd_renderDsisNone)CANCELLEDmissing from terminal status set)pyproject.tomlas single source of truth (delete redundantpytest.inithat was silently overriding it)sys.pathhack, deduplicate marker registration, fix global error handler state leakChanges
Security hardening (
shlex.quote())core/docker.pycontainer_name,image, mount paths,cwdindocker runexecution/docker_builder.pydocker_image,dockerfile,docker_contextindocker buildexecution/container_runner.pyregistry_image,local_nameindocker pull/tag/rmi; mount paths inget_mount_arg()orchestration/run_orchestrator.pyimage_nameindocker image inspectanddocker pullBug fixes
core/context.py— Guardkfd_renderDs is Nonebeforelen()call, with a clear error message pointing at KFD topology permissionsdeployment/base.py— AddDeploymentStatus.CANCELLEDto the terminal set in_monitor_until_complete()so cancelled jobs don't spin foreverTest & config cleanup
pytest.ini— was overriding allpyproject.toml[tool.pytest.ini_options]settings silentlypyproject.toml— Fixpython_pathstypo →pythonpath, add--strict-markers,filterwarnings,minversion, and full marker list (10 markers)tests/conftest.py— Removesys.pathinsertion (now handled bypythonpath) andpytest_configure()marker duplicationtests/unit/test_error_handling.py— Addsetup_method/teardown_methodto reset global error handler between teststests/unit/test_shell_quoting.py— 11 new tests validating quoting across all hardened code pathsTest plan
pytest tests/unit/ -v— all 437 tests pass (verified locally); rm -rf /,$(whoami)) and safe inputs