Skip to content

fix: remap container paths to host paths in job_metadata.json#218

Merged
k-chrispens merged 1 commit intomainfrom
fix-metadata-host-paths
Apr 14, 2026
Merged

fix: remap container paths to host paths in job_metadata.json#218
k-chrispens merged 1 commit intomainfrom
fix-metadata-host-paths

Conversation

@Abdelsalam-Abbas
Copy link
Copy Markdown
Contributor

@Abdelsalam-Abbas Abdelsalam-Abbas commented Apr 11, 2026

Closes #210

Summary

  • When guidance runs inside Docker, job_metadata.json recorded container-internal paths (/data/inputs/..., /data/results/...) instead of host paths, making it impossible to reproduce experiments from metadata alone
  • Add _remap_container_path() in GuidanceConfig.as_dict() that substitutes container path prefixes with host paths using env vars
  • Three env vars, checked in specificity order: SAMPLEWORKS_HOST_INPUT_DIR (/data/inputs), SAMPLEWORKS_HOST_RESULTS_DIR (/data/results), SAMPLEWORKS_HOST_DIR (/data catch-all for single mounts)
  • Remapping only affects metadata serialization; runtime file I/O is unchanged
  • When no env vars are set, behavior is identical to before

Test plan

  • Unit tests for _remap_container_path: no-op without env vars, split mount remapping, catch-all remapping, specificity priority, checkpoint passthrough, trailing slash normalization, exact prefix match, empty env var handling
  • Integration tests for as_dict(): split mounts, catch-all, and no-env-var backward compat
  • pixi run -e boltz-dev tests -- tests/utils/test_guidance_script_arguments.py -v (39 passed)

Summary by CodeRabbit

  • New Features

    • Added environment variables to record host paths in job metadata: SAMPLEWORKS_HOST_INPUT_DIR, SAMPLEWORKS_HOST_RESULTS_DIR, or single SAMPLEWORKS_HOST_DIR.
  • Chores

    • Docker run commands updated to pass these host-path environment variables into containers.
  • Documentation

    • Help/usage output gained a METADATA HOST PATH REMAPPING section with examples.
  • Tests

    • Added tests covering remapping behavior, precedence, normalization, and unaffected paths.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 11, 2026

Warning

Rate limit exceeded

@Abdelsalam-Abbas has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 50 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 6 minutes and 50 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f678bc8-36ff-4985-b692-576c553f44e2

📥 Commits

Reviewing files that changed from the base of the PR and between 03aa41e and a5848eb.

📒 Files selected for processing (4)
  • docker-entrypoint.sh
  • run_all_models.sh
  • src/sampleworks/utils/guidance_script_arguments.py
  • tests/utils/test_guidance_script_arguments.py
📝 Walkthrough

Walkthrough

Added host-path remapping for guidance job metadata: documented usage in the docker entrypoint, passed host-path env vars into container runs, implemented container→host path remapping in GuidanceConfig.as_dict(), and extended tests to cover remapping behavior and precedence.

Changes

Cohort / File(s) Summary
Entrypoint / Usage
docker-entrypoint.sh
Expanded help/usage output with a new "METADATA HOST PATH REMAPPING" section describing environment variables that instruct job_metadata.json to record host paths instead of container-internal paths (split-mount and single-mount options) and an example docker run.
Container invocations
run_all_models.sh
Added environment variables to docker run calls: SAMPLEWORKS_HOST_INPUT_DIR and SAMPLEWORKS_HOST_RESULTS_DIR (sourced from host DATA_DIR/RESULTS_DIR) so containers receive host-path mappings at runtime.
Core remapping logic
src/sampleworks/utils/guidance_script_arguments.py
Introduced _ENV_VAR_MAPPINGS and _remap_container_path() to map container prefixes (/data/inputs, /data/results, /data) to host directories based on env vars (including catch-all SAMPLEWORKS_HOST_DIR and precedence rules). GuidanceConfig.as_dict() now applies remapping to serialized path fields (density, structure, output_dir, log_path) and docstring updated to describe metadata behavior.
Tests
tests/utils/test_guidance_script_arguments.py
Added tests for _remap_container_path() covering no-env, per-prefix envs, catch-all env, precedence of per-prefix over catch-all, ignoring empty values, whitespace/slash normalization, and non-matching paths. Added GuidanceConfig.as_dict() tests verifying remapping of structure, density, output_dir, and log_path when env vars are set and no changes otherwise.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Host
participant Docker as "Docker Container"
participant Serializer as "Guidance Serializer"
participant FS as "Filesystem / job_metadata.json"
Host->>Docker: docker run ... -e SAMPLEWORKS_HOST_INPUT_DIR -e SAMPLEWORKS_HOST_RESULTS_DIR
Docker->>Serializer: run guidance job; call GuidanceConfig.as_dict()
Serializer->>Serializer: _remap_container_path() reads env vars (INPUT/RESULTS/HOST) and remaps prefixes
Serializer->>FS: write job_metadata.json with remapped host paths
FS-->>Host: metadata contains host-real paths

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hop from /data to fields outside,
I swap container trails for the host's wide stride,
Env-vars point to carrots in the ground,
job_metadata.json now shows where they're found.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: remapping container paths to host paths in job_metadata.json, which is the core objective across all file modifications.
Linked Issues check ✅ Passed The PR fully addresses issue #210 by implementing container-to-host path remapping in job_metadata.json via environment variables while maintaining backward compatibility.
Out of Scope Changes check ✅ Passed All changes directly support the path remapping objective: documentation updates, Docker environment variable configuration, remapping logic implementation, and comprehensive test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-metadata-host-paths

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/utils/test_guidance_script_arguments.py (1)

247-320: Drop _resolve_checkpoint mocks from as_dict remap tests.

These tests don’t exercise checkpoint resolution, so the patching adds avoidable coupling.

🧪 Simplify tests by removing unnecessary mocks
-@patch(
-    "sampleworks.utils.guidance_script_arguments._resolve_checkpoint",
-    return_value="/checkpoints/mock.ckpt",
-)
-def test_as_dict_remaps_all_four_path_fields(_mock_resolve, monkeypatch):
+def test_as_dict_remaps_all_four_path_fields(monkeypatch):
@@
-@patch(
-    "sampleworks.utils.guidance_script_arguments._resolve_checkpoint",
-    return_value="/checkpoints/mock.ckpt",
-)
-def test_as_dict_unchanged_when_no_env_vars(_mock_resolve, monkeypatch):
+def test_as_dict_unchanged_when_no_env_vars(monkeypatch):
@@
-@patch(
-    "sampleworks.utils.guidance_script_arguments._resolve_checkpoint",
-    return_value="/checkpoints/mock.ckpt",
-)
-def test_as_dict_remaps_with_catchall_host_dir(_mock_resolve, monkeypatch):
+def test_as_dict_remaps_with_catchall_host_dir(monkeypatch):
As per coding guidelines " `**/test_*.py`: Write black-box tests that verify behavior, not implementation. Use realistic inputs and avoid mocks. Test public interfaces with expected behaviors."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_guidance_script_arguments.py` around lines 247 - 320, The
tests test_as_dict_remaps_all_four_path_fields,
test_as_dict_unchanged_when_no_env_vars, and
test_as_dict_remaps_with_catchall_host_dir are unnecessarily patching
sampleworks.utils.guidance_script_arguments._resolve_checkpoint; remove the
`@patch`(...) decorator from each test and delete the unused _mock_resolve
parameter from their function signatures so the tests exercise
GuidanceConfig.as_dict as a black-box without coupling to internal checkpoint
resolution logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/sampleworks/utils/guidance_script_arguments.py`:
- Around line 84-85: The code appends a normalized host path with
host_dir.rstrip("/") which turns "/" into "" and also doesn't remove surrounding
whitespace before use; change the normalization so you first trim whitespace
from host_dir (host_dir = host_dir.strip()), then compute normalized_host = "/"
if host_dir == "/" else host_dir.rstrip("/"), and use that in the
env_pairs.append call (replace the existing
env_pairs.append((container_prefix.rstrip("/"), host_dir.rstrip("/")))). Ensure
you still validate host_dir is non-empty after stripping and keep
container_prefix normalization as before.

---

Nitpick comments:
In `@tests/utils/test_guidance_script_arguments.py`:
- Around line 247-320: The tests test_as_dict_remaps_all_four_path_fields,
test_as_dict_unchanged_when_no_env_vars, and
test_as_dict_remaps_with_catchall_host_dir are unnecessarily patching
sampleworks.utils.guidance_script_arguments._resolve_checkpoint; remove the
`@patch`(...) decorator from each test and delete the unused _mock_resolve
parameter from their function signatures so the tests exercise
GuidanceConfig.as_dict as a black-box without coupling to internal checkpoint
resolution logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: af2d41c9-1668-4891-9689-4d0a5dd7f802

📥 Commits

Reviewing files that changed from the base of the PR and between f57044e and 606a2bc.

📒 Files selected for processing (4)
  • docker-entrypoint.sh
  • run_all_models.sh
  • src/sampleworks/utils/guidance_script_arguments.py
  • tests/utils/test_guidance_script_arguments.py

Comment thread src/sampleworks/utils/guidance_script_arguments.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/utils/test_guidance_script_arguments.py (2)

202-208: Clear SAMPLEWORKS_HOST_RESULTS_DIR here to avoid env-dependent flakiness.

This test asserts catch-all behavior for /data/results, but it never unsets SAMPLEWORKS_HOST_RESULTS_DIR. If that var exists in the parent environment, this assertion can fail nondeterministically.

Proposed fix
 def test_remap_specific_env_takes_priority_over_catchall(monkeypatch):
-
+    monkeypatch.delenv("SAMPLEWORKS_HOST_RESULTS_DIR", raising=False)
     monkeypatch.setenv("SAMPLEWORKS_HOST_INPUT_DIR", "/specific/data")
     monkeypatch.setenv("SAMPLEWORKS_HOST_DIR", "/general")
     assert _remap_container_path("/data/inputs/foo.cif") == "/specific/data/foo.cif"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_guidance_script_arguments.py` around lines 202 - 208, The
test can be flaky if SAMPLEWORKS_HOST_RESULTS_DIR exists in the environment;
before asserting catch-all behavior for /data/results, remove that env var so
_remap_container_path cannot pick a specific results mapping. In the
test_remap_specific_env_takes_priority_over_catchall function use
monkeypatch.delenv("SAMPLEWORKS_HOST_RESULTS_DIR", raising=False) (or
equivalent) prior to the assertions so only SAMPLEWORKS_HOST_INPUT_DIR and
SAMPLEWORKS_HOST_DIR control the outcome.

165-320: New test functions are missing docstrings required by repo rules.

Most newly added tests in this block have no function docstrings. Please add concise NumPy-style docstrings to keep this file consistent with the project standard.

As per coding guidelines: "**/*.py: Always include NumPy-style docstrings for every function and class."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_guidance_script_arguments.py` around lines 165 - 320, Add
concise NumPy-style docstrings to each new test function (e.g.
test_remap_noop_when_no_env_vars, test_remap_data_dir_env,
test_remap_results_dir_env, test_remap_catchall_dir_env,
test_remap_specific_env_takes_priority_over_catchall,
test_remap_leaves_checkpoint_unchanged, test_remap_trailing_slash_normalization,
test_remap_exact_prefix_match, test_remap_ignores_empty_env_var,
test_as_dict_remaps_all_four_path_fields,
test_as_dict_unchanged_when_no_env_vars,
test_as_dict_remaps_with_catchall_host_dir) by adding a one-line summary and a
short description in NumPy docstring format directly under each def; keep them
minimal (purpose of the test, key conditions/assertions) to satisfy the repo
rule requiring NumPy-style docstrings for every function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/utils/test_guidance_script_arguments.py`:
- Around line 219-239: Add two regression tests in
tests/utils/test_guidance_script_arguments.py around the existing
_remap_container_path tests: (1) a whitespace-padded env var case that sets
SAMPLEWORKS_HOST_INPUT_DIR to " /mnt/data " and asserts
_remap_container_path("/data/inputs/foo.cif") returns "/mnt/data/foo.cif"
(ensuring trimming), and (2) a root env var case that sets
SAMPLEWORKS_HOST_INPUT_DIR to "/" and asserts
_remap_container_path("/data/inputs/foo.cif") returns "/foo.cif" and
_remap_container_path("/data/inputs") returns "/" (ensuring "/" normalizes
correctly); use monkeypatch to set/delenv like the other tests and reference
_remap_container_path to locate behavior to lock in.

---

Nitpick comments:
In `@tests/utils/test_guidance_script_arguments.py`:
- Around line 202-208: The test can be flaky if SAMPLEWORKS_HOST_RESULTS_DIR
exists in the environment; before asserting catch-all behavior for
/data/results, remove that env var so _remap_container_path cannot pick a
specific results mapping. In the
test_remap_specific_env_takes_priority_over_catchall function use
monkeypatch.delenv("SAMPLEWORKS_HOST_RESULTS_DIR", raising=False) (or
equivalent) prior to the assertions so only SAMPLEWORKS_HOST_INPUT_DIR and
SAMPLEWORKS_HOST_DIR control the outcome.
- Around line 165-320: Add concise NumPy-style docstrings to each new test
function (e.g. test_remap_noop_when_no_env_vars, test_remap_data_dir_env,
test_remap_results_dir_env, test_remap_catchall_dir_env,
test_remap_specific_env_takes_priority_over_catchall,
test_remap_leaves_checkpoint_unchanged, test_remap_trailing_slash_normalization,
test_remap_exact_prefix_match, test_remap_ignores_empty_env_var,
test_as_dict_remaps_all_four_path_fields,
test_as_dict_unchanged_when_no_env_vars,
test_as_dict_remaps_with_catchall_host_dir) by adding a one-line summary and a
short description in NumPy docstring format directly under each def; keep them
minimal (purpose of the test, key conditions/assertions) to satisfy the repo
rule requiring NumPy-style docstrings for every function.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 64f757db-610b-46d5-a628-f0ae41e0a331

📥 Commits

Reviewing files that changed from the base of the PR and between 606a2bc and 98eba05.

📒 Files selected for processing (4)
  • docker-entrypoint.sh
  • run_all_models.sh
  • src/sampleworks/utils/guidance_script_arguments.py
  • tests/utils/test_guidance_script_arguments.py
✅ Files skipped from review due to trivial changes (2)
  • docker-entrypoint.sh
  • run_all_models.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/sampleworks/utils/guidance_script_arguments.py

Comment thread tests/utils/test_guidance_script_arguments.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/utils/test_guidance_script_arguments.py (1)

165-323: Add NumPy-style docstrings to newly added test functions.

The new test functions are missing the required NumPy-style docstrings for functions/classes in Python files.

As per coding guidelines, "**/*.py: Always include NumPy-style docstrings for every function and class."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_guidance_script_arguments.py` around lines 165 - 323, Add
NumPy-style docstrings to each newly added test function so they comply with the
repository guideline requiring docstrings for every function/class; update the
test functions test_remap_noop_when_no_env_vars, test_remap_data_dir_env,
test_remap_results_dir_env, test_remap_catchall_dir_env,
test_remap_specific_env_takes_priority_over_catchall,
test_remap_leaves_checkpoint_unchanged, test_remap_trailing_slash_normalization,
test_remap_exact_prefix_match, test_remap_ignores_empty_env_var,
test_remap_whitespace_padded_env_var, test_remap_root_env_var,
test_as_dict_remaps_all_four_path_fields,
test_as_dict_unchanged_when_no_env_vars, and
test_as_dict_remaps_with_catchall_host_dir to include concise NumPy-style
docstrings (one-line summary plus short description/notes where appropriate)
describing the test purpose and expected behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/utils/test_guidance_script_arguments.py`:
- Around line 12-13: Replace direct tests of the private helper
_remap_container_path by exercising the public GuidanceConfig.as_dict()
interface: remove imports/usages of _remap_container_path from tests and extend
the three existing as_dict() test functions to include the edge cases covered
previously (trailing slashes, leading/trailing whitespace, and root container
paths) by constructing GuidanceConfig instances with those host/container path
variants and asserting the expected remapped values appear in the as_dict()
output; locate references to _remap_container_path and the
GuidanceConfig.as_dict() calls to update assertions accordingly.

---

Nitpick comments:
In `@tests/utils/test_guidance_script_arguments.py`:
- Around line 165-323: Add NumPy-style docstrings to each newly added test
function so they comply with the repository guideline requiring docstrings for
every function/class; update the test functions
test_remap_noop_when_no_env_vars, test_remap_data_dir_env,
test_remap_results_dir_env, test_remap_catchall_dir_env,
test_remap_specific_env_takes_priority_over_catchall,
test_remap_leaves_checkpoint_unchanged, test_remap_trailing_slash_normalization,
test_remap_exact_prefix_match, test_remap_ignores_empty_env_var,
test_remap_whitespace_padded_env_var, test_remap_root_env_var,
test_as_dict_remaps_all_four_path_fields,
test_as_dict_unchanged_when_no_env_vars, and
test_as_dict_remaps_with_catchall_host_dir to include concise NumPy-style
docstrings (one-line summary plus short description/notes where appropriate)
describing the test purpose and expected behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 56da6970-0858-46a2-bf2e-c2f45b3ef3fb

📥 Commits

Reviewing files that changed from the base of the PR and between 98eba05 and 03aa41e.

📒 Files selected for processing (2)
  • src/sampleworks/utils/guidance_script_arguments.py
  • tests/utils/test_guidance_script_arguments.py
✅ Files skipped from review due to trivial changes (1)
  • src/sampleworks/utils/guidance_script_arguments.py

Comment thread tests/utils/test_guidance_script_arguments.py
When guidance runs inside Docker, job_metadata.json recorded
container-internal paths (/data/inputs/..., /data/results/...) instead
of host paths, making it impossible to reproduce experiments from
metadata alone.

Add container-to-host path remapping in GuidanceConfig.as_dict() via
env vars:
- SAMPLEWORKS_HOST_INPUT_DIR replaces /data/inputs
- SAMPLEWORKS_HOST_RESULTS_DIR replaces /data/results
- SAMPLEWORKS_HOST_DIR replaces /data (fallback for single-mount setups)

Remapping only affects metadata serialization; runtime file I/O is
unchanged. When no env vars are set, behavior is identical to before.
Copy link
Copy Markdown
Collaborator

@k-chrispens k-chrispens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@k-chrispens k-chrispens merged commit e965808 into main Apr 14, 2026
8 of 11 checks passed
@k-chrispens k-chrispens deleted the fix-metadata-host-paths branch April 14, 2026 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Output metadata from running guidance should include the correct/real paths to objects.

2 participants