feat(.builders): store hashes of all dependency resolution inputs in metadata.json#22952
feat(.builders): store hashes of all dependency resolution inputs in metadata.json#22952
Conversation
- Add WORKFLOW_FILE constant, hash_directory(), and compute_input_hashes() - Write inputs dict to metadata.json alongside existing sha256 key - Add tests for hash_directory, compute_input_hashes, and metadata contents Rationale: enables future check to compare stored hashes against current inputs and skip resolution PRs when nothing has changed
|
This PR does not modify any files shipped with the agent. To help streamline the release process, please consider adding the |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 142f5cb1ce
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
.builders/upload.py
Outdated
| for file_path in sorted(path.rglob('*')): | ||
| if file_path.is_file(): | ||
| h.update(file_path.read_bytes()) |
There was a problem hiding this comment.
Hash file paths as part of the
.builders digest
This digest only feeds each file's bytes into the SHA256, so distinct .builders trees can produce the same value. A simple rename like .builders/foo.sh -> .builders/bar.sh with identical contents, or repartitioning bytes across files (ab+c → a+bc), leaves this hash unchanged even though .github/workflows/scripts/resolve_deps_check_should_run.sh treats any .builders/ path change as a dependency-resolution input. That means a later comparison against metadata.json can incorrectly skip a rebuild/PR after a real builder change.
Useful? React with 👍 / 👎.
- Include relative file path in hash_directory to detect renames - Sort by relative path (was sorting by absolute, contradicting docstring) - Use hash_file() in compute_input_hashes instead of inlining sha256 - Wrap FileNotFoundError with descriptive RuntimeError in compute_input_hashes - Reuse inputs dict in generate_lockfiles to avoid double read of DIRECT_DEP_FILE - Extract patched_input_files fixture to eliminate test setup duplication - Add rename test to test_hash_directory - Assert hash values (not just keys) in test_generate_lockfiles_metadata_contains_inputs
rdesgroppes
left a comment
There was a problem hiding this comment.
The PR looks backward-compatible, with a few actionable concerns:
- potential pollution by Python cache file(s) s worth a filter,
- hard-coded key names decoupled from constants which could cause silent future bugs.
The rest is minor.
.builders/upload.py
Outdated
| return { | ||
| 'agent_requirements.in': hash_file(DIRECT_DEP_FILE), | ||
| '.github/workflows/resolve-build-deps.yaml': hash_file(WORKFLOW_FILE), | ||
| '.builders': hash_directory(BUILDER_DIR), |
There was a problem hiding this comment.
hash_directory includes itself in its own hash => potential instability:
when upload.py changes, hash_directory(BUILDER_DIR) will change (BUILDER_DIR = Path(__file__).parent), which might be the intended behavior, but it also means the hash of .builders includes any .pyc cache files, __pycache__/, .pytest_cache/, etc. generated during current or earlier runs.
(see comment on path.rglob('*') for a suggested fix)
There was a problem hiding this comment.
This is intended behavior but you're right that we can ignore some files.
| def hash_directory(path: Path) -> str: | ||
| """Compute a combined SHA256 hash of all files in a directory.""" | ||
| h = sha256() | ||
| for file_path in sorted(path.rglob('*'), key=lambda p: p.relative_to(path)): |
There was a problem hiding this comment.
path.rglob('*') descends into everything. In CI this might be stable depending on the local clone state, but maybe not depending on generated Python cache files => consider filtering.
.builders/upload.py
Outdated
| """Compute a combined SHA256 hash of all files in a directory.""" | ||
| h = sha256() | ||
| for file_path in sorted(path.rglob('*'), key=lambda p: p.relative_to(path)): | ||
| if file_path.is_file(): |
There was a problem hiding this comment.
Example:
| if file_path.is_file(): | |
| if file_path.is_file() and file_patch.suffix() != ".pyc" and not any(p.startswith('.') or p == '__pycache__' for p in file_path.parts): |
(dirty+heavy+not tested, there's probably a much better way to achieve the same)
.builders/upload.py
Outdated
| 'agent_requirements.in': hash_file(DIRECT_DEP_FILE), | ||
| '.github/workflows/resolve-build-deps.yaml': hash_file(WORKFLOW_FILE), | ||
| '.builders': hash_directory(BUILDER_DIR), |
There was a problem hiding this comment.
The keys 'agent_requirements.in', '.github/workflows/resolve-build-deps.yaml', '.builders' are string literals unrelated to the actual constants DIRECT_DEP_FILE, WORKFLOW_FILE, BUILDER_DIR.
If those constants are ever changed (e.g., the workflow is renamed), the metadata keys won't update and the future check comparing hashes will silently mismatch.
Consider deriving the keys from the constants:
DIRECT_DEP_FILE.relative_to(REPO_DIR).as_posix(), # 'agent_requirements.in'
WORKFLOW_FILE.relative_to(REPO_DIR).as_posix(), # '.github/workflows/resolve-build-deps.yaml'
BUILDER_DIR.relative_to(REPO_DIR).as_posix(), # '.builders'This would also validate that the constants are actually rooted under REPO_DIR.
| assert result['.builders'] == upload.hash_directory(builder_dir) | ||
|
|
||
|
|
||
| def test_generate_lockfiles_metadata_contains_inputs(tmp_path, patched_input_files, monkeypatch): |
There was a problem hiding this comment.
patched_input_files is not used in the function.
- Filter __pycache__, .pytest_cache, and .pyc files in hash_directory to prevent generated files from making the hash non-deterministic in CI - Derive dict keys in compute_input_hashes from Path constants via .relative_to(REPO_DIR).as_posix() to keep keys in sync with constants - Update patched_input_files fixture to return (dep_content, workflow_content, builder_dir) so both tests use the return value; fixture now also patches REPO_DIR and creates the workflow at the correct subpath for key derivation
Review from rdesgroppes is dismissed. Related teams and files:
- agent-build
- .builders/tests/test_upload.py
- .builders/upload.py
Summary
WORKFLOW_FILEconstant,hash_directory(), andcompute_input_hashes()to.builders/upload.pyinputsdict into.deps/metadata.jsonalongside the existingsha256key (backward compatible)hash_directory,compute_input_hashes, and the newmetadata.jsonstructureMotivation
.deps/metadata.jsonpreviously only stored the SHA256 ofagent_requirements.in. Theresolve-build-deps.yamlworkflow is also triggered by changes to the workflow file itself and the entire.builders/directory. Storing hashes for all three inputs enables a future check to determine whether a new dependency resolution PR is actually needed.Resulting
metadata.jsonstructure:{ "inputs": { ".builders": "<sha256>", ".github/workflows/resolve-build-deps.yaml": "<sha256>", "agent_requirements.in": "<sha256>" }, "sha256": "<sha256 of agent_requirements.in>" }Test plan
cd .builders && pytest -vvvcd .builders && python -m mypy --config-file pyproject.toml .🤖 Generated with Claude Code