Skip to content

feat(.builders): store hashes of all dependency resolution inputs in metadata.json#22952

Open
dkirov-dd wants to merge 3 commits intomasterfrom
dk/AI-6466/store-hashes
Open

feat(.builders): store hashes of all dependency resolution inputs in metadata.json#22952
dkirov-dd wants to merge 3 commits intomasterfrom
dk/AI-6466/store-hashes

Conversation

@dkirov-dd
Copy link
Contributor

@dkirov-dd dkirov-dd commented Mar 18, 2026

Summary

  • Add WORKFLOW_FILE constant, hash_directory(), and compute_input_hashes() to .builders/upload.py
  • Write an inputs dict into .deps/metadata.json alongside the existing sha256 key (backward compatible)
  • Add unit tests for hash_directory, compute_input_hashes, and the new metadata.json structure

Motivation

.deps/metadata.json previously only stored the SHA256 of agent_requirements.in. The resolve-build-deps.yaml workflow is also triggered by changes to the workflow file itself and the entire .builders/ directory. Storing hashes for all three inputs enables a future check to determine whether a new dependency resolution PR is actually needed.

Resulting metadata.json structure:

{
  "inputs": {
    ".builders": "<sha256>",
    ".github/workflows/resolve-build-deps.yaml": "<sha256>",
    "agent_requirements.in": "<sha256>"
  },
  "sha256": "<sha256 of agent_requirements.in>"
}

Test plan

  • Unit tests pass: cd .builders && pytest -vvv
  • Type check passes: cd .builders && python -m mypy --config-file pyproject.toml .

🤖 Generated with Claude Code

- Add WORKFLOW_FILE constant, hash_directory(), and compute_input_hashes()
- Write inputs dict to metadata.json alongside existing sha256 key
- Add tests for hash_directory, compute_input_hashes, and metadata contents

Rationale: enables future check to compare stored hashes against current
inputs and skip resolution PRs when nothing has changed
@dkirov-dd dkirov-dd requested a review from a team as a code owner March 18, 2026 14:18
@github-actions
Copy link
Contributor

⚠️ Recommendation: Add qa/skip-qa label

This PR does not modify any files shipped with the agent.

To help streamline the release process, please consider adding the qa/skip-qa label if these changes do not require QA testing.

@dkirov-dd dkirov-dd added the qa/skip-qa Automatically skip this PR for the next QA label Mar 18, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 142f5cb1ce

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +83 to +85
for file_path in sorted(path.rglob('*')):
if file_path.is_file():
h.update(file_path.read_bytes())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hash file paths as part of the .builders digest

This digest only feeds each file's bytes into the SHA256, so distinct .builders trees can produce the same value. A simple rename like .builders/foo.sh -> .builders/bar.sh with identical contents, or repartitioning bytes across files (ab+ca+bc), leaves this hash unchanged even though .github/workflows/scripts/resolve_deps_check_should_run.sh treats any .builders/ path change as a dependency-resolution input. That means a later comparison against metadata.json can incorrectly skip a rebuild/PR after a real builder change.

Useful? React with 👍 / 👎.

- Include relative file path in hash_directory to detect renames
- Sort by relative path (was sorting by absolute, contradicting docstring)
- Use hash_file() in compute_input_hashes instead of inlining sha256
- Wrap FileNotFoundError with descriptive RuntimeError in compute_input_hashes
- Reuse inputs dict in generate_lockfiles to avoid double read of DIRECT_DEP_FILE
- Extract patched_input_files fixture to eliminate test setup duplication
- Add rename test to test_hash_directory
- Assert hash values (not just keys) in test_generate_lockfiles_metadata_contains_inputs
rdesgroppes
rdesgroppes previously approved these changes Mar 18, 2026
Copy link
Contributor

@rdesgroppes rdesgroppes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks backward-compatible, with a few actionable concerns:

  1. potential pollution by Python cache file(s) s worth a filter,
  2. hard-coded key names decoupled from constants which could cause silent future bugs.

The rest is minor.

return {
'agent_requirements.in': hash_file(DIRECT_DEP_FILE),
'.github/workflows/resolve-build-deps.yaml': hash_file(WORKFLOW_FILE),
'.builders': hash_directory(BUILDER_DIR),
Copy link
Contributor

@rdesgroppes rdesgroppes Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash_directory includes itself in its own hash => potential instability:
when upload.py changes, hash_directory(BUILDER_DIR) will change (BUILDER_DIR = Path(__file__).parent), which might be the intended behavior, but it also means the hash of .builders includes any .pyc cache files, __pycache__/, .pytest_cache/, etc. generated during current or earlier runs.

(see comment on path.rglob('*') for a suggested fix)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended behavior but you're right that we can ignore some files.

def hash_directory(path: Path) -> str:
"""Compute a combined SHA256 hash of all files in a directory."""
h = sha256()
for file_path in sorted(path.rglob('*'), key=lambda p: p.relative_to(path)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path.rglob('*') descends into everything. In CI this might be stable depending on the local clone state, but maybe not depending on generated Python cache files => consider filtering.

"""Compute a combined SHA256 hash of all files in a directory."""
h = sha256()
for file_path in sorted(path.rglob('*'), key=lambda p: p.relative_to(path)):
if file_path.is_file():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example:

Suggested change
if file_path.is_file():
if file_path.is_file() and file_patch.suffix() != ".pyc" and not any(p.startswith('.') or p == '__pycache__' for p in file_path.parts):

(dirty+heavy+not tested, there's probably a much better way to achieve the same)

Comment on lines +94 to +96
'agent_requirements.in': hash_file(DIRECT_DEP_FILE),
'.github/workflows/resolve-build-deps.yaml': hash_file(WORKFLOW_FILE),
'.builders': hash_directory(BUILDER_DIR),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keys 'agent_requirements.in', '.github/workflows/resolve-build-deps.yaml', '.builders' are string literals unrelated to the actual constants DIRECT_DEP_FILE, WORKFLOW_FILE, BUILDER_DIR.
If those constants are ever changed (e.g., the workflow is renamed), the metadata keys won't update and the future check comparing hashes will silently mismatch.

Consider deriving the keys from the constants:

DIRECT_DEP_FILE.relative_to(REPO_DIR).as_posix(),  # 'agent_requirements.in'
WORKFLOW_FILE.relative_to(REPO_DIR).as_posix(),  # '.github/workflows/resolve-build-deps.yaml'
BUILDER_DIR.relative_to(REPO_DIR).as_posix(),  # '.builders'

This would also validate that the constants are actually rooted under REPO_DIR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

assert result['.builders'] == upload.hash_directory(builder_dir)


def test_generate_lockfiles_metadata_contains_inputs(tmp_path, patched_input_files, monkeypatch):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patched_input_files is not used in the function.

- Filter __pycache__, .pytest_cache, and .pyc files in hash_directory to
  prevent generated files from making the hash non-deterministic in CI
- Derive dict keys in compute_input_hashes from Path constants via
  .relative_to(REPO_DIR).as_posix() to keep keys in sync with constants
- Update patched_input_files fixture to return (dep_content, workflow_content,
  builder_dir) so both tests use the return value; fixture now also patches
  REPO_DIR and creates the workflow at the correct subpath for key derivation
@temporal-github-worker-1 temporal-github-worker-1 bot dismissed rdesgroppes’s stale review March 19, 2026 10:29

Review from rdesgroppes is dismissed. Related teams and files:

  • agent-build
    • .builders/tests/test_upload.py
    • .builders/upload.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qa/skip-qa Automatically skip this PR for the next QA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants