Skip to content

Cloud SLayer mode end-to-end (mirror local) + consolidate the OTF artifact lifecycle (DEV-1468)#2

Merged
ZmeiGorynych merged 4 commits into
mainfrom
egor/dev-1468-cloud-slayer-mode-end-to-end-mirror-local-consolidate-the
May 26, 2026
Merged

Cloud SLayer mode end-to-end (mirror local) + consolidate the OTF artifact lifecycle (DEV-1468)#2
ZmeiGorynych merged 4 commits into
mainfrom
egor/dev-1468-cloud-slayer-mode-end-to-end-mirror-local-consolidate-the

Conversation

@ZmeiGorynych
Copy link
Copy Markdown
Member

@ZmeiGorynych ZmeiGorynych commented May 24, 2026

Closes DEV-1468. Extends/supersedes DEV-1467 with the design agreed in a /spec session, plus a consolidation of the OTF artifact lifecycle. Two Codex review rounds (plan + tests) folded in.

What & why

Make all three SLayer combos run in the cloud runner (bird-interact-cloud) by mirroring the local path, and consolidate the over-engineered OTF artifact lifecycle to presence-gated reuse + explicit force-wipe. The unifying lever is removing fingerprint gating (the fingerprint stays as provenance). Then local == cloud by construction: upload an artifact → it's present → it's reused, with no cloud-specific OTF code.

The three combos, all working:

  1. pre-encoded + slayer + {claude_sdk | pydantic_ai | pydantic_ai_recursive | mcp_agent | agno | smolagents} → committed slayer_models/<db>/.
  2. on-the-fly + slayer + a-interact + pydantic_ai_recursive → phase-1-3 cache slayer_otf_cache/<db>/.
  3. on-the-fly + slayer + a-interact + pydantic_ai_otf_encode → durable reference slayer_models_otf/<db>/.

OTF lifecycle consolidation (slayer_otf/)

  • Cache: single authoritative dir per DB (drop the <fp> path level); reuse gated on the _cache_fp.txt marker. ensure_db_cache gains force; a markerless/old-layout target is rmtree'd then atomic-renamed (handles migration from the old <db>/<fp>/ layout with no crash).
  • Reference: the reuse/marker check moves to the top of ensure_db_reference, before ensure_db_cache — load-bearing for cloud combo 3, which downloads only the reference (never the cache). _load_reference_entry is self-contained (loads kb_rows from disk; raises on missing/corrupt, accepts []); reuse emits a one-line provenance WARNING.
  • --otf-rebuild (renamed from --otf-rebuild-reference, kept as a hidden alias) force-wipes both layers for both on-the-fly frameworks via run._maybe_force_wipe_otf.

paths + agents

  • slayer_models_root / new slayer_otf_cache_root / slayer_models_otf_root each gain a BIRD_*_ROOT env override (harmless locally; the cloud actor uses them to point at /data/...). Both agents drop their duplicated _otf_cache_root.

Cloud (cloud/* + Dockerfile.cloud + cluster.yaml.j2)

  • gcs: upload_dir_prefix + download_prefix (concurrent_download_prefix is now a thin wrapper) + slayer_artifact_name.
  • cli: accept slayer; add --slayer-setup / --slayer-storage-root; validate the (framework, query_mode, mode, slayer_setup) tuple via run._validate_slayer_setup.
  • driver: upload the combo's dir per selected DB (pre-encoded from the submitted worktree; OTF from the main-checkout paths.*), a fail-fast presence check before build/push/cluster, OPENAI_API_KEY delivery for slayer, manifest + resubmit plumbing, and no re-upload on resubmit.
  • ray_app: delete the ephemeral-server stub; download_slayer_setup once per worker process (atomic, root-level .download_complete marker); thread slayer_setup through to run_one_task.
  • prereqs: require OPENAI_API_KEY in slayer mode.
  • image/Dockerfile: stop baking slayer_models/ (uploaded at submit); drop it + the slayer-ingest version from data_hash and the dirty-input set. mini-interact + audited_gold stay baked.
  • cluster template: export the three BIRD_*_ROOT env vars.

Decisions (confirmed in the /spec session)

  • Cache migration: rmtree-stale-then-rename (no first-run crash).
  • Submit presence: marker for OTF, non-empty dir for pre-encoded.
  • Upload scope: per selected DB (pre-encoded too).
  • Download marker: root-level.
  • Drop the slayer-ingest component from data_hash.
  • Upload roots: pre-encoded from worktree, OTF from main checkout (mirrors how each combo reads locally).

Testing

  • Tests written first (TDD), then Codex-reviewed against the plan; gaps folded in.
  • Full non-integration suite: 1026 passed, 92 skipped.
  • Integration tier (-m integration): 19 passed.
  • Ruff clean on all changed files.
  • Merged origin/main (DEV-1466) — fast-forward + clean stash-pop, no hand-resolved conflicts.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • New CLI flags: --slayer-setup (pre-encoded|on-the-fly) and --slayer-storage-root; run submission uploads selected SLayer artifacts and workers download them in-cluster.
    • Added --otf-rebuild to force on-the-fly cache/reference refresh before runs.
    • Cluster/job env vars pin in-container artifact/cache roots; run IDs truncated to fit GCE naming limits.
  • Bug Fixes

    • API-key validation now requires OPENAI_API_KEY for slayer mode; submit fails fast on missing artifacts.
  • Tests

    • Extensive new/updated tests covering slayer upload/download, cache/reference reuse, CLI parsing, and image/data-hash behavior.

Review Change Stack

…lifecycle (DEV-1468)

Make all three SLayer combos run in the cloud runner by mirroring the local
path, and consolidate the on-the-fly artifact lifecycle to presence-gated
reuse + explicit force-wipe (remove fingerprint *gating*; keep the fingerprint
as provenance). "Local == cloud by construction": upload an artifact → it's
present → it's reused, with no cloud-specific OTF code.

OTF lifecycle (cache.py / reference_build.py):
- Cache is a single authoritative dir per DB (drop the <fp> level); reuse is
  gated on the _cache_fp.txt completeness marker. ensure_db_cache gains force;
  a markerless/old-layout target is rmtree'd then atomic-renamed (migration).
- ensure_db_reference checks the marker BEFORE ensure_db_cache (load-bearing
  for cloud combo 3, which downloads only the reference). _load_reference_entry
  is self-contained (kb_rows from disk; raise on missing/corrupt, accept []);
  reuse emits a provenance WARNING.
- --otf-rebuild (renamed from --otf-rebuild-reference, kept as a hidden alias)
  force-wipes BOTH layers for both on-the-fly frameworks via _maybe_force_wipe_otf.

paths.py: slayer_models_root / slayer_otf_cache_root (new) / slayer_models_otf_root
gain BIRD_*_ROOT env overrides; both agents drop their duplicated _otf_cache_root.

Cloud (gcs/cli/driver/ray_app/prereqs/image + Dockerfile.cloud + cluster.yaml.j2):
- gcs: upload_dir_prefix + download_prefix (concurrent_download_prefix is now a
  thin wrapper) + slayer_artifact_name.
- cli: accept slayer; --slayer-setup / --slayer-storage-root; validate the tuple
  via run._validate_slayer_setup.
- driver: per-selected-DB upload (pre-encoded from the worktree, OTF from the
  main-checkout paths), fail-fast presence check before build/cluster, OPENAI
  key delivery for slayer, manifest + resubmit plumbing, no re-upload on resubmit.
- ray_app: delete the ephemeral-server stub; download_slayer_setup once per
  worker (atomic, root-level marker); thread slayer_setup through run_one_task.
- prereqs: require OPENAI_API_KEY in slayer mode.
- image/Dockerfile: stop baking slayer_models (uploaded at submit); drop it +
  the slayer-ingest version from data_hash and the dirty-input set.
- cluster template: export the three BIRD_*_ROOT env vars.

Full non-integration suite + integration tier green; merged origin/main (DEV-1466).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 24, 2026

📝 Walkthrough

Walkthrough

Removes baked SLayer artifacts from images and implements run-time per-run GCS upload/download of SLayer setups, adds env-overridable slayer path helpers, refactors OTF cache/reference reuse to marker-based layouts, wires CLI/driver slayer submit options and validation, and updates Ray actor initialization and tests.

Changes

SLayer Setup Delivery: Docker-less Runtime Models with Marker-Based OTF Cache

Layer / File(s) Summary
Path infrastructure for environment overrides
src/bird_interact_agents/paths.py, tests/test_paths.py
slayer_models_root(), new slayer_otf_cache_root(), and slayer_models_otf_root() now honor environment overrides (BIRD_SLAYER_MODELS_ROOT, BIRD_OTF_CACHE_ROOT, BIRD_SLAYER_MODELS_OTF_ROOT) for in-cluster artifact root resolution.
Dockerfile and image data-hash update
Dockerfile.cloud, src/bird_interact_agents/cloud/image.py, tests/cloud/test_image.py
Dockerfile no longer bakes slayer_models/; data_hash and dirty-path detection exclude slayer inputs and ingest-version; tests validate unchanged data-hash on slayer edits and guard Dockerfile content.
GCS upload/download infrastructure
src/bird_interact_agents/cloud/gcs.py, tests/cloud/test_gcs.py
Adds download_prefix and upload_dir_prefix for arbitrary GCS prefix operations, concurrent_download_prefix becomes a thin wrapper, and SLayer artifact naming helpers (slayer_artifact_name, slayer_setup_prefix) added with round-trip tests.
Agent cache root migration
src/bird_interact_agents/agents/pydantic_ai_otf_encode/agent.py, src/bird_interact_agents/agents/pydantic_ai_recursive/agent.py
Removes local _otf_cache_root() helpers and switches agents to use paths.slayer_otf_cache_root() for centralized OTF cache resolution.
OTF cache materialization & marker-based reuse
src/bird_interact_agents/slayer_otf/cache.py, tests/test_slayer_otf_cache.py
Refactors OTF cache to single authoritative <cache_root>/<db>/ directories, adds _cache_fp.txt marker and _load_cache_entry, ensure_db_cache(..., force=False) implements marker-gated reuse and atomic tmp-dir rename with race handling; tests cover happy path, reuse, migration, atomicity, and concurrency.
Reference build reuse & purge changes
src/bird_interact_agents/slayer_otf/reference_build.py, tests/test_slayer_otf_reference_build.py
Reference reuse relies on _reference_fp.txt plus _kb_rows.json loader, _reuse_reference warns about non-checked fingerprints, ensure_db_reference checks marker early, and purge_caches added; tests cover reuse, integrity, concurrency, and purge behavior.
Cloud CLI, prereqs, and driver submit wiring
src/bird_interact_agents/cloud/cli.py, src/bird_interact_agents/cloud/prereqs.py, src/bird_interact_agents/cloud/driver.py, tests/cloud/test_cli.py, tests/cloud/test_prereqs.py, tests/cloud/test_driver.py
Adds --slayer-setup and --slayer-storage-root flags, replaces rejection with validation, requires OPENAI_API_KEY in slayer query_mode, driver performs fail-fast local SLayer artifact checks, uploads per-DB setup to runs/<run_id>/slayer_setup/<artifact>/<db>/, includes slayer fields in manifests, and forwards flags to worker jobs and resubmit; tests added for parsing, validation, API keys, uploads, dedup, and resubmit behavior.
In-cluster Ray runtime once-per-worker download
src/bird_interact_agents/cloud/ray_app.py, src/bird_interact_agents/cloud/cluster.yaml.j2, tests/cloud/test_ray_app.py, tests/cloud/test_cluster.py
Removes per-task ephemeral SLayer server lifecycle; adds download_slayer_setup to download uploaded artifacts into env-override roots with .download_complete marker and atomic rename; _LocalActor and WorkerActor call it once during init when query_mode == "slayer"; per-task wiring passes slayer combo into task runner; tests validate idempotency, rename races, marker behavior, and cluster env vars.
Cloud test fixtures and cleanup
tests/cloud/conftest.py
Removes baked_slayer_dbs fixture and updates fake repo Dockerfile fixture to stop baking slayer_models/ and slayer ingest, keeping mini-interact/ and audited_gold/ in the image.
OTF rebuild flag consolidation & purge helper
src/bird_interact_agents/run.py, tests/test_slayer_setup_flag.py, tests/test_otf_rebuild_wiring.py
Renames --otf-rebuild-reference to --otf-rebuild, adds _maybe_force_wipe_otf to purge both cache and reference layers for OTF-capable frameworks when enabled, and wires the flag through run_evaluation; tests validate plumbing and behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hops and whispers through the GCS cloud,
Models now download, no longer baked in shroud,
Markers bloom where fingerprints once dwelled,
Workers meet their setup as bells are knelled.
Path roots bend to env's gentle command,
One per process, across the distributed land! 🌱

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.87% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing cloud SLayer mode end-to-end while mirroring local behavior and consolidating OTF artifact lifecycle per DEV-1468.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch egor/dev-1468-cloud-slayer-mode-end-to-end-mirror-local-consolidate-the

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/bird_interact_agents/cloud/image.py (1)

218-226: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Re-include slayer_models/ in dirty-worktree checks

The dirty-path filter no longer checks slayer_models/, so cloud submits can bypass DirtyWorktreeError even with uncommitted SLayer changes.

Suggested fix
     input_prefixes = (
         "src/",
+        "slayer_models/",
         "audited_gold/",
         "uv.lock",
         "pyproject.toml",
         "Dockerfile.cloud",
     )

As per coding guidelines, "Ensure worktree is clean (only src/, slayer_models/, audited_gold/, uv.lock, pyproject.toml, Dockerfile.cloud are checked) before submitting cloud jobs to avoid DirtyWorktreeError".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bird_interact_agents/cloud/image.py` around lines 218 - 226, The
dirty-path filter tuple input_prefixes (in
src/bird_interact_agents/cloud/image.py) was changed and no longer includes
"slayer_models/", allowing uncommitted SLayer changes to slip past
DirtyWorktreeError; restore "slayer_models/" into the input_prefixes tuple
(alongside "src/", "audited_gold/", "uv.lock", "pyproject.toml",
"Dockerfile.cloud") so the worktree cleanliness check includes that directory
again.
src/bird_interact_agents/cloud/gcs.py (1)

190-205: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce exact-prefix matching and block path traversal in download_prefix.

list_blobs(prefix=prefix) + rel = blob.name[len(prefix):] can incorrectly include sibling objects when callers pass a prefix without trailing / (e.g., .../db also matches .../db2). Also, rel is not sanitized for .., so a crafted blob key can write outside dest.

Suggested fix
-from pathlib import Path
+from pathlib import Path, PurePosixPath
@@
 def download_prefix(
@@
-    def _one(blob: Any) -> None:
-        # Strip the prefix AND any leading slash, so a prefix passed without a
-        # trailing slash (e.g. ".../db") doesn't yield an absolute "/x.yaml"
-        # that would escape `dest`.
-        rel = blob.name[len(prefix):].lstrip("/")
+    normalized_prefix = prefix.rstrip("/")
+    list_prefix = f"{normalized_prefix}/"
+    dest_root = dest.resolve()
+
+    def _one(blob: Any) -> None:
+        rel = blob.name[len(list_prefix):]
         if not rel:
             return
-        target = dest / rel
+        rel_path = PurePosixPath(rel)
+        if any(part == ".." for part in rel_path.parts):
+            raise ValueError(f"unsafe blob name outside prefix: {blob.name}")
+        target = (dest_root / Path(*rel_path.parts)).resolve()
+        target.relative_to(dest_root)
         target.parent.mkdir(parents=True, exist_ok=True)
         data = blob.download_as_bytes()
         target.write_bytes(data)
 
-    blobs = list(bucket.list_blobs(prefix=prefix))
+    blobs = list(bucket.list_blobs(prefix=list_prefix))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bird_interact_agents/cloud/gcs.py` around lines 190 - 205, The
download_prefix logic must enforce exact-directory matches and block path
traversal: when iterating bucket.list_blobs(prefix=prefix) only accept blobs
whose blob.name equals prefix or startswith prefix + "/" (so a prefix "x/db"
doesn't match "x/db2"), compute rel as the slice after the validated separator,
then sanitize rel by rejecting or normalizing any path components that contain
".." or absolute paths (e.g., use pathlib.Path(rel).parts and ensure no ".." and
no leading "/" before joining to dest). Update the _one handler and the
blob-filtering loop to apply these checks (references: prefix, blob.name, rel,
dest, _one, bucket.list_blobs) and return or raise on invalid/suspicious blob
names.
src/bird_interact_agents/cloud/ray_app.py (1)

313-324: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ensure per-task temp dirs are cleaned even when GCS writes fail.

Right now cleanup runs only after write_row/write_log succeed. If either call raises, cloud_log_* leaks. Wrap GCS writes in a try/finally with unconditional rmtree.

Proposed fix
-    _gcs.write_row(run_id, iid, attempt, row, client=gcs_client)
-    try:
-        log_bytes = log_tmp.read_bytes() if log_tmp.exists() else b""
-    except OSError:
-        log_bytes = b""
-    if log_bytes:
-        _gcs.write_log(run_id, iid, attempt, log_bytes, client=gcs_client)
-    # Now safe to drop the log tmp dir.
-    shutil.rmtree(log_dir, ignore_errors=True)
+    try:
+        _gcs.write_row(run_id, iid, attempt, row, client=gcs_client)
+        try:
+            log_bytes = log_tmp.read_bytes() if log_tmp.exists() else b""
+        except OSError:
+            log_bytes = b""
+        if log_bytes:
+            _gcs.write_log(run_id, iid, attempt, log_bytes, client=gcs_client)
+    finally:
+        # Always drop the per-task tmp dir, even on write failures.
+        shutil.rmtree(log_dir, ignore_errors=True)
     return iid
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bird_interact_agents/cloud/ray_app.py` around lines 313 - 324, The temp
log directory (log_dir / log_tmp) is only removed after successful
_gcs.write_row/_gcs.write_log, causing leaks on GCS failures; wrap the GCS write
logic in a try/finally: perform _gcs.write_row(...) and the conditional
read/_gcs.write_log(...) inside the try, and in the finally unconditionally call
shutil.rmtree(log_dir, ignore_errors=True) so the per-task temp dirs are always
cleaned even if write_row or write_log raises; reference the existing symbols
_gcs.write_row, _gcs.write_log, log_tmp, log_dir, gcs_client, run_id, iid,
attempt, and row when locating the change.
🧹 Nitpick comments (2)
tests/cloud/test_driver.py (2)

673-673: ⚡ Quick win

Tighten fail-fast assertions to avoid false positives.

Using pytest.raises(Exception) here can pass on unrelated failures. Please assert a specific exception type (or at least a stable message fragment) to keep these tests pinned to the intended fail-fast contract.

Also applies to: 692-692, 710-710, 727-727

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/cloud/test_driver.py` at line 673, Replace the broad
pytest.raises(Exception) usages in tests/cloud/test_driver.py (the raises call
at the shown locations) with a tighter assertion: either specify the concrete
exception class that the submit-time failure should raise (e.g., SubmitError or
the actual exception class thrown by the submit code) or keep
pytest.raises(Exception) but add a stable match="..." argument that asserts a
distinctive substring of the expected error message; update all occurrences (the
raises at lines indicated: 673, 692, 710, 727) so each uses the specific
exception type or pytest.raises(..., match="expected fragment") to avoid false
positives.

840-845: ⚡ Quick win

Assert the --slayer-storage-root value on resubmit, not just presence.

This test currently validates flag presence but not the propagated value, leaving a gap in manifest→job-args verification.

Suggested assertion addition
     job_args = mocks["cluster"].submit_job.call_args.kwargs["args"]
     assert "--slayer-setup" in job_args
     assert job_args[job_args.index("--slayer-setup") + 1] == "on-the-fly"
     assert "--slayer-storage-root" in job_args
+    assert job_args[job_args.index("--slayer-storage-root") + 1] == "/data/slayer_models"
     # The crux: resubmit must NOT re-upload the setup.
     mocks["gcs"].upload_dir_prefix.assert_not_called()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/cloud/test_driver.py` around lines 840 - 845, Assert the actual value
passed for "--slayer-storage-root" on resubmit instead of only checking
presence: find its index in job_args (job_args =
mocks["cluster"].submit_job.call_args.kwargs["args"]) and assert
job_args[job_args.index("--slayer-storage-root") + 1] == expected_storage_root,
where expected_storage_root is the storage-root value set earlier in the
test/manifest (or the variable used to construct the job args); keep the
existing check that mocks["gcs"].upload_dir_prefix.assert_not_called() to ensure
no re-upload.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/bird_interact_agents/cloud/driver.py`:
- Around line 82-96: The function read_api_keys_from_local_env builds a set of
required keys via prereqs._required_api_keys and then silently returns only
those present in os.environ; instead, compute the set of missing keys (needed
minus keys in os.environ), and if any are missing raise an explicit error (e.g.,
RuntimeError or ValueError) listing the missing key names so callers fail fast;
keep the existing logic for adding "OPENAI_API_KEY" when query_mode == "slayer",
but after that validate presence and only then return the dict of env values for
the required keys.

In `@src/bird_interact_agents/slayer_otf/cache.py`:
- Around line 291-305: When os.rename(tmp_dir, target) fails but
marker.is_file() indicates a peer won the race, don't return the local fp and
kb_rows; instead read the on-disk metadata from the committed target and return
that. In the except OSError branch (around tmp_dir/target/marker handling) after
confirming marker.is_file(), load the fingerprint and kb_rows from the existing
target directory and return CacheEntry(cache_dir=target, fingerprint=<disk_fp>,
kb_rows=<disk_kb_rows>) rather than returning the local fp/kb_rows; keep the
existing cleanup of tmp_dir (shutil.rmtree(tmp_dir, ignore_errors=True)).

In `@src/bird_interact_agents/slayer_otf/reference_build.py`:
- Around line 476-484: The code in ensure_db_reference currently holds
_get_lock(db) while calling ensure_db_cache (which itself acquires the same
lock), causing a self-deadlock; fix by not calling ensure_db_cache while inside
async with _get_lock(db): instead, under the lock only check marker.is_file()
and return _reuse_reference if set, otherwise exit the lock, call
ensure_db_cache (outside the lock), then re-acquire _get_lock(db) and re-check
marker.is_file() before proceeding with the build; update ensure_db_reference to
follow this release-call-reacquire pattern so ensure_db_cache never attempts to
obtain the same lock recursively.

---

Outside diff comments:
In `@src/bird_interact_agents/cloud/gcs.py`:
- Around line 190-205: The download_prefix logic must enforce exact-directory
matches and block path traversal: when iterating
bucket.list_blobs(prefix=prefix) only accept blobs whose blob.name equals prefix
or startswith prefix + "/" (so a prefix "x/db" doesn't match "x/db2"), compute
rel as the slice after the validated separator, then sanitize rel by rejecting
or normalizing any path components that contain ".." or absolute paths (e.g.,
use pathlib.Path(rel).parts and ensure no ".." and no leading "/" before joining
to dest). Update the _one handler and the blob-filtering loop to apply these
checks (references: prefix, blob.name, rel, dest, _one, bucket.list_blobs) and
return or raise on invalid/suspicious blob names.

In `@src/bird_interact_agents/cloud/image.py`:
- Around line 218-226: The dirty-path filter tuple input_prefixes (in
src/bird_interact_agents/cloud/image.py) was changed and no longer includes
"slayer_models/", allowing uncommitted SLayer changes to slip past
DirtyWorktreeError; restore "slayer_models/" into the input_prefixes tuple
(alongside "src/", "audited_gold/", "uv.lock", "pyproject.toml",
"Dockerfile.cloud") so the worktree cleanliness check includes that directory
again.

In `@src/bird_interact_agents/cloud/ray_app.py`:
- Around line 313-324: The temp log directory (log_dir / log_tmp) is only
removed after successful _gcs.write_row/_gcs.write_log, causing leaks on GCS
failures; wrap the GCS write logic in a try/finally: perform _gcs.write_row(...)
and the conditional read/_gcs.write_log(...) inside the try, and in the finally
unconditionally call shutil.rmtree(log_dir, ignore_errors=True) so the per-task
temp dirs are always cleaned even if write_row or write_log raises; reference
the existing symbols _gcs.write_row, _gcs.write_log, log_tmp, log_dir,
gcs_client, run_id, iid, attempt, and row when locating the change.

---

Nitpick comments:
In `@tests/cloud/test_driver.py`:
- Line 673: Replace the broad pytest.raises(Exception) usages in
tests/cloud/test_driver.py (the raises call at the shown locations) with a
tighter assertion: either specify the concrete exception class that the
submit-time failure should raise (e.g., SubmitError or the actual exception
class thrown by the submit code) or keep pytest.raises(Exception) but add a
stable match="..." argument that asserts a distinctive substring of the expected
error message; update all occurrences (the raises at lines indicated: 673, 692,
710, 727) so each uses the specific exception type or pytest.raises(...,
match="expected fragment") to avoid false positives.
- Around line 840-845: Assert the actual value passed for
"--slayer-storage-root" on resubmit instead of only checking presence: find its
index in job_args (job_args =
mocks["cluster"].submit_job.call_args.kwargs["args"]) and assert
job_args[job_args.index("--slayer-storage-root") + 1] == expected_storage_root,
where expected_storage_root is the storage-root value set earlier in the
test/manifest (or the variable used to construct the job args); keep the
existing check that mocks["gcs"].upload_dir_prefix.assert_not_called() to ensure
no re-upload.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0a34dea4-460e-4b62-ae6b-f8695c34f1a1

📥 Commits

Reviewing files that changed from the base of the PR and between 64c4c8d and 176dea3.

📒 Files selected for processing (27)
  • Dockerfile.cloud
  • src/bird_interact_agents/agents/pydantic_ai_otf_encode/agent.py
  • src/bird_interact_agents/agents/pydantic_ai_recursive/agent.py
  • src/bird_interact_agents/cloud/cli.py
  • src/bird_interact_agents/cloud/cluster.yaml.j2
  • src/bird_interact_agents/cloud/driver.py
  • src/bird_interact_agents/cloud/gcs.py
  • src/bird_interact_agents/cloud/image.py
  • src/bird_interact_agents/cloud/prereqs.py
  • src/bird_interact_agents/cloud/ray_app.py
  • src/bird_interact_agents/paths.py
  • src/bird_interact_agents/run.py
  • src/bird_interact_agents/slayer_otf/cache.py
  • src/bird_interact_agents/slayer_otf/reference_build.py
  • tests/cloud/conftest.py
  • tests/cloud/test_cli.py
  • tests/cloud/test_cluster.py
  • tests/cloud/test_driver.py
  • tests/cloud/test_gcs.py
  • tests/cloud/test_image.py
  • tests/cloud/test_prereqs.py
  • tests/cloud/test_ray_app.py
  • tests/test_otf_rebuild_wiring.py
  • tests/test_paths.py
  • tests/test_slayer_otf_cache.py
  • tests/test_slayer_otf_reference_build.py
  • tests/test_slayer_setup_flag.py

Comment thread src/bird_interact_agents/cloud/driver.py
Comment thread src/bird_interact_agents/slayer_otf/cache.py
Comment thread src/bird_interact_agents/slayer_otf/reference_build.py Outdated
…ss) + GCE run-id cap (DEV-1468)

Process-reviews (CodeRabbit + Codex) on PR #2, plus a GC-smoke-discovered bug:

- CRITICAL self-deadlock (CodeRabbit): ensure_db_reference held _get_lock(db)
  then called ensure_db_cache, which re-acquires the SAME non-reentrant lock —
  hanging on any fresh/--otf-rebuild reference build. Moved ensure_db_cache
  OUTSIDE the lock (kept the top reuse check so cloud combo-3 still never calls
  it). Added a regression test that wraps the fake cache to take the real lock
  (would hang under the old code; the unit tests masked it by mocking the cache).
- cache.py rename-race (CodeRabbit): the OSError race-success branch now returns
  _load_cache_entry(target) — the winner's on-disk fp/kb_rows — so CacheEntry
  matches cache_dir instead of our local values.
- ray_app.download_slayer_setup (Codex): a missing/empty GCS prefix is no longer
  cached as .download_complete — raise instead, so a failed/partial upload
  surfaces and a retry can still run.
- driver.read_api_keys_from_local_env (CodeRabbit): fail fast (PrereqError) on a
  missing required key instead of silently dropping it — resubmit skips prereq
  checks, so this otherwise surfaced as opaque per-actor auth failures.
- driver.mint_run_id: cap the framework slug so Ray's GCE node name
  ray-<run_id>-worker-<uuid> stays under the 63-char instance-name limit. The
  long pydantic_ai_otf_encode / pydantic_ai_recursive slugs (19 chars) tripped
  Ray's assertion at `ray up` once DEV-1468 made them cloud-submittable in
  slayer mode (a real GC-smoke failure). + regression test.

Codex CX[A] (root-level .download_complete not run-scoped) accepted as-is:
fresh-VM-per-run + per-combo-fixed roots mean it never persists across runs.

Full non-integration suite (1030) + integration tier (19) green; ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/bird_interact_agents/cloud/driver.py (1)

20-22: ⚡ Quick win

Bind _required_api_keys outside the monkeypatched driver.prereqs module.

Only PrereqError is pinned to the real module right now. read_api_keys_from_local_env() still resolves _required_api_keys through driver.prereqs, so the submit/resubmit tests that replace driver.prereqs with a mock stop exercising provider-key selection and can miss silently dropped env vars.

♻️ Suggested change
-from bird_interact_agents.cloud.prereqs import PrereqError
+from bird_interact_agents.cloud.prereqs import PrereqError, _required_api_keys
@@
-        needed.update(prereqs._required_api_keys(model))
+        needed.update(_required_api_keys(model))

Also applies to: 97-119

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bird_interact_agents/cloud/driver.py` around lines 20 - 22, Bind the
required keys list from the real prereqs module instead of resolving it through
driver.prereqs at call time: import or read _required_api_keys from
bird_interact_agents.cloud.prereqs into the top-level of driver.py (alongside
the already pinned PrereqError) and update read_api_keys_from_local_env to
reference that top-level _required_api_keys rather than
driver.prereqs._required_api_keys so tests that monkeypatch driver.prereqs still
exercise provider-key selection and env-var validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/bird_interact_agents/cloud/driver.py`:
- Around line 20-22: Bind the required keys list from the real prereqs module
instead of resolving it through driver.prereqs at call time: import or read
_required_api_keys from bird_interact_agents.cloud.prereqs into the top-level of
driver.py (alongside the already pinned PrereqError) and update
read_api_keys_from_local_env to reference that top-level _required_api_keys
rather than driver.prereqs._required_api_keys so tests that monkeypatch
driver.prereqs still exercise provider-key selection and env-var validation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5ba38dba-8ef7-4022-9301-1a2d5d8c1b9a

📥 Commits

Reviewing files that changed from the base of the PR and between 176dea3 and 40d4fec.

📒 Files selected for processing (8)
  • src/bird_interact_agents/cloud/driver.py
  • src/bird_interact_agents/cloud/ray_app.py
  • src/bird_interact_agents/slayer_otf/cache.py
  • src/bird_interact_agents/slayer_otf/reference_build.py
  • tests/cloud/test_driver.py
  • tests/cloud/test_ray_app.py
  • tests/test_slayer_otf_cache.py
  • tests/test_slayer_otf_reference_build.py

…not Ray's 55)

The previous cap was computed from Ray's internal `name_label <= 55`
assertion, but Ray composes the actual GCE name as
`ray-<run_id>-worker-<uuid8>-compute` which adds 17 more chars — and
GCP's compute-instance regex caps the FULL name at 63
(`[a-z]([-a-z0-9]{0,61}[a-z0-9])?`). Ray's 55 passed for a 50-char label
while GCP rejected the resulting 66-char name with HTTP 400, blocking
`ray up`. Compute the slug cap from GCP's real 63, and test the FULL
worst-case name (worker + compute suffix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/bird_interact_agents/cloud/driver.py (1)

170-188: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on unmapped instance_ids before deriving DB uploads.

_dbs_for_instances() silently drops requested IDs that are missing from mini_interact.jsonl. In slayer mode that means _check_slayer_setup_present() and _upload_slayer_setup() both run against an incomplete DB set, so a submit can proceed without uploading setup for some requested instances and only fail later in-cluster.

💡 Suggested fix
 def _dbs_for_instances(instance_ids) -> list[str]:
@@
-    wanted = set(instance_ids)
+    wanted = set(instance_ids)
+    matched: set[str] = set()
     dbs: set[str] = set()
     with paths.mini_interact_data_file().open() as f:
         for line in f:
@@
-            td = _json.loads(line)
-            if td.get("instance_id") in wanted:
+            td = _json.loads(line)
+            iid = td.get("instance_id")
+            if iid in wanted:
+                matched.add(iid)
                 db = td.get("selected_database")
                 if db:
                     dbs.add(db)
+    missing = sorted(wanted - matched)
+    if missing:
+        raise ValueError(
+            f"cloud slayer: instance_ids not found in mini_interact.jsonl: {missing}"
+        )
     return sorted(dbs)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bird_interact_agents/cloud/driver.py` around lines 170 - 188,
_dbs_for_instances currently drops requested instance_ids that aren't present in
mini_interact.jsonl; modify _dbs_for_instances to track found instance_ids while
reading paths.mini_interact_data_file(), compute missing = wanted - found_ids
after the loop, and if any missing raise a clear exception (e.g., ValueError)
listing the missing instance IDs and referring to the mini_interact.jsonl
source; keep returning the sorted, deduplicated db list when no IDs are missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/bird_interact_agents/cloud/driver.py`:
- Around line 170-188: _dbs_for_instances currently drops requested instance_ids
that aren't present in mini_interact.jsonl; modify _dbs_for_instances to track
found instance_ids while reading paths.mini_interact_data_file(), compute
missing = wanted - found_ids after the loop, and if any missing raise a clear
exception (e.g., ValueError) listing the missing instance IDs and referring to
the mini_interact.jsonl source; keep returning the sorted, deduplicated db list
when no IDs are missing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d012e4fb-83f8-466c-a46d-9c93867273a2

📥 Commits

Reviewing files that changed from the base of the PR and between 40d4fec and 2a141f9.

📒 Files selected for processing (2)
  • src/bird_interact_agents/cloud/driver.py
  • tests/cloud/test_driver.py

- ensure_db_reference now forwards `force` to ensure_db_cache (Codex): a
  programmatic ensure_db_reference(force=True) caller would otherwise reuse a
  stale phase-1-3 cache and encode it into the "fresh" reference. The CLI
  --otf-rebuild was already safe (it purges both layers separately via
  _maybe_force_wipe_otf), but the direct-API contract now matches. Regression
  test asserts ensure_db_cache is called with force=True.
- driver imports `_required_api_keys` by name (alongside PrereqError) so the
  submit/resubmit tests that monkeypatch `driver.prereqs` still exercise the
  real provider→key mapping (CodeRabbit). Otherwise the mock returns an empty
  iterable and key-selection silently no-ops in tests.

CodeRabbit's stale broad-pytest.raises nitpick is from its 2026-05-24 review
of the pre-fix code (already addressed in 40d4fec — all 4 fail-fast tests use
pytest.raises(FileNotFoundError)).

Full suite 1031 + integration 19 green; ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/bird_interact_agents/cloud/driver.py (1)

180-192: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't silently ignore instance_ids that never resolve to a DB.

If the dataset lookup misses any requested instance_id, this helper returns a partial DB list and the new fail-fast SLayer presence/upload path proceeds as if everything is valid. That turns a typo or stale manifest entry into a late in-cluster missing-artifact failure after the image build / cluster bring-up has already started.

Suggested fix
 def _dbs_for_instances(instance_ids) -> list[str]:
@@
-    wanted = set(instance_ids)
+    wanted = set(instance_ids)
+    found: set[str] = set()
     dbs: set[str] = set()
     with paths.mini_interact_data_file().open() as f:
         for line in f:
@@
             td = _json.loads(line)
             if td.get("instance_id") in wanted:
+                found.add(td["instance_id"])
                 db = td.get("selected_database")
                 if db:
                     dbs.add(db)
+    missing = sorted(wanted - found)
+    if missing:
+        raise ValueError(
+            f"instance_ids not found in {paths.mini_interact_data_file()}: {missing}"
+        )
     return sorted(dbs)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bird_interact_agents/cloud/driver.py` around lines 180 - 192, The helper
that reads paths.mini_interact_data_file() currently returns a partial list when
some requested instance_ids are not found; modify the code to detect missing ids
by tracking found_ids (e.g., build a set while iterating using the existing
wanted = set(instance_ids) and when td.get("instance_id") matches add to
found_ids), then compare wanted - found_ids after the loop and raise an explicit
exception (ValueError or a custom error) listing the missing instance_ids so the
caller (e.g., the SLayer presence/upload path) fails fast instead of proceeding
with a partial db list; ensure you still return sorted(dbs) only when no ids are
missing.
src/bird_interact_agents/slayer_otf/reference_build.py (1)

472-497: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Serialize the fast-path reuse check with force=True rebuilds.

The new lock-free reuse path lets a non-force caller return reference_dir while a concurrent force=True caller is still on the rebuild path. Once that force caller reaches _commit_reference(..., force=True), it can rmtree(target) underneath the reuser, so in-process callers can see missing or partially replaced reference files.

Suggested fix
-    # Reuse path FIRST — never builds/touches the cache (load-bearing for cloud
-    # combo 3, which downloads only the reference, never the cache).
-    if not force and marker.is_file():
-        return _reuse_reference(target, db)
+    # Serialize reuse with force rebuilds, but release the lock before touching
+    # the cache because ensure_db_cache() takes the same per-db lock.
+    async with _get_lock(db):
+        if not force and marker.is_file():
+            return _reuse_reference(target, db)
 
     # Materialise (or reuse) the phases-1-3 cache OUTSIDE the reference lock.
@@
     async with _get_lock(db):
         # Double-check under the lock — a peer may have built the reference
         # while we waited (or while ensure_db_cache ran).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/bird_interact_agents/slayer_otf/reference_build.py` around lines 472 -
497, The early lock-free fast-path (the "if not force and marker.is_file():
return _reuse_reference(...)" check) must be removed so reuse is serialized with
concurrent force=True rebuilds; delete that pre-ensure_db_cache return and let
ensure_db_cache run and the later async with _get_lock(db) double-check (which
calls _reuse_reference) decide reuse, so use marker, _reuse_reference,
ensure_db_cache and _get_lock(db) as the referenced symbols to locate and remove
the initial early-return branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/bird_interact_agents/cloud/driver.py`:
- Around line 180-192: The helper that reads paths.mini_interact_data_file()
currently returns a partial list when some requested instance_ids are not found;
modify the code to detect missing ids by tracking found_ids (e.g., build a set
while iterating using the existing wanted = set(instance_ids) and when
td.get("instance_id") matches add to found_ids), then compare wanted - found_ids
after the loop and raise an explicit exception (ValueError or a custom error)
listing the missing instance_ids so the caller (e.g., the SLayer presence/upload
path) fails fast instead of proceeding with a partial db list; ensure you still
return sorted(dbs) only when no ids are missing.

In `@src/bird_interact_agents/slayer_otf/reference_build.py`:
- Around line 472-497: The early lock-free fast-path (the "if not force and
marker.is_file(): return _reuse_reference(...)" check) must be removed so reuse
is serialized with concurrent force=True rebuilds; delete that
pre-ensure_db_cache return and let ensure_db_cache run and the later async with
_get_lock(db) double-check (which calls _reuse_reference) decide reuse, so use
marker, _reuse_reference, ensure_db_cache and _get_lock(db) as the referenced
symbols to locate and remove the initial early-return branch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 235f56bb-cf8b-421c-a1ea-6132556b446d

📥 Commits

Reviewing files that changed from the base of the PR and between 2a141f9 and 46187b3.

📒 Files selected for processing (3)
  • src/bird_interact_agents/cloud/driver.py
  • src/bird_interact_agents/slayer_otf/reference_build.py
  • tests/test_slayer_otf_reference_build.py

@ZmeiGorynych ZmeiGorynych merged commit 7349e96 into main May 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant