Skip to content

fix: thread analysis-level Inputs into manifest input_versions (#90)#91

Merged
EiffL merged 1 commit into
mainfrom
fix/external-input-manifest-fingerprint
Apr 30, 2026
Merged

fix: thread analysis-level Inputs into manifest input_versions (#90)#91
EiffL merged 1 commit into
mainfrom
fix/external-input-manifest-fingerprint

Conversation

@EiffL
Copy link
Copy Markdown
Member

@EiffL EiffL commented Apr 30, 2026

Summary

Fixes #90. Outputs that declared an analysis-level Input (one with source:) failed lc verify with broken_chain because the Snakefile generator never threaded those inputs into the rule's run_rule(inputs=...) dict — so write_manifest had nothing to fingerprint, and the manifest's input_versions came back empty for them.

The two pre-existing input paths (sibling outputs vs. analysis-level Inputs) had grown apart unnecessarily. Collapse them: every declared input becomes a Snakemake input: slot — universe-templated for sibling outputs, literal for source: paths. write_manifest already falls through to fingerprint_external for plain files, so no change is needed in the manifest layer.

Side benefit: Snakemake now enforces existence of external sources at DAG time and reruns rules when their mtime drifts (previously silent staleness).

  • src/lightcone/engine/snakefile.py: drop the external_inputs dict and resolved_inputs rebuild loop; one unified rule_inputs list.
  • tests/test_snakefile.py: regression test that the external source: path appears as a Snakemake input slot AND in the run_rule(inputs=...) dict literal.
  • tests/test_verify.py: end-to-end regression matching the issue scenario — lc verify now passes for an Output backed by an analysis-level Input.

Test plan

  • uv run pytest — 287 passed (existing 285 + 2 new regression tests)
  • uv run ruff check — clean
  • Manual: reproduce issue's scenario — verify now reports ok instead of broken_chain

🤖 Generated with Claude Code

The Snakefile generator split declared inputs into two paths: sibling
outputs reached run_rule's inputs dict (and so were fingerprinted by
write_manifest), while analysis-level Inputs (with source:) went into a
separate external_inputs dict used only for recipe-template substitution.
Result: lc verify reported broken_chain for every Output declaring a
source-backed Input, since the manifest's input_versions never recorded
them.

Collapse the two paths: every declared input becomes a Snakemake input:
slot — universe-templated for sibling outputs, literal for source paths.
write_manifest falls through to fingerprint_external for plain files, so
the manifest layer needs no change. As a side benefit, Snakemake now
enforces external-source existence at DAG time and reruns on mtime
drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@EiffL
Copy link
Copy Markdown
Member Author

EiffL commented Apr 30, 2026

@claude can you review this PR?

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @EiffL's task in 2m 36s —— View job


PR Review

  • Read changed files
  • Analyze fix correctness
  • Check test coverage
  • Post review feedback

Verdict: LGTM — clean fix, well-tested, ready to merge.

Root Cause Diagnosis

The diagnosis is correct. external_inputs was built separately and passed only into render_recipe for template substitution. It was never handed to run_rule(inputs=...), so write_manifest had an empty dict for external sources and input_versions came back empty for them. verify_outputs at verify.py:97 then found recorded = None and raised broken_chain. The fix is the right one.

Fix Correctness

snakefile.py:341-351 — Collapsing to a single rule_inputs list is clean. Both sibling outputs (with {universe}) and external sources (static strings) now flow through the same Snakemake input: slot, which is exactly what the integrity layer expects.

snakefile.py:374-376 — The pat.replace("{universe}", u) for static external paths is a safe no-op: external source: strings don't contain {universe}, so substitution leaves them unchanged. Elegant.

_render_snakefile (lines 262-285) — The generated Snakefile now emits:

union21_table="data/table.txt",

...and threads it into the run: block as "union21_table": Path(input.union21_table). This gives write_manifest a Path("data/table.txt"), which fingerprint_external handles correctly (relative to the Snakemake working directory = project root). Consistent with how sibling-output paths were already handled.

verify.py:107-109 — The verify chain walk is already correct for this case: find_upstream_output returns None for external inputs → continue, skipping the chain walk. What was broken was only the earlier recorded = None guard at line 97, which the fix addresses by ensuring input_versions is populated.

from: aliases and sub-analysis scoperesolve_external_input handles both in tree.py (lines 338-360). The fix propagates correctly to those cases without any additional changes.

Test Coverage

test_external_input_flows_to_manifest — Directly verifies that generate() produces the right Snakefile text: the external source appears as a Snakemake input: slot AND in run_rule's inputs dict literal. Good unit-level regression.

test_verify_passes_with_external_input — End-to-end: calls _materialize with the inputs dict then runs verify_outputs. Correctly proves the manifest has the fingerprint recorded and verify passes. One minor note: _materialize is called with an absolute path (src), while at runtime the Snakemake-generated code would use a relative path — this nuance doesn't affect the correctness of the verify logic being tested, and fingerprint_external works with both.

Together the two tests give good coverage: one exercises generation, the other exercises the verify chain.

Side Benefit

Having external files as Snakemake input: slots means Snakemake now enforces existence at DAG construction time and can detect mtime drift for reruns. Previously, a stale or missing external file would be silently invisible until lc verify was run. This is a genuine improvement in operational safety.

No Issues Found

Nothing blocking. The change is minimal, targeted, and doesn't introduce any new abstractions beyond what's needed to fix the bug.

@github-actions
Copy link
Copy Markdown

✅ Eval Results

Metric Value
Score 1.00
Build complete
Cost $1.73
Turns 63
Duration 526s
lightcone-cli 0.2.1.dev8+gbcc052ec6 (bcc052ec)
Results Download

Graders

✅ spec_valid (1.00)
✅ all_materialized (1.00)

Full output
56-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:24 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:24 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:24 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:24 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:25 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:25 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:25 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:26 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:26 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:26 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:27 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:27 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:27 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:27 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:28 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:22:28 httpx HTTP Request: POST https://proxy.app.daytona.io/toolbox/95b2ec73-0845-4f9a-9156-d0bfa79d9435/files/bulk-upload "HTTP/1.1 200 OK"
12:31:16 lightcone.eval.sandbox Deleted sandbox for trial build-snae-0
  snae trial 0: score=1.00 complete

lightcone-cli: 0.2.1.dev8+gbcc052ec6 (HEAD bcc052ec)

  Eval Results: Scores  
┏━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Task ┃     Score     ┃
┡━━━━━━╇━━━━━━━━━━━━━━━┩
│ snae │ 1.00 +/- 0.00 │
│      │ pass@k: 100%  │
└──────┴───────────────┘

   Eval Results: Cost &   
         Duration         
┏━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Task ┃ Cost / Duration ┃
┡━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ snae │      $1.73      │
│      │      526s       │
└──────┴─────────────────┘

Total: 1 trials, $1.73, 526s

Results saved to: eval-results/build-bcc052ec/results.json

@EiffL EiffL merged commit ae13980 into main Apr 30, 2026
5 of 6 checks passed
@aboucaud aboucaud deleted the fix/external-input-manifest-fingerprint branch May 8, 2026 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lc verify reports broken_chain for analysis-level Inputs missing from manifest input_versions

1 participant