Skip to content

feat(eval+server): foundations P0b — comparable hash + path layout + daemon port discovery#190

Merged
7xuanlu merged 6 commits into
mainfrom
worktree-feature+eval-foundations-p0b
May 25, 2026
Merged

feat(eval+server): foundations P0b — comparable hash + path layout + daemon port discovery#190
7xuanlu merged 6 commits into
mainfrom
worktree-feature+eval-foundations-p0b

Conversation

@7xuanlu
Copy link
Copy Markdown
Owner

@7xuanlu 7xuanlu commented May 25, 2026

Summary

Phase 0b of the eval-foundations refactor. Additive only; no behavior change to the HTTP API.

Builds on P0a (PR #178, merged as 46a9703).

What changed

  • comparable_env_hash(env: &ReportEnv) -> String in eval/report.rs — sha256[..8] over the comparable subset of ReportEnv fields: fixture_revision, embedder_revision, llm_provider_class, llm_model, mcp_schema_hash (or empty), skill_prompt_hash (or empty), schema_version, schema_db_version (or empty), similarity_fn_name. Excludes layer, variant, task, n_runs, run_id, timestamp, cost/latency fields. Hash contract documented inline — any field-set change must bump default_schema_version().
  • encode_baseline_path(root, env) -> PathBuf in eval/report.rs — layered path <root>/<layer_dir>/<task>/<variant>__<hash>.json. Panics with named-field message if env.layer/task/variant is None (precondition for save_full_report callers in P0c).
  • Daemon port advertisement in origin-server/src/main.rs:
    • println!("ORIGIN_LISTENING_ON={}", local_addr) after bind, before serve.
    • Optional ORIGIN_PORT_FILE env writes resolved port; hard-fails on write error (no silent swallow).
    • Existing tracing::info!("Listening on http://{}", local_addr) updated to use resolved local_addr instead of bind addr (otherwise it prints 127.0.0.1:0 under ephemeral-port mode).
  • Integration tests:
    • crates/origin-core/tests/eval_report_paths.rs — full (layer × task × variant) collision grid (18 distinct paths from 3×2×3) + cross-layer hash parity.
    • crates/origin-server/tests/port_discovery.rs — spawn origin-server via env!("CARGO_BIN_EXE_origin-server"), verify both stdout printline and ORIGIN_PORT_FILE channels work with 30s timeout + cleanup.

Adversarial review (5 findings, 2 fixed pre-merge)

  • IMPORTANT: tracing log line was printing :0 under ephemeral mode while new stdout printline showed resolved port — fixed in 845f7d5 (single Listening log line now uses local_addr).
  • IMPORTANT: comparable_env_hash hash-input contract was implicit — fixed in 845f7d5 (doc-comment explicitly states "any modification MUST bump default_schema_version()").
  • NIT: std::fs::write(ORIGIN_PORT_FILE) not atomic — acceptable for 5-byte port number on local disk; harness retries via read_to_string().ok().
  • NIT: inside-function use std::io::Write — stylistic mismatch with rest of main.rs top-level imports.
  • P2 RISK: cross-layer comparable-hash parity depends on L2 runner constructing ReportEnv.embedder_revision literal-identical to L1. P2 plan will add a canonical_embedder_revision() helper.

Test plan

  • cargo clippy --workspace --all-targets --features origin-core/eval-harness -- -D warnings → clean
  • cargo test --workspace --features origin-core/eval-harness → 1160 lib tests pass; 4 new integration tests pass (2 path-layout, 2 port-discovery); pre-existing eval::retrieval::tests::test_multi_turn_eval network failure unchanged
  • Manual smoke: ORIGIN_BIND_ADDR=127.0.0.1:0 ./target/debug/origin-server shows single Listening on http://127.0.0.1:<resolved-port> line + matching ORIGIN_LISTENING_ON= stdout
  • ORIGIN_PORT_FILE write contains resolved port

Follow-ups for P0c+

  • comparable_env_hash will be consumed by save_full_report (P0c) for filename construction
  • encode_baseline_path will be the canonical save path used by P1 baseline regeneration
  • ORIGIN_LISTENING_ON + ORIGIN_PORT_FILE will be consumed by P2 http_harness.rs::DaemonHandle::spawn

🤖 Generated with Claude Code

7xuanlu and others added 6 commits May 25, 2026 01:31
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Emit `ORIGIN_LISTENING_ON=<addr>` to stdout (with flush) immediately after
TcpListener::bind and before axum::serve, so the eval harness can discover
the real ephemeral port when ORIGIN_BIND_ADDR=127.0.0.1:0. Also write the
port number to a file when ORIGIN_PORT_FILE env var is set; hard-fails on
write error so the harness contract is reliable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…act doc

Move the "Listening on http://..." tracing line to after local_addr resolves
so operators see the real ephemeral port instead of 127.0.0.1:0 when
ORIGIN_BIND_ADDR uses port 0.

Add contract doc-comment to comparable_env_hash warning that any change to
its input set (add/remove/reorder/re-encode a field) must bump
default_schema_version() to prevent old and new baselines appearing comparable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@7xuanlu 7xuanlu merged commit 032ce63 into main May 25, 2026
9 checks passed
@7xuanlu 7xuanlu deleted the worktree-feature+eval-foundations-p0b branch May 25, 2026 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant