feat(eval+server): foundations P0b — comparable hash + path layout + daemon port discovery#190
Merged
Merged
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Emit `ORIGIN_LISTENING_ON=<addr>` to stdout (with flush) immediately after TcpListener::bind and before axum::serve, so the eval harness can discover the real ephemeral port when ORIGIN_BIND_ADDR=127.0.0.1:0. Also write the port number to a file when ORIGIN_PORT_FILE env var is set; hard-fails on write error so the harness contract is reliable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…act doc Move the "Listening on http://..." tracing line to after local_addr resolves so operators see the real ephemeral port instead of 127.0.0.1:0 when ORIGIN_BIND_ADDR uses port 0. Add contract doc-comment to comparable_env_hash warning that any change to its input set (add/remove/reorder/re-encode a field) must bump default_schema_version() to prevent old and new baselines appearing comparable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This was referenced May 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 0b of the eval-foundations refactor. Additive only; no behavior change to the HTTP API.
Builds on P0a (PR #178, merged as 46a9703).
What changed
comparable_env_hash(env: &ReportEnv) -> Stringineval/report.rs— sha256[..8] over the comparable subset ofReportEnvfields: fixture_revision, embedder_revision, llm_provider_class, llm_model, mcp_schema_hash (or empty), skill_prompt_hash (or empty), schema_version, schema_db_version (or empty), similarity_fn_name. Excludes layer, variant, task, n_runs, run_id, timestamp, cost/latency fields. Hash contract documented inline — any field-set change must bumpdefault_schema_version().encode_baseline_path(root, env) -> PathBufineval/report.rs— layered path<root>/<layer_dir>/<task>/<variant>__<hash>.json. Panics with named-field message ifenv.layer/task/variantis None (precondition forsave_full_reportcallers in P0c).origin-server/src/main.rs:println!("ORIGIN_LISTENING_ON={}", local_addr)after bind, before serve.ORIGIN_PORT_FILEenv writes resolved port; hard-fails on write error (no silent swallow).tracing::info!("Listening on http://{}", local_addr)updated to use resolvedlocal_addrinstead of bindaddr(otherwise it prints127.0.0.1:0under ephemeral-port mode).crates/origin-core/tests/eval_report_paths.rs— full (layer × task × variant) collision grid (18 distinct paths from 3×2×3) + cross-layer hash parity.crates/origin-server/tests/port_discovery.rs— spawnorigin-serverviaenv!("CARGO_BIN_EXE_origin-server"), verify both stdout printline andORIGIN_PORT_FILEchannels work with 30s timeout + cleanup.Adversarial review (5 findings, 2 fixed pre-merge)
:0under ephemeral mode while new stdout printline showed resolved port — fixed in 845f7d5 (single Listening log line now useslocal_addr).comparable_env_hashhash-input contract was implicit — fixed in 845f7d5 (doc-comment explicitly states "any modification MUST bumpdefault_schema_version()").std::fs::write(ORIGIN_PORT_FILE)not atomic — acceptable for 5-byte port number on local disk; harness retries viaread_to_string().ok().use std::io::Write— stylistic mismatch with rest ofmain.rstop-level imports.ReportEnv.embedder_revisionliteral-identical to L1. P2 plan will add acanonical_embedder_revision()helper.Test plan
cargo clippy --workspace --all-targets --features origin-core/eval-harness -- -D warnings→ cleancargo test --workspace --features origin-core/eval-harness→ 1160 lib tests pass; 4 new integration tests pass (2 path-layout, 2 port-discovery); pre-existingeval::retrieval::tests::test_multi_turn_evalnetwork failure unchangedORIGIN_BIND_ADDR=127.0.0.1:0 ./target/debug/origin-servershows singleListening on http://127.0.0.1:<resolved-port>line + matchingORIGIN_LISTENING_ON=stdoutORIGIN_PORT_FILEwrite contains resolved portFollow-ups for P0c+
comparable_env_hashwill be consumed bysave_full_report(P0c) for filename constructionencode_baseline_pathwill be the canonical save path used by P1 baseline regenerationORIGIN_LISTENING_ON+ORIGIN_PORT_FILEwill be consumed by P2http_harness.rs::DaemonHandle::spawn🤖 Generated with Claude Code