perf(evolution): atomic-worker architecture for parallel proposals (full GEPA parity) by KE7 · Pull Request #34 · KE7/helix

KE7 · 2026-05-13T22:59:53Z

Summary

Replaces helix's three-stage parallel pipeline (Step 1b parent-eval pool → Step 2 LLM pool → Step 3 sequential child-eval) with a single ThreadPoolExecutor whose workers each execute the full GEPA execute_proposal shape atomically — achieving full structural parity with GEPA reflective_mutation.py:268,308,369,420.

Design

Each atomic worker runs:

parent_eval — _cached_evaluate_batch(parent, subsample_ids, None, ...) (bypasses cache, mirrors GEPA RM:268)
skip-perfect check — if all parent scores ≥ threshold → return SkippedResult (mirrors GEPA RM:308-327)
LLM mutation — mutate(parent, eval_for_mutate, new_id, ...) (mirrors GEPA RM:369)
tamper check — _detect_evaluator_tamper(child, manifest, config, project_root) (thread-safe read-only)
child_eval — _cached_evaluate_batch(child, subsample_ids, minibatch_cache, ...) (mirrors GEPA RM:420)

Budget charges and frontier updates remain sequential in the acceptance loop (apply_proposal_output parity, GEPA RM:472).

Source: /Users/ke/helix-gepa-parity-investigation.md §7 D1.

Result type

@dataclass
class _ProposalResult:
    kind: Literal["success", "skipped", "tampered", "llm_failed"]
    presample_ctx: tuple[...]
    parent_eval_result: EvalResult | None
    child: Candidate | None = None
    child_eval_result: EvalResult | None = None
    tampered_paths: list[str] | None = None
    parent_n_uncached: int = 0
    child_n_uncached: int = 0
    child_usage: UsageStats | None = None

Diff stat

File	Insertions	Deletions
`src/helix/evolution.py`	458	444
`tests/unit/test_evolution_minibatch.py`	354	0

Test results

866 passed, 0 failed (860 original + 6 new TestAtomicProposalWorker)
New test class covers: atomicity (same thread), skip-perfect isolation, worker exception isolation, n=3 parallelism, sequential budget charges, tamper-check rejection

E2E validation

Backend	Exit	Cost	Mutations	Accepted	Wall-clock	Errors
Claude	0	$0.11	1 (g1-s1)	1/1 (1.0)	~10s	none
OpenCode	0	$0.00	2 (g1-s1, g1-s2)	2/2 (1.0)	~14s	none

Both runs used num_parallel_proposals=2. The OpenCode run demonstrates the parallelism directly: both worker threads printed "⟳ Running train evaluation…" simultaneously, and both mutations were accepted (both fixed the add_one off-by-one bug independently).

Design invariants preserved

✅ Acceptance order = pre-sampling order (not worker completion order)
✅ Budget charges sequential (acceptance loop only)
✅ Frontier/lineage writes sequential (acceptance loop only)
✅ PromptArtifactCollisionError re-raised from worker (fatal, not swallowed)
✅ max_workers cap applied to single pool (min(n, config.evolution.max_workers))
✅ No force-push, no --no-verify

🤖 Generated with Claude Code

…EPA execute_proposal parity) Replace three-stage pipeline (parent-eval pool → LLM pool → sequential child-eval) with ONE ThreadPoolExecutor whose workers each execute the full GEPA execute_proposal shape atomically: parent_eval (reflective_mutation.py:268) → skip-perfect (reflective_mutation.py:308) → LLM mutate (reflective_mutation.py:369) → tamper check → child_eval (reflective_mutation.py:420).Source: /Users/ke/helix-gepa-parity-investigation.md §7 D1.Budget charges and frontier updates remain sequential (apply_proposal_output parity).Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… atomic worker Five new tests verifying GEPA execute_proposal parity: 1. Parent and child evals on same worker thread (atomicity) 2. Skip-perfect inside worker prevents LLM call 3. Worker LLM exception isolation (one failure doesn't crash pool) 4. n=3 proposals run in parallel (distinct worker threads) 5. Budget charges sequential in acceptance loop Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ch paths Remove the erroneous `_sub_ids is not None` guard added in the Architecture D worker's skip-perfect check (step W3). The guard was based on a misread of the GEPA spec — helix fires skip-perfect on both the minibatch path (where parent_eval comes from _cached_evaluate_batch) and the no-minibatch path (where it comes from _cached_eval on the full train split). Tests: test_perfect_score_skips_mutation_continues_loop ✓ test_perfect_score_does_not_terminate_run ✓ Also add the no-minibatch budget charge in the "skipped" acceptance branch (worker ran _cached_eval; charge must be applied sequentially in the loop). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nion Replace the single `_ProposalResult` dataclass (all fields typed `object`, mypy-opaque) with a proper four-class sealed hierarchy (Option A): _SkippedResult – skip-perfect fired; parent_eval_result: EvalResult _LLMFailedResult – mutate() raised/returned None; parent_eval_result: EvalResult | None _TamperedResult – child touched protected files; child: Candidate, tampered_paths: list[str] _SuccessResult – all steps completed; child: Candidate, child_eval_result: EvalResult | None Fixes 87 mypy --strict errors concentrated at: • Line 2284 (bare `tuple` annotation) → _ProposalCtx type alias • Lines 2394–2397 (Exception not narrowed to HelixError) → direct isinstance guard • Lines 2540–2918 (object has no attribute id/instance_scores, etc.) → isinstance checks in acceptance loop replace `wr.kind == "..."` string discriminators, giving mypy the narrowing it needs on all downstream field accesses Cleanup: • Remove stray `tmp/e2e-opencode` submodule reference (160000-mode tree entry without a .gitmodules entry) and 13 other tracked tmp/ scratch files • Add `tmp/` to .gitignore to prevent recurrence Tests: • Add TestArchitectureDAtomicWorker::test_worker_tampered_result_rejects_child_without_crash to cover the previously-untested _TamperedResult path mypy result: 87 errors → 0 (Success: no issues found in 29 source files) pytest result: 865 → 866 passed (1 new tamper test, no behavior change) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Pure rename/wording cleanup — zero behavior change. Replaces the internal session-only label "Architecture D" with descriptive public-facing language throughout comments, docstrings, test class name, and assertion messages. Verified: mypy --strict 0 errors, pytest 866/866 passed.

KE7 and others added 5 commits May 13, 2026 15:50

KE7 mentioned this pull request May 14, 2026

perf(opencode): isolate per-proposal subprocess state to prevent SQLite WAL contention #35

Merged

5 tasks

KE7 merged commit 4fa82fa into main May 16, 2026
2 checks passed

KE7 deleted the perf/architecture-d-atomic-worker branch May 16, 2026 01:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(evolution): atomic-worker architecture for parallel proposals (full GEPA parity)#34

perf(evolution): atomic-worker architecture for parallel proposals (full GEPA parity)#34
KE7 merged 5 commits into
mainfrom
perf/architecture-d-atomic-worker

KE7 commented May 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KE7 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Result type

Diff stat

Test results

E2E validation

Design invariants preserved

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KE7 commented May 13, 2026 •

edited

Loading