Skip to content

perf(evolution): atomic-worker architecture for parallel proposals (full GEPA parity)#34

Merged
KE7 merged 5 commits into
mainfrom
perf/architecture-d-atomic-worker
May 16, 2026
Merged

perf(evolution): atomic-worker architecture for parallel proposals (full GEPA parity)#34
KE7 merged 5 commits into
mainfrom
perf/architecture-d-atomic-worker

Conversation

@KE7
Copy link
Copy Markdown
Owner

@KE7 KE7 commented May 13, 2026

Summary

Replaces helix's three-stage parallel pipeline (Step 1b parent-eval pool → Step 2 LLM pool → Step 3 sequential child-eval) with a single ThreadPoolExecutor whose workers each execute the full GEPA execute_proposal shape atomically — achieving full structural parity with GEPA reflective_mutation.py:268,308,369,420.

Design

Each atomic worker runs:

  1. parent_eval_cached_evaluate_batch(parent, subsample_ids, None, ...) (bypasses cache, mirrors GEPA RM:268)
  2. skip-perfect check — if all parent scores ≥ threshold → return SkippedResult (mirrors GEPA RM:308-327)
  3. LLM mutationmutate(parent, eval_for_mutate, new_id, ...) (mirrors GEPA RM:369)
  4. tamper check_detect_evaluator_tamper(child, manifest, config, project_root) (thread-safe read-only)
  5. child_eval_cached_evaluate_batch(child, subsample_ids, minibatch_cache, ...) (mirrors GEPA RM:420)

Budget charges and frontier updates remain sequential in the acceptance loop (apply_proposal_output parity, GEPA RM:472).

Source: /Users/ke/helix-gepa-parity-investigation.md §7 D1.

Result type

@dataclass
class _ProposalResult:
    kind: Literal["success", "skipped", "tampered", "llm_failed"]
    presample_ctx: tuple[...]
    parent_eval_result: EvalResult | None
    child: Candidate | None = None
    child_eval_result: EvalResult | None = None
    tampered_paths: list[str] | None = None
    parent_n_uncached: int = 0
    child_n_uncached: int = 0
    child_usage: UsageStats | None = None

Diff stat

File Insertions Deletions
src/helix/evolution.py 458 444
tests/unit/test_evolution_minibatch.py 354 0

Test results

  • 866 passed, 0 failed (860 original + 6 new TestAtomicProposalWorker)
  • New test class covers: atomicity (same thread), skip-perfect isolation, worker exception isolation, n=3 parallelism, sequential budget charges, tamper-check rejection

E2E validation

Backend Exit Cost Mutations Accepted Wall-clock Errors
Claude 0 $0.11 1 (g1-s1) 1/1 (1.0) ~10s none
OpenCode 0 $0.00 2 (g1-s1, g1-s2) 2/2 (1.0) ~14s none

Both runs used num_parallel_proposals=2. The OpenCode run demonstrates the parallelism directly: both worker threads printed "⟳ Running train evaluation…" simultaneously, and both mutations were accepted (both fixed the add_one off-by-one bug independently).

Design invariants preserved

  • ✅ Acceptance order = pre-sampling order (not worker completion order)
  • ✅ Budget charges sequential (acceptance loop only)
  • ✅ Frontier/lineage writes sequential (acceptance loop only)
  • PromptArtifactCollisionError re-raised from worker (fatal, not swallowed)
  • max_workers cap applied to single pool (min(n, config.evolution.max_workers))
  • ✅ No force-push, no --no-verify

🤖 Generated with Claude Code

KE7 and others added 5 commits May 13, 2026 15:50
…EPA execute_proposal parity)

Replace three-stage pipeline (parent-eval pool → LLM pool → sequential child-eval)
with ONE ThreadPoolExecutor whose workers each execute the full GEPA execute_proposal
shape atomically: parent_eval (reflective_mutation.py:268) → skip-perfect
(reflective_mutation.py:308) → LLM mutate (reflective_mutation.py:369) → tamper
check → child_eval (reflective_mutation.py:420).Source: /Users/ke/helix-gepa-parity-investigation.md §7 D1.Budget charges and frontier updates remain sequential (apply_proposal_output parity).Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… atomic worker

Five new tests verifying GEPA execute_proposal parity:
1. Parent and child evals on same worker thread (atomicity)
2. Skip-perfect inside worker prevents LLM call
3. Worker LLM exception isolation (one failure doesn't crash pool)
4. n=3 proposals run in parallel (distinct worker threads)
5. Budget charges sequential in acceptance loop

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ch paths

Remove the erroneous `_sub_ids is not None` guard added in the Architecture D
worker's skip-perfect check (step W3). The guard was based on a misread of the
GEPA spec — helix fires skip-perfect on both the minibatch path (where
parent_eval comes from _cached_evaluate_batch) and the no-minibatch path (where
it comes from _cached_eval on the full train split). Tests:
  test_perfect_score_skips_mutation_continues_loop ✓
  test_perfect_score_does_not_terminate_run ✓

Also add the no-minibatch budget charge in the "skipped" acceptance branch
(worker ran _cached_eval; charge must be applied sequentially in the loop).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nion

Replace the single `_ProposalResult` dataclass (all fields typed `object`,
mypy-opaque) with a proper four-class sealed hierarchy (Option A):

  _SkippedResult      – skip-perfect fired; parent_eval_result: EvalResult
  _LLMFailedResult    – mutate() raised/returned None; parent_eval_result: EvalResult | None
  _TamperedResult     – child touched protected files; child: Candidate, tampered_paths: list[str]
  _SuccessResult      – all steps completed; child: Candidate, child_eval_result: EvalResult | None

Fixes 87 mypy --strict errors concentrated at:
  • Line 2284 (bare `tuple` annotation) → _ProposalCtx type alias
  • Lines 2394–2397 (Exception not narrowed to HelixError) → direct isinstance guard
  • Lines 2540–2918 (object has no attribute id/instance_scores, etc.) → isinstance
    checks in acceptance loop replace `wr.kind == "..."` string discriminators,
    giving mypy the narrowing it needs on all downstream field accesses

Cleanup:
  • Remove stray `tmp/e2e-opencode` submodule reference (160000-mode tree entry
    without a .gitmodules entry) and 13 other tracked tmp/ scratch files
  • Add `tmp/` to .gitignore to prevent recurrence

Tests:
  • Add TestArchitectureDAtomicWorker::test_worker_tampered_result_rejects_child_without_crash
    to cover the previously-untested _TamperedResult path

mypy result:  87 errors → 0 (Success: no issues found in 29 source files)
pytest result: 865 → 866 passed (1 new tamper test, no behavior change)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pure rename/wording cleanup — zero behavior change.  Replaces the
internal session-only label "Architecture D" with descriptive
public-facing language throughout comments, docstrings, test class
name, and assertion messages.  Verified: mypy --strict 0 errors,
pytest 866/866 passed.
@KE7 KE7 merged commit 4fa82fa into main May 16, 2026
2 checks passed
@KE7 KE7 deleted the perf/architecture-d-atomic-worker branch May 16, 2026 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant