feat(s12_evaluate): register binary_classify in default stage (v0.25.0) by CocoRoF · Pull Request #31 · CocoRoF/geny-executor

CocoRoF · 2026-04-20T07:24:21Z

Summary

Two additive changes:

binary_classify is now registered in the default Stage 12 (EvaluateStage) strategy slot alongside signal_based / criteria_based / agent_evaluation. A manifest with strategies={\"strategy\": \"binary_classify\"} now resolves to a real BinaryClassifyEvaluation instance instead of silently falling back to SignalBasedEvaluation.
BinaryClassifyEvaluation.configure(config: dict) is given a working body, so strategy_configs={\"strategy\": {\"easy_max_turns\": N, \"not_easy_max_turns\": M}} flows through at manifest-restore time.

Why

Before v0.25.0, binary_classify lived only in the adaptive artifact module and was reachable only via the builder's .with_evaluate(strategy=BinaryClassifyEvaluation(...)) kwarg. That meant serializing the worker_adaptive preset through an EnvironmentManifest silently degraded it to signal-based evaluation — its adaptive identity was lost the moment it passed through a manifest round-trip.

Geny's manifest-first cutover (dev_docs/20260420_3/plan/02_default_env_per_role.md) requires build_default_manifest.stages to emit the full preset layout declaratively. This PR makes that faithful for worker_adaptive.

Backwards compatibility

No breaking changes. The adaptive artifact remains strategy-only; its Python import path (from geny_executor.stages.s12_evaluate.artifact.adaptive.strategy import BinaryClassifyEvaluation) is unchanged. The default registry keeps every pre-existing strategy — this PR is purely additive inside EvaluateStage.__init__.

Test plan

tests/unit/test_binary_classify_manifest.py — 6 new tests
Full suite: 1029 passed, 18 skipped (up from 1023)
ruff check + ruff format --check clean
CHANGELOG.md [0.25.0] entry
Version bumped to 0.25.0 in both pyproject.toml and __init__.py

🤖 Generated with Claude Code

Adds 'binary_classify' to the default EvaluateStage's strategy slot registry so manifests with strategies={"strategy": "binary_classify"} resolve to a real BinaryClassifyEvaluation instance instead of silently falling back to SignalBasedEvaluation. Also gives BinaryClassifyEvaluation.configure(...) a working body so strategy_configs flows through — easy_max_turns and not_easy_max_turns land on the strategy's internal config at manifest-restore time. This unblocks Geny's manifest-first cutover (PR10 of the 20260420_3 cycle) where build_default_manifest.stages emits a StageManifestEntry for worker_adaptive's Stage 12 with strategies.strategy = "binary_classify". Without this change, serializing worker_adaptive through an EnvironmentManifest silently degraded it to signal-based evaluation. No breaking changes. The adaptive artifact remains strategy-only and its Python import path is unchanged. The default registry keeps all three pre-existing strategies; binary_classify is additive. Tests: tests/unit/test_binary_classify_manifest.py (6 new tests). Full suite: 1029 passed, 18 skipped. Ruff + format clean. Refs: Geny/dev_docs/20260420_3/plan/02_default_env_per_role.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Fills in build_default_manifest.stages (previously an empty list with a "filled in by a later PR" comment) with StageManifestEntry objects that mirror the worker_adaptive and vtuber GenyPresets stage chains. Also bumps the executor pin to >=0.25.0,<0.26.0 so manifest-restore can resolve the binary_classify evaluator strategy that worker_adaptive depends on. ### What the manifest carries - Full active stage list per preset (worker_adaptive: 1–9 + 12, 13, 15, 16; vtuber: same minus Stage 8/Think). - Per-stage artifact name ("default" throughout) and strategy slot selections that exactly match the preset builder's output (e.g. cache strategy "aggressive_cache" for worker vs "system_cache" for vtuber, evaluator "binary_classify" vs "signal_based"). - Static configs: loop.max_turns (30 for worker, 10 for vtuber), binary_classify.{easy_max_turns, not_easy_max_turns}. - tools.built_in (Read/Write/Edit/Bash/Glob/Grep, shared constant), tools.external (plumbed through from the caller's whitelist). ### What the manifest does NOT carry (and why) Three slots are declared with *default* strategies and are meant to be overwritten by Pipeline.attach_runtime(...) at session start: - context.retriever → GenyMemoryRetriever (needs per-session memory_manager) - memory.strategy → GenyMemoryStrategy (needs memory_manager + llm_reflect + curated_km) - memory.persistence → GenyPersistence (needs memory_manager) This matches the declarative-only principle called out in dev_docs/20260420_3/plan/02_default_env_per_role.md: the manifest expresses stage shape and static params, runtime-scoped Python objects attach post-construction. Stage 10 (tool) is not in the declarative list — the preset registers it conditionally on `tools=` being passed, and the tool registry is built from tools.external + adhoc_providers at session-build time via Pipeline.from_manifest_async. Stage 3 (system) declares builder="composable" to match the preset, but the ComposablePromptBuilder's block list (PersonaBlock + DateTimeBlock + MemoryContextBlock) is runtime state and isn't encoded here. A later PR will expand attach_runtime to accept a system builder; until then, session-build code will either set the blocks itself or continue to fall back to the preset path. ### Parity verification Ad-hoc smoke test (/tmp/test_manifest_parity.py) compares the manifest-built pipeline to GenyPresets.{worker_adaptive, vtuber} on stage orders, per-stage artifact, per-slot strategy name (minus the three runtime-swapped slots), loop.max_turns, and the binary_classify strategy's live config. All 17 assertions pass. ### Why bump to v0.25.0 v0.25.0 registers binary_classify in the default EvaluateStage's strategy slot registry. Without it, worker_adaptive's manifest silently restored to signal_based evaluation. See CocoRoF/geny-executor#31 for the executor-side change. Refs: dev_docs/20260420_3/plan/02_default_env_per_role.md (PR 3) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…150) Fills in build_default_manifest.stages (previously an empty list with a "filled in by a later PR" comment) with StageManifestEntry objects that mirror the worker_adaptive and vtuber GenyPresets stage chains. Also bumps the executor pin to >=0.25.0,<0.26.0 so manifest-restore can resolve the binary_classify evaluator strategy that worker_adaptive depends on. ### What the manifest carries - Full active stage list per preset (worker_adaptive: 1–9 + 12, 13, 15, 16; vtuber: same minus Stage 8/Think). - Per-stage artifact name ("default" throughout) and strategy slot selections that exactly match the preset builder's output (e.g. cache strategy "aggressive_cache" for worker vs "system_cache" for vtuber, evaluator "binary_classify" vs "signal_based"). - Static configs: loop.max_turns (30 for worker, 10 for vtuber), binary_classify.{easy_max_turns, not_easy_max_turns}. - tools.built_in (Read/Write/Edit/Bash/Glob/Grep, shared constant), tools.external (plumbed through from the caller's whitelist). ### What the manifest does NOT carry (and why) Three slots are declared with *default* strategies and are meant to be overwritten by Pipeline.attach_runtime(...) at session start: - context.retriever → GenyMemoryRetriever (needs per-session memory_manager) - memory.strategy → GenyMemoryStrategy (needs memory_manager + llm_reflect + curated_km) - memory.persistence → GenyPersistence (needs memory_manager) This matches the declarative-only principle called out in dev_docs/20260420_3/plan/02_default_env_per_role.md: the manifest expresses stage shape and static params, runtime-scoped Python objects attach post-construction. Stage 10 (tool) is not in the declarative list — the preset registers it conditionally on `tools=` being passed, and the tool registry is built from tools.external + adhoc_providers at session-build time via Pipeline.from_manifest_async. Stage 3 (system) declares builder="composable" to match the preset, but the ComposablePromptBuilder's block list (PersonaBlock + DateTimeBlock + MemoryContextBlock) is runtime state and isn't encoded here. A later PR will expand attach_runtime to accept a system builder; until then, session-build code will either set the blocks itself or continue to fall back to the preset path. ### Parity verification Ad-hoc smoke test (/tmp/test_manifest_parity.py) compares the manifest-built pipeline to GenyPresets.{worker_adaptive, vtuber} on stage orders, per-stage artifact, per-slot strategy name (minus the three runtime-swapped slots), loop.max_turns, and the binary_classify strategy's live config. All 17 assertions pass. ### Why bump to v0.25.0 v0.25.0 registers binary_classify in the default EvaluateStage's strategy slot registry. Without it, worker_adaptive's manifest silently restored to signal_based evaluation. See CocoRoF/geny-executor#31 for the executor-side change. Refs: dev_docs/20260420_3/plan/02_default_env_per_role.md (PR 3) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

CocoRoF merged commit ece7b15 into main Apr 20, 2026
6 checks passed

CocoRoF deleted the feat/register-binary-classify branch April 20, 2026 07:24

CocoRoF mentioned this pull request Apr 20, 2026

feat(default_manifest): populate stages for worker_adaptive + vtuber CocoRoF/Geny#150

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(s12_evaluate): register binary_classify in default stage (v0.25.0)#31

feat(s12_evaluate): register binary_classify in default stage (v0.25.0)#31
CocoRoF merged 1 commit intomainfrom
feat/register-binary-classify

CocoRoF commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CocoRoF commented Apr 20, 2026

Summary

Why

Backwards compatibility

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant