Execute Tactus local dispatch and optimizer fixes by endymion · Pull Request #262 · AnthusAI/Plexus

endymion · 2026-05-01T21:17:04Z

Summary

fixes execute_tactus optimizer/local procedure dispatch behavior
adds dashboard handling/tests for local procedure dispatch state
preserves async child-budget accounting at the runtime handle boundary
commits Kanbus records for the associated work

Verification

pytest MCP/tools/tactus_runtime/execute_test.py
pytest plexus/cli/shared/command_dispatch_test.py plexus/cli/shared/test_command_dispatch.py plexus/cli/shared/test_experiment_runner.py plexus/lambda/test_task_dispatcher.py
cd dashboard && npm run ci:typecheck
cd dashboard && npm test -- --runTestsByPath components/tests/ProcedureTask.optimizer-auth.test.tsx --runInBand
GitHub Actions run 25233097621 passed for prior commit; run 25233535492 is queued for latest commit.

Introduce the single-tool Tactus runtime path with tracing, budget gating, long-running operation guards, and direct feedback lookup so Plexus can be exercised as a programmable MCP runtime. Made-with: Cursor

Made-with: Cursor

Add cooperative Task cancellation checkpoints so report and procedure workers stop cleanly after execute_tactus handle cancellation marks dashboard work cancelled. Made-with: Cursor

Keep cooperative report cancellation checks from turning unavailable Task refreshes into report failures, and relax the legacy mock expectation to allow intentional status polling. Made-with: Cursor

Capture the cooperative cancellation CI fix and local verification in Kanbus for the execute_tactus handle task. Made-with: Cursor

Bridge Tactus runtime events and Plexus API call progress to FastMCP Context notifications while preserving the final execute_tactus response envelope. Made-with: Cursor

Require explicit child budgets for async execute_tactus work so dispatched evaluations, reports, and procedures remain attached to the parent runtime budget. Made-with: Cursor

Apply propagated execute_tactus child budgets inside evaluation, report, and procedure workers so long-running child executions fail early when wallclock, depth, or known spend exceeds their allocation. Made-with: Cursor

Align the runtime validation contract with explicit async budgets and broaden helper aliases so generated Tactus can use the advertised Plexus API surface directly. Made-with: Cursor

…-execute-tactus-mcp-tool

- Replace the entire legacy MCP tool catalog (scorecard, score, evaluation, feedback, item, prediction, dataset, report, rubric_memory, etc.) with a single `execute_tactus` tool that exposes all Plexus functionality via the `plexus.*` Tactus runtime API - Add `plexus.score.contradictions` for rubric vs. code consistency checks - Add `score_rubric_consistency_check` option to `plexus.evaluation.run` - Add `plexus.procedure.optimize` shortcut for launching the feedback alignment optimizer with standard parameters - Add `plexus execute` CLI command for local Tactus snippet testing - Rename `run_experiment` → `run_procedure` and `run_experiment_with_task_tracking` → `run_procedure_with_task_tracking` throughout the codebase to match domain terminology - Delete `procedure_sop_agent`, `sop_agent_base`, `demo_ai_mcp_integration`, `model_config_examples`, and all associated tests (legacy LangGraph-based optimizer prototype; superseded by Tactus procedures) - Remove SOPAgent routing from `procedure_executor.py`; only `class: Tactus` procedures are supported going forward - Reorganise and expand Plexus documentation under `plexus/docs/` with topic-based subdirectories and new guides for the Tactus runtime API Made-with: Cursor

- Add score.pull/update/test, feedback.latest_update, rubric_memory.* namespaces to PlexusRuntimeModule DIRECT_HANDLERS with full _default_* implementations - Add _default_report_runner_sync for synchronous report execution needed by optimizer; route plexus.report.run(sync=true) through it - Add --emit-id-file CLI option to plexus evaluate accuracy/feedback so _default_evaluation_runner can capture evaluation_id from background subprocess for handle tracking - Construct and register PlexusRuntimeModule in procedure_executor.py so Tactus/Lua procedure code can call plexus.* directly - Create rubric_memory_toolset.py: in-process MCP tools for plexus_rubric_memory_* sub-agent tools - Replace legacy MCP tool calls in ScoreEditorToolset with direct _default_score_pull / _default_score_update calls - Rewrite feedback_alignment_optimizer.yaml call_plexus_tool to use plexus.* APIs directly; batch evaluations via handle protocol; synchronous reports via sync=true; score pull via temp files - Update execute_test.py and test_score_editor_toolset.py to mock new direct-call interfaces Made-with: Cursor

…code storage Closes plx-62b442, plx-51488a, plx-f804a6. Updates plx-61c332, plx-07dc0d. Adds plx-71ad53 (remaining L4 integration tests). ## execute_tactus contract hardening (plx-f804a6) - Add `_truncate_envelope` helper: caps execute_tactus JSON responses at 40 K chars to prevent LLM context-window overflow from large evaluation / scorecard payloads. - `BudgetGate.carve_child`: when the parent gate is effectively infinite (usd=inf, wallclock=inf — as in the embedded chat MCP context), auto-supply a generous default child budget instead of raising ChildBudgetRequired. Callers inside chat no longer need explicit `budget = { ... }` for async evaluation / procedure calls. - `_default_score_update`: set `isFeatured: "false"` on new ScoreVersion records so optimizer-created versions are not featured by default. - `_default_score_test`: remove erroneous lambda wrapper around coroutine, fixing asyncio awaitable error. - `_default_score_pull`: write YAML and guidelines to temp files and return their paths so sandboxed Lua code can read them via File.read() without needing the io library. - `_default_procedure_optimize`: dispatch optimizer via background daemon thread so the chat agent receives procedure_id immediately (~49 s) instead of blocking for hours. ## Console chat fixed end-to-end (plx-61c332, plx-62b442) - `chat_agent.tac` `extract_text`: handle Lupa userdata (Lua receives Plexus Python objects as `userdata`, not `table`) using pcall attribute access; checks response/content/message/text keys and indexed first element. - Remove `MessageHistory.get()` auto-load: history now comes exclusively from `console_session_history` passed by the caller, preventing cross-turn context bleed that caused 300 K–667 K token overflows. - Add `assistant.output` fallback with garbage filter: filters out Python model reprs like "UsageStats" and "output=None" that appeared when the LLM returned without tool use. - `mcp_transport.py`: pass a permissive BudgetGate (usd=inf, wallclock=inf, depth=20, tool_calls=500) to execute_tactus in the embedded procedure MCP context. - `builtin_procedures.py`: increase chat agent max_tokens 220→1024, reasoning_effort low→medium; add explicit usage examples for evaluation.run (with budget), procedure.optimize (with budget), and evaluation.find_recent (with evaluation_type). ## S3-backed procedure code storage (plx-07dc0d) - `service.py`: on procedure creation, upload YAML as `code.tac` to S3 and store the key in `procedure.metadata["code_s3_key"]`. On load, check S3 before falling back to template. Prevents DynamoDB 400 KB item limit from blocking large optimizer YAMLs. - `s3_utils.py`: add `upload_procedure_file` and `download_procedure_code` helpers. - `procedure.py` (model): add `metadata` field to Procedure GraphQL model. - `resource.ts`: add `metadata` field to Procedure Amplify schema. ## procedure_executor.py (plx-07dc0d) - Remove special `_create_console_plexus_dispatch_tool` branch for console chat; all procedures now use `PydanticAIMCPAdapter` uniformly to expose execute_tactus. - Fix MCP dir path (one extra `..` removed). - Inject `plexus` Lua global shim at the top of every procedure source so procedures that use `plexus.*` as a global (not via require) still work. - Register an effectively unlimited BudgetGate for procedure-internal plexus.* calls so long-running procedures are not killed by the 60 s default budget. ## tactus_adapters/storage.py (plx-07dc0d) - Wrap `OptimizerResultsService.index_optimizer_run` in try/except RuntimeError so missing `AMPLIFY_STORAGE_TASKATTACHMENTS_BUCKET_NAME` degrades gracefully with a warning instead of crashing the optimizer. ## feedback_alignment_optimizer.yaml (plx-07dc0d) - Add nil guard before `rubric_memory_context.machine_context` access that caused "attempt to index a nil value" crashes during early optimizer turns. ## plexus execute CLI (plx-07dc0d) - Fix sys.path construction so `plexus execute` finds the MCP module in all working-directory contexts. Made-with: Cursor

…tion, S3 graceful degradation - Fix plexus.score.predict system prompt (was plexus.predict, shorthand only in MCP wrapper) - Add metadata-only score.update path (external_id, name, key, description) without creating a new version - Fix CostEvent JSON serialization crash in procedure output with _json_safe() helper - Graceful degradation when S3 bucket not configured in persist_task_output_artifact - Fix console_chat_smoke.py: PLEXUS_CMD support, responseStatus/responseTarget schema fields Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The existing coercion only handled camelCase externalId; snake_case external_id: 47833 passed through unmodified and failed schema validation ("not of type 'string'"), blocking every hypothesis submission. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove budget enforcement from all four async dispatch paths (evaluation.run, report.run, procedure.run, procedure.optimize). These are fire-and-forget calls that return a handle immediately — the subprocess runs independently, so carving a child budget from the MCP session cap ($0.25 / 60s) before dispatch was wrong and blocked all async calls. Also add configuration_id support to _default_report_runner so full report configurations can be dispatched async via MCP, and update the tool description to reflect that no budget table is needed for async dispatches. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Report blocks and configuration-based reports now automatically use local thread execution in development (default) and remote task dispatcher in cloud deployments (Lambda). Set PLEXUS_REPORT_DISPATCH=remote to enqueue reports through the remote dispatcher — required in Lambda where long-running threads would time out. Omit it (or set to "local") for direct local execution, matching CLI behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Report subtitle now shows "Last N days" (or date range) instead of sentinel text — derived from _format_date_window_for_display() in _persist_block_result() - Procedure name now correctly extracted from YAML by passing code= to Procedure.create() - Feedback block description and block_title now include date range and scorecard name respectively - Local async report dispatch spawns subprocess instead of thread (survives MCP server restarts) - .mcp.json updated to point to py311 env and Plexus-4 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Procedure.create now accepts explicit name param and skips storing code in DynamoDB when it exceeds 350KB (large optimizer YAML was hitting DynamoDB's 400KB item limit); code still goes to S3 - procedure.optimize and procedure.run now dispatch as independent subprocesses instead of daemon threads (survive MCP server restarts) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… in DynamoDB Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AccuracyEvaluation.run() was missing the background metrics task drain that the base Evaluation.run() performs in its finally block. This left background tasks racing the outer code's final confusionMatrix write, allowing stale intermediate values to persist after the evaluation completed. Two fixes: 1. AccuracyEvaluation.run() now awaits pending metrics tasks (10s timeout) before returning, matching the base class pattern. 2. The dataset-backed accuracy final write now unconditionally writes confusionMatrix/predictedClassDistribution/datasetClassDistribution from final_metrics, so this write always wins over any earlier background write. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ault The accuracy evaluation dispatcher defaulted yaml=True, always appending --yaml to the CLI command. The --yaml flag causes the evaluate accuracy command to suppress scoreVersionId on the evaluation record, so the accuracy baseline appeared to have no version while the feedback baseline had one. Change the default to False so --yaml is only added when explicitly requested. The optimizer always passes --version explicitly, so --yaml is not needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add `acceptance_rate` and `report_acceptance_rate` HELPER_BINDINGS aliases - Add `("report", "acceptance_rate")` DIRECT_HANDLER → `_call_report_run` - `_call_report_run` pre-populates `block_class = "AcceptanceRate"` and promotes top-level params (scorecard, score, days, include_item_acceptance_rate, max_items) into block_config when called as `plexus.report.acceptance_rate` - Fix subprocess dispatch to pass `--include-item-acceptance-rate` and `--max-items` for AcceptanceRate blocks - Update tool description to list `acceptance_rate` as a high-frequency alias - Add `plexus/docs/evaluation-and-feedback/acceptance-rate.md` with full param reference, synonym list, and usage examples Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When acceptance_rate{ sync = true } is called, post-process the result: - Parse the comment-header+JSON string output into a proper dict - Drop the verbose shard-fetch log (hundreds of lines, not useful to LLMs) - Strip the items array by default (can be thousands of rows); callers that need per-item rows pass include_items = true - Drop raw_counts (internal diagnostic, not useful to consumers) Also add include_items parameter to the doc. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The optimizer was tracking repeat-offender items across cycles but telling the agent to AVOID them as "likely label noise" or "approaching the ceiling." For small eval sets (e.g. 51 samples) where 100% accuracy is the goal, that framing was self-defeating: it made the agent give up on exactly the items it needed to fix. Changes: 1. Inject deterministic item_recurrence summary into BOTH synthesis contexts (Strategy A, Strategy B) — it was previously only shown during hypothesis generation, leaving the code-editing phases blind to cross-cycle recurrence. 2. Flip the IMPLICATION text at every injection site to mark PERSISTENT, OSCILLATING, and FLIP_FLOP items as HIGHEST-PRIORITY targets rather than things to avoid. Encourage literal rules, example snippets, item-specific carve-outs, and explicit overfitting. 3. Rewrite the feedback landscape diagnostic's analysis task to produce per-item targeted fix recommendations and an "Aggressive Fix Strategy" section instead of an "Optimization Ceiling Assessment" and "Suspected Low-Quality Feedback Labels" list. 4. Flip the early-stop advisor context so repeat offenders prompt escalation to ultra_creative mode rather than acceptance of a ceiling. 5. Flip the accumulated_lessons item-recurrence instructions to treat recurring items as fixable targets and record which specific hypotheses failed (so future cycles try something different). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Procedure cards now show "Optimizer Procedure" (or any procedure_type set in YAML) in the badge below the ⋯ button, matching EvaluationTask - procedure_type, score_name, scorecard_name seeded into procedure metadata at creation so subtitle and badge appear immediately without waiting for first Lua State checkpoint - Procedure.create() now sets status='RUNNING' so Amplify Gen2 realtime subscriptions recognise new records (byStatus index populated) - Procedure.update() now accepts status and name parameters - onCreateProcedure subscription re-fetches full record to resolve @belongsTo relations (scorecard/score) that AppSync omits from payloads - onUpdateProcedure subscription preserves existing scorecard/score/metadata instead of clobbering with nulls from bare subscription payload - Optimizer procedure name set to "Optimizer: {scorecard}" (title line); score name appears as subtitle via linked score relation - feedback_alignment_optimizer.yaml declares procedure_type: Optimizer Procedure - Old runs named "Feedback Alignment Optimizer" or "Optimizer: ..." inferred as Optimizer Procedure for backward compatibility Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ew consistency Add a persistent dispatch-mode indicator to procedure grid and detail cards showing how the procedure is running (Local / Claimed... / Announced...), positioned directly below the score subtitle with no layout jiggle. Move timestamp and elapsed time into the card content below the indicator. Set dispatch_mode in task metadata at creation so the Local label is available immediately without waiting for CommandDispatch. - Grid card: indicator in header left column (no gap from title/subtitle) - Detail view: same ordered block — indicator → timestamp → elapsed → notes → segmented bar - hideTaskStatus=true for both variants; explicit TaskStatus with hideElapsedTime - workerNodeId + celeryTaskId added to TASK_CARD_FIELDS and wired through transformProcedure - service.py seeds dispatch_mode into task metadata at creation time (defaults to "local") Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-execute-tactus-mcp-tool

This reverts commit b000a54.

endymion and others added 30 commits April 29, 2026 01:32

Add execute_tactus runtime prototype

e6523ae

Introduce the single-tool Tactus runtime path with tracing, budget gating, long-running operation guards, and direct feedback lookup so Plexus can be exercised as a programmable MCP runtime. Made-with: Cursor

Wire evaluation info into Tactus runtime

35553cf

Made-with: Cursor

Add async handles for runtime run APIs

dd55749

Made-with: Cursor

Propagate cancellation for runtime handles

b0ef8d2

Made-with: Cursor

Honor cancellation in runtime workers

d091057

Add cooperative Task cancellation checkpoints so report and procedure workers stop cleanly after execute_tactus handle cancellation marks dashboard work cancelled. Made-with: Cursor

Make report cancellation polling tolerant

56f2d66

Keep cooperative report cancellation checks from turning unavailable Task refreshes into report failures, and relax the legacy mock expectation to allow intentional status polling. Made-with: Cursor

Record cancellation CI follow-up

d427e30

Capture the cooperative cancellation CI fix and local verification in Kanbus for the execute_tactus handle task. Made-with: Cursor

Stream execute_tactus progress over MCP

5830e35

Bridge Tactus runtime events and Plexus API call progress to FastMCP Context notifications while preserving the final execute_tactus response envelope. Made-with: Cursor

Carve child budgets for runtime handles

6d98b87

Require explicit child budgets for async execute_tactus work so dispatched evaluations, reports, and procedures remain attached to the parent runtime budget. Made-with: Cursor

Enforce child budgets in runtime workers

34e4127

Apply propagated execute_tactus child budgets inside evaluation, report, and procedure workers so long-running child executions fail early when wallclock, depth, or known spend exceeds their allocation. Made-with: Cursor

Expand execute_tactus validation helpers

9d94259

Align the runtime validation contract with explicit async budgets and broaden helper aliases so generated Tactus can use the advertised Plexus API surface directly. Made-with: Cursor

Merge remote-tracking branch 'origin/develop' into feature/plx-07dc0d…

eb0cc37

…-execute-tactus-mcp-tool

Merge branch 'develop' into feature/plx-07dc0d-execute-tactus-mcp-tool

01cd92a

Fix procedure run subprocess: ID is positional not --id flag

4b38a0d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix Procedure.create: accept explicit name param, skip oversized code…

722a51b

… in DynamoDB Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

endymion and others added 11 commits May 1, 2026 14:09

Merge remote-tracking branch 'origin/develop' into feature/plx-07dc0d…

9f7697f

…-execute-tactus-mcp-tool

Document report blocks with live examples

6edb864

Fix CI failures on report docs branch

b000a54

Revert "Fix CI failures on report docs branch"

9a1126a

This reverts commit b000a54.

Add grid-mode regression test for evaluation procedure link

0cd1a5a

Fix procedure optimize result serialization

bd198e5

Fix optimizer score editor external id normalization

b2ddf4b

Fix CI type checks for optimizer branch

002c714

Fix execute tactus CI test drift

1bf18b9

Support local procedure dispatch UI

ebdf70b

endymion requested a review from a team as a code owner May 1, 2026 21:17

endymion requested review from dereknorrbom and removed request for a team May 1, 2026 21:17

endymion merged commit 99d4682 into develop May 1, 2026
15 checks passed

endymion mentioned this pull request May 1, 2026

Promote execute_tactus optimizer and dashboard integration to main #267

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execute Tactus local dispatch and optimizer fixes#262

Execute Tactus local dispatch and optimizer fixes#262
endymion merged 41 commits intodevelopfrom
feature/plx-07dc0d-execute-tactus-mcp-tool

endymion commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

endymion commented May 1, 2026

Summary

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants