Promote execute_tactus optimizer and dashboard integration to main by endymion · Pull Request #267 · AnthusAI/Plexus

endymion · 2026-05-01T23:02:48Z

Summary

This promotes the current develop branch to main after the optimizer/runtime integration work, score-version dashboard association fixes, procedure dispatch status fixes, and CloudWatch procedure log streaming work were merged into develop and verified there.

Major Changes

Execute Tactus runtime and optimizer migration

Migrates the MCP surface to the single execute_tactus runtime path, replacing the older many-tool MCP layout.
Wires plexus.* runtime namespaces into procedures and console/optimizer flows.
Adds runtime handles, child-budget enforcement, progress streaming, cancellation propagation, and evaluation info support.
Updates the feedback alignment optimizer YAML to use the new Tactus/Plexus module flow and preserves the recent optimizer briefing/context work.
Adds validation harnesses and runtime documentation for discovery, read APIs, handles/budgets, reports, procedures, evaluation/feedback, and score/dataset authoring.

Optimizer, evaluation, and reporting reliability

Fixes optimizer dispatch and serialization problems around score version IDs, external IDs, procedure naming, oversized YAML/code storage, and local async dispatch.
Adds acceptance-rate report support to the Tactus runtime.
Fixes confusion-matrix overwrite behavior in concurrent accuracy evaluations.
Defaults optional FeedbackAnalysis memory fanout off to prevent empty deque failures.
Adds score rubric consistency preflight tooling and related tests.

Dashboard score-version associations and result navigation

Adds direct score-version associations/indexing for score-version procedures and evaluations.
Fixes score-version procedure/evaluation tab data loading so evaluations load through direct associations and procedures have the required indexed access path once deployed.
Improves related resource cards, score evaluation/procedure lists, optimizer result cards, and evaluation score-result item filtering.
Canonicalizes evaluation identifier flow from producers into dashboard/share views.

Procedure dispatch status and dashboard UX

Stabilizes procedure grid cards, local procedure dispatch indicators, task status rendering, and realtime procedure metadata merging.
Preserves local-run status and prevents empty pending rows for local/direct procedure dispatch.
Adds regression coverage for task status, procedure dashboard loading, optimizer auth, and score-version related lists.

CloudWatch procedure run logging

Streams procedure run logs to CloudWatch log groups under /plexus/procedures/{account_key}.
Adds per-invocation run and LLM-context streams and records the CloudWatch metadata on procedure runs.
Adds IAM permissions for console run workers to write logs and authenticated dashboard clients to read logs.
Adds the dashboard CloudWatch Logs client utility and locks the new AWS SDK dependency.

Documentation and Kanbus

Updates agent/MCP instructions to reflect execute_tactus as the standard Plexus access path.
Adds and updates Kanbus issue/event artifacts for the completed optimizer, dashboard, dispatch, and CloudWatch integration work.

Included PRs

Stabilize optimizer scoring and rubric consistency checks #257 Stabilize optimizer scoring and rubric consistency checks
Add RCA category filter feedback #258 Add RCA category filter feedback
fix: canonical evaluation identifier flow (producer + dashboard/share views) #259 Canonical evaluation identifier flow
fix(reports): disable optional memory fanout for FeedbackAnalysis by default #204 Disable optional FeedbackAnalysis memory fanout by default
Fix procedure grid dispatch stability #261 Fix procedure grid dispatch stability
Execute Tactus local dispatch and optimizer fixes #262 Execute Tactus local dispatch and optimizer fixes
Merge evaluation/procedure linkage and dashboard refinements into develop #263 Merge evaluation/procedure linkage and dashboard refinements into develop
Add score-version procedure association index #264 Add score-version procedure association index
Stream procedure run logs to CloudWatch #265 Stream procedure run logs to CloudWatch
Close CloudWatch procedure log Kanbus task #266 Close CloudWatch procedure log Kanbus task

Verification

develop CI passed for PR Stream procedure run logs to CloudWatch #265 merge commit b2315bdf in run 25236213294.
develop CI passed for the current tip 4413d0c in run 25236542873.
Earlier local verification during integration included dashboard typecheck, targeted dashboard unit tests, selected Python CLI/shared tests, and Python compile checks for the CloudWatch logger/executor files.

Deployment Notes

The score-version procedure association GSI/resource change must be deployed before the score-version Procedures tab can query procedure records through the intended direct index in deployed environments.
CloudWatch log groups/streams are created by procedure invocations; dashboard display wiring for those logs is intentionally separate follow-up work.

Introduce the single-tool Tactus runtime path with tracing, budget gating, long-running operation guards, and direct feedback lookup so Plexus can be exercised as a programmable MCP runtime. Made-with: Cursor

Made-with: Cursor

Add cooperative Task cancellation checkpoints so report and procedure workers stop cleanly after execute_tactus handle cancellation marks dashboard work cancelled. Made-with: Cursor

Keep cooperative report cancellation checks from turning unavailable Task refreshes into report failures, and relax the legacy mock expectation to allow intentional status polling. Made-with: Cursor

Capture the cooperative cancellation CI fix and local verification in Kanbus for the execute_tactus handle task. Made-with: Cursor

Bridge Tactus runtime events and Plexus API call progress to FastMCP Context notifications while preserving the final execute_tactus response envelope. Made-with: Cursor

Require explicit child budgets for async execute_tactus work so dispatched evaluations, reports, and procedures remain attached to the parent runtime budget. Made-with: Cursor

Apply propagated execute_tactus child budgets inside evaluation, report, and procedure workers so long-running child executions fail early when wallclock, depth, or known spend exceeds their allocation. Made-with: Cursor

Align the runtime validation contract with explicit async budgets and broaden helper aliases so generated Tactus can use the advertised Plexus API surface directly. Made-with: Cursor

…optimizer-candidates-unfeatured

…didates-unfeatured Stabilize optimizer scoring and rubric consistency checks

…-execute-tactus-mcp-tool

- Replace the entire legacy MCP tool catalog (scorecard, score, evaluation, feedback, item, prediction, dataset, report, rubric_memory, etc.) with a single `execute_tactus` tool that exposes all Plexus functionality via the `plexus.*` Tactus runtime API - Add `plexus.score.contradictions` for rubric vs. code consistency checks - Add `score_rubric_consistency_check` option to `plexus.evaluation.run` - Add `plexus.procedure.optimize` shortcut for launching the feedback alignment optimizer with standard parameters - Add `plexus execute` CLI command for local Tactus snippet testing - Rename `run_experiment` → `run_procedure` and `run_experiment_with_task_tracking` → `run_procedure_with_task_tracking` throughout the codebase to match domain terminology - Delete `procedure_sop_agent`, `sop_agent_base`, `demo_ai_mcp_integration`, `model_config_examples`, and all associated tests (legacy LangGraph-based optimizer prototype; superseded by Tactus procedures) - Remove SOPAgent routing from `procedure_executor.py`; only `class: Tactus` procedures are supported going forward - Reorganise and expand Plexus documentation under `plexus/docs/` with topic-based subdirectories and new guides for the Tactus runtime API Made-with: Cursor

…ifier-display fix: canonical evaluation identifier flow (producer + dashboard/share views)

- Add score.pull/update/test, feedback.latest_update, rubric_memory.* namespaces to PlexusRuntimeModule DIRECT_HANDLERS with full _default_* implementations - Add _default_report_runner_sync for synchronous report execution needed by optimizer; route plexus.report.run(sync=true) through it - Add --emit-id-file CLI option to plexus evaluate accuracy/feedback so _default_evaluation_runner can capture evaluation_id from background subprocess for handle tracking - Construct and register PlexusRuntimeModule in procedure_executor.py so Tactus/Lua procedure code can call plexus.* directly - Create rubric_memory_toolset.py: in-process MCP tools for plexus_rubric_memory_* sub-agent tools - Replace legacy MCP tool calls in ScoreEditorToolset with direct _default_score_pull / _default_score_update calls - Rewrite feedback_alignment_optimizer.yaml call_plexus_tool to use plexus.* APIs directly; batch evaluations via handle protocol; synchronous reports via sync=true; score pull via temp files - Update execute_test.py and test_score_editor_toolset.py to mock new direct-call interfaces Made-with: Cursor

…code storage Closes plx-62b442, plx-51488a, plx-f804a6. Updates plx-61c332, plx-07dc0d. Adds plx-71ad53 (remaining L4 integration tests). ## execute_tactus contract hardening (plx-f804a6) - Add `_truncate_envelope` helper: caps execute_tactus JSON responses at 40 K chars to prevent LLM context-window overflow from large evaluation / scorecard payloads. - `BudgetGate.carve_child`: when the parent gate is effectively infinite (usd=inf, wallclock=inf — as in the embedded chat MCP context), auto-supply a generous default child budget instead of raising ChildBudgetRequired. Callers inside chat no longer need explicit `budget = { ... }` for async evaluation / procedure calls. - `_default_score_update`: set `isFeatured: "false"` on new ScoreVersion records so optimizer-created versions are not featured by default. - `_default_score_test`: remove erroneous lambda wrapper around coroutine, fixing asyncio awaitable error. - `_default_score_pull`: write YAML and guidelines to temp files and return their paths so sandboxed Lua code can read them via File.read() without needing the io library. - `_default_procedure_optimize`: dispatch optimizer via background daemon thread so the chat agent receives procedure_id immediately (~49 s) instead of blocking for hours. ## Console chat fixed end-to-end (plx-61c332, plx-62b442) - `chat_agent.tac` `extract_text`: handle Lupa userdata (Lua receives Plexus Python objects as `userdata`, not `table`) using pcall attribute access; checks response/content/message/text keys and indexed first element. - Remove `MessageHistory.get()` auto-load: history now comes exclusively from `console_session_history` passed by the caller, preventing cross-turn context bleed that caused 300 K–667 K token overflows. - Add `assistant.output` fallback with garbage filter: filters out Python model reprs like "UsageStats" and "output=None" that appeared when the LLM returned without tool use. - `mcp_transport.py`: pass a permissive BudgetGate (usd=inf, wallclock=inf, depth=20, tool_calls=500) to execute_tactus in the embedded procedure MCP context. - `builtin_procedures.py`: increase chat agent max_tokens 220→1024, reasoning_effort low→medium; add explicit usage examples for evaluation.run (with budget), procedure.optimize (with budget), and evaluation.find_recent (with evaluation_type). ## S3-backed procedure code storage (plx-07dc0d) - `service.py`: on procedure creation, upload YAML as `code.tac` to S3 and store the key in `procedure.metadata["code_s3_key"]`. On load, check S3 before falling back to template. Prevents DynamoDB 400 KB item limit from blocking large optimizer YAMLs. - `s3_utils.py`: add `upload_procedure_file` and `download_procedure_code` helpers. - `procedure.py` (model): add `metadata` field to Procedure GraphQL model. - `resource.ts`: add `metadata` field to Procedure Amplify schema. ## procedure_executor.py (plx-07dc0d) - Remove special `_create_console_plexus_dispatch_tool` branch for console chat; all procedures now use `PydanticAIMCPAdapter` uniformly to expose execute_tactus. - Fix MCP dir path (one extra `..` removed). - Inject `plexus` Lua global shim at the top of every procedure source so procedures that use `plexus.*` as a global (not via require) still work. - Register an effectively unlimited BudgetGate for procedure-internal plexus.* calls so long-running procedures are not killed by the 60 s default budget. ## tactus_adapters/storage.py (plx-07dc0d) - Wrap `OptimizerResultsService.index_optimizer_run` in try/except RuntimeError so missing `AMPLIFY_STORAGE_TASKATTACHMENTS_BUCKET_NAME` degrades gracefully with a warning instead of crashing the optimizer. ## feedback_alignment_optimizer.yaml (plx-07dc0d) - Add nil guard before `rubric_memory_context.machine_context` access that caused "attempt to index a nil value" crashes during early optimizer turns. ## plexus execute CLI (plx-07dc0d) - Fix sys.path construction so `plexus execute` finds the MCP module in all working-directory contexts. Made-with: Cursor

…us-mcp-tool Execute Tactus local dispatch and optimizer fixes

…us-mcp-tool Merge evaluation/procedure linkage and dashboard refinements into develop

…us-mcp-tool Add score-version procedure association index

…deque-from-days-mode fix(reports): disable optional memory fanout for FeedbackAnalysis by default

Each procedure invocation opens two CloudWatch log streams under /plexus/procedures/{account_key}: - {procedure_id}/run/{invocation_run_id} lifecycle/tool/cost events - {procedure_id}/llm-context/{invocation_run_id} full LLM prompt_context JSON Events are written directly on each call (no buffering) so logs appear live during execution. The log group and stream prefix are stored in procedure.metadata so the dashboard can locate them by convention. IAM write permissions added to the consoleRunWorker Lambda role. IAM read permissions added to the Amplify authenticated Cognito role. Frontend utility added at dashboard/utils/cloudwatch-logs-client.ts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…patch-indicator Fix procedure grid dispatch stability

…-execute-tactus-mcp-tool

…us-mcp-tool Stream procedure run logs to CloudWatch

…ch-kanbus Close CloudWatch procedure log Kanbus task

dereknorrbom and others added 30 commits April 23, 2026 20:38

fix(reports): default FeedbackAnalysis memory analysis to off

d37a447

Add execute_tactus runtime prototype

e6523ae

Introduce the single-tool Tactus runtime path with tracing, budget gating, long-running operation guards, and direct feedback lookup so Plexus can be exercised as a programmable MCP runtime. Made-with: Cursor

Wire evaluation info into Tactus runtime

35553cf

Made-with: Cursor

Add async handles for runtime run APIs

dd55749

Made-with: Cursor

Propagate cancellation for runtime handles

b0ef8d2

Made-with: Cursor

Honor cancellation in runtime workers

d091057

Add cooperative Task cancellation checkpoints so report and procedure workers stop cleanly after execute_tactus handle cancellation marks dashboard work cancelled. Made-with: Cursor

Make report cancellation polling tolerant

56f2d66

Keep cooperative report cancellation checks from turning unavailable Task refreshes into report failures, and relax the legacy mock expectation to allow intentional status polling. Made-with: Cursor

Record cancellation CI follow-up

d427e30

Capture the cooperative cancellation CI fix and local verification in Kanbus for the execute_tactus handle task. Made-with: Cursor

Stream execute_tactus progress over MCP

5830e35

Bridge Tactus runtime events and Plexus API call progress to FastMCP Context notifications while preserving the final execute_tactus response envelope. Made-with: Cursor

Carve child budgets for runtime handles

6d98b87

Require explicit child budgets for async execute_tactus work so dispatched evaluations, reports, and procedures remain attached to the parent runtime budget. Made-with: Cursor

Enforce child budgets in runtime workers

34e4127

Apply propagated execute_tactus child budgets inside evaluation, report, and procedure workers so long-running child executions fail early when wallclock, depth, or known spend exceeds their allocation. Made-with: Cursor

Expand execute_tactus validation helpers

9d94259

Align the runtime validation contract with explicit async budgets and broaden helper aliases so generated Tactus can use the advertised Plexus API surface directly. Made-with: Cursor

Fix optimizer score versions default featured flag

e5a3fdb

Add score rubric consistency preflight

eaa2da9

Move rubric consistency command to score CLI

f501e80

Fix evaluation RCA item filtering UI

34ac856

Expand evaluation category filter linkage

3e22ce0

Merge remote-tracking branch 'origin/develop' into bugfix/plx-e4f63d-…

22b8ae2

…optimizer-candidates-unfeatured

Fix evaluation category View items filtering

15dfe3e

Merge pull request #257 from AnthusAI/bugfix/plx-e4f63d-optimizer-can…

63184db

…didates-unfeatured Stabilize optimizer scoring and rubric consistency checks

fix(dashboard): unify evaluation score-result transformation

35b5035

chore(kanbus): sync issue artifacts

b64ab99

Add explicit feedback for RCA category filtering

0672bce

Merge remote-tracking branch 'origin/develop' into feature/plx-07dc0d…

eb0cc37

…-execute-tactus-mcp-tool

fix(evaluations): canonicalize identifier flow from producer to views

c472811

chore(kanbus): sync issue and event artifacts

32d6751

Merge pull request #259 from AnthusAI/bugfix/evaluations-shared-ident…

50b2cdf

…ifier-display fix: canonical evaluation identifier flow (producer + dashboard/share views)

endymion and others added 25 commits May 1, 2026 16:01

Fix procedure optimize result serialization

bd198e5

Fix optimizer score editor external id normalization

b2ddf4b

Fix CI type checks for optimizer branch

002c714

Fix execute tactus CI test drift

1bf18b9

Support local procedure dispatch UI

ebdf70b

Merge pull request #262 from AnthusAI/feature/plx-07dc0d-execute-tact…

99d4682

…us-mcp-tool Execute Tactus local dispatch and optimizer fixes

Refine evaluation related-link cards and spacing

3ef9abd

Commit remaining local evaluation/procedure changes

fa89ec8

Merge pull request #263 from AnthusAI/feature/plx-07dc0d-execute-tact…

7c4741d

…us-mcp-tool Merge evaluation/procedure linkage and dashboard refinements into develop

Merge main into develop

8b4c0b6

Fix PublicEvaluation test data-operations mock

d12eeed

Add score-version procedure association index

697a7e1

Merge pull request #264 from AnthusAI/feature/plx-07dc0d-execute-tact…

d343a96

…us-mcp-tool Add score-version procedure association index

Merge pull request #204 from AnthusAI/bugfix/feedback-analysis-empty-…

3f46d8c

…deque-from-days-mode fix(reports): disable optional memory fanout for FeedbackAnalysis by default

Merge develop into procedure dispatch branch

813c652

Record PR 261 conflict resolution task

e8ba3f9

Merge pull request #261 from AnthusAI/bugfix/plx-77865f-procedure-dis…

cfa14d9

…patch-indicator Fix procedure grid dispatch stability

Merge remote-tracking branch 'origin/develop' into feature/plx-07dc0d…

f9c1b50

…-execute-tactus-mcp-tool

Record CloudWatch procedure log integration task

a0375eb

Fix CloudWatch log client region resolution

7af6d0c

Record CloudWatch log integration verification

7d8f2dd

Merge pull request #265 from AnthusAI/feature/plx-07dc0d-execute-tact…

b2315bd

…us-mcp-tool Stream procedure run logs to CloudWatch

Close CloudWatch procedure log Kanbus task

120d7b5

Merge pull request #266 from AnthusAI/chore/plx-5604df-close-cloudwat…

4413d0c

…ch-kanbus Close CloudWatch procedure log Kanbus task

endymion requested a review from a team as a code owner May 1, 2026 23:02

endymion requested review from dereknorrbom and removed request for a team May 1, 2026 23:02

dereknorrbom approved these changes May 1, 2026

View reviewed changes

dereknorrbom merged commit 1786810 into main May 1, 2026
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promote execute_tactus optimizer and dashboard integration to main#267

Promote execute_tactus optimizer and dashboard integration to main#267
dereknorrbom merged 86 commits intomainfrom
develop

endymion commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

endymion commented May 1, 2026

Summary

Major Changes

Execute Tactus runtime and optimizer migration

Optimizer, evaluation, and reporting reliability

Dashboard score-version associations and result navigation

Procedure dispatch status and dashboard UX

CloudWatch procedure run logging

Documentation and Kanbus

Included PRs

Verification

Deployment Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants