Promote execute_tactus optimizer and dashboard integration to main#267
Merged
dereknorrbom merged 86 commits intomainfrom May 1, 2026
Merged
Promote execute_tactus optimizer and dashboard integration to main#267dereknorrbom merged 86 commits intomainfrom
dereknorrbom merged 86 commits intomainfrom
Conversation
Introduce the single-tool Tactus runtime path with tracing, budget gating, long-running operation guards, and direct feedback lookup so Plexus can be exercised as a programmable MCP runtime. Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Add cooperative Task cancellation checkpoints so report and procedure workers stop cleanly after execute_tactus handle cancellation marks dashboard work cancelled. Made-with: Cursor
Keep cooperative report cancellation checks from turning unavailable Task refreshes into report failures, and relax the legacy mock expectation to allow intentional status polling. Made-with: Cursor
Capture the cooperative cancellation CI fix and local verification in Kanbus for the execute_tactus handle task. Made-with: Cursor
Bridge Tactus runtime events and Plexus API call progress to FastMCP Context notifications while preserving the final execute_tactus response envelope. Made-with: Cursor
Require explicit child budgets for async execute_tactus work so dispatched evaluations, reports, and procedures remain attached to the parent runtime budget. Made-with: Cursor
Apply propagated execute_tactus child budgets inside evaluation, report, and procedure workers so long-running child executions fail early when wallclock, depth, or known spend exceeds their allocation. Made-with: Cursor
Align the runtime validation contract with explicit async budgets and broaden helper aliases so generated Tactus can use the advertised Plexus API surface directly. Made-with: Cursor
…optimizer-candidates-unfeatured
…didates-unfeatured Stabilize optimizer scoring and rubric consistency checks
…-execute-tactus-mcp-tool
- Replace the entire legacy MCP tool catalog (scorecard, score, evaluation, feedback, item, prediction, dataset, report, rubric_memory, etc.) with a single `execute_tactus` tool that exposes all Plexus functionality via the `plexus.*` Tactus runtime API - Add `plexus.score.contradictions` for rubric vs. code consistency checks - Add `score_rubric_consistency_check` option to `plexus.evaluation.run` - Add `plexus.procedure.optimize` shortcut for launching the feedback alignment optimizer with standard parameters - Add `plexus execute` CLI command for local Tactus snippet testing - Rename `run_experiment` → `run_procedure` and `run_experiment_with_task_tracking` → `run_procedure_with_task_tracking` throughout the codebase to match domain terminology - Delete `procedure_sop_agent`, `sop_agent_base`, `demo_ai_mcp_integration`, `model_config_examples`, and all associated tests (legacy LangGraph-based optimizer prototype; superseded by Tactus procedures) - Remove SOPAgent routing from `procedure_executor.py`; only `class: Tactus` procedures are supported going forward - Reorganise and expand Plexus documentation under `plexus/docs/` with topic-based subdirectories and new guides for the Tactus runtime API Made-with: Cursor
…ifier-display fix: canonical evaluation identifier flow (producer + dashboard/share views)
- Add score.pull/update/test, feedback.latest_update, rubric_memory.* namespaces to PlexusRuntimeModule DIRECT_HANDLERS with full _default_* implementations - Add _default_report_runner_sync for synchronous report execution needed by optimizer; route plexus.report.run(sync=true) through it - Add --emit-id-file CLI option to plexus evaluate accuracy/feedback so _default_evaluation_runner can capture evaluation_id from background subprocess for handle tracking - Construct and register PlexusRuntimeModule in procedure_executor.py so Tactus/Lua procedure code can call plexus.* directly - Create rubric_memory_toolset.py: in-process MCP tools for plexus_rubric_memory_* sub-agent tools - Replace legacy MCP tool calls in ScoreEditorToolset with direct _default_score_pull / _default_score_update calls - Rewrite feedback_alignment_optimizer.yaml call_plexus_tool to use plexus.* APIs directly; batch evaluations via handle protocol; synchronous reports via sync=true; score pull via temp files - Update execute_test.py and test_score_editor_toolset.py to mock new direct-call interfaces Made-with: Cursor
…code storage
Closes plx-62b442, plx-51488a, plx-f804a6.
Updates plx-61c332, plx-07dc0d.
Adds plx-71ad53 (remaining L4 integration tests).
## execute_tactus contract hardening (plx-f804a6)
- Add `_truncate_envelope` helper: caps execute_tactus JSON responses at 40 K chars
to prevent LLM context-window overflow from large evaluation / scorecard payloads.
- `BudgetGate.carve_child`: when the parent gate is effectively infinite (usd=inf,
wallclock=inf — as in the embedded chat MCP context), auto-supply a generous default
child budget instead of raising ChildBudgetRequired. Callers inside chat no longer
need explicit `budget = { ... }` for async evaluation / procedure calls.
- `_default_score_update`: set `isFeatured: "false"` on new ScoreVersion records so
optimizer-created versions are not featured by default.
- `_default_score_test`: remove erroneous lambda wrapper around coroutine, fixing
asyncio awaitable error.
- `_default_score_pull`: write YAML and guidelines to temp files and return their paths
so sandboxed Lua code can read them via File.read() without needing the io library.
- `_default_procedure_optimize`: dispatch optimizer via background daemon thread so
the chat agent receives procedure_id immediately (~49 s) instead of blocking for hours.
## Console chat fixed end-to-end (plx-61c332, plx-62b442)
- `chat_agent.tac` `extract_text`: handle Lupa userdata (Lua receives Plexus Python
objects as `userdata`, not `table`) using pcall attribute access; checks
response/content/message/text keys and indexed first element.
- Remove `MessageHistory.get()` auto-load: history now comes exclusively from
`console_session_history` passed by the caller, preventing cross-turn context bleed
that caused 300 K–667 K token overflows.
- Add `assistant.output` fallback with garbage filter: filters out Python model reprs
like "UsageStats" and "output=None" that appeared when the LLM returned without
tool use.
- `mcp_transport.py`: pass a permissive BudgetGate (usd=inf, wallclock=inf, depth=20,
tool_calls=500) to execute_tactus in the embedded procedure MCP context.
- `builtin_procedures.py`: increase chat agent max_tokens 220→1024, reasoning_effort
low→medium; add explicit usage examples for evaluation.run (with budget),
procedure.optimize (with budget), and evaluation.find_recent (with evaluation_type).
## S3-backed procedure code storage (plx-07dc0d)
- `service.py`: on procedure creation, upload YAML as `code.tac` to S3 and store the
key in `procedure.metadata["code_s3_key"]`. On load, check S3 before falling back
to template. Prevents DynamoDB 400 KB item limit from blocking large optimizer YAMLs.
- `s3_utils.py`: add `upload_procedure_file` and `download_procedure_code` helpers.
- `procedure.py` (model): add `metadata` field to Procedure GraphQL model.
- `resource.ts`: add `metadata` field to Procedure Amplify schema.
## procedure_executor.py (plx-07dc0d)
- Remove special `_create_console_plexus_dispatch_tool` branch for console chat;
all procedures now use `PydanticAIMCPAdapter` uniformly to expose execute_tactus.
- Fix MCP dir path (one extra `..` removed).
- Inject `plexus` Lua global shim at the top of every procedure source so procedures
that use `plexus.*` as a global (not via require) still work.
- Register an effectively unlimited BudgetGate for procedure-internal plexus.* calls
so long-running procedures are not killed by the 60 s default budget.
## tactus_adapters/storage.py (plx-07dc0d)
- Wrap `OptimizerResultsService.index_optimizer_run` in try/except RuntimeError so
missing `AMPLIFY_STORAGE_TASKATTACHMENTS_BUCKET_NAME` degrades gracefully with a
warning instead of crashing the optimizer.
## feedback_alignment_optimizer.yaml (plx-07dc0d)
- Add nil guard before `rubric_memory_context.machine_context` access that caused
"attempt to index a nil value" crashes during early optimizer turns.
## plexus execute CLI (plx-07dc0d)
- Fix sys.path construction so `plexus execute` finds the MCP module in all
working-directory contexts.
Made-with: Cursor
…us-mcp-tool Execute Tactus local dispatch and optimizer fixes
…us-mcp-tool Merge evaluation/procedure linkage and dashboard refinements into develop
…us-mcp-tool Add score-version procedure association index
…deque-from-days-mode fix(reports): disable optional memory fanout for FeedbackAnalysis by default
Each procedure invocation opens two CloudWatch log streams under
/plexus/procedures/{account_key}:
- {procedure_id}/run/{invocation_run_id} lifecycle/tool/cost events
- {procedure_id}/llm-context/{invocation_run_id} full LLM prompt_context JSON
Events are written directly on each call (no buffering) so logs appear
live during execution. The log group and stream prefix are stored in
procedure.metadata so the dashboard can locate them by convention.
IAM write permissions added to the consoleRunWorker Lambda role.
IAM read permissions added to the Amplify authenticated Cognito role.
Frontend utility added at dashboard/utils/cloudwatch-logs-client.ts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…patch-indicator Fix procedure grid dispatch stability
…-execute-tactus-mcp-tool
…us-mcp-tool Stream procedure run logs to CloudWatch
…ch-kanbus Close CloudWatch procedure log Kanbus task
dereknorrbom
approved these changes
May 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This promotes the current
developbranch tomainafter the optimizer/runtime integration work, score-version dashboard association fixes, procedure dispatch status fixes, and CloudWatch procedure log streaming work were merged intodevelopand verified there.Major Changes
Execute Tactus runtime and optimizer migration
execute_tactusruntime path, replacing the older many-tool MCP layout.plexus.*runtime namespaces into procedures and console/optimizer flows.Optimizer, evaluation, and reporting reliability
Dashboard score-version associations and result navigation
Procedure dispatch status and dashboard UX
CloudWatch procedure run logging
/plexus/procedures/{account_key}.Documentation and Kanbus
execute_tactusas the standard Plexus access path.Included PRs
Verification
developCI passed for PR Stream procedure run logs to CloudWatch #265 merge commitb2315bdfin run25236213294.developCI passed for the current tip4413d0cin run25236542873.Deployment Notes