Skip to content

Promote execute_tactus optimizer and dashboard integration to main#267

Merged
dereknorrbom merged 86 commits intomainfrom
develop
May 1, 2026
Merged

Promote execute_tactus optimizer and dashboard integration to main#267
dereknorrbom merged 86 commits intomainfrom
develop

Conversation

@endymion
Copy link
Copy Markdown
Contributor

@endymion endymion commented May 1, 2026

Summary

This promotes the current develop branch to main after the optimizer/runtime integration work, score-version dashboard association fixes, procedure dispatch status fixes, and CloudWatch procedure log streaming work were merged into develop and verified there.

Major Changes

Execute Tactus runtime and optimizer migration

  • Migrates the MCP surface to the single execute_tactus runtime path, replacing the older many-tool MCP layout.
  • Wires plexus.* runtime namespaces into procedures and console/optimizer flows.
  • Adds runtime handles, child-budget enforcement, progress streaming, cancellation propagation, and evaluation info support.
  • Updates the feedback alignment optimizer YAML to use the new Tactus/Plexus module flow and preserves the recent optimizer briefing/context work.
  • Adds validation harnesses and runtime documentation for discovery, read APIs, handles/budgets, reports, procedures, evaluation/feedback, and score/dataset authoring.

Optimizer, evaluation, and reporting reliability

  • Fixes optimizer dispatch and serialization problems around score version IDs, external IDs, procedure naming, oversized YAML/code storage, and local async dispatch.
  • Adds acceptance-rate report support to the Tactus runtime.
  • Fixes confusion-matrix overwrite behavior in concurrent accuracy evaluations.
  • Defaults optional FeedbackAnalysis memory fanout off to prevent empty deque failures.
  • Adds score rubric consistency preflight tooling and related tests.

Dashboard score-version associations and result navigation

  • Adds direct score-version associations/indexing for score-version procedures and evaluations.
  • Fixes score-version procedure/evaluation tab data loading so evaluations load through direct associations and procedures have the required indexed access path once deployed.
  • Improves related resource cards, score evaluation/procedure lists, optimizer result cards, and evaluation score-result item filtering.
  • Canonicalizes evaluation identifier flow from producers into dashboard/share views.

Procedure dispatch status and dashboard UX

  • Stabilizes procedure grid cards, local procedure dispatch indicators, task status rendering, and realtime procedure metadata merging.
  • Preserves local-run status and prevents empty pending rows for local/direct procedure dispatch.
  • Adds regression coverage for task status, procedure dashboard loading, optimizer auth, and score-version related lists.

CloudWatch procedure run logging

  • Streams procedure run logs to CloudWatch log groups under /plexus/procedures/{account_key}.
  • Adds per-invocation run and LLM-context streams and records the CloudWatch metadata on procedure runs.
  • Adds IAM permissions for console run workers to write logs and authenticated dashboard clients to read logs.
  • Adds the dashboard CloudWatch Logs client utility and locks the new AWS SDK dependency.

Documentation and Kanbus

  • Updates agent/MCP instructions to reflect execute_tactus as the standard Plexus access path.
  • Adds and updates Kanbus issue/event artifacts for the completed optimizer, dashboard, dispatch, and CloudWatch integration work.

Included PRs

Verification

  • develop CI passed for PR Stream procedure run logs to CloudWatch #265 merge commit b2315bdf in run 25236213294.
  • develop CI passed for the current tip 4413d0c in run 25236542873.
  • Earlier local verification during integration included dashboard typecheck, targeted dashboard unit tests, selected Python CLI/shared tests, and Python compile checks for the CloudWatch logger/executor files.

Deployment Notes

  • The score-version procedure association GSI/resource change must be deployed before the score-version Procedures tab can query procedure records through the intended direct index in deployed environments.
  • CloudWatch log groups/streams are created by procedure invocations; dashboard display wiring for those logs is intentionally separate follow-up work.

dereknorrbom and others added 30 commits April 23, 2026 20:38
Introduce the single-tool Tactus runtime path with tracing, budget gating, long-running operation guards, and direct feedback lookup so Plexus can be exercised as a programmable MCP runtime.

Made-with: Cursor
Add cooperative Task cancellation checkpoints so report and procedure workers stop cleanly after execute_tactus handle cancellation marks dashboard work cancelled.

Made-with: Cursor
Keep cooperative report cancellation checks from turning unavailable Task refreshes into report failures, and relax the legacy mock expectation to allow intentional status polling.

Made-with: Cursor
Capture the cooperative cancellation CI fix and local verification in Kanbus for the execute_tactus handle task.

Made-with: Cursor
Bridge Tactus runtime events and Plexus API call progress to FastMCP Context notifications while preserving the final execute_tactus response envelope.

Made-with: Cursor
Require explicit child budgets for async execute_tactus work so dispatched evaluations, reports, and procedures remain attached to the parent runtime budget.

Made-with: Cursor
Apply propagated execute_tactus child budgets inside evaluation, report, and procedure workers so long-running child executions fail early when wallclock, depth, or known spend exceeds their allocation.

Made-with: Cursor
Align the runtime validation contract with explicit async budgets and broaden helper aliases so generated Tactus can use the advertised Plexus API surface directly.

Made-with: Cursor
…didates-unfeatured

Stabilize optimizer scoring and rubric consistency checks
- Replace the entire legacy MCP tool catalog (scorecard, score,
  evaluation, feedback, item, prediction, dataset, report, rubric_memory,
  etc.) with a single `execute_tactus` tool that exposes all Plexus
  functionality via the `plexus.*` Tactus runtime API
- Add `plexus.score.contradictions` for rubric vs. code consistency checks
- Add `score_rubric_consistency_check` option to `plexus.evaluation.run`
- Add `plexus.procedure.optimize` shortcut for launching the feedback
  alignment optimizer with standard parameters
- Add `plexus execute` CLI command for local Tactus snippet testing
- Rename `run_experiment` → `run_procedure` and
  `run_experiment_with_task_tracking` → `run_procedure_with_task_tracking`
  throughout the codebase to match domain terminology
- Delete `procedure_sop_agent`, `sop_agent_base`, `demo_ai_mcp_integration`,
  `model_config_examples`, and all associated tests (legacy LangGraph-based
  optimizer prototype; superseded by Tactus procedures)
- Remove SOPAgent routing from `procedure_executor.py`; only `class: Tactus`
  procedures are supported going forward
- Reorganise and expand Plexus documentation under `plexus/docs/` with
  topic-based subdirectories and new guides for the Tactus runtime API

Made-with: Cursor
…ifier-display

fix: canonical evaluation identifier flow (producer + dashboard/share views)
- Add score.pull/update/test, feedback.latest_update, rubric_memory.*
  namespaces to PlexusRuntimeModule DIRECT_HANDLERS with full
  _default_* implementations
- Add _default_report_runner_sync for synchronous report execution
  needed by optimizer; route plexus.report.run(sync=true) through it
- Add --emit-id-file CLI option to plexus evaluate accuracy/feedback
  so _default_evaluation_runner can capture evaluation_id from
  background subprocess for handle tracking
- Construct and register PlexusRuntimeModule in procedure_executor.py
  so Tactus/Lua procedure code can call plexus.* directly
- Create rubric_memory_toolset.py: in-process MCP tools for
  plexus_rubric_memory_* sub-agent tools
- Replace legacy MCP tool calls in ScoreEditorToolset with direct
  _default_score_pull / _default_score_update calls
- Rewrite feedback_alignment_optimizer.yaml call_plexus_tool to use
  plexus.* APIs directly; batch evaluations via handle protocol;
  synchronous reports via sync=true; score pull via temp files
- Update execute_test.py and test_score_editor_toolset.py to mock
  new direct-call interfaces

Made-with: Cursor
…code storage

Closes plx-62b442, plx-51488a, plx-f804a6.
Updates plx-61c332, plx-07dc0d.
Adds plx-71ad53 (remaining L4 integration tests).

## execute_tactus contract hardening (plx-f804a6)

- Add `_truncate_envelope` helper: caps execute_tactus JSON responses at 40 K chars
  to prevent LLM context-window overflow from large evaluation / scorecard payloads.
- `BudgetGate.carve_child`: when the parent gate is effectively infinite (usd=inf,
  wallclock=inf — as in the embedded chat MCP context), auto-supply a generous default
  child budget instead of raising ChildBudgetRequired. Callers inside chat no longer
  need explicit `budget = { ... }` for async evaluation / procedure calls.
- `_default_score_update`: set `isFeatured: "false"` on new ScoreVersion records so
  optimizer-created versions are not featured by default.
- `_default_score_test`: remove erroneous lambda wrapper around coroutine, fixing
  asyncio awaitable error.
- `_default_score_pull`: write YAML and guidelines to temp files and return their paths
  so sandboxed Lua code can read them via File.read() without needing the io library.
- `_default_procedure_optimize`: dispatch optimizer via background daemon thread so
  the chat agent receives procedure_id immediately (~49 s) instead of blocking for hours.

## Console chat fixed end-to-end (plx-61c332, plx-62b442)

- `chat_agent.tac` `extract_text`: handle Lupa userdata (Lua receives Plexus Python
  objects as `userdata`, not `table`) using pcall attribute access; checks
  response/content/message/text keys and indexed first element.
- Remove `MessageHistory.get()` auto-load: history now comes exclusively from
  `console_session_history` passed by the caller, preventing cross-turn context bleed
  that caused 300 K–667 K token overflows.
- Add `assistant.output` fallback with garbage filter: filters out Python model reprs
  like "UsageStats" and "output=None" that appeared when the LLM returned without
  tool use.
- `mcp_transport.py`: pass a permissive BudgetGate (usd=inf, wallclock=inf, depth=20,
  tool_calls=500) to execute_tactus in the embedded procedure MCP context.
- `builtin_procedures.py`: increase chat agent max_tokens 220→1024, reasoning_effort
  low→medium; add explicit usage examples for evaluation.run (with budget),
  procedure.optimize (with budget), and evaluation.find_recent (with evaluation_type).

## S3-backed procedure code storage (plx-07dc0d)

- `service.py`: on procedure creation, upload YAML as `code.tac` to S3 and store the
  key in `procedure.metadata["code_s3_key"]`. On load, check S3 before falling back
  to template. Prevents DynamoDB 400 KB item limit from blocking large optimizer YAMLs.
- `s3_utils.py`: add `upload_procedure_file` and `download_procedure_code` helpers.
- `procedure.py` (model): add `metadata` field to Procedure GraphQL model.
- `resource.ts`: add `metadata` field to Procedure Amplify schema.

## procedure_executor.py (plx-07dc0d)

- Remove special `_create_console_plexus_dispatch_tool` branch for console chat;
  all procedures now use `PydanticAIMCPAdapter` uniformly to expose execute_tactus.
- Fix MCP dir path (one extra `..` removed).
- Inject `plexus` Lua global shim at the top of every procedure source so procedures
  that use `plexus.*` as a global (not via require) still work.
- Register an effectively unlimited BudgetGate for procedure-internal plexus.* calls
  so long-running procedures are not killed by the 60 s default budget.

## tactus_adapters/storage.py (plx-07dc0d)

- Wrap `OptimizerResultsService.index_optimizer_run` in try/except RuntimeError so
  missing `AMPLIFY_STORAGE_TASKATTACHMENTS_BUCKET_NAME` degrades gracefully with a
  warning instead of crashing the optimizer.

## feedback_alignment_optimizer.yaml (plx-07dc0d)

- Add nil guard before `rubric_memory_context.machine_context` access that caused
  "attempt to index a nil value" crashes during early optimizer turns.

## plexus execute CLI (plx-07dc0d)

- Fix sys.path construction so `plexus execute` finds the MCP module in all
  working-directory contexts.

Made-with: Cursor
endymion and others added 25 commits May 1, 2026 16:01
…us-mcp-tool

Execute Tactus local dispatch and optimizer fixes
…us-mcp-tool

Merge evaluation/procedure linkage and dashboard refinements into develop
…us-mcp-tool

Add score-version procedure association index
…deque-from-days-mode

fix(reports): disable optional memory fanout for FeedbackAnalysis by default
Each procedure invocation opens two CloudWatch log streams under
/plexus/procedures/{account_key}:
  - {procedure_id}/run/{invocation_run_id}         lifecycle/tool/cost events
  - {procedure_id}/llm-context/{invocation_run_id} full LLM prompt_context JSON

Events are written directly on each call (no buffering) so logs appear
live during execution. The log group and stream prefix are stored in
procedure.metadata so the dashboard can locate them by convention.

IAM write permissions added to the consoleRunWorker Lambda role.
IAM read permissions added to the Amplify authenticated Cognito role.
Frontend utility added at dashboard/utils/cloudwatch-logs-client.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…patch-indicator

Fix procedure grid dispatch stability
…us-mcp-tool

Stream procedure run logs to CloudWatch
…ch-kanbus

Close CloudWatch procedure log Kanbus task
@endymion endymion requested a review from a team as a code owner May 1, 2026 23:02
@endymion endymion requested review from dereknorrbom and removed request for a team May 1, 2026 23:02
@dereknorrbom dereknorrbom merged commit 1786810 into main May 1, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants