Skip to content

merge: resolve conflicts with main#5

Merged
PolyphonyRequiem merged 16 commits intofeat/agent-dialog-mode-cleanfrom
merge/main-into-dialog
May 4, 2026
Merged

merge: resolve conflicts with main#5
PolyphonyRequiem merged 16 commits intofeat/agent-dialog-mode-cleanfrom
merge/main-into-dialog

Conversation

@PolyphonyRequiem
Copy link
Copy Markdown
Owner

Resolves merge conflicts between feat/agent-dialog-mode-clean and upstream main (after microsoft#113 merged).

Conflicts resolved (keep both sides):

File HEAD (dialog) main (breadcrumb/subworkflow)
schema.py dialog validation max_depth validation
claude.py execute_dialog_turn method _max_input_cache = None in close()
DetailPanel.tsx Dialog imports SubworkflowDetail import
workflow-store.ts activeDialog, dialogEngaged reset wfDepth, subworkflowContexts, etc. reset
events.ts Dialog* interfaces Subworkflow* interfaces
test_claude.py TestClaudeExecuteDialogTurn TestClaudeGetMaxPromptTokens
test_copilot.py TestCopilotExecuteDialogTurn TestGetMaxPromptTokens
static assets Rebuilt from merged source

All 480 tests for affected modules pass. Vite build succeeds.

lesandiz and others added 16 commits May 4, 2026 05:33
…rosoft#100)

* fix(copilot): infer nested prompt schema from output definitions

Build the Copilot prompt schema recursively from agent output definitions so nested object properties and array item schemas are surfaced to the model and parse-recovery flow.

Add regression tests for recursive schema generation and the actual prompt sent to Copilot.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(copilot): add recursion depth guard to prompt schema builders

Prevent RecursionError on pathologically deep output schemas by adding
a depth parameter to _build_prompt_schema, _build_prompt_field_schema,
and _build_prompt_item_schema, matching the existing pattern in claude.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tests): line-too-long (E501) linting errors

* style: ruff format copilot.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Lester Sanchez <lester@dotnetting.net>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jason Robert <jasont.robert@gmail.com>
) (microsoft#110)

* feat(composition): allow sub-workflows in for_each groups (microsoft#102)

Remove validator restriction blocking type='workflow' in for_each groups.
Wire execute_single_item() to call _execute_subworkflow_with_inputs()
for workflow agents, rendering input_mapping with loop variables in scope.

- Validator: Remove workflow rejection in for_each validation
- Engine: Add workflow branch in execute_single_item(), new helper
  _execute_subworkflow_with_inputs() for pre-built inputs
- Tests: Update test_workflow_in_for_each to validate (not reject)
- Experimental workflows: test-for-each-workflow parent/child pair

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* style: ruff format workflow.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Emit parent_path/slot_key/iteration on for_each-of-workflow lifecycle

For each iteration of a for_each whose member is a sub-workflow,
emit subworkflow_started / _completed / _failed with three new fields:

- parent_path: snapshot of the parent engine's dashboard context path
  (empty list when the dashboard infra from microsoft#113 is absent)
- slot_key: a unique "<group>[<key>]" identity per iteration so
  concurrent iterations don't stack under one shared dashboard context
- iteration: 1-based ordinal (matches the existing item_key)

Conditionally thread `_dashboard_context_path` into the child engine
via getattr so this code is forward-compatible with PR microsoft#113 landing
in either order. When microsoft#113 has not landed, the kwarg is omitted and
behavior is unchanged. When microsoft#113 has landed, the child engine receives
[*parent_path, slot_key] and auto-stamps subworkflow_path on every
event it emits.

Wraps `_execute_subworkflow_with_inputs` in try/except so a failing
iteration emits subworkflow_failed before for_each_item_failed; the
original exception propagates to keep existing error handling intact.

Adds a regression test that runs three for_each iterations and asserts
each gets a distinct slot_key (batch[0], batch[1], batch[2]) on both
started and completed events.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Daniel Green <dangreen@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bumps [postcss](https://github.com/postcss/postcss) from 8.5.6 to 8.5.12.
- [Release notes](https://github.com/postcss/postcss/releases)
- [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md)
- [Commits](postcss/postcss@8.5.6...8.5.12)

---
updated-dependencies:
- dependency-name: postcss
  dependency-version: 8.5.12
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…output (microsoft#139)

* fix(engine): coerce Python literal "True"/"False"/"None" in workflow output

Workflow output templates pass through `_maybe_parse_json` to convert
JSON-shaped strings back into native types. Previously this only
recognized lowercase JSON literals (`true`/`false`/`null`). Jinja
expressions like `{{ a == b }}` render Python bool via `str()`, producing
`"True"` / `"False"`, which then survived as truthy non-empty strings
downstream. Route `when:` clauses comparing such values against `true` /
`false` silently misbehaved.

Add explicit handling for the three Python literal forms before the
existing JSON parse path. Lowercase JSON literals continue to work
(regression covered).

Tests: 3 new cases under `TestWorkflowEngineOutputTemplates` covering
`True`/`False` from `==` / `!=` expressions, `None` from `{{ none }}`,
and a regression check for the lowercase forms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: assert native bool from workflow output, not stringified

Two existing integration tests asserted the broken behavior they were
exercising:
- test_examples.py:214: `result["syntax_passed"] == "True"`
- test_parallel_workflows.py:410: `result["success"] == "True"`

Both had inline comments acknowledging the workaround
(`# Templates return strings`, `# Boolean rendered as string`). With
this PR's fix, those values now coerce to native bool. Update the
assertions accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Daniel Green <dangreen@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…on PATH (microsoft#142)

uv tool install only modifies the current process PATH. New terminals,
sub-processes, CI agents, and IDE extensions inherit PATH from the user
registry (Windows) or shell rc files (Unix) and would not find conductor.

Run uv tool update-shell after a successful install to add the bin dir
to the persistent PATH. The call is idempotent and wrapped so a failure
does not abort the install — it falls back to a hint telling the user
to run the command manually.

Closes microsoft#115

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…acking (microsoft#103) (microsoft#111)

* feat(composition): allow self-referential sub-workflows with depth tracking (microsoft#103)

Remove the circular reference path check that blocked workflows from
referencing themselves. The existing MAX_SUBWORKFLOW_DEPTH=10 already
prevents infinite recursion. Add optional per-agent max_depth field for
tighter author-controlled bounds.

- Engine: Remove self-reference path equality check in both
  _execute_subworkflow() and _execute_subworkflow_with_inputs()
- Engine: Add per-agent max_depth enforcement alongside global limit
- Schema: Add max_depth field to AgentDef with validation
- Tests: Replace circular reference test with depth-limit tests
- Experimental: test-recursive.yaml self-referential countdown

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: unify sub-workflow input handling and report real usage stats

Address PR microsoft#111 review feedback:
- Extract _build_subworkflow_inputs helper with JSON-parse-with-fallback
  used by both _execute_subworkflow and for_each workflow paths
- Change _execute_subworkflow_with_inputs to return (output, WorkflowUsage)
- Emit real token/cost data in for_each_item_completed events instead of
  hardcoded zeros
- Update test expectation to match unified JSON-parse behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Daniel Green <dangreen@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…crosoft#143)

* fix(pricing): warn once when get_pricing falls back to fuzzy match

When get_pricing() resolves a model name via the longest-prefix or
suffix-strip fallback paths, it silently returned a sibling model's
ModelPricing — including its context_window — with no log line. Names
like "claude-opus-4-1m-internal" inherited claude-opus-4's 200K window
even though the suffix suggests 1M, and the dashboard / cost calc
treated those numbers as authoritative.

This change emits a one-time logging.warning per requested model name
when get_pricing() returns a non-exact entry, naming both the
requested model and the matched key. Exact matches, overrides, and
unknown models (None) do not warn. De-duped via a module-level set so
hot-loop callers don't spam logs.

Behavior is otherwise unchanged — this is the smallest viable change
suggested in microsoft#137 and is fully backward-compatible.

Closes microsoft#137

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(pricing): require '-' delimiter on prefix match; remove dead suffix-strip code

Tightens the longest-prefix fallback in get_pricing() to require a '-'
delimiter after the matched key (`model.startswith(known_model + "-")`).
Without the delimiter, names that share a textual prefix with a known
key but belong to a different model family — e.g. claude-opus-4.7-high
matching claude-opus-4 — silently inherited the wrong context_window
and pricing. The four repro names from microsoft#137 now correctly return None
and degrade gracefully (dashboard hides the bar; cost is null) rather
than reporting confidently wrong data.

Real versioned names like claude-sonnet-4-20250514 still match
claude-sonnet-4 because the date suffix is preceded by '-'.

Also removes the suffix-strip and suffix-strip+longest-prefix branches:
they were unreachable because longest-prefix runs first and catches
every name they would have simplified.

Updates the strategy label in fuzzy-match warnings from "longest-prefix"
to "versioned-suffix" to reflect the new, narrower matching semantics.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…microsoft#144)

* feat(engine): source context_window_max from provider SDKs at runtime

The dashboard's "context window remaining" bar (and any future
enforcement) reads context_window_max emitted in agent_started,
agent_completed, and parallel_agent_completed events. Until now this
value came from a hand-maintained context_window field on
ModelPricing entries — but that field reported the *theoretical model
max*, not the SDK's actual max_prompt_tokens.

Concrete impact (from the GitHub community discussion #186340 raw API
response on 2026-02-04):

  Model                context_window  max_prompt   Bar wrong by
  gpt-5.x variants     400K            128K         3.1x
  claude-opus-4.5      200K            128K         1.6x
  claude-opus-4.6      1M              200K         5.0x

This change replaces the static lookup with a per-provider runtime
query so the bar always reflects the SDK's actual cap.

  - AgentProvider.get_max_prompt_tokens(model) -> int | None
    Concrete default returns None; Copilot and Claude providers
    override to query their SDKs (cached).
  - WorkflowEngine resolves the provider via the same path as
    execution (single-provider or registry) and prefers output.model
    over agent.model when an output is available.
  - Failures (SDK error, missing client, misconfigured mock) are
    swallowed and return None; metadata is best-effort and must never
    block workflow execution.

Cleanup of dead code per scope:

  - Removed context_window field from ModelPricing and from all
    ~30 entries in DEFAULT_PRICING.
  - Updated fuzzy-match warning text to drop "context_window".
  - Deleted tests/test_providers/test_context_window.py (the entire
    file tested the static-table lookup that no longer exists).
  - New TestGetMaxPromptTokens classes in test_copilot.py and
    test_claude.py covering known model, unknown model, SDK failure,
    cache-after-first-call, and mock-mode fallback.
  - Updated test_context_window_events.py to inject expected values
    via a fake get_max_prompt_tokens (mock-handler mode no longer has
    a static table to fall back to). Added two new resolution-order
    tests covering default-model fallback and output.model preference.

Pricing data ($/Mtok) is still hand-maintained — neither SDK exposes
per-token dollar amounts, so DEFAULT_PRICING still needs entries for
new models for cost math. Only the context-window field becomes
self-correcting.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(providers): address PR review — alias matching, retry chain, narrow exceptions

Code review and rubber-duck reviews surfaced six issues on the SDK-driven
context window refactor. This commit addresses all of them and applies
the simplifier's cleanups.

Issue 1: lost alias resolution (Important)
  Names like "claude-3-5-sonnet-latest" and short aliases used in our
  own examples (claude-haiku-4.5, claude-sonnet-4.5) silently lost
  context_window_max because the SDK's models.list() returns dated IDs
  (claude-3-5-sonnet-20241022), not the alias names users configure.

  Added match_model_id() in providers/base.py: a small alias-aware
  matcher (exact -> boundary-prefix in either direction -> suffix-strip
  retry). Both Copilot and Claude providers route SDK lookups through
  it. Covered by 11 unit tests in test_base.py.

Issue 2: output.model precedence was a string-choice, not a retry chain
  If output.model was an SDK-unknown variant (e.g. a reasoning-effort
  tier the provider doesn't list), the resolver returned None instead
  of falling back to agent.model.

  WorkflowEngine._get_context_window_for_agent now tries each candidate
  in order (output.model -> agent.model -> default) and returns the
  first non-None lookup. Covered by a new test in
  test_context_window_events.py.

Issue 3: lock held across the SDK round-trip (Minor)
  ClaudeProvider.get_max_prompt_tokens held _max_input_cache_lock
  while awaiting client.models.list(); parallel first-callers all
  blocked behind the same SDK call.

  Refactored: fetch outside the lock, lock only around the dict
  install. Also seed the cache from validate_connection() so callers
  who go through normal connection setup never pay for the round-trip.

Issue 4: transient SDK failure cached forever (Minor)
  Previous implementation installed an empty cache on exception, so
  every subsequent call returned None forever.

  Now: don't cache failures. The cache stays None and the next call
  retries. Covered by test_sdk_failure_returns_none_and_does_not_cache.

Issue 5: race in ClaudeProvider.close() (Critical)
  close() didn't acquire the cache lock and could close the client
  while a concurrent get_max_prompt_tokens() was awaiting models.list().

  Now: drop the _client reference *before* awaiting close(), and
  invalidate _max_input_cache. In-flight requests will error and be
  swallowed by the metadata path's narrow except.

Issue 6: error swallowing too broad (Minor)
  Three layers of `except Exception: return None` (provider x 2 +
  engine) hid genuine bugs.

  Provider methods now catch only SDK/transport errors narrowly
  (AnthropicError | OSError | TimeoutError for Claude; ProviderError |
  OSError | RuntimeError | TimeoutError for Copilot). The engine keeps
  its broad outer catch as the safety net so unexpected provider bugs
  still don't break workflows. Covered by test_unexpected_exception_propagates.

Simplifier cleanups applied:
  - Dropped redundant `if limits else None` ternary in copilot.py
    (getattr with default never raises).
  - Dropped `or {}` and `if matched_id else None` redundancies in
    claude.py get_max_prompt_tokens.
  - Inlined the one-line _model_known() helper.
  - Consolidated 5 Copilot test methods through shared _make_model and
    _provider_with_list_models helpers (~60 lines of duplication
    removed).

Validation:
  - 2029 passed, 9 skipped (1977 + 52 new/updated).
  - ruff check + format pass.
  - ty type check passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion (microsoft#129)

* fix(copilot): pass streaming=True to SDK to prevent tool-call truncation

The Copilot SDK's create_session accepts a 'streaming' parameter that
defaults to false. In non-streaming mode the model must emit its entire
turn (text + tool_use blocks + arguments) under a single per-turn output
budget. For agents that issue large tool-call arguments — most commonly
'create' with multi-KB 'file_text' — that budget is exhausted mid-JSON
and the CLI silently executes the partial tool call (path only, no
file_text). The model sees the tool succeed with no content, retries the
same broken call, and loops indefinitely until the wall-clock session
limit fires (default 1800s). The interactive 'copilot' CLI defaults to
streaming, which is why the same model + tool combination works there
but not via the SDK without this flag.

Empirically verified red→green on the same workflow + model
(claude-opus-4.7-1m-internal, single ~50 KB create tool call):
- Without streaming=True: 9m08s wall-clock failure, 0 bytes written
  (ProviderError: tool 'create' was executing).
- With streaming=True: 4m57s success, 62 KB written in a single
  create call.

Tests:
- tests/test_providers/test_copilot_streaming.py — unit test that
  verifies create_session is called with streaming=True (and that the
  existing required kwargs are preserved).
- tests/test_integration/test_copilot_large_write.py — opt-in
  (real_api marker) regression test that builds a workflow inline,
  asks the writer agent to produce a single large create call, and
  asserts the file is at least 30 KB. Skips automatically when no
  copilot CLI is available.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: add changelog entry for streaming fix (microsoft#129)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: add microsoft#107 and microsoft#109 to unreleased changelog

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: add microsoft#100, microsoft#110, microsoft#111, microsoft#139, microsoft#142, microsoft#143, microsoft#144 to unreleased changelog

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…templates in explicit mode (microsoft#119)

* fix(engine): ensure workflow.input is available for script and sub-workflow template rendering in explicit mode

In explicit context mode, build_for_agent() starts with workflow.input: {}
and only populates entries declared in the agent's input: list. Script agents
and sub-workflow input_mapping templates that reference workflow.input.X
without declaring it in their input: list get an empty dict, causing
TemplateError at runtime.

Script args and input_mapping are rendered locally (no LLM cost), so
workflow inputs must always be available for template resolution regardless
of context mode. This fix injects the full workflow_inputs into the template
context after build_for_agent() for script and workflow agent types.

Fixes: script agents in explicit mode failing with
  TemplateError: 'dict object' has no attribute '<field>'

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(engine): centralize local-render workflow.input carve-out in build_for_agent

Addresses three review threads on PR microsoft#119:

1. Centralize the explicit-mode injection.
   The previous fix duplicated 'workflow.input' injection at two
   build_for_agent call sites in _execute_loop. There are three other
   call sites (LLM dispatch, parallel-group worker, for-each item) that
   would have drifted as soon as a script/workflow agent ever reached
   them. Move the carve-out into build_for_agent itself, gated by a new
   agent_type parameter and a _LOCAL_RENDER_AGENT_TYPES constant. All
   five call sites now pass agent_type and the behavior is defined in
   one place.

2. Re-justify the carve-out.
   The original 'no LLM cost' rationale didn't carve out workflow.input
   from agent outputs — by that argument, '{{ planner.output.X }}' in a
   script's args would also need to always render. Reframe the rule as:
   workflow.input is the workflow's external interface (set once at
   startup, present for the lifetime of the run); per-step agent
   outputs remain explicitly declared in input: for traceability, even
   for local renders. Keeps the scope conservative and additive — we
   can broaden to outputs later with zero compatibility risk if
   input_mapping users hit the friction.

3. Test portability.
   Replace 'pwsh' (not installed on standard macOS / many Linux CI
   runners) with sys.executable + 'python -c', matching the convention
   established in tests/test_engine/test_script_workflow.py.

Also adds direct unit tests on build_for_agent for the new agent_type
parameter: script and workflow types get full workflow.input; agent
outputs are still filtered for them; LLM agents and human_gate types
are unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Daniel Green <dangreen@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…icrosoft#122)

* feat(engine): auto-parse script agent JSON stdout into output fields

When a script agent's stdout is valid JSON, the parsed fields are now
merged into the output dict alongside stdout/stderr/exit_code. This
makes parsed fields accessible in route conditions and downstream
templates as output.field_name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(engine): log when script JSON output shadows built-in fields

Add debug-level logging when a script's parsed JSON output contains keys
that shadow the built-in stdout/stderr/exit_code fields. Makes the
intentional shadowing behavior observable for debugging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(engine): expand script JSON stdout coverage; portability + cleanup

Address PR microsoft#122 review feedback from @jrob5756:

- Narrow exception to json.JSONDecodeError (drop redundant ValueError;
  JSONDecodeError is a subclass of ValueError).
- Move test from TestWorkflowEngineContextModes to dedicated
  TestScriptJsonStdout class in test_script_workflow.py.
- Swap pwsh for sys.executable to match repo convention and unbreak
  local dev on macOS / minimal Linux.
- Add coverage for documented behaviors so they don't silently regress:
  non-JSON stdout, JSON arrays/scalars (parametrized), shadowing, empty.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Daniel Green <dangreen@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ults (microsoft#123)

* fix(engine): use type-appropriate zero values for optional input defaults

Optional workflow inputs without an explicit `default:` previously
defaulted to Python None, which renders as "None" in templates and
isn't caught by Jinja's `| default()` filter without the boolean flag.

Now uses type-appropriate zero values: "" for string, 0 for number,
false for boolean, [] for array, {} for object. This ensures templates
render cleanly without requiring `| default()` guards or `if X else Y`
workarounds.

Explicit `default:` values in the schema are still honored.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(engine): avoid mutable shared defaults in _TYPE_ZERO_VALUES

Replace shared [] and {} instances in the class-level _TYPE_ZERO_VALUES
dict with a _zero_value_for_type() method that returns fresh copies for
mutable types (array, object). Prevents potential shared-state bugs if
a caller ever mutates the returned default.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(engine): replace pwsh e2e test with parametrized unit tests on _apply_input_defaults

Addresses review feedback on PR microsoft#123:
- Drops subprocess + pwsh dependency (other script tests use sys.executable)
- Covers all 5 InputDef.type values (string, number, boolean, array, object),
  not just string and number
- Adds invariant test on _zero_value_for_type that mutable types (array,
  object) return fresh instances — guards against a future 'optimization'
  caching the instances and reintroducing the shared-mutable-default bug
- Adds explicit-default-honored, provided-value-passthrough, and
  required-input-untouched cases

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Daniel Green <dangreen@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… variables (microsoft#121)

* feat(engine): add workflow.dir, workflow.file, workflow.name template variables

Adds three new template variables available in all agent contexts:
- {{ workflow.dir }}  - absolute path to the workflow YAML's directory
- {{ workflow.file }} - absolute path to the workflow YAML file
- {{ workflow.name }} - workflow name from the YAML config

These enable script agents to resolve co-located scripts relative to
the workflow file rather than CWD, which is critical for registry-based
workflows where scripts live alongside the YAML in the registry directory.

Example:
  args:
    - -File
    - {{ workflow.dir }}/scripts/detect-state.ps1

Available in all context modes (accumulate, last_only, explicit).
Empty strings are omitted from context to avoid polluting templates.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(engine): repopulate workflow metadata on context restore

Resume drops workflow_dir/file/name because WorkflowContext.from_dict()
omits absolute path metadata (intentionally, to keep checkpoints portable).
Without this, set_context() during the resume path silently wipes the
workflow.dir/file/name template variables that the rest of this PR exists
to provide.

set_context() now repopulates from self.workflow_path and self.config —
the engine knows the current path and is the source of truth.

Tests:
- test_set_context_repopulates_workflow_metadata: round-trip + assert
  metadata survives, including end-to-end via build_for_agent
- test_set_context_without_workflow_path_still_sets_name: path-less case
- test_engine_populates_workflow_metadata: wiring guard for __init__
- test_engine_workflow_metadata_empty_without_path: no-pollution guard

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Daniel Green <dangreen@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Daniel Green <dangreen@github.com>
Co-authored-by: Jason Roberts <jasont.robert@gmail.com>
…soft#125)

* feat(validate): catch template reference errors before runtime

Enhanced `conductor validate` to scan Jinja2 templates in agent prompts,
script args, input_mapping, and workflow output for reference errors:

Level 1 — Template reference resolution:
- {{ X.output.Y }} where X is not a valid agent name → error
- {{ workflow.input.X }} where X is not a declared input → error
- In explicit mode, LLM agents referencing agent outputs not in their
  input: list → warning

Also wires semantic validation (validate_workflow_config) into the CLI
`conductor validate` command, which previously only ran Pydantic schema
validation.

Catches errors like: stale agent name references after renames, missing
workflow input declarations, and undeclared dependencies in explicit mode
— all before runtime, at zero cost.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(validate): use Jinja2 AST instead of regex; remove dead code; add tests

Address review feedback on PR microsoft#125 from @jrob5756.

Replace regex-based template scanning with Jinja2's parser + AST walk.
The previous `_TEMPLATE_REF_PATTERN` matched any `ident.attr.attr`
chain in the raw template source, producing two classes of false
positives the reviewer flagged:

  - Loop variables: `{% for r in pg.outputs %}{{ r.output.text }}{% endfor %}`
    triggered "stale agent reference: r" because the regex doesn't
    track Jinja2 scopes.
  - String literals: `{{ "agent.output.text" }}` triggered the same
    error because the regex doesn't distinguish strings from code.

The new `_extract_template_refs` parses the template, calls
`meta.find_undeclared_variables` to get only the truly free names,
then walks `Getattr` chains rooted at those names. Loop variables,
`{% set %}` bindings, and macro params are excluded automatically
because they're declared inside the template.

Also remove dead code:
  - The `input_mapping` collection block in `_collect_template_strings`
    referenced an `AgentDef` field that doesn't exist (issue microsoft#101 hasn't
    landed). `getattr` always returned `None`.
  - `_resolve_prompt_file` had no callers.
  - The vestigial `has_full_workflow_input` flag — bare `workflow.input`
    is no longer a valid input declaration after the INPUT_REF_PATTERN
    tightening.

Add comprehensive tests:
  - `TestExtractTemplateRefs`: 12 unit tests covering loop vars, string
    literals, `{% set %}`, macros, builtins, malformed templates, and
    unknown filters/tests (the AST walk uses a tolerant filter map).
  - `TestInputRefPatternExtensions`: parametrized tests for `.errors`
    and bare `.outputs` that the PR added support for.
  - `TestTemplateReferenceValidation`: end-to-end stale-ref detection
    plus regression tests for both false-positive classes.
  - `TestExplicitModeWarnings`: warning emission for missing inputs
    in `context.mode: explicit`.
  - `TestOutputTemplateValidation`: workflow `output:` template checks.
  - `TestExamplesRegression`: loops over `examples/*.yaml` to prove
    no example breaks.
  - `TestSemanticValidationIntegration` in test_validate.py: 5 CLI-level
    tests covering exit codes, warning rendering, and false-positive
    non-blocking.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(validate): restore input_mapping template collection

PR microsoft#109 (closing microsoft#101) merged input_mapping onto AgentDef on main. The
previous commit dropped input_mapping handling on the assumption that
the field was nonexistent — true at the time of the PR's original
review, but no longer.

Restore the collection block (using getattr for forward-compat with
branches that haven't merged microsoft#109 yet), and add tests via a duck-typed
agent so coverage works regardless of whether the schema field is
present on this branch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* style: format test_validator.py to pass ruff format check

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Daniel Green <dangreen@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ity (microsoft#113)

* feat: breadcrumb navigation and depth-isolated subworkflow rendering

Adds full subworkflow awareness to the per-run web dashboard:

State pollution fix:
- Added wf_depth counter to workflow store — only depth-0
  workflow_started initializes root context. Inner workflow events
  are routed to isolated SubworkflowContext objects.
- Each subworkflow invocation gets its own nodes/routes/agents maps,
  keyed by (parentAgent, iteration). Repeated runs of the same
  subworkflow no longer share state.

Subworkflow event handling:
- Added TypeScript types for subworkflow_started, subworkflow_completed,
  subworkflow_failed events (mirrors engine emit).
- Event handlers create/update child contexts and track the active
  context path for routing subsequent events.

Breadcrumb navigation:
- New BreadcrumbBar component shows the context stack above the graph
  (e.g., Root > twig-sdlc-planning > plan-issue).
- Click any breadcrumb to navigate to that context level.
- Double-click a workflow agent node in the graph to dive into its
  subworkflow context.
- Graph rebuilds automatically when context changes.

Context stack architecture:
- SubworkflowContext[] tree structure mirrors workflow nesting.
- activeContextPath tracks where live events are routed.
- viewContextPath tracks what the user is viewing (independent).
- getViewedContext() returns the correct nodes/routes for rendering.
- All event handlers use activeTarget() helper to route to the
  correct context's nodes/groupProgress.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: subworkflow node visual, detail panel, and context-aware rendering

Phase 4-5 of breadcrumb navigation feature:

WorkflowNode component:
- New node type for type:'workflow' agents with dashed border and
  Layers icon (visually distinct from regular agent nodes).
- Shows child workflow name, elapsed time, and a chevron indicator
  when a SubworkflowContext exists.
- Double-click to navigate into the subworkflow graph.

SubworkflowDetail panel:
- New detail component shown when a workflow agent is selected.
- Lists all subworkflow runs for that agent with status, agent
  count, and cost summary.
- Click any run to navigate into its context.

Context-aware rendering:
- All graph node components (AgentNode, ScriptNode, GateNode,
  GroupNode, AnimatedEdge) now read from getViewedContext().nodes
  instead of root state.nodes — ensures correct status display
  when viewing child contexts.
- DetailPanel reads from viewed context for node lookup.
- GroupDetail reads groupProgress from viewed context.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: replace getViewedContext() selectors with stable hooks

getViewedContext() creates a new object on every call, causing infinite
re-render loops (React error #185) when used inside Zustand selectors.

New hooks in use-viewed-context.ts use useMemo with stable state
references:
- useViewedNodes() — nodes map for current context
- useViewedGroupProgress() — group progress for current context
- useViewedHighlightedEdges() — edge highlights
- useViewedSubworkflowContexts() — child contexts
- useViewedGraphData() — full graph data for WorkflowGraph

All graph components and detail panels updated to use these hooks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: Stop button now reliably stops subworkflows

When a subworkflow agent is running, the shared interrupt_event was
being silently consumed by _check_interrupt() in web mode (line 857:
'if self._web_dashboard is not None: return None'). This meant Stop
would pause the current agent but then resume and continue — the
interrupt never propagated to the parent engine.

Two fixes:

1. _check_interrupt(): when _subworkflow_depth > 0, raise
   InterruptError instead of silently consuming the interrupt.
   This unwinds the child engine back to the parent's
   _execute_subworkflow try/except, stopping the workflow.

2. _handle_web_pause(): in subworkflows, also watch interrupt_event
   alongside resume/kill/disconnect events. A second Stop click
   while an agent is paused now raises InterruptError immediately,
   without requiring Resume first.

Root-level (depth 0) behavior is unchanged — Stop still pauses the
current agent with Resume/Kill options in the dashboard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: guard against proactor accept-loop race on Windows (Python 3.14+)

On Windows with Python 3.14+, the proactor event loop's accept callback
can fire after Server.close() sets _sockets = None during shutdown,
causing an AssertionError in base_events.py:_attach that crashes the
workflow process.

Fix:
- Add _guarded_serve() wrapper that catches AssertionError when the
  uvicorn server is in shutdown state (should_exit = True)
- Install a custom event-loop exception handler during server lifetime
  that suppresses the same race when it surfaces through callbacks
- _is_proactor_shutdown_race() validates: AssertionError type, server
  shutdown state, and asyncio-originating traceback frames
- Restore original exception handler in stop()

The guard is narrowly scoped: only AssertionError during server shutdown
is suppressed. All other exceptions delegate to the original handler.

Tests: 9 new tests covering the race detection, exception handler
delegation, guarded serve behavior, and edge cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(web): clear stale graph edges when navigating between workflow layers

When navigating between subworkflow layers via breadcrumbs or double-click,
old React Flow edges from the previous layer persisted as floating links
disconnected from any visible nodes.

Two fixes:
- WorkflowGraph: explicitly clear nodes and edges when switching to an
  empty context (subworkflow data not yet populated)
- graph-layout: filter edges against the actual node ID set to prevent
  orphan edges from routes referencing non-existent nodes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(web): add URL query param deep-linking for agent and subworkflow nodes

Parse ?agent={name} and ?subworkflow={name} query params on initial load
to auto-select and center the matching node in the workflow graph. This
enables the meta-dashboard (conductor-dashboard) to generate clickable
breadcrumb links that open the conductor UI focused on a specific node.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(web): support nested subworkflow paths and combined agent deep-links

Update useDeepLink hook to:
- Parse slash-separated subworkflow paths (e.g., ?subworkflow=planning/design)
  for navigating multiple levels deep into nested subworkflows
- Support combined ?subworkflow=X&agent=Y to select an agent within
  a subworkflow context
- Remove dependency on subworkflowContexts selector (array mutation
  doesn't trigger re-renders); rely on late-joiner replay instead

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: add dashboard deep-link specification

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(web): rewrite deep-link hook for reliability + error feedback

Rewrote useDeepLink to fix timing issues that prevented navigation:

- Use zustand.subscribe() instead of useEffect + selector reactivity.
  The old approach relied on subworkflowContexts selector changes,
  but the store mutates the array in-place during processEvent,
  so the selector never detected changes.
- Resolve the full subworkflow path in one shot via index walking
  instead of calling navigateIntoSubworkflow() in a loop.
- Set viewContextPath directly via setState instead of relying on
  action functions that might see stale state.
- Add error banner when deep-link target is invalid: shows the error
  message and a link back to the root dashboard.
- Validate agent exists in the target context's agent list.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(web): add ingress/egress nodes for sub-workflow views

When viewing a sub-workflow graph, the start and end nodes are now
replaced with distinct ingress/egress nodes that:
- Use a dashed border with rounded-xl style to visually distinguish
  them from regular start/end nodes
- Display 'From <parent agent>' on the ingress node
- Display 'Return to <parent agent>' on the egress node
- Navigate back to the parent workflow on double-click

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* build: rebuild frontend static assets

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(web): route nested sub-workflow events by engine-supplied path

The breadcrumb navigation feature could not reach gates inside nested
sub-workflows: the dashboard's view path did not follow the active
context as a parent agent spawned a child workflow, so a human_gate
inside that child was unreachable from the breadcrumb tree.

The root cause: the store inferred sub-workflow parentage from the
shared activeContextPath, then mutated it on every subworkflow_started.
That conflated the user's view cursor with the engine's execution
cursor and made events impossible to route correctly under any
concurrency.

Engine: each WorkflowEngine now carries _dashboard_context_path, the
slot-key path identifying its position in the recursive sub-workflow
tree (root = []). _execute_subworkflow accepts a slot_key and threads
[*parent_path, slot_key] into the spawned child engine. _emit auto-
stamps subworkflow_path on every event a sub-engine produces.
Sequential subworkflow_started/_completed/_failed include parent_path
and slot_key.

Frontend: subworkflow_started/completed/failed and workflow_completed/
failed handlers resolve the owning context strictly from engine-
supplied parent_path / subworkflow_path via a new resolveSlotPath
helper, instead of mutating a single shared activeContextPath.
activeTarget consults subworkflow_path on event data when present so
per-iteration agent_message/tool/turn events land in the right per-
iteration context. viewContextPath sticks to the live edge when the
user has not navigated away, so newly-spawned gates inside sub-
workflows are reachable without manual breadcrumb clicks. Handlers
fall back to legacy behavior when these new fields are absent.

SubworkflowDetail navigates by ctx.slotKey (stable across iteration
reorders) instead of (agent_name, iteration). Breadcrumbs render slot
keys so concurrent iterations are distinguishable.

Adds TestSubWorkflowDashboardPath covering parent_path/slot_key
emission for sequential sub-workflows and the auto-stamped
subworkflow_path on the child workflow_completed event.

This is a self-contained slice of a broader fix that also addresses
concurrent for_each-of-workflow stacking; the matching for_each-side
emit changes are staged in microsoft#110 and become functional once both PRs
land. Order of merge does not matter — handlers degrade gracefully
when events lack the new fields.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(web): support for_each iteration deep-link notation

Add flexible segment matching for subworkflow deep-links:
  1. Exact slotKey match (e.g. plan_child[item-0])
  2. Positional index (e.g. plan_child#0, 0-based)
  3. Bare agent name (when unambiguous)

Ambiguous bare names (multiple for_each iterations) now produce
an actionable error listing valid alternatives instead of silently
failing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(web): remap group-child edges to parent node to prevent floating arrows

Edges whose source or target was a child node inside a parallel group
rendered at incorrect absolute positions (React Flow positions children
relative to the parent group). This caused orphaned arrowheads in the
top-left corner of the graph.

Fix: remap any edge endpoint that references a group child to the
parent group node instead, so dagre can properly route the edge and
React Flow draws it at the correct position.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(web): auto-fit viewport on context switch

Add FitViewOnContextSwitch component that triggers fitView when
navigating between sub-workflow contexts, preventing stale viewport
positioning.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(web): reset viewContextPath for agent-only deep-links

When ?agent= is provided without ?subworkflow=, explicitly pin
viewContextPath to root ([]) before the agent lookup runs. This
prevents the sticky-follow mechanism from advancing the view into
a stale subworkflow/for_each iteration during WS replay, which
caused 'Agent not found' errors for root-level agents.

Also handles ?subworkflow=&agent=foo (empty subworkflow string)
correctly — empty string is falsy so falls through to the
agent-only reset path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(web): route child workflow_started by engine slot path

Under concurrent for_each iterations, the global activeContextPath is
advanced by each subworkflow_started event. When workflow_started for
an earlier iteration's child engine arrived, it was resolved against
the (now-stale) activeContextPath, so its agents/routes were written
to the wrong sibling ctx — or got overwritten by a later sibling.
That manifested as phantom routes (and missing routes) when the user
navigated into a deeply-nested for_each iteration: routes from one
iteration would surface in another, leaving edges drawn between
agents that don't appear in the iteration's view.

workflow_completed and workflow_failed already used the engine-
supplied subworkflow_path slot key (commit f77b20b); workflow_started
was the one handler missed. This change makes it consistent with
its sibling handlers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(web): bail on slot-resolution miss in subworkflow_completed/failed

Both handlers used 
esolved?.indexPath ?? [] as a fallback when
resolveSlotPath returned null. That silently routed the event to the
root context, with two distinct corruption modes:

  1. 	argetNodes defaulted to state.nodes (root), so a same-named
     root agent had its status overwritten to 'completed' or 'failed'
     by an unrelated subworkflow event.
  2. state.activeContextPath = parentIndexPath then unconditionally
     reset the active path to [], breaking subsequent sibling routing.

Mirror the guard that subworkflow_started already uses (line 1365):
when resolveSlotPath returns null, return early. A null result means
the event arrived before its sibling subworkflow_started or the path
is inconsistent — neither warrants polluting root state.

Reported-by: jrob5756 (PR microsoft#113 review)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(engine): cover subworkflow stop/interrupt and dashboard event payloads

Adds coverage for code paths that previously had none, all introduced
by the breadcrumb-navigation PR:

  TestCheckInterruptSubworkflow
    - test_check_interrupt_raises_in_subworkflow
        Sub-engine in web mode propagates the interrupt as
        InterruptError so the child unwinds back to the parent.
    - test_check_interrupt_consumed_silently_at_root
        Regression guard for the root-depth web-mode silent-consume
        branch (the partial-output handler does the real pausing).

  TestHandleWebPauseSubworkflow
    - test_handle_web_pause_stop_event_in_subworkflow
        Pause-then-Stop in a sub-workflow exits via interrupt_event
        without requiring Resume first.
    - test_handle_web_pause_root_ignores_interrupt_event
        Documents the intentional root-vs-subworkflow asymmetry: at
        root, only Resume or Kill exit a pause.

  TestSubWorkflowDashboardPath (additions)
    - test_subworkflow_failed_event_carries_parent_path_and_slot_key
        The exception branch of _execute_subworkflow emits
        subworkflow_failed with parent_path and slot_key. Previously
        only the success-path emit was asserted.
    - test_nested_subworkflow_path_accumulates
        At depth >= 2 (parent -> mid -> leaf), each engine emits its
        own workflow_completed and the auto-stamped subworkflow_path
        chains correctly across nesting levels.
    - test_concurrent_for_each_subworkflow_emits_distinct_slot_keys
        Existing for_each test ran with max_concurrent=1; this variant
        uses max_concurrent=3 so iterations actually overlap, proving
        slot_key uniqueness is not an artifact of serial execution.

Reported-by: jrob5756 (PR microsoft#113 review)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(engine): document root-vs-subworkflow pause/stop asymmetry

Adds an inline comment in _handle_web_pause explaining:

  1. Why root depth deliberately omits the interrupt_event subscription
     (pause exits only on Resume or Kill).
  2. Why subworkflow depth subscribes to it (so Stop unwinds the child
     engine without requiring Resume first).
  3. The tiny clear()/create_task() race window where a Stop click can
     be silently discarded, and the trade-off vs. carrying a stale Stop
     signal across pause cycles.

No behavior change. Documents an intentional design choice flagged in
PR microsoft#113 review.

Reported-by: jrob5756 (PR microsoft#113 review)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* build: rebuild frontend static assets after fix-chain restoration

Rebuilds the production bundle so it incorporates the
subworkflow_completed/failed slot-resolution fix (f374a19) that was
cherry-picked into this branch. Without this rebuild the committed
`static/` bundle would not reflect the source change and the dashboard
would still exhibit the bug at runtime.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(web): transitive agent search for agent-only deep-links

The previous fix (593f4e1) reset viewContextPath to root for agent-only
deep-links to stop sticky-follow from stranding the user inside a stale
for_each iteration during replay. That solved the false-negative case
(agent at root) but introduced a false-negative for nested agents: if
the agent only exists inside a sub-workflow / for_each iteration, the
resolver would now error with 'Agent X not found in root workflow.'
That broke external integrations (e.g. notification feeds) that surface
?agent=X without knowing the parent slot chain.

Resolution algorithm now:
  - subworkflow=foo&agent=bar: navigate to foo, look for bar there;
    if bar isn't at foo but exists elsewhere, list discovered locations
    in the error so the next click is obvious.
  - agent=bar (no subworkflow): try root first; otherwise walk every
    sub-workflow context. On exactly one match: navigate there. On
    many matches (e.g. bar ran in every for_each iteration): pick
    running > deepest > newest, mirroring the engine's live-event
    routing precedence.
  - Zero matches anywhere: deterministic error, view pinned to root
    so sticky-follow still doesn't strand.

Also reworks the wait condition: instead of firing on the first
state change after agents.length > 0 (which races against WS replay
of nested subworkflow_started events), the resolver debounces 200ms
of state quiescence and only applies once the target is resolvable
or the workflow has reached a terminal state. A 5s hard cap keeps
live-workflow deep-links from hanging when the target never appears.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Daniel Green <dangreen@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep both dialog features (from PR microsoft#130) and breadcrumb/subworkflow
features (from main via microsoft#113). Conflict resolution:

- schema.py: keep both dialog + max_depth validations for human_gate/script
- claude.py: keep _max_input_cache reset in close() AND execute_dialog_turn method
- DetailPanel.tsx: import both DialogDetail and SubworkflowDetail
- workflow-store.ts: include both dialog state + subworkflow state in all reset blocks
- events.ts: include both Dialog* and Subworkflow* event interfaces
- test_claude.py / test_copilot.py: keep both dialog turn + max prompt tokens test classes
- static assets: rebuilt from merged source

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PolyphonyRequiem PolyphonyRequiem merged commit 664a0c4 into feat/agent-dialog-mode-clean May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants