merge: resolve conflicts with main#5
Merged
PolyphonyRequiem merged 16 commits intofeat/agent-dialog-mode-cleanfrom May 4, 2026
Merged
merge: resolve conflicts with main#5PolyphonyRequiem merged 16 commits intofeat/agent-dialog-mode-cleanfrom
PolyphonyRequiem merged 16 commits intofeat/agent-dialog-mode-cleanfrom
Conversation
…rosoft#100) * fix(copilot): infer nested prompt schema from output definitions Build the Copilot prompt schema recursively from agent output definitions so nested object properties and array item schemas are surfaced to the model and parse-recovery flow. Add regression tests for recursive schema generation and the actual prompt sent to Copilot. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(copilot): add recursion depth guard to prompt schema builders Prevent RecursionError on pathologically deep output schemas by adding a depth parameter to _build_prompt_schema, _build_prompt_field_schema, and _build_prompt_item_schema, matching the existing pattern in claude.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tests): line-too-long (E501) linting errors * style: ruff format copilot.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Lester Sanchez <lester@dotnetting.net> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Jason Robert <jasont.robert@gmail.com>
) (microsoft#110) * feat(composition): allow sub-workflows in for_each groups (microsoft#102) Remove validator restriction blocking type='workflow' in for_each groups. Wire execute_single_item() to call _execute_subworkflow_with_inputs() for workflow agents, rendering input_mapping with loop variables in scope. - Validator: Remove workflow rejection in for_each validation - Engine: Add workflow branch in execute_single_item(), new helper _execute_subworkflow_with_inputs() for pre-built inputs - Tests: Update test_workflow_in_for_each to validate (not reject) - Experimental workflows: test-for-each-workflow parent/child pair Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * style: ruff format workflow.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Emit parent_path/slot_key/iteration on for_each-of-workflow lifecycle For each iteration of a for_each whose member is a sub-workflow, emit subworkflow_started / _completed / _failed with three new fields: - parent_path: snapshot of the parent engine's dashboard context path (empty list when the dashboard infra from microsoft#113 is absent) - slot_key: a unique "<group>[<key>]" identity per iteration so concurrent iterations don't stack under one shared dashboard context - iteration: 1-based ordinal (matches the existing item_key) Conditionally thread `_dashboard_context_path` into the child engine via getattr so this code is forward-compatible with PR microsoft#113 landing in either order. When microsoft#113 has not landed, the kwarg is omitted and behavior is unchanged. When microsoft#113 has landed, the child engine receives [*parent_path, slot_key] and auto-stamps subworkflow_path on every event it emits. Wraps `_execute_subworkflow_with_inputs` in try/except so a failing iteration emits subworkflow_failed before for_each_item_failed; the original exception propagates to keep existing error handling intact. Adds a regression test that runs three for_each iterations and asserts each gets a distinct slot_key (batch[0], batch[1], batch[2]) on both started and completed events. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Green <dangreen@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bumps [postcss](https://github.com/postcss/postcss) from 8.5.6 to 8.5.12. - [Release notes](https://github.com/postcss/postcss/releases) - [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md) - [Commits](postcss/postcss@8.5.6...8.5.12) --- updated-dependencies: - dependency-name: postcss dependency-version: 8.5.12 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…output (microsoft#139) * fix(engine): coerce Python literal "True"/"False"/"None" in workflow output Workflow output templates pass through `_maybe_parse_json` to convert JSON-shaped strings back into native types. Previously this only recognized lowercase JSON literals (`true`/`false`/`null`). Jinja expressions like `{{ a == b }}` render Python bool via `str()`, producing `"True"` / `"False"`, which then survived as truthy non-empty strings downstream. Route `when:` clauses comparing such values against `true` / `false` silently misbehaved. Add explicit handling for the three Python literal forms before the existing JSON parse path. Lowercase JSON literals continue to work (regression covered). Tests: 3 new cases under `TestWorkflowEngineOutputTemplates` covering `True`/`False` from `==` / `!=` expressions, `None` from `{{ none }}`, and a regression check for the lowercase forms. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test: assert native bool from workflow output, not stringified Two existing integration tests asserted the broken behavior they were exercising: - test_examples.py:214: `result["syntax_passed"] == "True"` - test_parallel_workflows.py:410: `result["success"] == "True"` Both had inline comments acknowledging the workaround (`# Templates return strings`, `# Boolean rendered as string`). With this PR's fix, those values now coerce to native bool. Update the assertions accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Green <dangreen@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…on PATH (microsoft#142) uv tool install only modifies the current process PATH. New terminals, sub-processes, CI agents, and IDE extensions inherit PATH from the user registry (Windows) or shell rc files (Unix) and would not find conductor. Run uv tool update-shell after a successful install to add the bin dir to the persistent PATH. The call is idempotent and wrapped so a failure does not abort the install — it falls back to a hint telling the user to run the command manually. Closes microsoft#115 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…acking (microsoft#103) (microsoft#111) * feat(composition): allow self-referential sub-workflows with depth tracking (microsoft#103) Remove the circular reference path check that blocked workflows from referencing themselves. The existing MAX_SUBWORKFLOW_DEPTH=10 already prevents infinite recursion. Add optional per-agent max_depth field for tighter author-controlled bounds. - Engine: Remove self-reference path equality check in both _execute_subworkflow() and _execute_subworkflow_with_inputs() - Engine: Add per-agent max_depth enforcement alongside global limit - Schema: Add max_depth field to AgentDef with validation - Tests: Replace circular reference test with depth-limit tests - Experimental: test-recursive.yaml self-referential countdown Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: unify sub-workflow input handling and report real usage stats Address PR microsoft#111 review feedback: - Extract _build_subworkflow_inputs helper with JSON-parse-with-fallback used by both _execute_subworkflow and for_each workflow paths - Change _execute_subworkflow_with_inputs to return (output, WorkflowUsage) - Emit real token/cost data in for_each_item_completed events instead of hardcoded zeros - Update test expectation to match unified JSON-parse behavior Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Green <dangreen@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…crosoft#143) * fix(pricing): warn once when get_pricing falls back to fuzzy match When get_pricing() resolves a model name via the longest-prefix or suffix-strip fallback paths, it silently returned a sibling model's ModelPricing — including its context_window — with no log line. Names like "claude-opus-4-1m-internal" inherited claude-opus-4's 200K window even though the suffix suggests 1M, and the dashboard / cost calc treated those numbers as authoritative. This change emits a one-time logging.warning per requested model name when get_pricing() returns a non-exact entry, naming both the requested model and the matched key. Exact matches, overrides, and unknown models (None) do not warn. De-duped via a module-level set so hot-loop callers don't spam logs. Behavior is otherwise unchanged — this is the smallest viable change suggested in microsoft#137 and is fully backward-compatible. Closes microsoft#137 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(pricing): require '-' delimiter on prefix match; remove dead suffix-strip code Tightens the longest-prefix fallback in get_pricing() to require a '-' delimiter after the matched key (`model.startswith(known_model + "-")`). Without the delimiter, names that share a textual prefix with a known key but belong to a different model family — e.g. claude-opus-4.7-high matching claude-opus-4 — silently inherited the wrong context_window and pricing. The four repro names from microsoft#137 now correctly return None and degrade gracefully (dashboard hides the bar; cost is null) rather than reporting confidently wrong data. Real versioned names like claude-sonnet-4-20250514 still match claude-sonnet-4 because the date suffix is preceded by '-'. Also removes the suffix-strip and suffix-strip+longest-prefix branches: they were unreachable because longest-prefix runs first and catches every name they would have simplified. Updates the strategy label in fuzzy-match warnings from "longest-prefix" to "versioned-suffix" to reflect the new, narrower matching semantics. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…microsoft#144) * feat(engine): source context_window_max from provider SDKs at runtime The dashboard's "context window remaining" bar (and any future enforcement) reads context_window_max emitted in agent_started, agent_completed, and parallel_agent_completed events. Until now this value came from a hand-maintained context_window field on ModelPricing entries — but that field reported the *theoretical model max*, not the SDK's actual max_prompt_tokens. Concrete impact (from the GitHub community discussion #186340 raw API response on 2026-02-04): Model context_window max_prompt Bar wrong by gpt-5.x variants 400K 128K 3.1x claude-opus-4.5 200K 128K 1.6x claude-opus-4.6 1M 200K 5.0x This change replaces the static lookup with a per-provider runtime query so the bar always reflects the SDK's actual cap. - AgentProvider.get_max_prompt_tokens(model) -> int | None Concrete default returns None; Copilot and Claude providers override to query their SDKs (cached). - WorkflowEngine resolves the provider via the same path as execution (single-provider or registry) and prefers output.model over agent.model when an output is available. - Failures (SDK error, missing client, misconfigured mock) are swallowed and return None; metadata is best-effort and must never block workflow execution. Cleanup of dead code per scope: - Removed context_window field from ModelPricing and from all ~30 entries in DEFAULT_PRICING. - Updated fuzzy-match warning text to drop "context_window". - Deleted tests/test_providers/test_context_window.py (the entire file tested the static-table lookup that no longer exists). - New TestGetMaxPromptTokens classes in test_copilot.py and test_claude.py covering known model, unknown model, SDK failure, cache-after-first-call, and mock-mode fallback. - Updated test_context_window_events.py to inject expected values via a fake get_max_prompt_tokens (mock-handler mode no longer has a static table to fall back to). Added two new resolution-order tests covering default-model fallback and output.model preference. Pricing data ($/Mtok) is still hand-maintained — neither SDK exposes per-token dollar amounts, so DEFAULT_PRICING still needs entries for new models for cost math. Only the context-window field becomes self-correcting. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(providers): address PR review — alias matching, retry chain, narrow exceptions Code review and rubber-duck reviews surfaced six issues on the SDK-driven context window refactor. This commit addresses all of them and applies the simplifier's cleanups. Issue 1: lost alias resolution (Important) Names like "claude-3-5-sonnet-latest" and short aliases used in our own examples (claude-haiku-4.5, claude-sonnet-4.5) silently lost context_window_max because the SDK's models.list() returns dated IDs (claude-3-5-sonnet-20241022), not the alias names users configure. Added match_model_id() in providers/base.py: a small alias-aware matcher (exact -> boundary-prefix in either direction -> suffix-strip retry). Both Copilot and Claude providers route SDK lookups through it. Covered by 11 unit tests in test_base.py. Issue 2: output.model precedence was a string-choice, not a retry chain If output.model was an SDK-unknown variant (e.g. a reasoning-effort tier the provider doesn't list), the resolver returned None instead of falling back to agent.model. WorkflowEngine._get_context_window_for_agent now tries each candidate in order (output.model -> agent.model -> default) and returns the first non-None lookup. Covered by a new test in test_context_window_events.py. Issue 3: lock held across the SDK round-trip (Minor) ClaudeProvider.get_max_prompt_tokens held _max_input_cache_lock while awaiting client.models.list(); parallel first-callers all blocked behind the same SDK call. Refactored: fetch outside the lock, lock only around the dict install. Also seed the cache from validate_connection() so callers who go through normal connection setup never pay for the round-trip. Issue 4: transient SDK failure cached forever (Minor) Previous implementation installed an empty cache on exception, so every subsequent call returned None forever. Now: don't cache failures. The cache stays None and the next call retries. Covered by test_sdk_failure_returns_none_and_does_not_cache. Issue 5: race in ClaudeProvider.close() (Critical) close() didn't acquire the cache lock and could close the client while a concurrent get_max_prompt_tokens() was awaiting models.list(). Now: drop the _client reference *before* awaiting close(), and invalidate _max_input_cache. In-flight requests will error and be swallowed by the metadata path's narrow except. Issue 6: error swallowing too broad (Minor) Three layers of `except Exception: return None` (provider x 2 + engine) hid genuine bugs. Provider methods now catch only SDK/transport errors narrowly (AnthropicError | OSError | TimeoutError for Claude; ProviderError | OSError | RuntimeError | TimeoutError for Copilot). The engine keeps its broad outer catch as the safety net so unexpected provider bugs still don't break workflows. Covered by test_unexpected_exception_propagates. Simplifier cleanups applied: - Dropped redundant `if limits else None` ternary in copilot.py (getattr with default never raises). - Dropped `or {}` and `if matched_id else None` redundancies in claude.py get_max_prompt_tokens. - Inlined the one-line _model_known() helper. - Consolidated 5 Copilot test methods through shared _make_model and _provider_with_list_models helpers (~60 lines of duplication removed). Validation: - 2029 passed, 9 skipped (1977 + 52 new/updated). - ruff check + format pass. - ty type check passes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion (microsoft#129) * fix(copilot): pass streaming=True to SDK to prevent tool-call truncation The Copilot SDK's create_session accepts a 'streaming' parameter that defaults to false. In non-streaming mode the model must emit its entire turn (text + tool_use blocks + arguments) under a single per-turn output budget. For agents that issue large tool-call arguments — most commonly 'create' with multi-KB 'file_text' — that budget is exhausted mid-JSON and the CLI silently executes the partial tool call (path only, no file_text). The model sees the tool succeed with no content, retries the same broken call, and loops indefinitely until the wall-clock session limit fires (default 1800s). The interactive 'copilot' CLI defaults to streaming, which is why the same model + tool combination works there but not via the SDK without this flag. Empirically verified red→green on the same workflow + model (claude-opus-4.7-1m-internal, single ~50 KB create tool call): - Without streaming=True: 9m08s wall-clock failure, 0 bytes written (ProviderError: tool 'create' was executing). - With streaming=True: 4m57s success, 62 KB written in a single create call. Tests: - tests/test_providers/test_copilot_streaming.py — unit test that verifies create_session is called with streaming=True (and that the existing required kwargs are preserved). - tests/test_integration/test_copilot_large_write.py — opt-in (real_api marker) regression test that builds a workflow inline, asks the writer agent to produce a single large create call, and asserts the file is at least 30 KB. Skips automatically when no copilot CLI is available. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add changelog entry for streaming fix (microsoft#129) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add microsoft#107 and microsoft#109 to unreleased changelog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add microsoft#100, microsoft#110, microsoft#111, microsoft#139, microsoft#142, microsoft#143, microsoft#144 to unreleased changelog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…templates in explicit mode (microsoft#119) * fix(engine): ensure workflow.input is available for script and sub-workflow template rendering in explicit mode In explicit context mode, build_for_agent() starts with workflow.input: {} and only populates entries declared in the agent's input: list. Script agents and sub-workflow input_mapping templates that reference workflow.input.X without declaring it in their input: list get an empty dict, causing TemplateError at runtime. Script args and input_mapping are rendered locally (no LLM cost), so workflow inputs must always be available for template resolution regardless of context mode. This fix injects the full workflow_inputs into the template context after build_for_agent() for script and workflow agent types. Fixes: script agents in explicit mode failing with TemplateError: 'dict object' has no attribute '<field>' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(engine): centralize local-render workflow.input carve-out in build_for_agent Addresses three review threads on PR microsoft#119: 1. Centralize the explicit-mode injection. The previous fix duplicated 'workflow.input' injection at two build_for_agent call sites in _execute_loop. There are three other call sites (LLM dispatch, parallel-group worker, for-each item) that would have drifted as soon as a script/workflow agent ever reached them. Move the carve-out into build_for_agent itself, gated by a new agent_type parameter and a _LOCAL_RENDER_AGENT_TYPES constant. All five call sites now pass agent_type and the behavior is defined in one place. 2. Re-justify the carve-out. The original 'no LLM cost' rationale didn't carve out workflow.input from agent outputs — by that argument, '{{ planner.output.X }}' in a script's args would also need to always render. Reframe the rule as: workflow.input is the workflow's external interface (set once at startup, present for the lifetime of the run); per-step agent outputs remain explicitly declared in input: for traceability, even for local renders. Keeps the scope conservative and additive — we can broaden to outputs later with zero compatibility risk if input_mapping users hit the friction. 3. Test portability. Replace 'pwsh' (not installed on standard macOS / many Linux CI runners) with sys.executable + 'python -c', matching the convention established in tests/test_engine/test_script_workflow.py. Also adds direct unit tests on build_for_agent for the new agent_type parameter: script and workflow types get full workflow.input; agent outputs are still filtered for them; LLM agents and human_gate types are unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Green <dangreen@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…icrosoft#122) * feat(engine): auto-parse script agent JSON stdout into output fields When a script agent's stdout is valid JSON, the parsed fields are now merged into the output dict alongside stdout/stderr/exit_code. This makes parsed fields accessible in route conditions and downstream templates as output.field_name. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(engine): log when script JSON output shadows built-in fields Add debug-level logging when a script's parsed JSON output contains keys that shadow the built-in stdout/stderr/exit_code fields. Makes the intentional shadowing behavior observable for debugging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test(engine): expand script JSON stdout coverage; portability + cleanup Address PR microsoft#122 review feedback from @jrob5756: - Narrow exception to json.JSONDecodeError (drop redundant ValueError; JSONDecodeError is a subclass of ValueError). - Move test from TestWorkflowEngineContextModes to dedicated TestScriptJsonStdout class in test_script_workflow.py. - Swap pwsh for sys.executable to match repo convention and unbreak local dev on macOS / minimal Linux. - Add coverage for documented behaviors so they don't silently regress: non-JSON stdout, JSON arrays/scalars (parametrized), shadowing, empty. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Green <dangreen@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ults (microsoft#123) * fix(engine): use type-appropriate zero values for optional input defaults Optional workflow inputs without an explicit `default:` previously defaulted to Python None, which renders as "None" in templates and isn't caught by Jinja's `| default()` filter without the boolean flag. Now uses type-appropriate zero values: "" for string, 0 for number, false for boolean, [] for array, {} for object. This ensures templates render cleanly without requiring `| default()` guards or `if X else Y` workarounds. Explicit `default:` values in the schema are still honored. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(engine): avoid mutable shared defaults in _TYPE_ZERO_VALUES Replace shared [] and {} instances in the class-level _TYPE_ZERO_VALUES dict with a _zero_value_for_type() method that returns fresh copies for mutable types (array, object). Prevents potential shared-state bugs if a caller ever mutates the returned default. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test(engine): replace pwsh e2e test with parametrized unit tests on _apply_input_defaults Addresses review feedback on PR microsoft#123: - Drops subprocess + pwsh dependency (other script tests use sys.executable) - Covers all 5 InputDef.type values (string, number, boolean, array, object), not just string and number - Adds invariant test on _zero_value_for_type that mutable types (array, object) return fresh instances — guards against a future 'optimization' caching the instances and reintroducing the shared-mutable-default bug - Adds explicit-default-honored, provided-value-passthrough, and required-input-untouched cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Green <dangreen@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… variables (microsoft#121) * feat(engine): add workflow.dir, workflow.file, workflow.name template variables Adds three new template variables available in all agent contexts: - {{ workflow.dir }} - absolute path to the workflow YAML's directory - {{ workflow.file }} - absolute path to the workflow YAML file - {{ workflow.name }} - workflow name from the YAML config These enable script agents to resolve co-located scripts relative to the workflow file rather than CWD, which is critical for registry-based workflows where scripts live alongside the YAML in the registry directory. Example: args: - -File - {{ workflow.dir }}/scripts/detect-state.ps1 Available in all context modes (accumulate, last_only, explicit). Empty strings are omitted from context to avoid polluting templates. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(engine): repopulate workflow metadata on context restore Resume drops workflow_dir/file/name because WorkflowContext.from_dict() omits absolute path metadata (intentionally, to keep checkpoints portable). Without this, set_context() during the resume path silently wipes the workflow.dir/file/name template variables that the rest of this PR exists to provide. set_context() now repopulates from self.workflow_path and self.config — the engine knows the current path and is the source of truth. Tests: - test_set_context_repopulates_workflow_metadata: round-trip + assert metadata survives, including end-to-end via build_for_agent - test_set_context_without_workflow_path_still_sets_name: path-less case - test_engine_populates_workflow_metadata: wiring guard for __init__ - test_engine_workflow_metadata_empty_without_path: no-pollution guard Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Green <dangreen@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Daniel Green <dangreen@github.com> Co-authored-by: Jason Roberts <jasont.robert@gmail.com>
…soft#125) * feat(validate): catch template reference errors before runtime Enhanced `conductor validate` to scan Jinja2 templates in agent prompts, script args, input_mapping, and workflow output for reference errors: Level 1 — Template reference resolution: - {{ X.output.Y }} where X is not a valid agent name → error - {{ workflow.input.X }} where X is not a declared input → error - In explicit mode, LLM agents referencing agent outputs not in their input: list → warning Also wires semantic validation (validate_workflow_config) into the CLI `conductor validate` command, which previously only ran Pydantic schema validation. Catches errors like: stale agent name references after renames, missing workflow input declarations, and undeclared dependencies in explicit mode — all before runtime, at zero cost. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(validate): use Jinja2 AST instead of regex; remove dead code; add tests Address review feedback on PR microsoft#125 from @jrob5756. Replace regex-based template scanning with Jinja2's parser + AST walk. The previous `_TEMPLATE_REF_PATTERN` matched any `ident.attr.attr` chain in the raw template source, producing two classes of false positives the reviewer flagged: - Loop variables: `{% for r in pg.outputs %}{{ r.output.text }}{% endfor %}` triggered "stale agent reference: r" because the regex doesn't track Jinja2 scopes. - String literals: `{{ "agent.output.text" }}` triggered the same error because the regex doesn't distinguish strings from code. The new `_extract_template_refs` parses the template, calls `meta.find_undeclared_variables` to get only the truly free names, then walks `Getattr` chains rooted at those names. Loop variables, `{% set %}` bindings, and macro params are excluded automatically because they're declared inside the template. Also remove dead code: - The `input_mapping` collection block in `_collect_template_strings` referenced an `AgentDef` field that doesn't exist (issue microsoft#101 hasn't landed). `getattr` always returned `None`. - `_resolve_prompt_file` had no callers. - The vestigial `has_full_workflow_input` flag — bare `workflow.input` is no longer a valid input declaration after the INPUT_REF_PATTERN tightening. Add comprehensive tests: - `TestExtractTemplateRefs`: 12 unit tests covering loop vars, string literals, `{% set %}`, macros, builtins, malformed templates, and unknown filters/tests (the AST walk uses a tolerant filter map). - `TestInputRefPatternExtensions`: parametrized tests for `.errors` and bare `.outputs` that the PR added support for. - `TestTemplateReferenceValidation`: end-to-end stale-ref detection plus regression tests for both false-positive classes. - `TestExplicitModeWarnings`: warning emission for missing inputs in `context.mode: explicit`. - `TestOutputTemplateValidation`: workflow `output:` template checks. - `TestExamplesRegression`: loops over `examples/*.yaml` to prove no example breaks. - `TestSemanticValidationIntegration` in test_validate.py: 5 CLI-level tests covering exit codes, warning rendering, and false-positive non-blocking. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(validate): restore input_mapping template collection PR microsoft#109 (closing microsoft#101) merged input_mapping onto AgentDef on main. The previous commit dropped input_mapping handling on the assumption that the field was nonexistent — true at the time of the PR's original review, but no longer. Restore the collection block (using getattr for forward-compat with branches that haven't merged microsoft#109 yet), and add tests via a duck-typed agent so coverage works regardless of whether the schema field is present on this branch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * style: format test_validator.py to pass ruff format check Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Green <dangreen@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ity (microsoft#113) * feat: breadcrumb navigation and depth-isolated subworkflow rendering Adds full subworkflow awareness to the per-run web dashboard: State pollution fix: - Added wf_depth counter to workflow store — only depth-0 workflow_started initializes root context. Inner workflow events are routed to isolated SubworkflowContext objects. - Each subworkflow invocation gets its own nodes/routes/agents maps, keyed by (parentAgent, iteration). Repeated runs of the same subworkflow no longer share state. Subworkflow event handling: - Added TypeScript types for subworkflow_started, subworkflow_completed, subworkflow_failed events (mirrors engine emit). - Event handlers create/update child contexts and track the active context path for routing subsequent events. Breadcrumb navigation: - New BreadcrumbBar component shows the context stack above the graph (e.g., Root > twig-sdlc-planning > plan-issue). - Click any breadcrumb to navigate to that context level. - Double-click a workflow agent node in the graph to dive into its subworkflow context. - Graph rebuilds automatically when context changes. Context stack architecture: - SubworkflowContext[] tree structure mirrors workflow nesting. - activeContextPath tracks where live events are routed. - viewContextPath tracks what the user is viewing (independent). - getViewedContext() returns the correct nodes/routes for rendering. - All event handlers use activeTarget() helper to route to the correct context's nodes/groupProgress. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: subworkflow node visual, detail panel, and context-aware rendering Phase 4-5 of breadcrumb navigation feature: WorkflowNode component: - New node type for type:'workflow' agents with dashed border and Layers icon (visually distinct from regular agent nodes). - Shows child workflow name, elapsed time, and a chevron indicator when a SubworkflowContext exists. - Double-click to navigate into the subworkflow graph. SubworkflowDetail panel: - New detail component shown when a workflow agent is selected. - Lists all subworkflow runs for that agent with status, agent count, and cost summary. - Click any run to navigate into its context. Context-aware rendering: - All graph node components (AgentNode, ScriptNode, GateNode, GroupNode, AnimatedEdge) now read from getViewedContext().nodes instead of root state.nodes — ensures correct status display when viewing child contexts. - DetailPanel reads from viewed context for node lookup. - GroupDetail reads groupProgress from viewed context. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: replace getViewedContext() selectors with stable hooks getViewedContext() creates a new object on every call, causing infinite re-render loops (React error #185) when used inside Zustand selectors. New hooks in use-viewed-context.ts use useMemo with stable state references: - useViewedNodes() — nodes map for current context - useViewedGroupProgress() — group progress for current context - useViewedHighlightedEdges() — edge highlights - useViewedSubworkflowContexts() — child contexts - useViewedGraphData() — full graph data for WorkflowGraph All graph components and detail panels updated to use these hooks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: Stop button now reliably stops subworkflows When a subworkflow agent is running, the shared interrupt_event was being silently consumed by _check_interrupt() in web mode (line 857: 'if self._web_dashboard is not None: return None'). This meant Stop would pause the current agent but then resume and continue — the interrupt never propagated to the parent engine. Two fixes: 1. _check_interrupt(): when _subworkflow_depth > 0, raise InterruptError instead of silently consuming the interrupt. This unwinds the child engine back to the parent's _execute_subworkflow try/except, stopping the workflow. 2. _handle_web_pause(): in subworkflows, also watch interrupt_event alongside resume/kill/disconnect events. A second Stop click while an agent is paused now raises InterruptError immediately, without requiring Resume first. Root-level (depth 0) behavior is unchanged — Stop still pauses the current agent with Resume/Kill options in the dashboard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: guard against proactor accept-loop race on Windows (Python 3.14+) On Windows with Python 3.14+, the proactor event loop's accept callback can fire after Server.close() sets _sockets = None during shutdown, causing an AssertionError in base_events.py:_attach that crashes the workflow process. Fix: - Add _guarded_serve() wrapper that catches AssertionError when the uvicorn server is in shutdown state (should_exit = True) - Install a custom event-loop exception handler during server lifetime that suppresses the same race when it surfaces through callbacks - _is_proactor_shutdown_race() validates: AssertionError type, server shutdown state, and asyncio-originating traceback frames - Restore original exception handler in stop() The guard is narrowly scoped: only AssertionError during server shutdown is suppressed. All other exceptions delegate to the original handler. Tests: 9 new tests covering the race detection, exception handler delegation, guarded serve behavior, and edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(web): clear stale graph edges when navigating between workflow layers When navigating between subworkflow layers via breadcrumbs or double-click, old React Flow edges from the previous layer persisted as floating links disconnected from any visible nodes. Two fixes: - WorkflowGraph: explicitly clear nodes and edges when switching to an empty context (subworkflow data not yet populated) - graph-layout: filter edges against the actual node ID set to prevent orphan edges from routes referencing non-existent nodes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(web): add URL query param deep-linking for agent and subworkflow nodes Parse ?agent={name} and ?subworkflow={name} query params on initial load to auto-select and center the matching node in the workflow graph. This enables the meta-dashboard (conductor-dashboard) to generate clickable breadcrumb links that open the conductor UI focused on a specific node. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(web): support nested subworkflow paths and combined agent deep-links Update useDeepLink hook to: - Parse slash-separated subworkflow paths (e.g., ?subworkflow=planning/design) for navigating multiple levels deep into nested subworkflows - Support combined ?subworkflow=X&agent=Y to select an agent within a subworkflow context - Remove dependency on subworkflowContexts selector (array mutation doesn't trigger re-renders); rely on late-joiner replay instead Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: add dashboard deep-link specification Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(web): rewrite deep-link hook for reliability + error feedback Rewrote useDeepLink to fix timing issues that prevented navigation: - Use zustand.subscribe() instead of useEffect + selector reactivity. The old approach relied on subworkflowContexts selector changes, but the store mutates the array in-place during processEvent, so the selector never detected changes. - Resolve the full subworkflow path in one shot via index walking instead of calling navigateIntoSubworkflow() in a loop. - Set viewContextPath directly via setState instead of relying on action functions that might see stale state. - Add error banner when deep-link target is invalid: shows the error message and a link back to the root dashboard. - Validate agent exists in the target context's agent list. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(web): add ingress/egress nodes for sub-workflow views When viewing a sub-workflow graph, the start and end nodes are now replaced with distinct ingress/egress nodes that: - Use a dashed border with rounded-xl style to visually distinguish them from regular start/end nodes - Display 'From <parent agent>' on the ingress node - Display 'Return to <parent agent>' on the egress node - Navigate back to the parent workflow on double-click Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * build: rebuild frontend static assets Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(web): route nested sub-workflow events by engine-supplied path The breadcrumb navigation feature could not reach gates inside nested sub-workflows: the dashboard's view path did not follow the active context as a parent agent spawned a child workflow, so a human_gate inside that child was unreachable from the breadcrumb tree. The root cause: the store inferred sub-workflow parentage from the shared activeContextPath, then mutated it on every subworkflow_started. That conflated the user's view cursor with the engine's execution cursor and made events impossible to route correctly under any concurrency. Engine: each WorkflowEngine now carries _dashboard_context_path, the slot-key path identifying its position in the recursive sub-workflow tree (root = []). _execute_subworkflow accepts a slot_key and threads [*parent_path, slot_key] into the spawned child engine. _emit auto- stamps subworkflow_path on every event a sub-engine produces. Sequential subworkflow_started/_completed/_failed include parent_path and slot_key. Frontend: subworkflow_started/completed/failed and workflow_completed/ failed handlers resolve the owning context strictly from engine- supplied parent_path / subworkflow_path via a new resolveSlotPath helper, instead of mutating a single shared activeContextPath. activeTarget consults subworkflow_path on event data when present so per-iteration agent_message/tool/turn events land in the right per- iteration context. viewContextPath sticks to the live edge when the user has not navigated away, so newly-spawned gates inside sub- workflows are reachable without manual breadcrumb clicks. Handlers fall back to legacy behavior when these new fields are absent. SubworkflowDetail navigates by ctx.slotKey (stable across iteration reorders) instead of (agent_name, iteration). Breadcrumbs render slot keys so concurrent iterations are distinguishable. Adds TestSubWorkflowDashboardPath covering parent_path/slot_key emission for sequential sub-workflows and the auto-stamped subworkflow_path on the child workflow_completed event. This is a self-contained slice of a broader fix that also addresses concurrent for_each-of-workflow stacking; the matching for_each-side emit changes are staged in microsoft#110 and become functional once both PRs land. Order of merge does not matter — handlers degrade gracefully when events lack the new fields. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(web): support for_each iteration deep-link notation Add flexible segment matching for subworkflow deep-links: 1. Exact slotKey match (e.g. plan_child[item-0]) 2. Positional index (e.g. plan_child#0, 0-based) 3. Bare agent name (when unambiguous) Ambiguous bare names (multiple for_each iterations) now produce an actionable error listing valid alternatives instead of silently failing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(web): remap group-child edges to parent node to prevent floating arrows Edges whose source or target was a child node inside a parallel group rendered at incorrect absolute positions (React Flow positions children relative to the parent group). This caused orphaned arrowheads in the top-left corner of the graph. Fix: remap any edge endpoint that references a group child to the parent group node instead, so dagre can properly route the edge and React Flow draws it at the correct position. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(web): auto-fit viewport on context switch Add FitViewOnContextSwitch component that triggers fitView when navigating between sub-workflow contexts, preventing stale viewport positioning. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(web): reset viewContextPath for agent-only deep-links When ?agent= is provided without ?subworkflow=, explicitly pin viewContextPath to root ([]) before the agent lookup runs. This prevents the sticky-follow mechanism from advancing the view into a stale subworkflow/for_each iteration during WS replay, which caused 'Agent not found' errors for root-level agents. Also handles ?subworkflow=&agent=foo (empty subworkflow string) correctly — empty string is falsy so falls through to the agent-only reset path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(web): route child workflow_started by engine slot path Under concurrent for_each iterations, the global activeContextPath is advanced by each subworkflow_started event. When workflow_started for an earlier iteration's child engine arrived, it was resolved against the (now-stale) activeContextPath, so its agents/routes were written to the wrong sibling ctx — or got overwritten by a later sibling. That manifested as phantom routes (and missing routes) when the user navigated into a deeply-nested for_each iteration: routes from one iteration would surface in another, leaving edges drawn between agents that don't appear in the iteration's view. workflow_completed and workflow_failed already used the engine- supplied subworkflow_path slot key (commit f77b20b); workflow_started was the one handler missed. This change makes it consistent with its sibling handlers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(web): bail on slot-resolution miss in subworkflow_completed/failed Both handlers used esolved?.indexPath ?? [] as a fallback when resolveSlotPath returned null. That silently routed the event to the root context, with two distinct corruption modes: 1. argetNodes defaulted to state.nodes (root), so a same-named root agent had its status overwritten to 'completed' or 'failed' by an unrelated subworkflow event. 2. state.activeContextPath = parentIndexPath then unconditionally reset the active path to [], breaking subsequent sibling routing. Mirror the guard that subworkflow_started already uses (line 1365): when resolveSlotPath returns null, return early. A null result means the event arrived before its sibling subworkflow_started or the path is inconsistent — neither warrants polluting root state. Reported-by: jrob5756 (PR microsoft#113 review) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test(engine): cover subworkflow stop/interrupt and dashboard event payloads Adds coverage for code paths that previously had none, all introduced by the breadcrumb-navigation PR: TestCheckInterruptSubworkflow - test_check_interrupt_raises_in_subworkflow Sub-engine in web mode propagates the interrupt as InterruptError so the child unwinds back to the parent. - test_check_interrupt_consumed_silently_at_root Regression guard for the root-depth web-mode silent-consume branch (the partial-output handler does the real pausing). TestHandleWebPauseSubworkflow - test_handle_web_pause_stop_event_in_subworkflow Pause-then-Stop in a sub-workflow exits via interrupt_event without requiring Resume first. - test_handle_web_pause_root_ignores_interrupt_event Documents the intentional root-vs-subworkflow asymmetry: at root, only Resume or Kill exit a pause. TestSubWorkflowDashboardPath (additions) - test_subworkflow_failed_event_carries_parent_path_and_slot_key The exception branch of _execute_subworkflow emits subworkflow_failed with parent_path and slot_key. Previously only the success-path emit was asserted. - test_nested_subworkflow_path_accumulates At depth >= 2 (parent -> mid -> leaf), each engine emits its own workflow_completed and the auto-stamped subworkflow_path chains correctly across nesting levels. - test_concurrent_for_each_subworkflow_emits_distinct_slot_keys Existing for_each test ran with max_concurrent=1; this variant uses max_concurrent=3 so iterations actually overlap, proving slot_key uniqueness is not an artifact of serial execution. Reported-by: jrob5756 (PR microsoft#113 review) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs(engine): document root-vs-subworkflow pause/stop asymmetry Adds an inline comment in _handle_web_pause explaining: 1. Why root depth deliberately omits the interrupt_event subscription (pause exits only on Resume or Kill). 2. Why subworkflow depth subscribes to it (so Stop unwinds the child engine without requiring Resume first). 3. The tiny clear()/create_task() race window where a Stop click can be silently discarded, and the trade-off vs. carrying a stale Stop signal across pause cycles. No behavior change. Documents an intentional design choice flagged in PR microsoft#113 review. Reported-by: jrob5756 (PR microsoft#113 review) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * build: rebuild frontend static assets after fix-chain restoration Rebuilds the production bundle so it incorporates the subworkflow_completed/failed slot-resolution fix (f374a19) that was cherry-picked into this branch. Without this rebuild the committed `static/` bundle would not reflect the source change and the dashboard would still exhibit the bug at runtime. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(web): transitive agent search for agent-only deep-links The previous fix (593f4e1) reset viewContextPath to root for agent-only deep-links to stop sticky-follow from stranding the user inside a stale for_each iteration during replay. That solved the false-negative case (agent at root) but introduced a false-negative for nested agents: if the agent only exists inside a sub-workflow / for_each iteration, the resolver would now error with 'Agent X not found in root workflow.' That broke external integrations (e.g. notification feeds) that surface ?agent=X without knowing the parent slot chain. Resolution algorithm now: - subworkflow=foo&agent=bar: navigate to foo, look for bar there; if bar isn't at foo but exists elsewhere, list discovered locations in the error so the next click is obvious. - agent=bar (no subworkflow): try root first; otherwise walk every sub-workflow context. On exactly one match: navigate there. On many matches (e.g. bar ran in every for_each iteration): pick running > deepest > newest, mirroring the engine's live-event routing precedence. - Zero matches anywhere: deterministic error, view pinned to root so sticky-follow still doesn't strand. Also reworks the wait condition: instead of firing on the first state change after agents.length > 0 (which races against WS replay of nested subworkflow_started events), the resolver debounces 200ms of state quiescence and only applies once the target is resolvable or the workflow has reached a terminal state. A 5s hard cap keeps live-workflow deep-links from hanging when the target never appears. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Green <dangreen@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep both dialog features (from PR microsoft#130) and breadcrumb/subworkflow features (from main via microsoft#113). Conflict resolution: - schema.py: keep both dialog + max_depth validations for human_gate/script - claude.py: keep _max_input_cache reset in close() AND execute_dialog_turn method - DetailPanel.tsx: import both DialogDetail and SubworkflowDetail - workflow-store.ts: include both dialog state + subworkflow state in all reset blocks - events.ts: include both Dialog* and Subworkflow* event interfaces - test_claude.py / test_copilot.py: keep both dialog turn + max prompt tokens test classes - static assets: rebuilt from merged source Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves merge conflicts between
feat/agent-dialog-mode-cleanand upstreammain(after microsoft#113 merged).Conflicts resolved (keep both sides):
dialogvalidationmax_depthvalidationexecute_dialog_turnmethod_max_input_cache = Nonein close()activeDialog,dialogEngagedresetwfDepth,subworkflowContexts, etc. resetAll 480 tests for affected modules pass. Vite build succeeds.