Skip to content

fix(case-management): updated sdd phase 0 interview to ask for task/connector inputs#1166

Merged
abhiram-vad merged 1 commit into
mainfrom
fix/case-sdd-interview
Jun 2, 2026
Merged

fix(case-management): updated sdd phase 0 interview to ask for task/connector inputs#1166
abhiram-vad merged 1 commit into
mainfrom
fix/case-sdd-interview

Conversation

@abhiram-vad

Copy link
Copy Markdown
Contributor

No description provided.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Claude finished @abhiram-vad's task in 2m 59s —— View job


Code Review: fix(case-management): updated sdd phase 0 interview to ask for task/connector inputs

Summary

Refactors two Phase 0 → case test tasks (candidate-interview, loan-origination) from full e2e (Phase 0 + build + validate) to Phase 0-only integration tests that stop at sdd.md approval. Adds a new sdd_check.py mechanical checker for SDD markdown, updates the skill's phase-0-interview.md with "Buildability musts" and "Schema discovery" sections, and expands sdd-generation-rules.md with ExceptionStage entry-condition distinctness and Interrupting-flag rules. Also hardens case_check.py to tolerate flat data.tasks arrays while rejecting them via a new assert_tasks_nested guard.

Change-by-Change Review

1. skills/uipath-maestro-case/references/phase-0-interview.md

Severity: Medium

Good additions. The "Buildability musts" section (lines 137–145) and "Schema discovery" section (lines 308–329) add critical interview-time guidance that prevents unbuildable SDDs. The resource-resolution table (lines 260–268) and schema-discovery command table (lines 313–319) are well-structured for agent consumption.

Issue — Duplication with sdd-generation-rules.md: The "Buildability musts" item 1 (line 141) repeats nearly the full text of sdd-generation-rules.md §Logical integrity rule 5 (line 692 in the rules file). Both describe ExceptionStage entry-condition distinctness with the same examples (selected-stage-completed/selected-stage-exited, user-selected-stage, wait-for-connector), Interrupting semantics, and terminal vs. return exit modes. The interview file's item 1 should be a bridge sentence + link — e.g., "per lane, capture how it fires (see sdd-generation-rules.md § Logical integrity rule 5)" — not a parallel copy that will drift.

Fix this →

2. skills/uipath-maestro-case/references/sdd-generation-rules.md

Severity: OK

Solid expansion. Rule 5 now requires distinct entry conditions per ExceptionStage with the specific CASE_MGMT_SECONDARY_STAGE_ENTRY_RULES_DUPLICATE error code — actionable for the agent. New rule 6 classifies Interrupting semantics with the hard constraint that return-to-origin requires Interrupting: Yes. Finalization step 12 is updated to include the Interrupting-flag check. All internally consistent.

3. tests/tasks/uipath-maestro-case/_shared/sdd_check.py (new file)

Severity: Low

Well-structured mechanical checker covering 6 checks (mapping integrity, lineage closure, task-type enum, per-gate rule legality, conditions presence, interrupting semantics). The rule-type sets correctly mirror the schema contract. A few minor observations:

  • Line 96–98: The lineage-closure producer regex r"\b([A-Za-z]\w*)\s*=\s*(?!=)" is broad — it will match any assignment-like pattern in markdown prose (e.g., Marks Stage Complete = Yes). Could produce false negatives (variables incorrectly marked as "produced" when they're just markdown text). Acceptable for a v1 heuristic, but worth monitoring for false passes.

  • Line 160: set(cells[0]) <= set("-: ") — clever header-row filter, but would also match cells like :-- (markdown alignment). Fine in practice since those are header separators anyway.

  • Line 215: The < 3 stage-section floor is reasonable but fragile — a 2-stage SDD is technically valid in the platform even if unusual. Low-priority.

4. tests/tasks/uipath-maestro-case/_shared/case_check.py

Severity: OK

iter_tasks now tolerates flat (mis-nested) Task[] arrays by yielding dict entries directly, while the new assert_tasks_nested function rejects that shape explicitly when called. Good defensive separation — callers that just need to iterate tolerate the shape; callers that need correctness assert it. Error message is clear and actionable.

5. tests/tasks/uipath-maestro-case/_shared/rule_variable_check.py

Severity: OK

Clean addition — imports assert_tasks_nested and calls it at the top of main(). Correct integration.

6. tests/tasks/uipath-maestro-case/phase_0_to_case/candidate_interview/candidate_interview.yaml

Severity: Medium

  • Tag changed from e2eintegration: appropriate since the test now stops at sdd.md and doesn't build/validate a caseplan.
  • max_iterations: 1 and reduced max_turns/turn_timeout: sensible for a Phase 0-only run.
  • Success criteria replaced: validate and rule_variable_check.py criteria replaced with sdd_check.py + llm_judge + grep checks. Good split of mechanical vs. domain coherence.
  • The llm_judge prompt is detailed and well-structured with specific scoring criteria.

Issue — Orphaned check_candidate_interview.py: The file tests/tasks/uipath-maestro-case/phase_0_to_case/candidate_interview/check_candidate_interview.py still exists on disk but is no longer referenced by any success criterion. It should be deleted in this PR to avoid confusion.

Fix this →

Issue — sdd.md path expects cwd: The file_exists path is sdd.md (cwd-relative), while the old tests used CandidateInterview/CandidateInterview/sdd.md. The initial_prompt still says Name the solution and case project "CandidateInterview" — if the skill creates a solution directory structure, the sdd.md would land inside that directory, not cwd. However, the prompt now also says "no solution, no caseplan" so this is likely intentional. Just worth confirming the Phase 0 skill writes sdd.md to cwd when no solution exists.

7. tests/tasks/uipath-maestro-case/phase_0_to_case/loan_origination/check_loan_origination.py

Severity: OK

The terminal-exception case-exit check (lines 199–222) is corrected: it no longer requires marksCaseComplete=true for terminal exceptions, correctly recognizing that selected-stage-* case-exits use marksCaseComplete=false per platform semantics. The assert_tasks_nested guard is added at the top of main(). The error message on line 219–222 is improved with clearer guidance.

8. tests/tasks/uipath-maestro-case/phase_0_to_case/loan_origination/loan_origination.yaml

Severity: Medium

Same structural changes as candidate_interview.yamle2eintegration, Phase 0-only scope, sdd_check.py + llm_judge criteria. Same observations apply (path consistency, good mechanical/domain split).

Note: check_loan_origination.py is still present and modified in this PR (not orphaned) — it's used by other tests or future Phase 0 → build tests. However, it's no longer referenced from loan_origination.yaml, so if no other task YAML references it, it may become orphaned too.

What's Missing

  • Orphaned check_candidate_interview.py: Should be deleted — no YAML references it after this PR.
  • Passing-run claim: Per .claude/rules/test-writing.md §Workflow step 5, the PR should include a passing-run claim for the modified tasks. Not present in the PR description.
  • PR description: The PR body says "No description provided." Given the scope of changes (skill reference updates + test strategy rewrite + new shared checker), a description explaining the motivation would help reviewers.

Area Ratings

Area Status Notes
Frontmatter OK No skill SKILL.md changes; YAML frontmatter in tasks is valid
E2E Tests OK e2e tests exist elsewhere; downgrade to integration is appropriate for Phase 0-only
Skill Body OK Reference updates are accurate and well-structured
References & Assets Medium Content duplication between phase-0-interview.md buildability-musts item 1 and sdd-generation-rules.md rule 5
Repo Hygiene Medium Orphaned check_candidate_interview.py should be removed

Issues for Manual Review

  • Confirm that sdd_check.py's lineage-closure regex (line 96–98) doesn't produce false negatives on real SDD outputs — the \b([A-Za-z]\w*)\s*=\s*(?!=) pattern may match markdown prose assignments.
  • Confirm that Phase 0 writes sdd.md to cwd (not inside a solution subdirectory) when the prompt says "no solution, no caseplan" — the success criteria paths depend on this.
  • The llm_judge criterion with pass_threshold: 1.0 requires a perfect 1.0 score — verify this is intentional vs. allowing 0.5 for "mostly coherent with minor gaps."

Conclusion

Good refactoring that isolates Phase 0 testing from the full build pipeline and adds a solid mechanical SDD checker. Two actionable items: (1) delete the orphaned check_candidate_interview.py, and (2) consider bridging the duplicated ExceptionStage entry-condition content in phase-0-interview.md to avoid drift from the authoritative copy in sdd-generation-rules.md. The check_loan_origination.py fix for terminal-exception case-exit semantics (marksCaseComplete=false) is a genuine bug fix that corrects a false-negative in the integrity checker.


@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Claude finished @abhiram-vad's task in 2m 53s —— View job


Coder-eval task lint (advisory)

2 task YAMLs changed (both modified); verdict counts: 0 Critical, 0 High, 0 Medium, 2 Low, 0 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

High — PR body does not claim the changed tasks have been run and passed. Please edit the PR description to add a line like:

Ran skill-case-phase-0-candidate-interview and skill-case-phase-0-loan-origination locally and they passed.

Per-task lint

tests/tasks/uipath-maestro-case/phase_0_to_case/candidate_interview/candidate_interview.yaml — verdict: Low

Issues:

  • [Low] Could pass for the wrong reason (lines 79–149): no skill_triggered or command_executed criterion verifies the agent actually loaded the uipath-maestro-case skill. The sdd_check.py mechanical check is robust enough to make gaming hard, but a sufficiently capable LLM could theoretically produce a valid SDD without ever invoking the skill. (See Theme 1.)

Suggested fixes:

  • Add a skill_triggered criterion: skill_triggered: uipath-maestro-case, weight ~1.0. This confirms the agent activated the skill, not just produced a plausible artifact independently.

tests/tasks/uipath-maestro-case/phase_0_to_case/loan_origination/loan_origination.yaml — verdict: Low

Issues:

  • [Low] Could pass for the wrong reason (lines 73–143): same as candidate_interview — no criterion verifies skill invocation. (See Theme 1.)

Suggested fixes:

  • Same fix: add skill_triggered: uipath-maestro-case.

Within-PR duplicates

No duplicate clusters detected. Both tasks share a YAML scaffold but exercise materially different domains (hiring pipeline vs. commercial loan origination) with distinct stages, exception lanes, conditional gates, and domain-specific llm_judge prompts. This is good scaffold reuse.

Themes

  • Theme 1 (Low): Both tasks lack a skill_triggered criterion. The sdd_check.py mechanical check (variable mapping, lineage, rule legality, task-type enum, interrupting semantics) raises the bar high enough that gaming is unlikely, but confirming skill activation is cheap insurance.

Conclusion

⚠ 2 task(s) have issues, max severity Low (theme-captured). Evidence of passing run is missing (High). Advisory only — not blocking merge.


Notes on the changes themselves: The refactoring from e2e (full Phase 0 → build → validate) to integration (Phase 0 only: interview → sdd.md → approve) is well-scoped. The new sdd_check.py shared helper is a strong deterministic validator. The llm_judge at weight 3.0 (31.6% of total) is well-balanced against the equally-weighted mechanical check. Run limits are properly placed under run_limits:, not agent:. No sandbox or CLI verb issues.

@song-zhao-25 song-zhao-25 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamp it and please merge after verify the sdd output

@abhiram-vad

Copy link
Copy Markdown
Contributor Author
Screen.Recording.2026-06-01.at.12.40.05.PM.mp4

@abhiram-vad abhiram-vad merged commit 41751cc into main Jun 2, 2026
16 checks passed
@abhiram-vad abhiram-vad deleted the fix/case-sdd-interview branch June 2, 2026 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants