feat(eval): add runtime simulation via --simulation flag#1624
Conversation
- Add --simulation flag to uipath run accepting JSON (same schema as simulation.json) - Wrap runtime with UiPathMockRuntime when simulation config is provided - Export build_mocking_context_from_dict from eval.mocks for use by CLI - Add runtime-simulations-agent sample demonstrating @mockable tools - Add simulation-testcase with run.sh and assert.py verifying LLM simulation - Fix dict type annotation in cli_run (mypy) - Bump version to 2.10.63 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5549c35 to
4528fb2
Compare
- Add UiPathMockRuntime tests: execute/stream with context, get_schema, mocker creation failure handler - Add TestRunSimulation to test_run: invalid JSON, wrapping, disabled cases - Fix run.sh to use LOG_LEVEL=DEBUG so simulation log lines appear in run.log - Restore assert.py log-based checks that verify simulation was triggered Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Chibionos
left a comment
There was a problem hiding this comment.
/deep-review — overall
Verdict: Comment. Two P1s (one Pydantic validation gap, one runtime-lifecycle leak that Codex caught and I verified), two P2s, one P3. All findings are inline below — each can be resolved / replied to individually.
Slop meter
🟢 clean — slop 4/100 · tunnel-vision 0/100. The 4 points are version literals in pyproject.toml files, expected for sample/testcase scaffolding.
Codex recheck
Verdict: amend — Codex agreed with my four draft findings and surfaced one P1 I'd missed (runtime-lifecycle leak), now F2.
Findings index (all inline)
| # | Sev | File:Line | Headline |
|---|---|---|---|
| F1 | P1 | _mock_runtime.py:38 |
Loose dict[str, Any] input — no Pydantic validation |
| F2 | P1 | cli_run.py:249 |
Simulated run leaks the delegate runtime (Codex catch) |
| F3 | P2 | cli_run.py:107 |
--simulation accepts inline JSON only — awkward UX |
| F4 | P2 | testcases/simulation-testcase/main.py |
Sample + testcase duplicate the same fixture |
| F5 | P3 | _mock_runtime.py:77 |
logger.info likely too chatty — consider logger.debug |
Pillar 15 — Python Service Discipline
The deep-review skill grew a Python pillar today, distilled from 104 review comments by @cristipufu and @radu-mocanu on this repo over the last 120 days. F1 and F4 trace directly back to patterns those two flag repeatedly. The full pillar lives at skills-internal#299 (plugins/deep-review/skills/deep-review/references/default-pillars.md).
…mocking_context - Add SimulationConfig to _types.py with toolsToSimulate alias and strict validation - Add build_mocking_context(config: SimulationConfig) as the typed entry point - Keep build_mocking_context_from_dict as a thin wrapper for backward compat - Update cli_run.py to validate --simulation via SimulationConfig.model_validate_json - Export SimulationConfig and build_mocking_context from mocks __init__ - Fix delegate runtime leak: keep base_runtime ref so finally always disposes it - Remove duplicate agent files from simulation-testcase; run.sh points to sample Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🚨 Heads up:
|
🚨 Heads up:
|
…h explicit output path
…ing --output-file
|



Summary
--simulationflag touipath runthat accepts a JSON config (same schema assimulation.json) to wrap the runtime withUiPathMockRuntimebuild_mocking_context_from_dictfromuipath.eval.mocksfor use by the CLIruntime-simulations-agentsample demonstrating@mockable-decorated tools with asimulation.jsonsimulation-testcaseintegration testcase withrun.sh+src/assert.pythat verifies simulation was triggered (log lines + non-default LLM output)2.10.63Test plan
uv run pytest)ruff check,ruff format --check,lint_httpx_client.py)cd packages/uipath/samples/runtime-simulations-agent && uv run uipath run main -f input.json --simulation "$(cat simulation.json)"integration_tests.ymlwhenuipathpackage changesContext notes
🤖 Generated with Claude Code