feat(uipath-platform): add traces_fetch smoke test + traces_e2e full round-trip test by saksharthakkar · Pull Request #480 · UiPath/skills

saksharthakkar · 2026-04-29T22:44:12Z

Motivation

Adds two tests for `uip traces spans get` — the CLI command for fetching LLM trace spans from agent jobs.

Summary

`traces_fetch.yaml` — smoke (`skill-platform-traces-fetch`)

Tags: `[uipath-platform, smoke, lifecycle:discover]`
Goal-only prompt: agent fetches spans for a placeholder GUID; 404 is acceptable — tests command knowledge, not data
2 success criteria: correct command + `--job-key` flag, `--output json`

`traces_e2e.yaml` + `check_traces_e2e.py` — full round-trip (`skill-platform-traces-e2e`)

Tags: `[uipath-platform, e2e, lifecycle:discover]`
Goal-only prompt: "Verify that the published agent produces LLM trace spans" — skill teaches start→wait→fetch
First test in the repo that proves the CLI returns real spans from a real job
Process key supplied via `TRACES_SMOKE_PROCESS_KEY` GitHub secret (no hardcoded GUIDs)
5 success criteria including `run_command` via `check_traces_e2e.py` (weight 5.0 — the real gate)

Workflow change: `smoke-skills.yml` now injects `TRACES_SMOKE_PROCESS_KEY` into `$GITHUB_ENV` so the agent sandbox picks it up at runtime.

Prompt style: Both prompts follow goal-statement style matching the repo pattern — no CLI commands, no procedural steps. Skill teaches the workflow.

Test Results

traces_e2e full round-trip — score 1.000 ✅

Verified 2026-05-05 locally against `codereval/DefaultTenant`:

status=SUCCESS  score=1.000  duration=73.1s  iterations=1  5/5 criteria passed

Job: traces-smoke-agent → State: Successful (8s)
Job key: 216e601d-59cd-42d3-8fce-f93262b1d307
Spans: 1 — "LLM call" (gpt-4.1-mini-2025-04-14, 35 tokens)
check_traces_e2e.py → OK: 1 span(s) returned

Agent discovered the process via `uip or processes list`, started the job, fetched spans — all from the skill, no procedure in the prompt.

traces_fetch smoke — score 1.000 ✅

Verified locally — 2/2 criteria passed, ~20s.

Test plan

`traces_fetch` smoke passes locally (score 1.000)
`traces_e2e` full round-trip passes locally (score 1.000, 5/5 criteria)
`check_traces_e2e.py` exits 0 with real data
No hardcoded GUIDs — process key via `TRACES_SMOKE_PROCESS_KEY` secret
Goal-only prompts — no CLI commands, no procedural steps (per review feedback)
Login criterion removed — CI always authenticates via env vars before tests run
Both pass in CI on merge

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove JSON schema from initial_prompt; agent writes freely, criteria validate - Add uip login status step so test runs against active live tenant (e2e pattern) - Add json_check criteria for command + outcome fields in report.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-30T17:38:35Z

Reviewing PR #480

Gather context (review criteria, project conventions, existing comments)
Read the full changed file
Analyze diff against origin/main
Check test task conventions and structure
Post review findings

View job run

Adds skill-platform-traces-e2e — first test in repo proving uip traces spans get returns real spans from a real job (span_count >= 1). Uses a pre-published traces-smoke-agent on alpha codereval/DefaultTenant. Process key supplied via TRACES_SMOKE_PROCESS_KEY secret (not hardcoded). Test starts the job, waits for completion, fetches spans, asserts span_count >= 1 via check_traces_e2e.py — the gate the smoke test lacks. Also injects TRACES_SMOKE_PROCESS_KEY into GITHUB_ENV in smoke-skills.yml so the secret is available to the agent sandbox at runtime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Agent no longer writes its own assessment to report.json. traces_fetch relies purely on command_executed criteria. traces_e2e has the agent pipe raw CLI output to spans.json; check_traces_e2e.py reads that instead, preserving the span_count >= 1 gate without self-reporting. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

uipreliga

I approved, but please can you fix the issue below? The prompt is way too detailed.

… in traces tests Replace verbatim CLI command sequences with goal-statement style prompts (per PR review: prompts were telling the agent what to do rather than testing the skill). Remove uip-login-status success criterion — CI always authenticates via UIPATH_CLI_ENABLE_ENV_AUTH before tests run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…_prompt Replace start→wait→fetch procedure with goal-only prompt per lint feedback. Skill now teaches the full workflow; test verified locally score=1.000 5/5. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-05-05T23:18:39Z

Claude finished @saksharthakkar's task in 2m 39s —— View job

Coder-eval task lint (advisory)

2 task YAMLs changed; verdict counts: 0 Critical, 0 High, 0 Medium, 1 Low, 1 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

✅ Author confirms passing run: "traces_e2e full round-trip — score 1.000 ✅" / "Verified 2026-05-05 locally against codereval/DefaultTenant" and "traces_fetch smoke — score 1.000 ✅" / "Verified via previous run — 2/2 criteria passed, ~20s."

Per-task lint

`tests/tasks/uipath-platform/traces_fetch.yaml` — verdict: Low

Issues:

[Low] Meaningful coverage (lines 20–35): both criteria are command_executed only — no output validation. Acceptable for a smoke test with a placeholder GUID (404 expected), but thinner than sibling integration-service tests which pair command_executed with run_command for structural validation. The e2e companion provides the real correctness gate.

Suggested fixes:

Consider adding a lightweight run_command criterion that asserts the CLI produced JSON output (even a 404 response is parseable JSON with Result and Message keys), e.g. a one-liner python3 -c "import json,sys; ..." that checks the command's stderr/stdout was valid JSON. This avoids self-reporting while adding a minimal correctness signal.

`tests/tasks/uipath-platform/traces_e2e.yaml` — verdict: OK

Prompt is now goal-only (good — procedural steps removed in 0be95c5). Five criteria with a strong mix: command_executed for CLI usage verification + file_exists + run_command via check_traces_e2e.py (weight 5.0) that validates real JSON structure and span count. Not gameable without actually running the commands against a real tenant.

Within-PR duplicates

No duplicate clusters detected. traces_fetch (smoke — command knowledge with placeholder GUID) and traces_e2e (e2e — full round-trip with real job and span validation) are a complementary smoke/e2e pair testing materially different operations.

Conclusion

⚠ 1 task has issues, max severity Low. Advisory only — not blocking merge. The Low finding on traces_fetch.yaml (meaningful coverage) is expected for a smoke test and mitigated by the e2e companion.

feat(uipath-platform): add traces_fetch smoke test

ce80750

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

uipreliga reviewed Apr 30, 2026

View reviewed changes

Comment thread tests/tasks/uipath-platform/traces_fetch.yaml Outdated

uipreliga reviewed Apr 30, 2026

View reviewed changes

Comment thread tests/tasks/uipath-platform/traces_fetch.yaml Outdated

saksharthakkar changed the title ~~(wip) feat(uipath-platform): add traces_fetch smoke test~~ feat(uipath-platform): add traces_fetch smoke test Apr 30, 2026

saksharthakkar marked this pull request as ready for review April 30, 2026 17:38

This comment was marked as outdated.

Sign in to view

saksharthakkar changed the title ~~feat(uipath-platform): add traces_fetch smoke test~~ feat(uipath-platform): add traces_fetch smoke test + traces_e2e full round-trip test Apr 30, 2026

This comment was marked as outdated.

Sign in to view

saksharthakkar force-pushed the feat/traces-tests branch from 9963a9f to 2742505 Compare April 30, 2026 22:27

bai-uipath approved these changes May 1, 2026

View reviewed changes

Comment thread tests/tasks/uipath-platform/traces_fetch.yaml Outdated

uipreliga approved these changes May 5, 2026

View reviewed changes

Comment thread tests/tasks/uipath-platform/traces_e2e.yaml

This comment was marked as resolved.

Sign in to view

saksharthakkar merged commit d2361d6 into main May 5, 2026
6 checks passed

saksharthakkar deleted the feat/traces-tests branch May 5, 2026 23:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(uipath-platform): add traces_fetch smoke test + traces_e2e full round-trip test#480

feat(uipath-platform): add traces_fetch smoke test + traces_e2e full round-trip test#480
saksharthakkar merged 6 commits into
mainfrom
feat/traces-tests

saksharthakkar commented Apr 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

uipreliga left a comment

Uh oh!

Uh oh!

This comment was marked as resolved.

github-actions Bot commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

saksharthakkar commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Summary

`traces_fetch.yaml` — smoke (`skill-platform-traces-fetch`)

`traces_e2e.yaml` + `check_traces_e2e.py` — full round-trip (`skill-platform-traces-e2e`)

Test Results

traces_e2e full round-trip — score 1.000 ✅

traces_fetch smoke — score 1.000 ✅

Test plan

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing PR #480

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

uipreliga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as resolved.

github-actions Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coder-eval task lint (advisory)

Evidence of passing run

Per-task lint

tests/tasks/uipath-platform/traces_fetch.yaml — verdict: Low

tests/tasks/uipath-platform/traces_e2e.yaml — verdict: OK

Within-PR duplicates

Conclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

saksharthakkar commented Apr 29, 2026 •

edited

Loading

github-actions Bot commented Apr 30, 2026 •

edited

Loading

github-actions Bot commented May 5, 2026 •

edited

Loading

`tests/tasks/uipath-platform/traces_fetch.yaml` — verdict: Low

`tests/tasks/uipath-platform/traces_e2e.yaml` — verdict: OK