Skip to content

feat(uipath-platform): add traces_fetch smoke test + traces_e2e full round-trip test#480

Merged
saksharthakkar merged 6 commits into
mainfrom
feat/traces-tests
May 5, 2026
Merged

feat(uipath-platform): add traces_fetch smoke test + traces_e2e full round-trip test#480
saksharthakkar merged 6 commits into
mainfrom
feat/traces-tests

Conversation

@saksharthakkar
Copy link
Copy Markdown
Contributor

@saksharthakkar saksharthakkar commented Apr 29, 2026

Motivation

Adds two tests for `uip traces spans get` — the CLI command for fetching LLM trace spans from agent jobs.

Summary

`traces_fetch.yaml` — smoke (`skill-platform-traces-fetch`)

  • Tags: `[uipath-platform, smoke, lifecycle:discover]`
  • Goal-only prompt: agent fetches spans for a placeholder GUID; 404 is acceptable — tests command knowledge, not data
  • 2 success criteria: correct command + `--job-key` flag, `--output json`

`traces_e2e.yaml` + `check_traces_e2e.py` — full round-trip (`skill-platform-traces-e2e`)

  • Tags: `[uipath-platform, e2e, lifecycle:discover]`
  • Goal-only prompt: "Verify that the published agent produces LLM trace spans" — skill teaches start→wait→fetch
  • First test in the repo that proves the CLI returns real spans from a real job
  • Process key supplied via `TRACES_SMOKE_PROCESS_KEY` GitHub secret (no hardcoded GUIDs)
  • 5 success criteria including `run_command` via `check_traces_e2e.py` (weight 5.0 — the real gate)

Workflow change: `smoke-skills.yml` now injects `TRACES_SMOKE_PROCESS_KEY` into `$GITHUB_ENV` so the agent sandbox picks it up at runtime.

Prompt style: Both prompts follow goal-statement style matching the repo pattern — no CLI commands, no procedural steps. Skill teaches the workflow.

Test Results

traces_e2e full round-trip — score 1.000 ✅

Verified 2026-05-05 locally against `codereval/DefaultTenant`:

status=SUCCESS  score=1.000  duration=73.1s  iterations=1  5/5 criteria passed

Job: traces-smoke-agent → State: Successful (8s)
Job key: 216e601d-59cd-42d3-8fce-f93262b1d307
Spans: 1 — "LLM call" (gpt-4.1-mini-2025-04-14, 35 tokens)
check_traces_e2e.py → OK: 1 span(s) returned

Agent discovered the process via `uip or processes list`, started the job, fetched spans — all from the skill, no procedure in the prompt.

traces_fetch smoke — score 1.000 ✅

Verified locally — 2/2 criteria passed, ~20s.

Test plan

  • `traces_fetch` smoke passes locally (score 1.000)
  • `traces_e2e` full round-trip passes locally (score 1.000, 5/5 criteria)
  • `check_traces_e2e.py` exits 0 with real data
  • No hardcoded GUIDs — process key via `TRACES_SMOKE_PROCESS_KEY` secret
  • Goal-only prompts — no CLI commands, no procedural steps (per review feedback)
  • Login criterion removed — CI always authenticates via env vars before tests run
  • Both pass in CI on merge

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread tests/tasks/uipath-platform/traces_fetch.yaml Outdated
Comment thread tests/tasks/uipath-platform/traces_fetch.yaml Outdated
- Remove JSON schema from initial_prompt; agent writes freely, criteria validate
- Add uip login status step so test runs against active live tenant (e2e pattern)
- Add json_check criteria for command + outcome fields in report.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@saksharthakkar saksharthakkar changed the title (wip) feat(uipath-platform): add traces_fetch smoke test feat(uipath-platform): add traces_fetch smoke test Apr 30, 2026
@saksharthakkar saksharthakkar marked this pull request as ready for review April 30, 2026 17:38
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Reviewing PR #480

  • Gather context (review criteria, project conventions, existing comments)
  • Read the full changed file
  • Analyze diff against origin/main
  • Check test task conventions and structure
  • Post review findings

View job run

@github-actions

This comment was marked as outdated.

@saksharthakkar saksharthakkar changed the title feat(uipath-platform): add traces_fetch smoke test feat(uipath-platform): add traces_fetch smoke test + traces_e2e full round-trip test Apr 30, 2026
@github-actions

This comment was marked as outdated.

Adds skill-platform-traces-e2e — first test in repo proving uip traces
spans get returns real spans from a real job (span_count >= 1).

Uses a pre-published traces-smoke-agent on alpha codereval/DefaultTenant.
Process key supplied via TRACES_SMOKE_PROCESS_KEY secret (not hardcoded).
Test starts the job, waits for completion, fetches spans, asserts
span_count >= 1 via check_traces_e2e.py — the gate the smoke test lacks.

Also injects TRACES_SMOKE_PROCESS_KEY into GITHUB_ENV in smoke-skills.yml
so the secret is available to the agent sandbox at runtime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread tests/tasks/uipath-platform/traces_fetch.yaml Outdated
Agent no longer writes its own assessment to report.json. traces_fetch
relies purely on command_executed criteria. traces_e2e has the agent
pipe raw CLI output to spans.json; check_traces_e2e.py reads that
instead, preserving the span_count >= 1 gate without self-reporting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@uipreliga uipreliga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved, but please can you fix the issue below? The prompt is way too detailed.

Comment thread tests/tasks/uipath-platform/traces_e2e.yaml
… in traces tests

Replace verbatim CLI command sequences with goal-statement style prompts
(per PR review: prompts were telling the agent what to do rather than
testing the skill). Remove uip-login-status success criterion — CI always
authenticates via UIPATH_CLI_ENABLE_ENV_AUTH before tests run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

This comment was marked as resolved.

…_prompt

Replace start→wait→fetch procedure with goal-only prompt per lint feedback.
Skill now teaches the full workflow; test verified locally score=1.000 5/5.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Claude finished @saksharthakkar's task in 2m 39s —— View job


Coder-eval task lint (advisory)

2 task YAMLs changed; verdict counts: 0 Critical, 0 High, 0 Medium, 1 Low, 1 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

  • ✅ Author confirms passing run: "traces_e2e full round-trip — score 1.000 ✅" / "Verified 2026-05-05 locally against codereval/DefaultTenant" and "traces_fetch smoke — score 1.000 ✅" / "Verified via previous run — 2/2 criteria passed, ~20s."

Per-task lint

tests/tasks/uipath-platform/traces_fetch.yaml — verdict: Low

Issues:

  • [Low] Meaningful coverage (lines 20–35): both criteria are command_executed only — no output validation. Acceptable for a smoke test with a placeholder GUID (404 expected), but thinner than sibling integration-service tests which pair command_executed with run_command for structural validation. The e2e companion provides the real correctness gate.

Suggested fixes:

  • Consider adding a lightweight run_command criterion that asserts the CLI produced JSON output (even a 404 response is parseable JSON with Result and Message keys), e.g. a one-liner python3 -c "import json,sys; ..." that checks the command's stderr/stdout was valid JSON. This avoids self-reporting while adding a minimal correctness signal.

tests/tasks/uipath-platform/traces_e2e.yaml — verdict: OK

Prompt is now goal-only (good — procedural steps removed in 0be95c5). Five criteria with a strong mix: command_executed for CLI usage verification + file_exists + run_command via check_traces_e2e.py (weight 5.0) that validates real JSON structure and span count. Not gameable without actually running the commands against a real tenant.

Within-PR duplicates

  • No duplicate clusters detected. traces_fetch (smoke — command knowledge with placeholder GUID) and traces_e2e (e2e — full round-trip with real job and span validation) are a complementary smoke/e2e pair testing materially different operations.

Conclusion

  • ⚠ 1 task has issues, max severity Low. Advisory only — not blocking merge. The Low finding on traces_fetch.yaml (meaningful coverage) is expected for a smoke test and mitigated by the e2e companion.

@saksharthakkar saksharthakkar merged commit d2361d6 into main May 5, 2026
6 checks passed
@saksharthakkar saksharthakkar deleted the feat/traces-tests branch May 5, 2026 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants