ci: raise agent audit turn limit and preserve logs by andreatgretel · Pull Request #571 · NVIDIA-NeMo/DataDesigner

andreatgretel · 2026-04-24T14:49:10Z

📋 Summary

The Friday test-health suite hit the 30-turn cap on its first-ever run (failure run 24880704245) and left no retrievable trace — /tmp/claude-audit-log.txt lives only on the self-hosted runner and the step summary came back empty. This bumps the turn budget so heavier recipes have room to finish and preserves the agent log as a workflow artifact on failures so future regressions are diagnosable.

Validated end-to-end via dispatch on this branch (run 24895909677): suite completed naturally in 34 turns (confirming the old 30-turn cap was genuinely too tight, not a loop) in ~5m27s.

🔗 Related Issue

N/A — follow-up on the agentic-ci-daily workflow introduced in #543.

🔄 Changes

Raise --max-turns from 30 to 50 in the Run audit recipe step
Switch --output-format from text to stream-json so agent events are emitted during the run instead of only at process exit; prefix the invocation with stdbuf -oL -eL to line-buffer the pipe
New Upload agent log step: uploads /tmp/claude-audit-log.txt and /tmp/audit-<suite>.md as a workflow artifact on failure (if: failure(), 14-day retention), with a name keyed by run_id and run_attempt so re-runs do not collide. Reuses the actions/upload-artifact@v7.0.1 SHA already pinned in build-notebooks.yml
Drop the raw-log <details> block from the Write job summary step — the tail would now emit unreadable NDJSON, and the audit report itself (already in the summary) carries the human-readable payload

🧪 Testing

Pre-commit hooks pass locally (yaml / trailing-whitespace / EOL)
YAML parses cleanly
End-to-end dispatch on this branch succeeded (run 24895909677)
make test — N/A, workflow-only change
Unit tests added/updated — N/A, workflow-only change
E2E tests added/updated — N/A

✅ Checklist

Follows commit message conventions
Commits are signed off (DCO)
Architecture docs updated — N/A

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

.github/workflows/agentic-ci-daily.yml — the recipe is unchanged; the turn bump alone is what unblocks test-health. The artifact upload is gated on if: failure() (the log only materializes when we actually need to debug), and the name includes run_attempt so re-runs do not hit the upload-artifact unique-name constraint.

The Friday test-health audit hit the 30-turn cap on its first-ever run (2026-04-24) and the agent log was discarded with the self-hosted runner. Heavier recipes need more room, and the next failure should be diagnosable. - Raise --max-turns from 30 to 50 - Switch --output-format from text to stream-json so events are emitted during the run instead of only at process exit; prefix with stdbuf -oL -eL to line-buffer the pipe - Upload /tmp/claude-audit-log.txt and /tmp/audit-<suite>.md as an artifact (if: always(), 14-day retention) using the upload-artifact SHA already pinned in build-notebooks.yml Signed-off-by: Andre Manoel <amanoel@nvidia.com>

actions/upload-artifact@v4+ rejects duplicate names within a workflow, and re-running a failed run reuses the same github.run_id. Append github.run_attempt so re-runs upload successfully instead of failing at the exact moment the artifact is most useful. Found by Codex review of #571. Signed-off-by: Andre Manoel <amanoel@nvidia.com>

Raise the bar for persisting the full verbose stream-json event log: we only need it when we're actually debugging a failure, and the audit report itself still lands in the step summary on success. Shrinks the window where tool inputs, read file contents, or other verbose-stream detail could end up in a 14-day artifact. Addresses the minor privacy finding from Codex review of #571. Signed-off-by: Andre Manoel <amanoel@nvidia.com>

With --output-format stream-json the previous tail -100 of the agent log emitted raw NDJSON into the GH Actions UI summary, which is unreadable. The audit report itself (/tmp/audit-<suite>.md) already carries the human-readable payload, and the full event stream is available as an on-failure artifact, so the raw tail was redundant and worse than nothing for the summary surface. Also rewords the fallback message to point at the artifact when no report lands (typically a failure). Signed-off-by: Andre Manoel <amanoel@nvidia.com>

greptile-apps · 2026-04-24T18:43:00Z

Greptile Summary

Bumps the --max-turns cap from 30 to 50, switches to stream-json output with stdbuf line-buffering for real-time log capture, and adds a failure-gated artifact upload step so future regressions are diagnosable. All three changes are well-motivated by the described test-health cap failure and are mechanically sound.

Confidence Score: 5/5

Safe to merge — changes are narrowly scoped to CI observability and turn budget; no logic, security, or correctness issues introduced.

No P0 or P1 findings. The three changes (turn limit, output format, artifact upload) are each independently correct: set -o pipefail preserves exit-code semantics through the pipe, if-no-files-found: ignore prevents spurious failures, and run_attempt in the artifact name prevents re-run collisions.

No files require special attention.

Important Files Changed

Filename	Overview
.github/workflows/agentic-ci-daily.yml	Turn limit raised to 50, output switched to stream-json with stdbuf line-buffering, and a failure-gated artifact upload step added — all changes are mechanically correct and well-scoped

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant Claude as claude CLI (stream-json)
    participant Log as /tmp/claude-audit-log.txt
    participant Report as /tmp/audit-{suite}.md
    participant Artifact as Artifact Store

    GHA->>Claude: stdbuf -oL -eL claude --max-turns 50 --output-format stream-json
    Claude-->>Log: NDJSON events (via tee, line-buffered)
    Claude-->>Report: audit-{suite}.md (written by agent tool calls)
    Claude-->>GHA: exit code (0 = success, non-zero = failure)

    alt Step succeeds
        GHA->>GHA: Update runner memory (last_run stamped)
        GHA->>GHA: Skip Upload agent log (if: failure() not met)
        GHA->>GHA: Write job summary (report from Report)
    else Step fails
        GHA->>GHA: Update runner memory (last_run not stamped)
        GHA->>Artifact: Upload claude-audit-log.txt + audit-{suite}.md (14-day retention)
        GHA->>GHA: Write job summary (fallback message)
    end

_{Reviews (2): Last reviewed commit: "Merge branch 'main' into andreatgretel/f..." | Re-trigger Greptile}

github-actions · 2026-04-24T18:43:05Z

Review: PR #571 — ci: raise agent audit turn limit and preserve logs

Summary

Workflow-only change to .github/workflows/agentic-ci-daily.yml addressing a real regression on the Friday test-health recipe, which hit the 30-turn cap with no retrievable log. Four coordinated edits:

--max-turns 30 → 50 (test-health needed 34; headroom on top).
--output-format text → stream-json, with stdbuf -oL -eL prepended so stream events flush through the tee pipe instead of buffering until exit.
New Upload agent log step (artifact gated on if: failure(), 14-day retention, name keyed by suite + run_id + run_attempt).
Removes the tail -100 /tmp/claude-audit-log.txt block from Write job summary — NDJSON would be noise in the summary; replaces it with a one-liner pointing at the artifact.

The diff is small (15/-14) and the changes are internally consistent: each edit is load-bearing for the failure-diagnosis goal stated in the PR.

Findings

Correctness

failure() fires correctly after if: always() steps. Upload agent log sits after Update runner memory (which is if: always()). GitHub's failure() returns true if any prior step failed regardless of intervening always() steps, so this behaves as intended when the audit step fails.
set -o pipefail preserved (line 160) — the audit step still exits non-zero when claude fails upstream of tee, which is what drives the failure() gate. Good.
stdbuf availability: part of GNU coreutils, present on all typical self-hosted Linux runners. No portability concern for this repo's [self-hosted, agentic-ci] runners.
Artifact name uniqueness: ${suite}-${run_id}-${run_attempt} is sufficient — re-runs bump run_attempt, and the matrix fan-out is disambiguated by suite. Matches the stated intent.
if-no-files-found: ignore: correct choice — /tmp/audit-${suite}.md may not exist if the agent died before writing it, and we still want to grab claude-audit-log.txt.

Potential issues / risks

Success-path log is discarded. Because the artifact is gated on if: failure(), a run that completes but produces a surprising report (e.g., new finding you'd want to reproduce from the event stream) leaves no trace — the markdown report goes to the step summary, but the stream-json transcript is gone. The PR's framing ("preserve on failure") is deliberate, but worth flagging: if future debugging needs the successful-run transcript, the gate will have to widen to if: always() or if: !cancelled(). Not a blocker.
50 turns is still a guess. test-health hit 34 this run. Recipe cost will drift as the codebase grows; there's no monitoring on turn consumption. Consider a follow-up to surface turns_used in the job summary so you can see the headroom trend before the next cap-hit.
NDJSON tee output is no longer tail-friendly. If anyone SSHes into the runner to debug, tail /tmp/claude-audit-log.txt now shows raw event JSON. Minor DX regression, acceptable given artifact availability. A comment near the tee noting the format change would help the next person.
Summary pointer wording ("See the claude-audit-log-* artifact on failures") only appears when /tmp/audit-${SUITE}.md is empty. On a successful run with no report (which shouldn't happen but could), the summary still guides the reader to a failure-only artifact. Edge case only.

Conventions

actions/upload-artifact is pinned by SHA (043fb46d…) with version comment, matching the repo pattern seen elsewhere. ✓
YAML style (2-space indent, quoting, multi-line path: block) matches the surrounding file. ✓
No workflow-level permission or concurrency changes — scope-appropriate. ✓

Test coverage

Workflow-only; make test / unit tests correctly marked N/A.
End-to-end dispatch run 24895909677 referenced in the PR body demonstrates the 50-turn path completes and the pipe flushes live. Good validation for a change that can only be exercised in CI.

Security

No new secrets, no expanded permissions, no changes to GH_TOKEN scope or the ANTHROPIC_* handling. Artifact path list is explicit (no globs that could pick up unintended files under /tmp). ✓

Verdict

Approve-equivalent (no blocking issues). The change is minimal, well-scoped, and directly addresses the observed failure mode with evidence from a real dispatch run. Recommend merging; the notes above (success-path log retention, turn-headroom monitoring) are worth a follow-up issue but don't need to block this PR.

greptile-apps · 2026-04-24T18:43:04Z

-            tail -100 /tmp/claude-audit-log.txt >> "$GITHUB_STEP_SUMMARY"
-            echo '```' >> "$GITHUB_STEP_SUMMARY"
-            echo "</details>" >> "$GITHUB_STEP_SUMMARY"
+            echo "No report generated. See the \`claude-audit-log-*\` artifact on failures for the full event stream." >> "$GITHUB_STEP_SUMMARY"


Misleading fallback message on success-with-no-report

The new fallback message tells users to check the claude-audit-log-* artifact, but the artifact upload is gated on if: failure(). If the audit step exits successfully yet no /tmp/audit-${SUITE}.md is produced (e.g. the agent ran to completion without writing the report), the summary points to an artifact that was never uploaded.

Suggested change

echo "No report generated. See the \`claude-audit-log-*\` artifact on failures for the full event stream." >> "$GITHUB_STEP_SUMMARY"

echo "No report generated." >> "$GITHUB_STEP_SUMMARY"

Alternatively, the message could be conditioned on the step outcome, but reverting to the neutral original text avoids the false reference entirely.

Prompt To Fix With AI

This is a comment left during a code review. Path: .github/workflows/agentic-ci-daily.yml Line: 230 Comment: **Misleading fallback message on success-with-no-report** The new fallback message tells users to check the `claude-audit-log-*` artifact, but the artifact upload is gated on `if: failure()`. If the audit step exits successfully yet no `/tmp/audit-${SUITE}.md` is produced (e.g. the agent ran to completion without writing the report), the summary points to an artifact that was never uploaded. ```suggestion echo "No report generated." >> "$GITHUB_STEP_SUMMARY" ``` Alternatively, the message could be conditioned on the step outcome, but reverting to the neutral original text avoids the false reference entirely. How can I resolve this? If you propose a fix, please make it concise.

andreatgretel added 4 commits April 24, 2026 11:45

andreatgretel marked this pull request as ready for review April 24, 2026 18:41

andreatgretel requested a review from a team as a code owner April 24, 2026 18:41

andreatgretel temporarily deployed to agentic-ci April 24, 2026 18:41 — with GitHub Actions Inactive

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 27, 2026

Agentic CI: Issue & PR Triage Tracker #562

Open

nabinchha approved these changes Apr 27, 2026

View reviewed changes

Merge branch 'main' into andreatgretel/fix/agentic-ci-test-health

eaf9604

andreatgretel merged commit 482ab5a into main Apr 28, 2026
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: raise agent audit turn limit and preserve logs#571

ci: raise agent audit turn limit and preserve logs#571
andreatgretel merged 5 commits intomainfrom
andreatgretel/fix/agentic-ci-test-health

andreatgretel commented Apr 24, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading

Confidence Score: 5/5

Sequence Diagram

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

greptile-apps Bot Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	echo "No report generated. See the \`claude-audit-log-*\` artifact on failures for the full event stream." >> "$GITHUB_STEP_SUMMARY"
	echo "No report generated." >> "$GITHUB_STEP_SUMMARY"

Conversation

andreatgretel commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📋 Summary

🔗 Related Issue

🔄 Changes

🧪 Testing

✅ Checklist

🔍 Attention Areas

Uh oh!

greptile-apps Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

github-actions Bot commented Apr 24, 2026

Review: PR #571 — ci: raise agent audit turn limit and preserve logs

Summary

Findings

Correctness

Potential issues / risks

Conventions

Test coverage

Security

Verdict

Uh oh!

greptile-apps Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andreatgretel commented Apr 24, 2026 •

edited

Loading

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading