Skip to content

ci: raise agent audit turn limit and preserve logs#571

Merged
andreatgretel merged 5 commits intomainfrom
andreatgretel/fix/agentic-ci-test-health
Apr 28, 2026
Merged

ci: raise agent audit turn limit and preserve logs#571
andreatgretel merged 5 commits intomainfrom
andreatgretel/fix/agentic-ci-test-health

Conversation

@andreatgretel
Copy link
Copy Markdown
Contributor

@andreatgretel andreatgretel commented Apr 24, 2026

📋 Summary

The Friday test-health suite hit the 30-turn cap on its first-ever run (failure run 24880704245) and left no retrievable trace — /tmp/claude-audit-log.txt lives only on the self-hosted runner and the step summary came back empty. This bumps the turn budget so heavier recipes have room to finish and preserves the agent log as a workflow artifact on failures so future regressions are diagnosable.

Validated end-to-end via dispatch on this branch (run 24895909677): suite completed naturally in 34 turns (confirming the old 30-turn cap was genuinely too tight, not a loop) in ~5m27s.

🔗 Related Issue

N/A — follow-up on the agentic-ci-daily workflow introduced in #543.

🔄 Changes

  • Raise --max-turns from 30 to 50 in the Run audit recipe step
  • Switch --output-format from text to stream-json so agent events are emitted during the run instead of only at process exit; prefix the invocation with stdbuf -oL -eL to line-buffer the pipe
  • New Upload agent log step: uploads /tmp/claude-audit-log.txt and /tmp/audit-<suite>.md as a workflow artifact on failure (if: failure(), 14-day retention), with a name keyed by run_id and run_attempt so re-runs do not collide. Reuses the actions/upload-artifact@v7.0.1 SHA already pinned in build-notebooks.yml
  • Drop the raw-log <details> block from the Write job summary step — the tail would now emit unreadable NDJSON, and the audit report itself (already in the summary) carries the human-readable payload

🧪 Testing

  • Pre-commit hooks pass locally (yaml / trailing-whitespace / EOL)
  • YAML parses cleanly
  • End-to-end dispatch on this branch succeeded (run 24895909677)
  • make test — N/A, workflow-only change
  • Unit tests added/updated — N/A, workflow-only change
  • E2E tests added/updated — N/A

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated — N/A

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • .github/workflows/agentic-ci-daily.yml — the recipe is unchanged; the turn bump alone is what unblocks test-health. The artifact upload is gated on if: failure() (the log only materializes when we actually need to debug), and the name includes run_attempt so re-runs do not hit the upload-artifact unique-name constraint.

The Friday test-health audit hit the 30-turn cap on its first-ever run
(2026-04-24) and the agent log was discarded with the self-hosted
runner. Heavier recipes need more room, and the next failure should be
diagnosable.

- Raise --max-turns from 30 to 50
- Switch --output-format from text to stream-json so events are emitted
  during the run instead of only at process exit; prefix with
  stdbuf -oL -eL to line-buffer the pipe
- Upload /tmp/claude-audit-log.txt and /tmp/audit-<suite>.md as an
  artifact (if: always(), 14-day retention) using the upload-artifact
  SHA already pinned in build-notebooks.yml

Signed-off-by: Andre Manoel <amanoel@nvidia.com>
actions/upload-artifact@v4+ rejects duplicate names within a workflow,
and re-running a failed run reuses the same github.run_id. Append
github.run_attempt so re-runs upload successfully instead of failing at
the exact moment the artifact is most useful.

Found by Codex review of #571.

Signed-off-by: Andre Manoel <amanoel@nvidia.com>
Raise the bar for persisting the full verbose stream-json event log:
we only need it when we're actually debugging a failure, and the audit
report itself still lands in the step summary on success. Shrinks the
window where tool inputs, read file contents, or other verbose-stream
detail could end up in a 14-day artifact.

Addresses the minor privacy finding from Codex review of #571.

Signed-off-by: Andre Manoel <amanoel@nvidia.com>
With --output-format stream-json the previous tail -100 of the agent
log emitted raw NDJSON into the GH Actions UI summary, which is
unreadable. The audit report itself (/tmp/audit-<suite>.md) already
carries the human-readable payload, and the full event stream is
available as an on-failure artifact, so the raw tail was redundant and
worse than nothing for the summary surface.

Also rewords the fallback message to point at the artifact when no
report lands (typically a failure).

Signed-off-by: Andre Manoel <amanoel@nvidia.com>
@andreatgretel andreatgretel marked this pull request as ready for review April 24, 2026 18:41
@andreatgretel andreatgretel requested a review from a team as a code owner April 24, 2026 18:41
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

Bumps the --max-turns cap from 30 to 50, switches to stream-json output with stdbuf line-buffering for real-time log capture, and adds a failure-gated artifact upload step so future regressions are diagnosable. All three changes are well-motivated by the described test-health cap failure and are mechanically sound.

Confidence Score: 5/5

Safe to merge — changes are narrowly scoped to CI observability and turn budget; no logic, security, or correctness issues introduced.

No P0 or P1 findings. The three changes (turn limit, output format, artifact upload) are each independently correct: set -o pipefail preserves exit-code semantics through the pipe, if-no-files-found: ignore prevents spurious failures, and run_attempt in the artifact name prevents re-run collisions.

No files require special attention.

Important Files Changed

Filename Overview
.github/workflows/agentic-ci-daily.yml Turn limit raised to 50, output switched to stream-json with stdbuf line-buffering, and a failure-gated artifact upload step added — all changes are mechanically correct and well-scoped

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant Claude as claude CLI (stream-json)
    participant Log as /tmp/claude-audit-log.txt
    participant Report as /tmp/audit-{suite}.md
    participant Artifact as Artifact Store

    GHA->>Claude: stdbuf -oL -eL claude --max-turns 50 --output-format stream-json
    Claude-->>Log: NDJSON events (via tee, line-buffered)
    Claude-->>Report: audit-{suite}.md (written by agent tool calls)
    Claude-->>GHA: exit code (0 = success, non-zero = failure)

    alt Step succeeds
        GHA->>GHA: Update runner memory (last_run stamped)
        GHA->>GHA: Skip Upload agent log (if: failure() not met)
        GHA->>GHA: Write job summary (report from Report)
    else Step fails
        GHA->>GHA: Update runner memory (last_run not stamped)
        GHA->>Artifact: Upload claude-audit-log.txt + audit-{suite}.md (14-day retention)
        GHA->>GHA: Write job summary (fallback message)
    end
Loading

Reviews (2): Last reviewed commit: "Merge branch 'main' into andreatgretel/f..." | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown
Contributor

Review: PR #571 — ci: raise agent audit turn limit and preserve logs

Summary

Workflow-only change to .github/workflows/agentic-ci-daily.yml addressing a real regression on the Friday test-health recipe, which hit the 30-turn cap with no retrievable log. Four coordinated edits:

  1. --max-turns 30 → 50 (test-health needed 34; headroom on top).
  2. --output-format textstream-json, with stdbuf -oL -eL prepended so stream events flush through the tee pipe instead of buffering until exit.
  3. New Upload agent log step (artifact gated on if: failure(), 14-day retention, name keyed by suite + run_id + run_attempt).
  4. Removes the tail -100 /tmp/claude-audit-log.txt block from Write job summary — NDJSON would be noise in the summary; replaces it with a one-liner pointing at the artifact.

The diff is small (15/-14) and the changes are internally consistent: each edit is load-bearing for the failure-diagnosis goal stated in the PR.

Findings

Correctness

  • failure() fires correctly after if: always() steps. Upload agent log sits after Update runner memory (which is if: always()). GitHub's failure() returns true if any prior step failed regardless of intervening always() steps, so this behaves as intended when the audit step fails.
  • set -o pipefail preserved (line 160) — the audit step still exits non-zero when claude fails upstream of tee, which is what drives the failure() gate. Good.
  • stdbuf availability: part of GNU coreutils, present on all typical self-hosted Linux runners. No portability concern for this repo's [self-hosted, agentic-ci] runners.
  • Artifact name uniqueness: ${suite}-${run_id}-${run_attempt} is sufficient — re-runs bump run_attempt, and the matrix fan-out is disambiguated by suite. Matches the stated intent.
  • if-no-files-found: ignore: correct choice — /tmp/audit-${suite}.md may not exist if the agent died before writing it, and we still want to grab claude-audit-log.txt.

Potential issues / risks

  • Success-path log is discarded. Because the artifact is gated on if: failure(), a run that completes but produces a surprising report (e.g., new finding you'd want to reproduce from the event stream) leaves no trace — the markdown report goes to the step summary, but the stream-json transcript is gone. The PR's framing ("preserve on failure") is deliberate, but worth flagging: if future debugging needs the successful-run transcript, the gate will have to widen to if: always() or if: !cancelled(). Not a blocker.
  • 50 turns is still a guess. test-health hit 34 this run. Recipe cost will drift as the codebase grows; there's no monitoring on turn consumption. Consider a follow-up to surface turns_used in the job summary so you can see the headroom trend before the next cap-hit.
  • NDJSON tee output is no longer tail-friendly. If anyone SSHes into the runner to debug, tail /tmp/claude-audit-log.txt now shows raw event JSON. Minor DX regression, acceptable given artifact availability. A comment near the tee noting the format change would help the next person.
  • Summary pointer wording ("See the claude-audit-log-* artifact on failures") only appears when /tmp/audit-${SUITE}.md is empty. On a successful run with no report (which shouldn't happen but could), the summary still guides the reader to a failure-only artifact. Edge case only.

Conventions

  • actions/upload-artifact is pinned by SHA (043fb46d…) with version comment, matching the repo pattern seen elsewhere. ✓
  • YAML style (2-space indent, quoting, multi-line path: block) matches the surrounding file. ✓
  • No workflow-level permission or concurrency changes — scope-appropriate. ✓

Test coverage

  • Workflow-only; make test / unit tests correctly marked N/A.
  • End-to-end dispatch run 24895909677 referenced in the PR body demonstrates the 50-turn path completes and the pipe flushes live. Good validation for a change that can only be exercised in CI.

Security

  • No new secrets, no expanded permissions, no changes to GH_TOKEN scope or the ANTHROPIC_* handling. Artifact path list is explicit (no globs that could pick up unintended files under /tmp). ✓

Verdict

Approve-equivalent (no blocking issues). The change is minimal, well-scoped, and directly addresses the observed failure mode with evidence from a real dispatch run. Recommend merging; the notes above (success-path log retention, turn-headroom monitoring) are worth a follow-up issue but don't need to block this PR.

tail -100 /tmp/claude-audit-log.txt >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
echo "</details>" >> "$GITHUB_STEP_SUMMARY"
echo "No report generated. See the \`claude-audit-log-*\` artifact on failures for the full event stream." >> "$GITHUB_STEP_SUMMARY"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Misleading fallback message on success-with-no-report

The new fallback message tells users to check the claude-audit-log-* artifact, but the artifact upload is gated on if: failure(). If the audit step exits successfully yet no /tmp/audit-${SUITE}.md is produced (e.g. the agent ran to completion without writing the report), the summary points to an artifact that was never uploaded.

Suggested change
echo "No report generated. See the \`claude-audit-log-*\` artifact on failures for the full event stream." >> "$GITHUB_STEP_SUMMARY"
echo "No report generated." >> "$GITHUB_STEP_SUMMARY"

Alternatively, the message could be conditioned on the step outcome, but reverting to the neutral original text avoids the false reference entirely.

Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/agentic-ci-daily.yml
Line: 230

Comment:
**Misleading fallback message on success-with-no-report**

The new fallback message tells users to check the `claude-audit-log-*` artifact, but the artifact upload is gated on `if: failure()`. If the audit step exits successfully yet no `/tmp/audit-${SUITE}.md` is produced (e.g. the agent ran to completion without writing the report), the summary points to an artifact that was never uploaded.

```suggestion
            echo "No report generated." >> "$GITHUB_STEP_SUMMARY"
```

Alternatively, the message could be conditioned on the step outcome, but reverting to the neutral original text avoids the false reference entirely.

How can I resolve this? If you propose a fix, please make it concise.

@andreatgretel andreatgretel merged commit 482ab5a into main Apr 28, 2026
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants