Skip to content

fix(ci): close stale-bot gaps for re-marking and active-PR auto-close#60402

Merged
webjunkie merged 1 commit into
masterfrom
claude/trusting-pasteur-26DW6
May 28, 2026
Merged

fix(ci): close stale-bot gaps for re-marking and active-PR auto-close#60402
webjunkie merged 1 commit into
masterfrom
claude/trusting-pasteur-26DW6

Conversation

@webjunkie
Copy link
Copy Markdown
Contributor

Problem

Two gaps in the stale-PR workflow that have been silently misbehaving:

1. PRs unstaled by their author were never re-marked. actions/stale keeps a per-cycle "already processed this run" set: every issue/PR the action visits — even one it decides isn't stale and takes no action on — is added to the set and skipped for the rest of the cycle. The set only resets when the action gets all the way through the open issue + PR list. With operations-per-run: 250 and PostHog's volume (~7k open issues + ~200 open PRs), a single cycle was taking weeks to complete. So if an author removed the stale label and went idle, the bot would visit the PR once while it was still fresh, stamp it as processed, and not look again until the cycle finally rolled over — by which time the author had usually force-pushed and bumped updated_at again. The bot looked broken; it was actually just rate-limited by its own budget. Example: #54280 was unstaled on Apr 22 and sat completely idle for 36 days without ever being re-marked.

2. Pushing commits to a stale PR did not clear the label. remove-pr-stale-when-updated: false meant an author actively addressing review feedback would still get auto-closed seven days later. Not the behavior most contributors expect.

Changes

operations-per-run: 250 → 5000        # fits a full cycle in one run
remove-pr-stale-when-updated: false → true   # pushes / comments un-stale a PR
repo-token: ${{ steps.app-token.outputs.token }}   # 15k req/hr vs 1k via GITHUB_TOKEN

Piggybacks on the existing POSTHOG_SCHEDULED_ACTIONS GitHub App that already powers update-ai-costs, browserslist, and update-bot-ips — same actions/create-github-app-token SHA-pinned step. With the app token providing auth via repo-token, the workflow-level issues: write / pull-requests: write permissions on GITHUB_TOKEN are no longer needed and have been dropped (matching the auto-assign-reviewers pattern).

The holiday-skip behavior, message text, thresholds (7-day stale / 7-day close on PRs, 730 / 14 on issues), and waiting exempt label are all unchanged.

How did you test this code?

I'm Claude (Opus 4.7) — agent-authored, no manual run of the workflow.

  • Verified the actions/stale state model and per-cycle skip behavior by reading src/classes/issues-processor.ts in actions/stale@v10.2.0:
    • state.addIssueToProcessed(issue) is called on every visited issue regardless of whether the action mutated it
    • state.reset() only fires when issues.length <= 0 returns from the listing call (i.e. all issues exhausted)
    • Staleness decision is purely !_updatedSince(issue.updated_at, daysBeforeStale) — there's no exemption based on prior manual label removal, so the only reason re-marking wouldn't happen is the per-cycle skip set
  • Verified empirically against feat(ci): auto-size Django test shard counts from .test_durations #54280: stale label added Apr 22 08:06 UTC by github-actions[bot], manually removed same day, zero activity (no commits, no comments) for 36 days, no re-mark — fits the cycle-too-long hypothesis exactly
  • Cross-checked app choice: POSTHOG_SCHEDULED_ACTIONS is already used by 3 other scheduled workflows (update-ai-costs.yml, browserslist.yml, update-bot-ips.yml) — semantic fit for a daily cron, and avoids the naming mismatch of having stale comments appear under "Assign Reviewers"
  • Did not exercise the workflow live; it'll fire on the next 07:30 UTC Mon–Fri tick. The enable-statistics: true output should show whether a full cycle now completes in one run (look for "Statistics" block in the action log)

If 5000 turns out to still be too low, the budget can be bumped further without other changes.

Publish to changelog?

no

🤖 Agent context

Authored via Claude Code (Opus 4.7) in a session with @webjunkie investigating why the stale bot never re-marked #54280 after a manual stale label removal.

Decisions worth flagging:

  • Cycle-too-long was the actual gap, not operations-per-run exhaustion per se. Initial hypothesis was that the action gives up mid-list when budget runs out and loses its place — debunked by reading the source: state persists in the actions cache across runs. The real issue is that an issue visited as not stale still consumes its slot in the per-cycle "processed" set, so it isn't reconsidered until the whole list has been walked. With 250 ops/run that walk was taking many weeks. 5000 should comfortably fit one full pass in a single run for current repo size.
  • App-token over GITHUB_TOKEN: primary motivation is the 15× higher API rate limit, which the action will actually use when it's no longer artificially capped at 250 ops. The identity change (comments now come from the scheduled-actions app rather than github-actions[bot]) is incidental but matches what other scheduled workflows already do.
  • remove-pr-stale-when-updated flip is a separate bug from the cycle issue and could ship on its own, but the two changes are scoped narrowly enough together that bundling keeps the PR focused on "make the stale workflow actually behave the way contributors expect".
  • Considered but rejected: switching to a non-stock implementation with a since-filtered listing pattern. Out of scope and not warranted while the stock action's knobs cover the gap.

Reviewer ask: app-token piggybacking is fine to validate by inspection — the secret refs and SHA-pinned action match the existing pattern verbatim. The enable-statistics output on the first run will tell us whether 5000 is the right number.


Generated by Claude Code

The stale workflow had two gaps. First, `actions/stale` tracks visited
issues/PRs in a per-cycle state set, so a PR visited while still fresh
is skipped for the rest of the cycle even if it ages past the
threshold mid-cycle. With ~7k open issues and `operations-per-run: 250`,
one cycle was taking long enough that PRs manually unstaled by an
author were never re-evaluated before the author force-pushed weeks
later. Bumping the budget to 5000 plus the higher app-token API rate
limit lets a full cycle complete in a single run, so re-marking happens
on the next scheduled tick.

Second, `remove-pr-stale-when-updated: false` meant pushing commits to
a stale PR did NOT clear the label, so authors actively addressing
review feedback still got auto-closed seven days later. Flipping it to
true matches the behavior most contributors expect.

Piggybacks on the existing `POSTHOG_SCHEDULED_ACTIONS` GitHub App,
already used by update-ai-costs, browserslist, and update-bot-ips. The
app token also replaces the workflow-scoped `issues:`/`pull-requests:`
write permissions — the action now authenticates via `repo-token`
instead of `GITHUB_TOKEN`.
Copilot AI review requested due to automatic review settings May 28, 2026 09:54
@assign-reviewers-posthog assign-reviewers-posthog Bot requested a review from a team May 28, 2026 09:55
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Comments Outside Diff (1)

  1. .github/workflows/stale.yaml, line 12 (link)

    P2 Timeout may be tight on a large first run

    timeout-minutes: 10 was set when operations-per-run was 250. With the budget now at 5000 and the app token no longer artificially throttling the action, the first run after this fix could trigger a large batch of re-stale mutations (every PR unstaled mid-cycle that has since gone idle). At the app token's 15k req/hr rate ceiling that's ~4 ops/sec — 5000 operations would theoretically take ~20 min in the degenerate case. In practice most operations are cheap listing pages and the action won't hit the full budget on a normal day, but a post-holiday or post-fix burst could cause the job to time out and leave the action mid-cycle. Bumping to 30 min would eliminate this concern at no real cost.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: .github/workflows/stale.yaml
    Line: 12
    
    Comment:
    **Timeout may be tight on a large first run**
    
    `timeout-minutes: 10` was set when `operations-per-run` was 250. With the budget now at 5000 and the app token no longer artificially throttling the action, the first run after this fix could trigger a large batch of re-stale mutations (every PR unstaled mid-cycle that has since gone idle). At the app token's 15k req/hr rate ceiling that's ~4 ops/sec — 5000 operations would theoretically take ~20 min in the degenerate case. In practice most operations are cheap listing pages and the action won't hit the full budget on a normal day, but a post-holiday or post-fix burst could cause the job to time out and leave the action mid-cycle. Bumping to 30 min would eliminate this concern at no real cost.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
.github/workflows/stale.yaml:12
**Timeout may be tight on a large first run**

`timeout-minutes: 10` was set when `operations-per-run` was 250. With the budget now at 5000 and the app token no longer artificially throttling the action, the first run after this fix could trigger a large batch of re-stale mutations (every PR unstaled mid-cycle that has since gone idle). At the app token's 15k req/hr rate ceiling that's ~4 ops/sec — 5000 operations would theoretically take ~20 min in the degenerate case. In practice most operations are cheap listing pages and the action won't hit the full budget on a normal day, but a post-holiday or post-fix burst could cause the job to time out and leave the action mid-cycle. Bumping to 30 min would eliminate this concern at no real cost.

Reviews (1): Last reviewed commit: "fix(ci): close stale-bot gaps for re-mar..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the scheduled stale workflow to make stale PR handling more reliable by using a higher-rate-limit GitHub App token and allowing updated PRs to clear the stale label.

Changes:

  • Adds a POSTHOG_SCHEDULED_ACTIONS app-token step and passes it to actions/stale.
  • Drops write permissions from the workflow GITHUB_TOKEN.
  • Enables PR unstaling on update and raises operations-per-run.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

exempt-pr-labels: 'waiting'
enable-statistics: true
operations-per-run: 250
operations-per-run: 5000
@webjunkie webjunkie enabled auto-merge (squash) May 28, 2026 12:54
@webjunkie webjunkie merged commit d9f6faa into master May 28, 2026
141 checks passed
@webjunkie webjunkie deleted the claude/trusting-pasteur-26DW6 branch May 28, 2026 15:43
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented May 28, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-05-28 16:19 UTC Run
prod-us ✅ Deployed 2026-05-28 16:44 UTC Run
prod-eu ✅ Deployed 2026-05-28 16:47 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants