fix(ci): close stale-bot gaps for re-marking and active-PR auto-close by webjunkie · Pull Request #60402 · PostHog/posthog

webjunkie · 2026-05-28T09:54:41Z

Problem

Two gaps in the stale-PR workflow that have been silently misbehaving:

1. PRs unstaled by their author were never re-marked. actions/stale keeps a per-cycle "already processed this run" set: every issue/PR the action visits — even one it decides isn't stale and takes no action on — is added to the set and skipped for the rest of the cycle. The set only resets when the action gets all the way through the open issue + PR list. With operations-per-run: 250 and PostHog's volume (~7k open issues + ~200 open PRs), a single cycle was taking weeks to complete. So if an author removed the stale label and went idle, the bot would visit the PR once while it was still fresh, stamp it as processed, and not look again until the cycle finally rolled over — by which time the author had usually force-pushed and bumped updated_at again. The bot looked broken; it was actually just rate-limited by its own budget. Example: #54280 was unstaled on Apr 22 and sat completely idle for 36 days without ever being re-marked.

2. Pushing commits to a stale PR did not clear the label. remove-pr-stale-when-updated: false meant an author actively addressing review feedback would still get auto-closed seven days later. Not the behavior most contributors expect.

Changes

operations-per-run: 250 → 5000        # fits a full cycle in one run
remove-pr-stale-when-updated: false → true   # pushes / comments un-stale a PR
repo-token: ${{ steps.app-token.outputs.token }}   # 15k req/hr vs 1k via GITHUB_TOKEN

Piggybacks on the existing POSTHOG_SCHEDULED_ACTIONS GitHub App that already powers update-ai-costs, browserslist, and update-bot-ips — same actions/create-github-app-token SHA-pinned step. With the app token providing auth via repo-token, the workflow-level issues: write / pull-requests: write permissions on GITHUB_TOKEN are no longer needed and have been dropped (matching the auto-assign-reviewers pattern).

The holiday-skip behavior, message text, thresholds (7-day stale / 7-day close on PRs, 730 / 14 on issues), and waiting exempt label are all unchanged.

How did you test this code?

I'm Claude (Opus 4.7) — agent-authored, no manual run of the workflow.

Verified the actions/stale state model and per-cycle skip behavior by reading src/classes/issues-processor.ts in actions/stale@v10.2.0:
- state.addIssueToProcessed(issue) is called on every visited issue regardless of whether the action mutated it
- state.reset() only fires when issues.length <= 0 returns from the listing call (i.e. all issues exhausted)
- Staleness decision is purely !_updatedSince(issue.updated_at, daysBeforeStale) — there's no exemption based on prior manual label removal, so the only reason re-marking wouldn't happen is the per-cycle skip set
Verified empirically against feat(ci): auto-size Django test shard counts from .test_durations #54280: stale label added Apr 22 08:06 UTC by github-actions[bot], manually removed same day, zero activity (no commits, no comments) for 36 days, no re-mark — fits the cycle-too-long hypothesis exactly
Cross-checked app choice: POSTHOG_SCHEDULED_ACTIONS is already used by 3 other scheduled workflows (update-ai-costs.yml, browserslist.yml, update-bot-ips.yml) — semantic fit for a daily cron, and avoids the naming mismatch of having stale comments appear under "Assign Reviewers"
Did not exercise the workflow live; it'll fire on the next 07:30 UTC Mon–Fri tick. The enable-statistics: true output should show whether a full cycle now completes in one run (look for "Statistics" block in the action log)

If 5000 turns out to still be too low, the budget can be bumped further without other changes.

Publish to changelog?

no

🤖 Agent context

Authored via Claude Code (Opus 4.7) in a session with @webjunkie investigating why the stale bot never re-marked #54280 after a manual stale label removal.

Decisions worth flagging:

Cycle-too-long was the actual gap, not operations-per-run exhaustion per se. Initial hypothesis was that the action gives up mid-list when budget runs out and loses its place — debunked by reading the source: state persists in the actions cache across runs. The real issue is that an issue visited as not stale still consumes its slot in the per-cycle "processed" set, so it isn't reconsidered until the whole list has been walked. With 250 ops/run that walk was taking many weeks. 5000 should comfortably fit one full pass in a single run for current repo size.
App-token over GITHUB_TOKEN: primary motivation is the 15× higher API rate limit, which the action will actually use when it's no longer artificially capped at 250 ops. The identity change (comments now come from the scheduled-actions app rather than github-actions[bot]) is incidental but matches what other scheduled workflows already do.
remove-pr-stale-when-updated flip is a separate bug from the cycle issue and could ship on its own, but the two changes are scoped narrowly enough together that bundling keeps the PR focused on "make the stale workflow actually behave the way contributors expect".
Considered but rejected: switching to a non-stock implementation with a since-filtered listing pattern. Out of scope and not warranted while the stock action's knobs cover the gap.

Reviewer ask: app-token piggybacking is fine to validate by inspection — the secret refs and SHA-pinned action match the existing pattern verbatim. The enable-statistics output on the first run will tell us whether 5000 is the right number.

Generated by Claude Code

The stale workflow had two gaps. First, `actions/stale` tracks visited issues/PRs in a per-cycle state set, so a PR visited while still fresh is skipped for the rest of the cycle even if it ages past the threshold mid-cycle. With ~7k open issues and `operations-per-run: 250`, one cycle was taking long enough that PRs manually unstaled by an author were never re-evaluated before the author force-pushed weeks later. Bumping the budget to 5000 plus the higher app-token API rate limit lets a full cycle complete in a single run, so re-marking happens on the next scheduled tick. Second, `remove-pr-stale-when-updated: false` meant pushing commits to a stale PR did NOT clear the label, so authors actively addressing review feedback still got auto-closed seven days later. Flipping it to true matches the behavior most contributors expect. Piggybacks on the existing `POSTHOG_SCHEDULED_ACTIONS` GitHub App, already used by update-ai-costs, browserslist, and update-bot-ips. The app token also replaces the workflow-scoped `issues:`/`pull-requests:` write permissions — the action now authenticates via `repo-token` instead of `GITHUB_TOKEN`.

greptile-apps · 2026-05-28T09:57:23Z

Comments Outside Diff (1)

.github/workflows/stale.yaml, line 12 (link)

Timeout may be tight on a large first run

timeout-minutes: 10 was set when operations-per-run was 250. With the budget now at 5000 and the app token no longer artificially throttling the action, the first run after this fix could trigger a large batch of re-stale mutations (every PR unstaled mid-cycle that has since gone idle). At the app token's 15k req/hr rate ceiling that's ~4 ops/sec — 5000 operations would theoretically take ~20 min in the degenerate case. In practice most operations are cheap listing pages and the action won't hit the full budget on a normal day, but a post-holiday or post-fix burst could cause the job to time out and leave the action mid-cycle. Bumping to 30 min would eliminate this concern at no real cost.
Prompt To Fix With AI
```
This is a comment left during a code review.
Path: .github/workflows/stale.yaml
Line: 12

Comment:
**Timeout may be tight on a large first run**

`timeout-minutes: 10` was set when `operations-per-run` was 250. With the budget now at 5000 and the app token no longer artificially throttling the action, the first run after this fix could trigger a large batch of re-stale mutations (every PR unstaled mid-cycle that has since gone idle). At the app token's 15k req/hr rate ceiling that's ~4 ops/sec — 5000 operations would theoretically take ~20 min in the degenerate case. In practice most operations are cheap listing pages and the action won't hit the full budget on a normal day, but a post-holiday or post-fix burst could cause the job to time out and leave the action mid-cycle. Bumping to 30 min would eliminate this concern at no real cost.

How can I resolve this? If you propose a fix, please make it concise.
```
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
.github/workflows/stale.yaml:12
**Timeout may be tight on a large first run**

`timeout-minutes: 10` was set when `operations-per-run` was 250. With the budget now at 5000 and the app token no longer artificially throttling the action, the first run after this fix could trigger a large batch of re-stale mutations (every PR unstaled mid-cycle that has since gone idle). At the app token's 15k req/hr rate ceiling that's ~4 ops/sec — 5000 operations would theoretically take ~20 min in the degenerate case. In practice most operations are cheap listing pages and the action won't hit the full budget on a normal day, but a post-holiday or post-fix burst could cause the job to time out and leave the action mid-cycle. Bumping to 30 min would eliminate this concern at no real cost.

_{Reviews (1): Last reviewed commit: "fix(ci): close stale-bot gaps for re-mar..." | Re-trigger Greptile}

Copilot

Pull request overview

This PR updates the scheduled stale workflow to make stale PR handling more reliable by using a higher-rate-limit GitHub App token and allowing updated PRs to clear the stale label.

Changes:

Adds a POSTHOG_SCHEDULED_ACTIONS app-token step and passes it to actions/stale.
Drops write permissions from the workflow GITHUB_TOKEN.
Enables PR unstaling on update and raises operations-per-run.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

                  exempt-pr-labels: 'waiting'
                  enable-statistics: true
-                  operations-per-run: 250
+                  operations-per-run: 5000


deployment-status-posthog · 2026-05-28T16:19:56Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-05-28 16:19 UTC	Run
prod-us	✅ Deployed	2026-05-28 16:44 UTC	Run
prod-eu	✅ Deployed	2026-05-28 16:47 UTC	Run

Copilot AI review requested due to automatic review settings May 28, 2026 09:54

Copilot started reviewing on behalf of webjunkie May 28, 2026 09:54 View session

assign-reviewers-posthog Bot requested a review from a team May 28, 2026 09:55

Copilot AI reviewed May 28, 2026

View reviewed changes

Comment thread .github/workflows/stale.yaml

exempt-pr-labels: 'waiting'

enable-statistics: true

operations-per-run: 250

operations-per-run: 5000

webjunkie enabled auto-merge (squash) May 28, 2026 12:54

gantoine approved these changes May 28, 2026

View reviewed changes

webjunkie merged commit d9f6faa into master May 28, 2026
141 checks passed

webjunkie deleted the claude/trusting-pasteur-26DW6 branch May 28, 2026 15:43

webjunkie mentioned this pull request May 29, 2026

chore(ci): allow manual dispatch and raise timeout for stale workflow #60607

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): close stale-bot gaps for re-marking and active-PR auto-close#60402

fix(ci): close stale-bot gaps for re-marking and active-PR auto-close#60402
webjunkie merged 1 commit into
masterfrom
claude/trusting-pasteur-26DW6

webjunkie commented May 28, 2026

Uh oh!

greptile-apps Bot commented May 28, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

webjunkie commented May 28, 2026

Problem

Changes

How did you test this code?

Publish to changelog?

🤖 Agent context

Uh oh!

greptile-apps Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments Outside Diff (1)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

greptile-apps Bot commented May 28, 2026 •

edited

Loading

deployment-status-posthog Bot commented May 28, 2026 •

edited

Loading