Optimize GitHub API call volume to prevent secondary rate limits

## Problem

During pipeline runs with heavy GitHub interaction (posting inline review-pr comments, responding via resolve-review, and PR operations), we've hit GitHub's secondary rate limits (403/429 responses). Issue #892 addressed token fallback and retry resilience, but the root cause — **excessive API call volume** — was never addressed.

## Context: GitHub Secondary Rate Limits

Per [GitHub docs](https://docs.github.com/en/rest/using-the-rest-api/best-practices-for-using-the-rest-api):

- **80 content-generating requests per minute** (broader than just POST/PATCH/PUT/DELETE — includes any content-generating action)
- **900 points/minute** for REST, **2000 points/minute** for GraphQL
- Mutating requests cost **5 points each** (REST POST/PATCH/PUT/DELETE and GraphQL mutations)
- GET/HEAD/OPTIONS cost **1 point each**; GraphQL queries (no mutations) cost **1 point**
- **Recommended**: wait at least 1 second between POST/PATCH/PUT/DELETE operations (not a hard rule, but violating increases secondary rate limit risk)
- Requests should be **serialized, not concurrent**
- Use **conditional requests** (ETags/If-Modified-Since) for repeated GETs — 304 responses don't count against primary limits (note: ETags only work on REST GET requests, NOT on GraphQL POST endpoints)
- No more than 100 concurrent requests allowed
- **GraphQL aliases count as 1 request** — a single GraphQL request with N mutation aliases costs 5 points total, not N×5. This makes alias-based batching extremely high-leverage.

## Audit Results: Skills by API Call Volume

A full audit of all skills' GitHub API usage was performed and **independently validated against the codebase**. Here are the findings:

### Zero API Call Skills (no concern)
These skills read only local files (sessions.jsonl, Channel B JSONL, plan files, source code):
- `audit-bugs`, `audit-friction`, `audit-arch`, `audit-cohesion`
- `audit-gaps-gap-plan`, `audit-gaps-gap-review`, `audit-gaps-synthesize`

### Low API Call Skills (minimal concern)
These skills make a small number of GitHub API calls (issue list/view) but are not rate-limit-sensitive:
- `dry-walkthrough` — `gh issue list` for context gathering
- `implement-worktree`, `implement-worktree-no-merge` — occasional `gh` for issue references
- `validate-audit` — `gh issue list --state all --search`
- `diagnose-ci` — `gh api .../actions/runs/.../jobs` + `gh api .../actions/jobs/.../logs`

### High-Volume Skills (primary concern)

#### review-pr — ~10-50 API calls per PR
| Call | Count | Points |
|------|-------|--------|
| `gh api user -q .login` | 1 | 1 |
| `gh pr list --head` | 1 | 1 |
| GraphQL `reviewThreads(first:100)` | 1 | 1 |
| `gh pr diff` | 1 | 1 |
| `gh repo view` | 1 | 1 |
| `gh pr view` (headRefName/baseRefName) | 1 | 1 |
| `gh api compare/{base}...{head}` | 1 | 1 |
| `gh pr view --json files` | 1 | 1 |
| **POST** `/pulls/{N}/reviews` (batch inline comments) | 1 | 5 |
| **POST** `/pulls/{N}/comments` (Tier 1 fallback, per-finding) | 0-N | 5 each |
| `gh pr review --approve\|--request-changes` | 1 | 5 |

**Current state**: The primary path (Step 6) already uses the batch endpoint `POST /pulls/{N}/reviews` with a `comments[]` array. This is correct.

**Problem**: The Tier 1 fallback (triggered when the batch POST fails) posts each finding individually via `POST /pulls/{N}/comments` — each costing 5 points. With 20 findings, that's 100 points in rapid succession with zero delay between calls.

#### Fallback trigger analysis (validated against 57 session logs)

The Tier 1 fallback fires in **~9% of runs** (5/57 sessions). Two distinct causes were observed:

**Cause A — Own-PR `REQUEST_CHANGES` 422 (3/5 sessions):** The bot always reviews its own PRs (standard deployment model), so GitHub rejects `event: REQUEST_CHANGES` with HTTP 422 `"Review Cannot request changes on your own pull request"`. The SKILL.md instructs retry with `event: COMMENT`, but in some cases the model falls through to Tier 1 anyway. Since own-PR review is the **expected default**, this trigger is not an edge case — the `REQUEST_CHANGES` → `COMMENT` retry path must be hardened so it reliably succeeds without falling through to Tier 1. (Future deployments may review other users' PRs, where `REQUEST_CHANGES` would be valid, so the event logic should remain flexible.)

**Cause B — Response-parsing false alarm (2/5 sessions, BUG):** The batch POST *succeeds* (HTTP 200, review created), but the model checks `len(response.get("comments", []))` and gets 0. GitHub's `POST /pulls/{N}/reviews` response does NOT echo back the `comments` array — it returns the review object without inline comments. The model misreads "0 comments in response" as "0 comments posted" and unnecessarily fires Tier 1, **creating duplicate comments on GitHub**.

**Conclusion**: The fallback cannot be removed — Cause A is the standard operating mode (own-PR reviews). But once both fixes land (Cause B bug fix + hardened `COMMENT` retry for Cause A), the Tier 1 fallback should almost never fire. The `VALID_LINE_RANGES` filter in Step 4 is effective — no 422 from invalid line numbers was observed in any of the 57 sessions. The remaining latent risk is race conditions (PR head changes between diff fetch and review POST), which is not currently guarded.

**Fixes**:
1. **Fix the response-parsing bug** — remove `len(d.get("comments", []))` check. The batch endpoint always returns 0 for this field; presence of inline comments must be verified separately (e.g., via a follow-up GET or by trusting the 200 response).
2. If batch fails for legitimate reasons, retry with filtered payload (remove offending comment) before falling back to individual POSTs
3. If individual POSTs are still needed, add 1-second delays between them

#### resolve-review — ~15-80 API calls per PR (CONFIRMED)
| Call | Count | Points |
|------|-------|--------|
| `gh pr list --head` | 1 | 1 |
| `gh repo view` | 1 | 1 |
| `GET /pulls/{N}/comments --paginate` | 1-3 | 1 each |
| `GET /pulls/{N}/reviews --paginate` | 1-3 | 1 each |
| GraphQL `reviewThreads(first:100)` | 1 | 1 |
| **POST** `/pulls/{N}/comments/{id}/replies` | N per finding | 5 each |
| GraphQL `resolveReviewThread` mutation | N per thread | 5 each |

**Confirmed**: Replies are posted one at a time (Step 6.5). Thread resolutions are individual GraphQL mutations (Step 6). No delays between any mutating calls. With 15 threads and 15 replies, that's ~150 points in rapid succession.

**Fixes**:
1. Batch thread resolution via GraphQL aliases: `mutation { t1: resolveReviewThread(...) t2: resolveReviewThread(...) }` — one request = 5 points total (confirmed: aliases don't multiply point cost)
2. Add 1-second delays between reply POSTs (no batch API exists for comment replies)

#### analyze-prs — ~5 calls per PR × N PRs (CONFIRMED)
| Call | Count | Points |
|------|-------|--------|
| `gh pr list --base` | 1 | 1 |
| `gh pr diff {N}` | N per PR | 1 each |
| `gh pr view {N} --json files` | N per PR | 1 each |
| `gh pr view {N} --json body` | N per PR | 1 each |
| `gh pr checks {N}` | N per PR | 1 each |
| `gh pr view {N} --json reviews` | N per PR | 1 each |

**Confirmed**: Step 1 parallelizes 3 reads/PR in batches of 8. Step 1.5 adds 2 more reads per PR in a sequential shell loop. `pr_gates.py:partition_prs` already accepts pre-fetched `checks_by_number` and `reviews_by_number` dicts — the Python layer is ready for batched input.

**Fix**: Use GraphQL to batch multiple PR queries: `query { pr1: pullRequest(number:1){...} pr2: pullRequest(number:2){...} }` — one call for all PRs. This works directly with `pr_gates.py`'s existing interface.

#### open-integration-pr — ~5-10 calls per collapsed PR
| Call | Count | Points |
|------|-------|--------|
| `gh pr view {N} --json body` (Step 3) | N per PR | 1 each |
| `gh pr close {N} --comment` (Step 10) | N per PR | 5 each |

**Fix**: Batch PR body fetches via GraphQL aliases. Add 1-second delay between `gh pr close` calls.

### Medium-Volume Skills

#### triage-issues — N+2 mutating calls
- `gh issue list` (1 bulk read)
- `gh label create --force` (2 POSTs, idempotent)
- `gh issue edit {N} --add-label` — **1 PATCH per issue, no delay** (5 pts each)

#### enrich-issues — 2 mutating calls per issue
- `gh issue view N --json body` + `gh issue edit N --body-file` per enriched issue

#### collapse-issues — 3 mutating calls per original
- `gh issue create` + `gh issue comment {orig}` + `gh issue close {orig}` per collapsed original

#### issue-splitter — 3-5 mutating calls per split
- `gh issue create` per sub-issue + `gh issue edit --add-label "split"` + `gh issue comment`

#### prepare-issue — 2-3 mutating calls
- `gh issue create` or `gh issue edit` + 2-3 `gh issue edit --add-label` calls

### Server Tools (Python — mutating call sites)
- `tools_pr_ops.py:bulk_close_issues` — async `_close_issues_sequentially` fires sequential `gh issue close` with **no delay** between awaits
- `execution/github.py:DefaultGitHubFetcher` — all mutating methods (`create_issue`, `add_comment`, `add_labels`, `remove_label`, `ensure_label`) are **async** and fire immediately with no rate-limit awareness
- `tools_issue_lifecycle.py:claim_issue` — 3 async API calls per claim (fetch + ensure_label + add_labels). The `ensure_label` call could be **session-cached** since label existence is stable within a run (reduces to 2 calls after first invocation)

### Cross-Cutting Finding: Zero Rate-Limit Awareness

**No code in the entire codebase** reads or acts on `X-RateLimit-Remaining`, `X-RateLimit-Reset`, or `Retry-After` headers. The only backoff mechanism is `_jittered_sleep` in `ci.py`, which is a CI-polling delay — not a rate-limit response. No ETags or conditional requests are used anywhere.

## Implementation Safety Assessment

All proposed changes have been validated for safety:

- **Timeout risk: NONE.** Hard session timeout is 7200s (2 hours). Stale/idle thresholds (1200s/600s) only fire on output silence. Adding 20-60 seconds of API delays is negligible. No recipe defines per-skill budgets.
- **Async compatibility: CONFIRMED.** All server tool functions (`bulk_close_issues`, `DefaultGitHubFetcher` methods, `claim_issue`) are already `async`. Adding `await asyncio.sleep(1)` requires no signature changes.
- **Test impact: MINIMAL.** Existing tests mock `_run_subprocess` with `AsyncMock` and don't assert timing. Tests will need `asyncio.sleep` patched to avoid slowdown, but won't break.
- **ETag limitation: GraphQL is POST-only.** ETags only work on REST GET requests. `merge_queue.py`'s GraphQL polling cannot benefit from conditional requests. Only `ci.py` REST GET polling is a candidate for ETags.

## Optimization Priorities

### P0 — Fix response-parsing bug + harden own-PR retry (review-pr)
Three changes:
1. **Fix the response-parsing bug**: Remove the `len(d.get("comments", []))` check that causes false-alarm Tier 1 triggers and duplicate comments. Trust the HTTP 200 response from the batch endpoint.
2. **Harden the `REQUEST_CHANGES` → `COMMENT` retry**: Since own-PR is the standard deployment, this retry path fires on every `changes_requested` verdict. Ensure it reliably completes without falling through to Tier 1.
3. **Throttle the fallback**: If Tier 1 individual POSTs are still needed (future edge cases), add 1-second delays between them.
- **Impact**: Eliminates ~40% of current fallback triggers (false alarms), prevents ~60% (hardened retry), throttles the remainder
- **Files**: `src/autoskillit/skills_extended/review-pr/SKILL.md`

### P1 — Batch thread resolution (resolve-review)
Use GraphQL mutation aliases to resolve multiple threads in one request.
- **Current**: N individual `resolveReviewThread` mutations (5 pts each, no delay)
- **Target**: 1 GraphQL request with N aliases (5 pts total)
- **Impact**: Reduces N×5 pts to 5 pts per resolve cycle
- **Files**: `src/autoskillit/skills_extended/resolve-review/SKILL.md`

### P2 — Add delays between mutating calls (all skills + server tools)
Add 1-second delays between POST/PATCH/PUT/DELETE calls. All affected functions are already async — `await asyncio.sleep(1)` requires no interface changes.
- **Files**:
  - `src/autoskillit/server/tools_pr_ops.py` — `bulk_close_issues` / `_close_issues_sequentially`
  - `src/autoskillit/execution/github.py` — callers of mutating methods in `DefaultGitHubFetcher`
  - `src/autoskillit/skills_extended/resolve-review/SKILL.md` — reply POSTs (Step 6.5)
  - `src/autoskillit/skills_extended/triage-issues/SKILL.md` — label application
  - `src/autoskillit/skills_extended/collapse-issues/SKILL.md` — close+comment loops
  - `src/autoskillit/skills_extended/issue-splitter/SKILL.md` — create loops
  - `src/autoskillit/skills_extended/enrich-issues/SKILL.md` — edit loops
  - `src/autoskillit/skills_extended/open-integration-pr/SKILL.md` — close loops
- **Test note**: Tests using `AsyncMock` for `_run_subprocess` should also mock `asyncio.sleep` to avoid slowdown

### P3 — GraphQL multi-entity batching (analyze-prs, open-integration-pr)
Replace per-PR `gh pr view` loops with single GraphQL query using aliases.
- **Current**: N sequential `gh pr view` calls per PR
- **Target**: 1 GraphQL query with N aliases per batch
- **Impact**: Reduces N calls to 1 per batch (up to ~50 entities per query)
- **Files**:
  - `src/autoskillit/skills_extended/analyze-prs/SKILL.md` — Step 1.5 sequential loop
  - `src/autoskillit/skills_extended/open-integration-pr/SKILL.md` — Step 3 body fetches
- **Note**: `pipeline/pr_gates.py:partition_prs` already accepts pre-fetched dicts — Python layer is ready

### P4 — Pre-fetch entity lists + cache labels (triage/process/enrich + claim_issue)
Move `gh issue list` / `gh pr list` bulk fetches to pre-scan steps. Pass results via manifest files.
- Already applied in `audit-gaps` recipe (pre-scan fetches all issues once)
- Apply pattern to: `triage-issues`, `process-issues`, `enrich-issues`
- Cache `ensure_label` results in `DefaultGitHubFetcher` (session-scoped set of `(owner, repo, label)` tuples) — reduces `claim_issue` from 3 API calls to 2 on all invocations after the first

### P5 — Conditional requests for REST GET polling (CI only)
Use ETags / If-Modified-Since for repeated REST GET calls when polling CI status.
- 304 responses don't count against primary rate limits
- **Only viable for REST GET endpoints** — GitHub does not support ETags on GraphQL POST requests
- **Files**:
  - `src/autoskillit/execution/ci.py` — CI status polling (unconditional GETs today; `_jittered_sleep` pattern is reusable for the delay utility)
- ~~`src/autoskillit/execution/merge_queue.py`~~ — NOT a candidate (GraphQL POST, ETags not supported)

## Complete Affected Files List

| File | Priority | Change Type |
|------|----------|-------------|
| `src/autoskillit/skills_extended/review-pr/SKILL.md` | P0 | Fix response-parsing bug, throttle fallback |
| `src/autoskillit/skills_extended/resolve-review/SKILL.md` | P1 | Batch mutations via GraphQL aliases + reply delays |
| `src/autoskillit/server/tools_pr_ops.py` | P2 | Add `asyncio.sleep(1)` in `_close_issues_sequentially` |
| `src/autoskillit/execution/github.py` | P2 | Add delay utility; callers add delay between mutating calls |
| `src/autoskillit/skills_extended/triage-issues/SKILL.md` | P2/P4 | Label delays + pre-fetch pattern |
| `src/autoskillit/skills_extended/collapse-issues/SKILL.md` | P2 | Close delays |
| `src/autoskillit/skills_extended/issue-splitter/SKILL.md` | P2 | Create delays |
| `src/autoskillit/skills_extended/enrich-issues/SKILL.md` | P2/P4 | Edit delays + pre-fetch |
| `src/autoskillit/skills_extended/open-integration-pr/SKILL.md` | P3 | GraphQL batch body fetches + close delays |
| `src/autoskillit/skills_extended/analyze-prs/SKILL.md` | P3 | GraphQL batch Step 1.5 |
| `src/autoskillit/pipeline/pr_gates.py` | P3 | Already ready (accepts pre-fetched dicts) |
| `src/autoskillit/server/tools_issue_lifecycle.py` | P4 | Cache `ensure_label` per session |
| `src/autoskillit/execution/ci.py` | P5 | Add ETag/conditional GET for REST polling |

## Post-Implementation: CLAUDE.md Rules

Once implemented, add the following to CLAUDE.md as §3.5 "GitHub API Call Discipline":

- **Batch inline review comments**: Always use `POST /pulls/{N}/reviews` with a `comments[]` array. Never post individual comments via `POST /pulls/{N}/comments` — each is 5 rate-limit points. One batch POST = 5 points regardless of comment count.
- **Batch GraphQL mutations**: Use GraphQL aliases to resolve multiple threads or fetch multiple entities in a single request. Never loop individual mutations.
- **Delay between mutating calls**: Space POST/PATCH/PUT/DELETE requests at least 1 second apart to avoid secondary rate limits.
- **Pre-fetch entity lists**: Use `gh issue list`/`gh pr list` with broad filters to get bulk data upfront. Pass results via manifest files — never have each subagent re-fetch the same list.
- **Use `--json` field selection**: Always specify only the fields needed.
- **Prefer GraphQL for multi-entity reads**: Replace per-entity `gh pr view` loops with a single `gh api graphql` query using aliases.

## Related

- #892 (closed) — GitHub API resilience: token fallback, rate limit handling
- #166 — Handle GitHub billing limit errors in CI check steps


Call	Count	Points
`gh api user -q .login`	1	1
`gh pr list --head`	1	1
GraphQL `reviewThreads(first:100)`	1	1
`gh pr diff`	1	1
`gh repo view`	1	1
`gh pr view` (headRefName/baseRefName)	1	1
`gh api compare/{base}...{head}`	1	1
`gh pr view --json files`	1	1
POST `/pulls/{N}/reviews` (batch inline comments)	1	5
POST `/pulls/{N}/comments` (Tier 1 fallback, per-finding)	0-N	5 each
`gh pr review --approve\|--request-changes`	1	5

Call	Count	Points
`gh pr list --head`	1	1
`gh repo view`	1	1
`GET /pulls/{N}/comments --paginate`	1-3	1 each
`GET /pulls/{N}/reviews --paginate`	1-3	1 each
GraphQL `reviewThreads(first:100)`	1	1
POST `/pulls/{N}/comments/{id}/replies`	N per finding	5 each
GraphQL `resolveReviewThread` mutation	N per thread	5 each

Call	Count	Points
`gh pr list --base`	1	1
`gh pr diff {N}`	N per PR	1 each
`gh pr view {N} --json files`	N per PR	1 each
`gh pr view {N} --json body`	N per PR	1 each
`gh pr checks {N}`	N per PR	1 each
`gh pr view {N} --json reviews`	N per PR	1 each

Call	Count	Points
`gh pr view {N} --json body` (Step 3)	N per PR	1 each
`gh pr close {N} --comment` (Step 10)	N per PR	5 each

File	Priority	Change Type
`src/autoskillit/skills_extended/review-pr/SKILL.md`	P0	Fix response-parsing bug, throttle fallback
`src/autoskillit/skills_extended/resolve-review/SKILL.md`	P1	Batch mutations via GraphQL aliases + reply delays
`src/autoskillit/server/tools_pr_ops.py`	P2	Add `asyncio.sleep(1)` in `_close_issues_sequentially`
`src/autoskillit/execution/github.py`	P2	Add delay utility; callers add delay between mutating calls
`src/autoskillit/skills_extended/triage-issues/SKILL.md`	P2/P4	Label delays + pre-fetch pattern
`src/autoskillit/skills_extended/collapse-issues/SKILL.md`	P2	Close delays
`src/autoskillit/skills_extended/issue-splitter/SKILL.md`	P2	Create delays
`src/autoskillit/skills_extended/enrich-issues/SKILL.md`	P2/P4	Edit delays + pre-fetch
`src/autoskillit/skills_extended/open-integration-pr/SKILL.md`	P3	GraphQL batch body fetches + close delays
`src/autoskillit/skills_extended/analyze-prs/SKILL.md`	P3	GraphQL batch Step 1.5
`src/autoskillit/pipeline/pr_gates.py`	P3	Already ready (accepts pre-fetched dicts)
`src/autoskillit/server/tools_issue_lifecycle.py`	P4	Cache `ensure_label` per session
`src/autoskillit/execution/ci.py`	P5	Add ETag/conditional GET for REST polling

Optimize GitHub API call volume to prevent secondary rate limits #992

Description

Problem

Context: GitHub Secondary Rate Limits

Audit Results: Skills by API Call Volume

Zero API Call Skills (no concern)

Low API Call Skills (minimal concern)

High-Volume Skills (primary concern)

review-pr — ~10-50 API calls per PR

Fallback trigger analysis (validated against 57 session logs)

resolve-review — ~15-80 API calls per PR (CONFIRMED)

analyze-prs — ~5 calls per PR × N PRs (CONFIRMED)

open-integration-pr — ~5-10 calls per collapsed PR

Medium-Volume Skills

triage-issues — N+2 mutating calls

enrich-issues — 2 mutating calls per issue

collapse-issues — 3 mutating calls per original

issue-splitter — 3-5 mutating calls per split

prepare-issue — 2-3 mutating calls

Server Tools (Python — mutating call sites)

Cross-Cutting Finding: Zero Rate-Limit Awareness

Implementation Safety Assessment

Optimization Priorities

P0 — Fix response-parsing bug + harden own-PR retry (review-pr)

P1 — Batch thread resolution (resolve-review)

P2 — Add delays between mutating calls (all skills + server tools)

P3 — GraphQL multi-entity batching (analyze-prs, open-integration-pr)

P4 — Pre-fetch entity lists + cache labels (triage/process/enrich + claim_issue)

P5 — Conditional requests for REST GET polling (CI only)

Complete Affected Files List

Post-Implementation: CLAUDE.md Rules

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions