-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
Summary
The issues metadata backup feature (added in v0.3.0) works but has performance and rate limit concerns that need addressing before enabling in production.
Current Implementation
- Uses PyGithub REST API
- 1 API call per page of issues (30 issues/page)
- 1 API call per issue to fetch comments
- For a repo with 100 issues → ~100+ API calls
Test Results
Backing up QuantEcon.manual (29 issues):
- Time: ~38 seconds (~1.3s per issue)
- API calls: ~30
- Output: 43 KB JSON file
Concerns for Full Org Backup
- QuantEcon has ~100 active repos
- If average 50 issues per repo = 5,000+ API calls
- GitHub Actions GITHUB_TOKEN limit: 1,000 requests/hour
- Could easily hit rate limits
Proposed Solutions
Option 1: GraphQL API (Recommended)
Use GitHub GraphQL API to fetch issues + comments in a single query per repo.
query {
repository(owner: "QuantEcon", name: "quantecon-py") {
issues(first: 100, states: [OPEN, CLOSED]) {
nodes {
number
title
body
comments(first: 100) {
nodes { author { login } body createdAt }
}
}
}
}
}Benefits:
- Single request per repo (with pagination)
- Dramatically fewer API calls
- Faster execution
Option 2: Add include_comments config option
backup_metadata:
issues: true
include_comments: false # Skip comments, much fasterOption 3: Rate limit handling
Add retry logic with exponential backoff when rate limited.
Current Status
- Feature implemented and tested ✅
- Default disabled (
issues: false) until optimized - Config includes warning comment about API usage
Related
- Issues backup JSON schema is finalized and working
- Markdown recovery utility planned for future