Skip to content

ENH: Optimize GitHub API usage for issues metadata backup #3

@mmcky

Description

@mmcky

Summary

The issues metadata backup feature (added in v0.3.0) works but has performance and rate limit concerns that need addressing before enabling in production.

Current Implementation

  • Uses PyGithub REST API
  • 1 API call per page of issues (30 issues/page)
  • 1 API call per issue to fetch comments
  • For a repo with 100 issues → ~100+ API calls

Test Results

Backing up QuantEcon.manual (29 issues):

  • Time: ~38 seconds (~1.3s per issue)
  • API calls: ~30
  • Output: 43 KB JSON file

Concerns for Full Org Backup

  • QuantEcon has ~100 active repos
  • If average 50 issues per repo = 5,000+ API calls
  • GitHub Actions GITHUB_TOKEN limit: 1,000 requests/hour
  • Could easily hit rate limits

Proposed Solutions

Option 1: GraphQL API (Recommended)

Use GitHub GraphQL API to fetch issues + comments in a single query per repo.

query {
  repository(owner: "QuantEcon", name: "quantecon-py") {
    issues(first: 100, states: [OPEN, CLOSED]) {
      nodes {
        number
        title
        body
        comments(first: 100) {
          nodes { author { login } body createdAt }
        }
      }
    }
  }
}

Benefits:

  • Single request per repo (with pagination)
  • Dramatically fewer API calls
  • Faster execution

Option 2: Add include_comments config option

backup_metadata:
  issues: true
  include_comments: false  # Skip comments, much faster

Option 3: Rate limit handling

Add retry logic with exponential backoff when rate limited.

Current Status

  • Feature implemented and tested ✅
  • Default disabled (issues: false) until optimized
  • Config includes warning comment about API usage

Related

  • Issues backup JSON schema is finalized and working
  • Markdown recovery utility planned for future

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions