Skip to content

Implement caching for GitHub API responses #11

@BekahHW

Description

@BekahHW

Description

RepoReady currently makes fresh API calls every time, even when evaluating the same repository multiple times. Implementing intelligent caching would improve performance, reduce API rate limit consumption, and provide faster feedback for users.

Current State

  • ❌ No caching mechanism for API responses
  • ❌ Repeated API calls for same repository
  • ❌ No cache invalidation strategy
  • ❌ High API rate limit consumption during testing/development
  • ✅ GitHub API responses are relatively stable
  • ✅ Repository data doesn't change frequently

Acceptance Criteria

Cache Implementation

  • Implement file-based cache for API responses
  • Cache repository metadata, file existence checks, and issues
  • Use cache keys based on repository and API endpoint
  • Implement cache expiration (TTL) mechanism
  • Provide cache statistics and management

Cache Management

  • CLI commands to clear cache (rr cache clear)
  • CLI commands to view cache status (rr cache status)
  • Automatic cache cleanup for expired entries
  • Configurable cache location and size limits
  • Respect cache headers from GitHub API

Performance & Reliability

  • Fallback to API if cache is corrupted or unavailable
  • Compress cached data to save disk space
  • Thread-safe cache operations
  • Cache miss/hit metrics for debugging

Implementation Suggestions

Cache Service Architecture

// src/utils/cache.ts
import fs from 'fs/promises';
import path from 'path';
import os from 'os';
import crypto from 'crypto';

export interface CacheEntry<T> {
  data: T;
  timestamp: number;
  ttl: number; // Time to live in milliseconds
  etag?: string; // GitHub API ETag for validation
}

export class GitHubApiCache {
  private cacheDir: string;
  private defaultTtl: number;
  private maxCacheSize: number;

  constructor(options: {
    cacheDir?: string;
    defaultTtl?: number;
    maxCacheSize?: number;
  } = {}) {
    this.cacheDir = options.cacheDir ?? path.join(os.homedir(), '.repoready-cache');
    this.defaultTtl = options.defaultTtl ?? 15 * 60 * 1000; // 15 minutes
    this.maxCacheSize = options.maxCacheSize ?? 100 * 1024 * 1024; // 100MB
    
    this.ensureCacheDir();
  }

  private async ensureCacheDir(): Promise<void> {
    try {
      await fs.mkdir(this.cacheDir, { recursive: true });
    } catch (error) {
      console.warn('Failed to create cache directory:', error);
    }
  }

  private generateCacheKey(endpoint: string, params: Record<string, any>): string {
    const key = `${endpoint}-${JSON.stringify(params, Object.keys(params).sort())}`;
    return crypto.createHash('sha256').update(key).digest('hex');
  }

  private getCacheFilePath(key: string): string {
    return path.join(this.cacheDir, `${key}.json`);
  }

  async get<T>(endpoint: string, params: Record<string, any>): Promise<CacheEntry<T> | null> {
    try {
      const cacheKey = this.generateCacheKey(endpoint, params);
      const filePath = this.getCacheFilePath(cacheKey);
      
      const data = await fs.readFile(filePath, 'utf8');
      const entry: CacheEntry<T> = JSON.parse(data);
      
      // Check if entry has expired
      if (Date.now() - entry.timestamp > entry.ttl) {
        await this.delete(endpoint, params);
        return null;
      }
      
      return entry;
    } catch (error) {
      // Cache miss or error reading cache
      return null;
    }
  }

  async set<T>(
    endpoint: string,
    params: Record<string, any>,
    data: T,
    ttl?: number,
    etag?: string
  ): Promise<void> {
    try {
      const cacheKey = this.generateCacheKey(endpoint, params);
      const filePath = this.getCacheFilePath(cacheKey);
      
      const entry: CacheEntry<T> = {
        data,
        timestamp: Date.now(),
        ttl: ttl ?? this.defaultTtl,
        etag
      };
      
      await fs.writeFile(filePath, JSON.stringify(entry));
      
      // Clean up old entries if cache is getting too large
      await this.cleanupIfNeeded();
    } catch (error) {
      console.warn('Failed to write to cache:', error);
    }
  }

  async delete(endpoint: string, params: Record<string, any>): Promise<void> {
    try {
      const cacheKey = this.generateCacheKey(endpoint, params);
      const filePath = this.getCacheFilePath(cacheKey);
      await fs.unlink(filePath);
    } catch (error) {
      // File doesn't exist, ignore
    }
  }

  async clear(): Promise<void> {
    try {
      const files = await fs.readdir(this.cacheDir);
      await Promise.all(
        files.filter(f => f.endsWith('.json')).map(f => 
          fs.unlink(path.join(this.cacheDir, f))
        )
      );
    } catch (error) {
      console.warn('Failed to clear cache:', error);
    }
  }

  async getStats(): Promise<{
    totalEntries: number;
    totalSize: number;
    oldestEntry: number;
    newestEntry: number;
  }> {
    try {
      const files = await fs.readdir(this.cacheDir);
      const jsonFiles = files.filter(f => f.endsWith('.json'));
      
      let totalSize = 0;
      let oldestEntry = Date.now();
      let newestEntry = 0;
      
      for (const file of jsonFiles) {
        const filePath = path.join(this.cacheDir, file);
        const stats = await fs.stat(filePath);
        totalSize += stats.size;
        
        const content = await fs.readFile(filePath, 'utf8');
        const entry = JSON.parse(content);
        
        oldestEntry = Math.min(oldestEntry, entry.timestamp);
        newestEntry = Math.max(newestEntry, entry.timestamp);
      }
      
      return {
        totalEntries: jsonFiles.length,
        totalSize,
        oldestEntry,
        newestEntry
      };
    } catch (error) {
      return { totalEntries: 0, totalSize: 0, oldestEntry: 0, newestEntry: 0 };
    }
  }

  private async cleanupIfNeeded(): Promise<void> {
    const stats = await this.getStats();
    
    if (stats.totalSize > this.maxCacheSize) {
      // Remove oldest entries until under limit
      const files = await fs.readdir(this.cacheDir);
      const fileStats = await Promise.all(
        files.filter(f => f.endsWith('.json')).map(async f => ({
          file: f,
          path: path.join(this.cacheDir, f),
          stats: await fs.stat(path.join(this.cacheDir, f))
        }))
      );
      
      // Sort by modification time (oldest first)
      fileStats.sort((a, b) => a.stats.mtime.getTime() - b.stats.mtime.getTime());
      
      let currentSize = stats.totalSize;
      for (const { path: filePath, stats: fileStats } of fileStats) {
        if (currentSize <= this.maxCacheSize * 0.8) break; // Keep 20% buffer
        
        await fs.unlink(filePath);
        currentSize -= fileStats.size;
      }
    }
  }
}

Enhanced GitHubService with Caching

// src/utils/github.ts - Add caching support
export class GitHubService {
  private octokit: Octokit;
  private cache: GitHubApiCache;
  private cacheEnabled: boolean;

  constructor(token?: string, options: { enableCache?: boolean } = {}) {
    this.octokit = new Octokit({ auth: token || process.env.GITHUB_TOKEN });
    this.cacheEnabled = options.enableCache ?? process.env.REPOREADY_CACHE !== 'false';
    this.cache = new GitHubApiCache();
  }

  private async cachedApiCall<T>(
    endpoint: string,
    params: Record<string, any>,
    apiCall: () => Promise<{ data: T; headers?: any }>
  ): Promise<T> {
    if (!this.cacheEnabled) {
      const result = await apiCall();
      return result.data;
    }

    // Check cache first
    const cached = await this.cache.get<T>(endpoint, params);
    if (cached) {
      console.debug(`📦 Cache hit for ${endpoint}`);
      return cached.data;
    }

    // Cache miss - make API call
    console.debug(`🌐 Cache miss for ${endpoint} - fetching from API`);
    const result = await apiCall();
    
    // Cache the result
    const ttl = this.getTtlForEndpoint(endpoint);
    await this.cache.set(endpoint, params, result.data, ttl, result.headers?.etag);
    
    return result.data;
  }

  private getTtlForEndpoint(endpoint: string): number {
    // Different TTL for different types of data
    if (endpoint.includes('repos/get')) return 10 * 60 * 1000; // 10 minutes for repo metadata
    if (endpoint.includes('contents')) return 30 * 60 * 1000; // 30 minutes for file checks
    if (endpoint.includes('issues')) return 5 * 60 * 1000; // 5 minutes for issues (more dynamic)
    return 15 * 60 * 1000; // 15 minutes default
  }

  async getRepositoryInfo(owner: string, repo: string): Promise<RepositoryInfo> {
    // Use cached API calls
    const repoData = await this.cachedApiCall(
      'repos/get',
      { owner, repo },
      () => this.octokit.rest.repos.get({ owner, repo })
    );

    const hasReadme = await this.checkFileExists(owner, repo, 'README.md');
    const hasContributing = await this.checkFileExists(owner, repo, 'CONTRIBUTING.md');
    // ... other checks using cached calls

    return {
      owner,
      repo,
      name: repoData.name,
      // ... rest of the implementation
    };
  }

  private async checkFileExists(owner: string, repo: string, path: string): Promise<boolean> {
    try {
      await this.cachedApiCall(
        'repos/contents',
        { owner, repo, path },
        () => this.octokit.rest.repos.getContent({ owner, repo, path })
      );
      return true;
    } catch (error) {
      return false;
    }
  }
}

Cache Management CLI Commands

// src/commands/cache.ts
import { Command } from 'commander';
import { GitHubApiCache } from '../utils/cache';
import chalk from 'chalk';

export function createCacheCommand(): Command {
  const command = new Command('cache');
  command.description('Manage RepoReady cache');

  command
    .command('status')
    .description('Show cache statistics')
    .action(async () => {
      const cache = new GitHubApiCache();
      const stats = await cache.getStats();
      
      console.log('\n📦 RepoReady Cache Status');
      console.log('─'.repeat(30));
      console.log(`Total entries: ${stats.totalEntries}`);
      console.log(`Total size: ${(stats.totalSize / 1024 / 1024).toFixed(2)} MB`);
      
      if (stats.totalEntries > 0) {
        console.log(`Oldest entry: ${new Date(stats.oldestEntry).toLocaleString()}`);
        console.log(`Newest entry: ${new Date(stats.newestEntry).toLocaleString()}`);
      }
    });

  command
    .command('clear')
    .description('Clear all cached data')
    .action(async () => {
      const cache = new GitHubApiCache();
      await cache.clear();
      console.log(chalk.green('✅ Cache cleared successfully'));
    });

  return command;
}

Files to Create/Modify

  • src/utils/cache.ts (new) - Cache implementation
  • src/utils/github.ts (modify) - Add caching support
  • src/commands/cache.ts (new) - Cache management commands
  • src/index.ts (modify) - Add cache command
  • README.md (modify) - Document caching behavior
  • package.json (modify) - Add cache-related dependencies if needed

Configuration Options

Environment Variables

REPOREADY_CACHE=true              # Enable/disable caching (default: true)
REPOREADY_CACHE_DIR=~/.rr-cache   # Custom cache directory
REPOREADY_CACHE_TTL=900000        # Default TTL in milliseconds (15 min)
REPOREADY_CACHE_SIZE=104857600    # Max cache size in bytes (100MB)

CLI Usage

# Check cache status
rr cache status

# Clear cache
rr cache clear

# Disable cache for single evaluation
REPOREADY_CACHE=false rr evaluate owner/repo

# Evaluate with fresh data (bypass cache)
rr evaluate owner/repo --no-cache

Benefits

  • ⚡ Faster repeated evaluations
  • 📉 Reduced API rate limit consumption
  • 🔄 Better development/testing experience
  • 💾 Intelligent storage management
  • 🎯 Configurable caching strategies
  • 📊 Cache performance insights

Implementation Phases

  1. Phase 1: Basic file-based cache with TTL
  2. Phase 2: Cache management commands
  3. Phase 3: Advanced features (compression, ETag validation)
  4. Phase 4: Performance metrics and optimization

Testing Considerations

  • Test cache hit/miss scenarios
  • Test cache expiration
  • Test cache cleanup
  • Test error handling when cache is corrupted
  • Test concurrent cache access

Resources

Estimated Effort

Medium to Hard - Requires careful design of cache invalidation and error handling.


Perfect for contributors interested in performance optimization and system design! ⚡

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions