Skip to content

anjy7/cf_ai_codebase-tutorial-generator

Repository files navigation

Github Codebase Tutorial Generator

A Cloudflare Worker that automatically generates comprehensive tutorials from GitHub repositories using AI. It crawls repositories, identifies key abstractions, analyzes relationships, and creates structured markdown tutorials with chapters, code examples, and diagrams. Screenshot 2025-10-13 031248 Note: A demo generated tutorial can be found in ./demo-tutorial

Quick Start

Prerequisites

  • For deployment only: Cloudflare account with Workers, R2, and D1
  • For local development: Can run entirely locally (uses Miniflare)
  • Gemini API key (required) OR Cloudflare AI (alternative)
  • GitHub token (optional, for private repos or to increase rate-limit)

Setup

  1. Clone and install:

    git clone <repo-url>
    cd cf-worker-101
    bun install
  2. Configure environment:

    # Create .dev.vars file
    echo "GEMINI_API_KEY=your_gemini_api_key" > .dev.vars
    echo "GITHUB_DEFAULT_TOKEN=your_github_token" >> .dev.vars  # optional

    Alternative: Use Cloudflare AI instead of Gemini

    • Uncomment AI binding in wrangler.jsonc
    • Uncomment AI: Ai; in src/worker/utils/types.ts
    • Uncomment Cloudflare AI code in src/worker/utils/utils.ts (lines 4-12)
  3. Setup Cloudflare resources:

    # Create R2 bucket
    wrangler r2 bucket create gh-crawl
    
    # Create D1 database
    wrangler d1 create ghcrawl
    
    # Create required tables
    wrangler d1 execute ghcrawl --command="
    CREATE TABLE sessions (
      id TEXT PRIMARY KEY,
      created_at TEXT NOT NULL,
      url TEXT NOT NULL,
      config_json TEXT NOT NULL,
      status TEXT NOT NULL
    );
    
    CREATE TABLE files (
      session_id TEXT NOT NULL,
      relpath TEXT NOT NULL,
      bytes INTEGER NOT NULL,
      mime TEXT,
      sha256 TEXT,
      included INTEGER NOT NULL,
      reason TEXT,
      r2_key_raw TEXT,
      r2_key_text TEXT,
      ref TEXT,
      source_url TEXT,
      PRIMARY KEY (session_id, relpath)
    );
    
    CREATE TABLE steps (
      session_id TEXT NOT NULL,
      step TEXT NOT NULL,
      status TEXT NOT NULL,
      attempt INTEGER NOT NULL,
      payload_ref TEXT,
      metrics_json TEXT,
      created_at TEXT NOT NULL DEFAULT (datetime('now')),
      PRIMARY KEY (session_id, step, attempt)
    );
    "
  4. Deploy:

    # Deploy worker
    wrangler deploy
    
    # Start local development
    bun run dev
  5. Run frontend:

    bun run dev:web

How It Works

1. Repository Crawling

  • Parses GitHub URLs (supports /tree and /blob paths)
  • Recursively crawls directories with configurable include/exclude patterns
  • Downloads files to R2 storage with deduplication via SHA256
  • Stores metadata in D1 database

2. AI-Powered Analysis

  • Identify Abstractions: AI analyzes code to identify key concepts, components, and patterns
  • Analyze Relationships: Maps dependencies and interactions between abstractions
  • Order Chapters: Determines optimal learning sequence based on dependencies
  • Write Chapters: Generates detailed tutorial content with code examples
  • Combine Tutorial: Creates final markdown with table of contents and Mermaid diagrams

3. Workflow Architecture

  • Single Worker: All steps run in one Cloudflare Worker for simplicity
  • Durable Execution: Uses Cloudflare Workflows for reliable multi-step processing
  • Caching: LLM responses cached in R2 to reduce API costs
  • Async Processing: Frontend gets immediate response, generation continues in background

4. Output Structure

tutorial/{sessionId}/
├── index.md          # Main tutorial with TOC and overview
├── 01_concept.md     # Chapter files
├── 02_implementation.md
└── ...

API Endpoints

  • GET /api/generate?url=<github-url> - Start tutorial generation
  • GET /api/tutorials - List completed tutorials
  • GET /api/tutorial/{sessionId}/{filename} - Access tutorial content

Configuration

  • Include patterns: *.py,*.js,*.ts,*.md (default: *.*)
  • Exclude patterns: **/tests/**,**/node_modules/**
  • Max file size: 1MB (configurable)
  • Language: English, Spanish, French, German, Chinese
  • Model: Gemini 2.5 Flash (configurable)

idea inspiration: @zachary62

About

Transform any GitHub repository into a comprehensive, AI-generated tutorial in minutes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages