A Cloudflare Worker that automatically generates comprehensive tutorials from GitHub repositories using AI. It crawls repositories, identifies key abstractions, analyzes relationships, and creates structured markdown tutorials with chapters, code examples, and diagrams.
Note:
A demo generated tutorial can be found in ./demo-tutorial
- For deployment only: Cloudflare account with Workers, R2, and D1
- For local development: Can run entirely locally (uses Miniflare)
- Gemini API key (required) OR Cloudflare AI (alternative)
- GitHub token (optional, for private repos or to increase rate-limit)
-
Clone and install:
git clone <repo-url> cd cf-worker-101 bun install
-
Configure environment:
# Create .dev.vars file echo "GEMINI_API_KEY=your_gemini_api_key" > .dev.vars echo "GITHUB_DEFAULT_TOKEN=your_github_token" >> .dev.vars # optional
Alternative: Use Cloudflare AI instead of Gemini
- Uncomment AI binding in
wrangler.jsonc
- Uncomment
AI: Ai;
insrc/worker/utils/types.ts
- Uncomment Cloudflare AI code in
src/worker/utils/utils.ts
(lines 4-12)
- Uncomment AI binding in
-
Setup Cloudflare resources:
# Create R2 bucket wrangler r2 bucket create gh-crawl # Create D1 database wrangler d1 create ghcrawl # Create required tables wrangler d1 execute ghcrawl --command=" CREATE TABLE sessions ( id TEXT PRIMARY KEY, created_at TEXT NOT NULL, url TEXT NOT NULL, config_json TEXT NOT NULL, status TEXT NOT NULL ); CREATE TABLE files ( session_id TEXT NOT NULL, relpath TEXT NOT NULL, bytes INTEGER NOT NULL, mime TEXT, sha256 TEXT, included INTEGER NOT NULL, reason TEXT, r2_key_raw TEXT, r2_key_text TEXT, ref TEXT, source_url TEXT, PRIMARY KEY (session_id, relpath) ); CREATE TABLE steps ( session_id TEXT NOT NULL, step TEXT NOT NULL, status TEXT NOT NULL, attempt INTEGER NOT NULL, payload_ref TEXT, metrics_json TEXT, created_at TEXT NOT NULL DEFAULT (datetime('now')), PRIMARY KEY (session_id, step, attempt) ); "
-
Deploy:
# Deploy worker wrangler deploy # Start local development bun run dev
-
Run frontend:
bun run dev:web
- Parses GitHub URLs (supports
/tree
and/blob
paths) - Recursively crawls directories with configurable include/exclude patterns
- Downloads files to R2 storage with deduplication via SHA256
- Stores metadata in D1 database
- Identify Abstractions: AI analyzes code to identify key concepts, components, and patterns
- Analyze Relationships: Maps dependencies and interactions between abstractions
- Order Chapters: Determines optimal learning sequence based on dependencies
- Write Chapters: Generates detailed tutorial content with code examples
- Combine Tutorial: Creates final markdown with table of contents and Mermaid diagrams
- Single Worker: All steps run in one Cloudflare Worker for simplicity
- Durable Execution: Uses Cloudflare Workflows for reliable multi-step processing
- Caching: LLM responses cached in R2 to reduce API costs
- Async Processing: Frontend gets immediate response, generation continues in background
tutorial/{sessionId}/
├── index.md # Main tutorial with TOC and overview
├── 01_concept.md # Chapter files
├── 02_implementation.md
└── ...
GET /api/generate?url=<github-url>
- Start tutorial generationGET /api/tutorials
- List completed tutorialsGET /api/tutorial/{sessionId}/{filename}
- Access tutorial content
- Include patterns:
*.py,*.js,*.ts,*.md
(default:*.*
) - Exclude patterns:
**/tests/**,**/node_modules/**
- Max file size: 1MB (configurable)
- Language: English, Spanish, French, German, Chinese
- Model: Gemini 2.5 Flash (configurable)