Transform any GitHub repository into an interactive, gamified learning experience with AI-powered analysis and intelligent code assistance.
Features • Quick Start • RAG Chat • Architecture • Contributing
Repo Reader is an innovative tool that makes understanding complex codebases fun and engaging through gamified learning walkthroughs. It analyzes any GitHub repository and creates structured learning experiences with achievements, progress tracking, and interactive quizzes. Enhanced with TiDB vector search for intelligent code assistance:
- 🎯 Gamified Learning: Transform repositories into interactive learning quests with achievements, levels, and progress tracking
- 🤖 AI-Powered Analysis: Deep code understanding using advanced language models
- 📚 Learning Path Generation: Creates progressive, difficulty-appropriate walkthroughs tailored to your experience level
- 🏆 Achievement System: Unlock badges and earn points as you master different aspects of the codebase
- 🎮 Interactive Quizzes: Test your understanding with dynamically generated questions
- 💬 RAG Chat Interface: Ask questions about any codebase using TiDB vector search (NEW!)
- ⚡ Smart Code Assistant: Get instant answers with relevant source code citations
- Repository Analysis: Clones and analyzes GitHub repositories automatically
- Gamified Walkthroughs: Structured learning modules with difficulty progression
- Achievement System: Unlock badges for completing learning milestones
- Progress Tracking: Visual progress bars and completion statistics
- Interactive Quizzes: Codebase-specific questions that test real understanding
- Multi-Level Learning: Beginner, Intermediate, and Advanced learning paths
- Code Structure Detection: Identifies frameworks, languages, and architectural patterns
- Learning Path Generation: Creates progressive, difficulty-appropriate walkthroughs
- Hybrid LLM Support: Local models with OpenAI fallback for analysis
- TiDB Vector Indexing: Automatically indexes code chunks for semantic search
- Intelligent Q&A: Chat interface for asking questions about the codebase
- Source Citations: Answers include relevant code snippets and file references
- Hybrid Search: Combines vector similarity and full-text search
- Hybrid LLM Support: Local models with OpenAI fallback
- Context-Aware Analysis: Understanding of project structure and dependencies
- Multi-Step RAG Workflow:
- Ingest & Index: Code chunks embedded into TiDB vector store
- Semantic Search: Vector + full-text hybrid search
- LLM Generation: Context-enhanced responses
- Chat Interface: Interactive code assistance
- Achievement System: Unlock badges for learning milestones
- Progress Visualization: Interactive progress bars and completion tracking
- Level System: Advance through difficulty levels as you learn
- Social Features: Share progress and achievements
- Node.js 18+ and npm/pnpm
- Python 3.13+
- Git
- TiDB Serverless account (free tier available)
- OpenAI API key (optional: for cloud AI features)
git clone <[repository-url](https://github.com/Dprof-in-tech/repo-reader)>
cd repo-reader
npm install
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your API keys and TiDB credentials
Required Environment Variables:
# TiDB Serverless Configuration
TIDB_HOST=gateway01.your-region.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USER=your_username
TIDB_PASSWORD=your_password
TIDB_DATABASE=your_database_name
# OpenAI API Key
OPENAI_API_KEY=your_openai_api_key_here
# Optional: Local Model Support
CHATGPT_OSS_MODEL_PATH=openai/gpt-oss-20b
LOCAL_MODEL_PREFER=true
npm run dev
After you run npm run dev, it would take a couple of seconds to actually initialize as this project is using a local embedding model so there would be a first time initial download of this model (it is very lightweight) and after that it would be cached. Dont start using the frontend until you see this message

This starts:
- Frontend: http://localhost:3000 (Next.js)
- API: http://localhost:5328 (Flask)
- Open http://localhost:3000
- Enter a GitHub repository URL
- Select your experience level (Beginner/Intermediate/Advanced)
- Watch the AI analyze the repository and create your personalized learning path
- Follow the gamified walkthrough with achievements and quizzes
- NEW: Use the chat widget to ask questions about the code using RAG!
The latest enhancement adds intelligent code assistance using TiDB Serverless vector search:
- Repository Indexing: Code is automatically chunked and embedded into TiDB vector store
- Smart Search: When you ask questions, the system finds relevant code using hybrid search
- Context-Aware Answers: AI generates responses using the most relevant code snippets
- Source Citations: Every answer includes references to specific files and line numbers
- VECTOR(384) columns for semantic embeddings using all-MiniLM-L6-v2
- Hybrid search combining vector similarity and full-text search
- Connection pooling and batch processing for optimal performance
- Intelligent caching for faster response times
- Interactive chat widget appears after repository analysis
- Real-time responses with streaming support
- Source code citations with file paths and line numbers
- Multi-turn conversations that remember context
- Mobile-responsive design with proper text wrapping
- Frontend: Next.js 14, React, TypeScript, Tailwind CSS
- Backend: Flask, Python 3.13
- AI/ML: LangChain, LangGraph, OpenAI, HuggingFace Transformers
- Database: TiDB Serverless with vector search
- Deployment: Vercel (Frontend), Vercel Functions (Backend)
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Next.js App │───▶│ Flask API │───▶│ LangGraph │
│ (Frontend) │ │ (Backend) │ │ Agent │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌─────────────────┐
│ TiDB Serverless │ │ AI Models │
│ (Vector Store) │ │ (Analysis) │
└──────────────────┘ └─────────────────┘
- Orchestrates the entire analysis workflow
- Manages state between analysis steps
- Handles streaming progress updates
- NEW: Includes automatic TiDB indexing step
- RepoReaderTool: Clones and extracts repository data
- CodeIndexerTool: Embeds and indexes code into TiDB
- CodeAnalyzerTool: Performs static and AI-powered code analysis
- RAGQueryTool: Handles question answering with context retrieval
- WalkthroughGeneratorTool: Creates gamified learning experiences
- SQLAlchemy ORM with VectorType columns
- Automatic embedding generation with caching
- Hybrid search (vector + full-text)
- Connection pooling for performance
- Interactive chat widget with proper text wrapping
- Real-time question answering
- Source code citations
- Mobile-responsive design
import { analyzeRepository } from '@/lib/api'
const result = await analyzeRepository(
'https://github.com/user/repo',
'intermediate'
)
console.log(result.walkthrough_data.learning_modules)
import { askCodeQuestion } from '@/lib/api'
const answer = await askCodeQuestion(
'How does the authentication system work?',
'user/repo',
'intermediate'
)
console.log(answer.answer)
console.log(answer.sources) // Source code citations
import { searchCode } from '@/lib/api'
const results = await searchCode(
'database connection',
'user/repo',
'hybrid', // vector + fulltext
5
)
- Connect your GitHub repository
- Set environment variables in Vercel dashboard (including TiDB credentials)
- Deploy automatically
TIDB_HOST=your-tidb-host
TIDB_USER=your-tidb-user
TIDB_PASSWORD=your-tidb-password
TIDB_DATABASE=your-database
OPENAI_API_KEY=your-openai-key
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Set up TiDB Serverless account
- Configure environment variables
- Make your changes
- Add tests if applicable
- Submit a pull request
- Use TypeScript for frontend code
- Follow Python PEP 8 for backend code
- Add JSDoc/docstring comments for public functions
- Use descriptive commit messages
# Run frontend tests
npm test
# Run backend tests
cd api && python -m pytest
# Type checking
npm run type-check
TiDB connection fails:
- Check TiDB Serverless credentials in
.env
- Ensure database exists and is accessible
- Verify network connectivity
Chat widget not working:
- Ensure repository is analyzed and indexed first
- Check browser console for API errors
- Verify OpenAI API key is working
Performance issues:
- Check TiDB connection pool settings
- Monitor embedding cache performance
- Consider upgrading to TiDB Cloud Dedicated
- Use SSD storage for better I/O performance
- Enable connection pooling (already configured)
- Monitor TiDB query performance
- Use embedding caching for repeated queries
Check system status:
# TiDB vector search status
curl http://localhost:5328/api/llm-status
# Agent workflow status
curl http://localhost:5328/api/status
# Health check
curl http://localhost:5328/api/health
MIT License - see LICENSE file for details.
Primary Goal: Transform complex codebases into engaging, gamified learning experiences that help developers understand and master new technologies through interactive walkthroughs, achievements, and personalized learning paths.
Latest Enhancement: Added TiDB Serverless integration for intelligent code assistance, enabling users to ask questions and get contextual answers while exploring repositories.
- TiDB Cloud: For serverless vector database capabilities
- LangChain & LangGraph: For the AI orchestration framework
- OpenAI: For GPT models and embeddings API
- HuggingFace: For open-source model hosting
- Vercel: For hosting and deployment platform
Made with ❤️ for developers who want to understand code better