Ask natural language questions about any GitHub repository and get intelligent, context-aware answers
- Overview
- Key Features
- Demo
- Tech Stack
- Architecture
- Prerequisites
- Installation
- Configuration
- Database Setup
- Usage Guide
- API Documentation
- Project Structure
- Troubleshooting
- Roadmap
- Contributing
- License
- Acknowledgments
CodeChat is an intelligent code repository analysis system that enables developers to interact with their GitHub repositories using natural language. Instead of manually searching through files and documentation, simply ask questions and get AI-powered answers with precise file references and code citations.
Built with cutting-edge vector similarity search and Google's Gemini AI, CodeChat indexes your entire repository, understands code context, and provides accurate answers backed by actual source code.
- 🔍 Semantic Code Search - Find relevant code using natural language, not just keywords
- 🤖 AI-Powered Answers - Get intelligent responses with file citations and code snippets
- ⚡ Fast & Efficient - Vector embeddings enable lightning-fast similarity search
- 📚 Complete Context - AI understands relationships between files and functions
- 🔒 Secure & Private - Your code stays secure with token-based authentication
- Secure user registration and login with JWT tokens
- Password hashing with bcrypt
- Session management and protected routes
- User-specific project isolation
- One-Click Indexing - Simply paste your GitHub repository URL
- Automatic File Discovery - Intelligently fetches and filters code files
- Language Detection - Supports multiple programming languages
- Public & Private Repos - Works with both public and private repositories (with token)
- AI Summarization - Each file gets an AI-generated summary using Gemini 2.5 Flash
- Vector Embeddings - Code converted to 768-dimensional vectors for semantic search
- pgvector Integration - PostgreSQL extension for efficient similarity search
- Contextual Understanding - AI comprehends code structure and relationships
- Ask Anything - Query your codebase in plain English
- Smart Answers - AI generates context-aware responses with explanations
- File Citations - Every answer includes relevant file references
- Similarity Scores - See how relevant each file is to your question
- Code Snippets - View actual code excerpts that answer your question
- Multiple Projects - Index and manage multiple repositories
- Status Tracking - Real-time indexing progress monitoring
- Project Dashboard - View file counts, languages, and indexing status
- Easy Deletion - Remove projects and associated data with one click
- Conversation Memory - Access all past questions and answers
- Export Capabilities - Save important Q&A sessions
- Search History - Find previous questions quickly
- Delete Questions - Remove unwanted history items
| Technology | Purpose |
|---|---|
| Node.js | JavaScript runtime environment |
| Express.js | Web application framework |
| PostgreSQL | Primary database with advanced features |
| pgvector | Vector similarity search extension |
| Supabase | Backend-as-a-Service for database and auth |
| Google Gemini API | AI for summarization, embeddings, and Q&A |
| JWT | Secure token-based authentication |
| Bcrypt | Password hashing and security |
| Axios | HTTP client for API requests |
| Express Rate Limit | API rate limiting and abuse prevention |
| Technology | Purpose |
|---|---|
| React.js | UI library for building interactive interfaces |
| React Router DOM | Client-side routing and navigation |
| Tailwind CSS | Utility-first CSS framework |
| Context API | Global state management |
| Axios | HTTP client for backend communication |
- Google Gemini 2.5 Flash - Code understanding and question answering
- text-embedding-004 - High-quality text embeddings (768 dimensions)
- Vector Similarity Search - Cosine similarity for semantic matching
- User Authentication - JWT tokens secure all API requests
- Repository Indexing:
- Fetch repository structure from GitHub API
- Filter relevant code files (
.js,.py,.java, etc.) - Generate AI summaries for each file using Gemini
- Create vector embeddings using text-embedding-004
- Store in PostgreSQL with pgvector
- Question Answering:
- User asks question in natural language
- Question converted to vector embedding
- pgvector performs similarity search to find relevant files
- Top matching files sent to Gemini with question
- AI generates contextual answer with file citations
- History Management - All Q&As stored for future reference
Before you begin, ensure you have the following installed and configured:
- Node.js 16.x or higher (Download)
- PostgreSQL 14.x or higher with pgvector extension
- Git for version control
- npm or yarn package manager
- Supabase Account - Sign up free
- Google Gemini API Key - Get API key
- GitHub Personal Access Token (optional, for private repos) - Generate token
git clone https://github.com/yourusername/codechat.git
cd codechatcd backend
npm installInstall Dependencies:
npm install express pg @supabase/supabase-js @google/generative-ai bcryptjs jsonwebtoken axios dotenv express-rate-limit corscd ../frontend
npm installInstall Dependencies:
npm install react react-dom react-router-dom axios tailwindcssCreate a .env file in the backend directory:
# Server Configuration
PORT=5000
NODE_ENV=development
# Supabase Configuration
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-supabase-anon-key
SUPABASE_SERVICE_KEY=your-supabase-service-role-key
# Database Configuration (from Supabase)
DATABASE_URL=postgresql://postgres:[password]@db.[project-ref].supabase.co:5432/postgres
# JWT Configuration
JWT_SECRET=your-super-secret-jwt-key-min-32-characters
JWT_EXPIRE=7d
# Google Gemini API
GEMINI_API_KEY=your-gemini-api-key-here
# GitHub Configuration (Optional - for private repos)
GITHUB_TOKEN=ghp_your-personal-access-token
# Rate Limiting
RATE_LIMIT_WINDOW_MS=900000
RATE_LIMIT_MAX_REQUESTS=100
# Frontend URL (for CORS)
FRONTEND_URL=http://localhost:3000Create a .env file in the frontend directory:
# API Configuration
REACT_APP_API_URL=http://localhost:5000/api
# App Configuration
REACT_APP_NAME=CodeChat
REACT_APP_VERSION=1.0.0-
Create a Supabase Project
- Go to Supabase Dashboard
- Click "New Project"
- Note your project URL and API keys
-
Enable pgvector Extension
In the Supabase SQL Editor, run:
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;- Create Database Schema
-- Users table
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- Projects table
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
repo_owner VARCHAR(255) NOT NULL,
repo_name VARCHAR(255) NOT NULL,
github_url TEXT NOT NULL,
status VARCHAR(50) DEFAULT 'pending',
file_count INTEGER DEFAULT 0,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- Source code embeddings table
CREATE TABLE source_code_embeddings (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
file_path TEXT NOT NULL,
source_code TEXT NOT NULL,
summary TEXT,
embedding vector(768),
language VARCHAR(50),
file_size INTEGER,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- Questions table
CREATE TABLE questions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
question TEXT NOT NULL,
answer TEXT NOT NULL,
file_references JSONB,
query_embedding vector(768),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- Create indexes for better performance
CREATE INDEX idx_projects_user_id ON projects(user_id);
CREATE INDEX idx_embeddings_project_id ON source_code_embeddings(project_id);
CREATE INDEX idx_questions_project_id ON questions(project_id);
CREATE INDEX idx_questions_user_id ON questions(user_id);
-- Create vector similarity search index
CREATE INDEX idx_embeddings_vector ON source_code_embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
CREATE INDEX idx_questions_vector ON questions
USING ivfflat (query_embedding vector_cosine_ops)
WITH (lists = 100);- Set Row Level Security (Optional)
-- Enable RLS
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE source_code_embeddings ENABLE ROW LEVEL SECURITY;
ALTER TABLE questions ENABLE ROW LEVEL SECURITY;
-- Create policies (examples)
CREATE POLICY "Users can view own data" ON users
FOR SELECT USING (auth.uid() = id);
CREATE POLICY "Users can view own projects" ON projects
FOR SELECT USING (auth.uid() = user_id);- Install PostgreSQL and pgvector
# macOS
brew install postgresql pgvector
# Ubuntu/Debian
sudo apt-get install postgresql postgresql-contrib- Enable pgvector
CREATE EXTENSION vector;- Run the same schema SQL from above
1. Start Backend Server
cd backend
npm run devServer will start at http://localhost:5000
2. Start Frontend Development Server
cd frontend
npm startApplication will open at http://localhost:3000
- Navigate to the registration page
- Create an account with email and password
- Login with your credentials
- Click "New Project" on the dashboard
- Enter a project name
- Paste the GitHub repository URL (e.g.,
https://github.com/facebook/react) - Click "Create Project"
- Click "Start Indexing" on your project
- Wait for the indexing process to complete
- Files are fetched from GitHub
- AI generates summaries
- Vector embeddings are created
- Monitor the status indicator
- Open the indexed project
- Type your question in natural language:
- "How does authentication work in this project?"
- "Where is the database connection established?"
- "Explain the routing logic"
- "What libraries are used for state management?"
- Review the AI-generated answer with file citations
- Explore the referenced files and code snippets
- Navigate to "Question History"
- Browse all past questions and answers
- Delete questions you no longer need
✅ "How is user authentication implemented?"
✅ "What API endpoints are available?"
✅ "Where is the database schema defined?"
✅ "Explain how the vector search works"
✅ "What dependencies does this project use?"
✅ "How are errors handled in the API?"
✅ "Where is the configuration loaded?"
http://localhost:5000/api
POST /api/auth/register
Content-Type: application/json
{
"email": "user@example.com",
"password": "SecurePassword123"
}Response:
{
"success": true,
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"user": {
"id": "uuid",
"email": "user@example.com"
}
}POST /api/auth/login
Content-Type: application/json
{
"email": "user@example.com",
"password": "SecurePassword123"
}GET /api/projects
Authorization: Bearer <token>Response:
{
"success": true,
"projects": [
{
"id": "uuid",
"name": "React Project",
"repo_owner": "facebook",
"repo_name": "react",
"github_url": "https://github.com/facebook/react",
"status": "completed",
"file_count": 245,
"created_at": "2024-01-15T10:30:00Z"
}
]
}POST /api/projects
Authorization: Bearer <token>
Content-Type: application/json
{
"name": "My Project",
"githubUrl": "https://github.com/username/repo"
}GET /api/projects/:id
Authorization: Bearer <token>POST /api/projects/:id/process
Authorization: Bearer <token>GET /api/projects/:id/status
Authorization: Bearer <token>Response:
{
"status": "processing",
"progress": {
"processed": 120,
"total": 245,
"percentage": 48.98
}
}DELETE /api/projects/:id
Authorization: Bearer <token>POST /api/questions/:projectId/ask
Authorization: Bearer <token>
Content-Type: application/json
{
"question": "How does authentication work?"
}Response:
{
"success": true,
"answer": "Authentication is implemented using JWT tokens...",
"fileReferences": [
{
"filePath": "src/auth/middleware.js",
"summary": "JWT authentication middleware",
"similarity": 0.87,
"codeSnippet": "const verifyToken = (req, res, next) => {...}"
}
]
}GET /api/questions/:projectId/history
Authorization: Bearer <token>DELETE /api/questions/:questionId
Authorization: Bearer <token>- Authentication endpoints: 5 requests per 15 minutes
- Project endpoints: 50 requests per 15 minutes
- Question endpoints: 20 requests per 15 minutes
codechat/
├── backend/
│ ├── config/
│ │ ├── database.js # Database connection
│ │ └── gemini.js # Gemini AI configuration
│ ├── middleware/
│ │ ├── auth.js # JWT authentication
│ │ └── rateLimiter.js # Rate limiting
│ ├── routes/
│ │ ├── auth.js # Authentication routes
│ │ ├── projects.js # Project management
│ │ └── questions.js # Q&A routes
│ ├── services/
│ │ ├── githubService.js # GitHub API integration
│ │ ├── embeddingService.js # Vector embedding generation
│ │ ├── indexingService.js # Repository indexing
│ │ └── qaService.js # Question answering
│ ├── utils/
│ │ ├── vectorSearch.js # pgvector similarity search
│ │ └── fileParser.js # Code file parsing
│ ├── .env # Environment variables
│ ├── server.js # Express server entry
│ └── package.json
│
├── frontend/
│ ├── public/
│ │ └── index.html
│ ├── src/
│ │ ├── components/
│ │ │ ├── Auth/
│ │ │ │ ├── Login.jsx
│ │ │ │ └── Register.jsx
│ │ │ ├── Dashboard/
│ │ │ │ ├── ProjectList.jsx
│ │ │ │ └── ProjectCard.jsx
│ │ │ ├── Project/
│ │ │ │ ├── ProjectDetails.jsx
│ │ │ │ ├── IndexingStatus.jsx
│ │ │ │ └── QAInterface.jsx
│ │ │ └── History/
│ │ │ └── QuestionHistory.jsx
│ │ ├── context/
│ │ │ └── AuthContext.jsx # Authentication context
│ │ ├── services/
│ │ │ └── api.js # Axios API client
│ │ ├── App.jsx # Main app component
│ │ ├── index.js # Entry point
│ │ └── index.css # Tailwind styles
│ ├── .env
│ ├── tailwind.config.js
│ └── package.json
│
└── README.md
Error: Connection refused to PostgreSQL
Solution:
- Verify DATABASE_URL in
.env - Check Supabase project is active
- Ensure network connectivity
Error: extension "vector" does not exist
Solution:
-- Run in Supabase SQL Editor
CREATE EXTENSION IF NOT EXISTS vector;Error: 429 Too Many Requests
Solution:
- Reduce concurrent indexing operations
- Implement request queuing
- Consider upgrading Gemini API tier
Error: Request timeout during indexing
Solution:
- Process files in smaller batches
- Increase timeout limits in axios config
- Filter out non-essential files
Error: Access-Control-Allow-Origin blocked
Solution:
- Verify FRONTEND_URL in backend
.env - Check CORS middleware configuration
- Ensure correct API URL in frontend
.env
Error: Token expired or invalid
Solution:
- User needs to login again
- Implement token refresh mechanism
- Check JWT_EXPIRE setting
Enable detailed logging:
# Add to backend .env
DEBUG=true
LOG_LEVEL=verbose- Support for more programming languages
- Batch question asking (multiple questions at once)
- Export Q&A sessions to PDF/Markdown
- Code snippet highlighting in answers
- Project sharing with team members
- Real-time collaboration on projects
- Integration with GitLab and Bitbucket
- Custom AI model fine-tuning
- Advanced filtering and search options
- Mobile application (React Native)
- Code generation based on Q&A context
- Automatic documentation generation
- Integration with IDE plugins (VS Code, IntelliJ)
- Multi-repository project support
- Advanced analytics and insights dashboard
- Voice-to-text question input
- Diagram generation from code explanations
- Integration with Slack/Discord bots
- Webhook support for CI/CD pipelines
We welcome contributions from the community! CodeChat is open-source and thrives on collaboration.
-
Fork the Repository
git clone https://github.com/yourusername/codechat.git
-
Create a Feature Branch
git checkout -b feature/amazing-feature
-
Make Your Changes
- Write clean, documented code
- Follow existing code style
- Add tests if applicable
-
Commit Your Changes
git commit -m "Add amazing feature" -
Push to Your Fork
git push origin feature/amazing-feature
-
Open a Pull Request
- Describe your changes clearly
- Reference any related issues
- Wait for review and feedback
- Code Style: Follow ESLint and Prettier configurations
- Commits: Use conventional commit messages
- Testing: Add tests for new features
- Documentation: Update README and inline comments
- Issues: Check existing issues before creating new ones
# Install development dependencies
npm install --include=dev
# Run tests
npm test
# Run linting
npm run lint
# Format code
npm run format- 🐛 Bug fixes and error handling
- ✨ New features and enhancements
- 📝 Documentation improvements
- 🎨 UI/UX enhancements
- 🧪 Test coverage expansion
- 🌐 Internationalization (i18n)
This project is licensed under the MIT License.
MIT License
Copyright (c) 2024 CodeChat
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
See LICENSE file for details.
CodeChat is built on the shoulders of giants. We'd like to thank:
- Google Gemini - Powerful AI models for code understanding
- Supabase - Backend infrastructure and database
- pgvector - Vector similarity search for PostgreSQL
- React - UI library for building the frontend
- Tailwind CSS - Utility-first CSS framework
- GitHub Copilot - AI-powered code assistance
- Phind - AI search for developers
- Sourcegraph - Code intelligence platform
Thank you to all our contributors who help make CodeChat better!
Built with ❤️ by developers, for developers



