💬 CodeChat - AI-Powered Code Repository Q&A

Ask natural language questions about any GitHub repository and get intelligent, context-aware answers

📖 Table of Contents

Overview
Key Features
Demo
Tech Stack
Architecture
Prerequisites
Installation
Configuration
Database Setup
Usage Guide
API Documentation
Project Structure
Troubleshooting
Roadmap
Contributing
License
Acknowledgments

🌟 Overview

CodeChat is an intelligent code repository analysis system that enables developers to interact with their GitHub repositories using natural language. Instead of manually searching through files and documentation, simply ask questions and get AI-powered answers with precise file references and code citations.

Built with cutting-edge vector similarity search and Google's Gemini AI, CodeChat indexes your entire repository, understands code context, and provides accurate answers backed by actual source code.

Why CodeChat?

🔍 Semantic Code Search - Find relevant code using natural language, not just keywords
🤖 AI-Powered Answers - Get intelligent responses with file citations and code snippets
⚡ Fast & Efficient - Vector embeddings enable lightning-fast similarity search
📚 Complete Context - AI understands relationships between files and functions
🔒 Secure & Private - Your code stays secure with token-based authentication

✨ Key Features

🔐 Authentication & User Management

Secure user registration and login with JWT tokens
Password hashing with bcrypt
Session management and protected routes
User-specific project isolation

📦 GitHub Repository Integration

One-Click Indexing - Simply paste your GitHub repository URL
Automatic File Discovery - Intelligently fetches and filters code files
Language Detection - Supports multiple programming languages
Public & Private Repos - Works with both public and private repositories (with token)

🧠 Intelligent Code Analysis

AI Summarization - Each file gets an AI-generated summary using Gemini 2.5 Flash
Vector Embeddings - Code converted to 768-dimensional vectors for semantic search
pgvector Integration - PostgreSQL extension for efficient similarity search
Contextual Understanding - AI comprehends code structure and relationships

💬 Natural Language Q&A

Ask Anything - Query your codebase in plain English
Smart Answers - AI generates context-aware responses with explanations
File Citations - Every answer includes relevant file references
Similarity Scores - See how relevant each file is to your question
Code Snippets - View actual code excerpts that answer your question

📊 Project Management

Multiple Projects - Index and manage multiple repositories
Status Tracking - Real-time indexing progress monitoring
Project Dashboard - View file counts, languages, and indexing status
Easy Deletion - Remove projects and associated data with one click

📜 Question History

Conversation Memory - Access all past questions and answers
Export Capabilities - Save important Q&A sessions
Search History - Find previous questions quickly
Delete Questions - Remove unwanted history items

🎬 Demo

Chat Box

Repository Indexing

🛠️ Tech Stack

Backend

Technology	Purpose
Node.js	JavaScript runtime environment
Express.js	Web application framework
PostgreSQL	Primary database with advanced features
pgvector	Vector similarity search extension
Supabase	Backend-as-a-Service for database and auth
Google Gemini API	AI for summarization, embeddings, and Q&A
JWT	Secure token-based authentication
Bcrypt	Password hashing and security
Axios	HTTP client for API requests
Express Rate Limit	API rate limiting and abuse prevention

Frontend

Technology	Purpose
React.js	UI library for building interactive interfaces
React Router DOM	Client-side routing and navigation
Tailwind CSS	Utility-first CSS framework
Context API	Global state management
Axios	HTTP client for backend communication

AI & Machine Learning

Google Gemini 2.5 Flash - Code understanding and question answering
text-embedding-004 - High-quality text embeddings (768 dimensions)
Vector Similarity Search - Cosine similarity for semantic matching

🏗️ Architecture

Data Flow

User Authentication - JWT tokens secure all API requests
Repository Indexing:
- Fetch repository structure from GitHub API
- Filter relevant code files (.js, .py, .java, etc.)
- Generate AI summaries for each file using Gemini
- Create vector embeddings using text-embedding-004
- Store in PostgreSQL with pgvector
Question Answering:
- User asks question in natural language
- Question converted to vector embedding
- pgvector performs similarity search to find relevant files
- Top matching files sent to Gemini with question
- AI generates contextual answer with file citations
History Management - All Q&As stored for future reference

📋 Prerequisites

Before you begin, ensure you have the following installed and configured:

Node.js 16.x or higher (Download)
PostgreSQL 14.x or higher with pgvector extension
Git for version control
npm or yarn package manager

Required Accounts & API Keys

Supabase Account - Sign up free
Google Gemini API Key - Get API key
GitHub Personal Access Token (optional, for private repos) - Generate token

🚀 Installation

1. Clone the Repository

git clone https://github.com/yourusername/codechat.git
cd codechat

2. Backend Setup

cd backend
npm install

Install Dependencies:

npm install express pg @supabase/supabase-js @google/generative-ai bcryptjs jsonwebtoken axios dotenv express-rate-limit cors

3. Frontend Setup

cd ../frontend
npm install

Install Dependencies:

npm install react react-dom react-router-dom axios tailwindcss

⚙️ Configuration

Backend Environment Variables

Create a .env file in the backend directory:

# Server Configuration
PORT=5000
NODE_ENV=development

# Supabase Configuration
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-supabase-anon-key
SUPABASE_SERVICE_KEY=your-supabase-service-role-key

# Database Configuration (from Supabase)
DATABASE_URL=postgresql://postgres:[password]@db.[project-ref].supabase.co:5432/postgres

# JWT Configuration
JWT_SECRET=your-super-secret-jwt-key-min-32-characters
JWT_EXPIRE=7d

# Google Gemini API
GEMINI_API_KEY=your-gemini-api-key-here

# GitHub Configuration (Optional - for private repos)
GITHUB_TOKEN=ghp_your-personal-access-token

# Rate Limiting
RATE_LIMIT_WINDOW_MS=900000
RATE_LIMIT_MAX_REQUESTS=100

# Frontend URL (for CORS)
FRONTEND_URL=http://localhost:3000

Frontend Environment Variables

Create a .env file in the frontend directory:

# API Configuration
REACT_APP_API_URL=http://localhost:5000/api

# App Configuration
REACT_APP_NAME=CodeChat
REACT_APP_VERSION=1.0.0

🗄️ Database Setup

Option 1: Supabase (Recommended)

Create a Supabase Project
- Go to Supabase Dashboard
- Click "New Project"
- Note your project URL and API keys
Enable pgvector Extension

In the Supabase SQL Editor, run:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

Create Database Schema

-- Users table
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Projects table
CREATE TABLE projects (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    name VARCHAR(255) NOT NULL,
    repo_owner VARCHAR(255) NOT NULL,
    repo_name VARCHAR(255) NOT NULL,
    github_url TEXT NOT NULL,
    status VARCHAR(50) DEFAULT 'pending',
    file_count INTEGER DEFAULT 0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Source code embeddings table
CREATE TABLE source_code_embeddings (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    file_path TEXT NOT NULL,
    source_code TEXT NOT NULL,
    summary TEXT,
    embedding vector(768),
    language VARCHAR(50),
    file_size INTEGER,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Questions table
CREATE TABLE questions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    question TEXT NOT NULL,
    answer TEXT NOT NULL,
    file_references JSONB,
    query_embedding vector(768),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create indexes for better performance
CREATE INDEX idx_projects_user_id ON projects(user_id);
CREATE INDEX idx_embeddings_project_id ON source_code_embeddings(project_id);
CREATE INDEX idx_questions_project_id ON questions(project_id);
CREATE INDEX idx_questions_user_id ON questions(user_id);

-- Create vector similarity search index
CREATE INDEX idx_embeddings_vector ON source_code_embeddings 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

CREATE INDEX idx_questions_vector ON questions 
USING ivfflat (query_embedding vector_cosine_ops)
WITH (lists = 100);

Set Row Level Security (Optional)

-- Enable RLS
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE source_code_embeddings ENABLE ROW LEVEL SECURITY;
ALTER TABLE questions ENABLE ROW LEVEL SECURITY;

-- Create policies (examples)
CREATE POLICY "Users can view own data" ON users
    FOR SELECT USING (auth.uid() = id);

CREATE POLICY "Users can view own projects" ON projects
    FOR SELECT USING (auth.uid() = user_id);

Option 2: Local PostgreSQL

Install PostgreSQL and pgvector

# macOS
brew install postgresql pgvector

# Ubuntu/Debian
sudo apt-get install postgresql postgresql-contrib

Enable pgvector

CREATE EXTENSION vector;

Run the same schema SQL from above

📖 Usage Guide

Starting the Application

1. Start Backend Server

cd backend
npm run dev

Server will start at http://localhost:5000

2. Start Frontend Development Server

cd frontend
npm start

Application will open at http://localhost:3000

Using CodeChat

Step 1: Register/Login

Navigate to the registration page
Create an account with email and password
Login with your credentials

Step 2: Create a Project

Click "New Project" on the dashboard
Enter a project name
Paste the GitHub repository URL (e.g., https://github.com/facebook/react)
Click "Create Project"

Step 3: Index the Repository

Click "Start Indexing" on your project
Wait for the indexing process to complete
- Files are fetched from GitHub
- AI generates summaries
- Vector embeddings are created
Monitor the status indicator

Step 4: Ask Questions

Open the indexed project
Type your question in natural language:
- "How does authentication work in this project?"
- "Where is the database connection established?"
- "Explain the routing logic"
- "What libraries are used for state management?"
Review the AI-generated answer with file citations
Explore the referenced files and code snippets

Step 5: View History

Navigate to "Question History"
Browse all past questions and answers
Delete questions you no longer need

Example Questions

✅ "How is user authentication implemented?"
✅ "What API endpoints are available?"
✅ "Where is the database schema defined?"
✅ "Explain how the vector search works"
✅ "What dependencies does this project use?"
✅ "How are errors handled in the API?"
✅ "Where is the configuration loaded?"

📚 API Documentation

Base URL

http://localhost:5000/api

Authentication Endpoints

Register User

POST /api/auth/register
Content-Type: application/json

{
  "email": "user@example.com",
  "password": "SecurePassword123"
}

Response:

{
  "success": true,
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": {
    "id": "uuid",
    "email": "user@example.com"
  }
}

Login User

POST /api/auth/login
Content-Type: application/json

{
  "email": "user@example.com",
  "password": "SecurePassword123"
}

Project Endpoints

List Projects

GET /api/projects
Authorization: Bearer <token>

Response:

{
  "success": true,
  "projects": [
    {
      "id": "uuid",
      "name": "React Project",
      "repo_owner": "facebook",
      "repo_name": "react",
      "github_url": "https://github.com/facebook/react",
      "status": "completed",
      "file_count": 245,
      "created_at": "2024-01-15T10:30:00Z"
    }
  ]
}

Create Project

POST /api/projects
Authorization: Bearer <token>
Content-Type: application/json

{
  "name": "My Project",
  "githubUrl": "https://github.com/username/repo"
}

Get Project Details

GET /api/projects/:id
Authorization: Bearer <token>

Start Indexing

POST /api/projects/:id/process
Authorization: Bearer <token>

Check Indexing Status

GET /api/projects/:id/status
Authorization: Bearer <token>

Response:

{
  "status": "processing",
  "progress": {
    "processed": 120,
    "total": 245,
    "percentage": 48.98
  }
}

Delete Project

DELETE /api/projects/:id
Authorization: Bearer <token>

Question Endpoints

Ask Question

POST /api/questions/:projectId/ask
Authorization: Bearer <token>
Content-Type: application/json

{
  "question": "How does authentication work?"
}

Response:

{
  "success": true,
  "answer": "Authentication is implemented using JWT tokens...",
  "fileReferences": [
    {
      "filePath": "src/auth/middleware.js",
      "summary": "JWT authentication middleware",
      "similarity": 0.87,
      "codeSnippet": "const verifyToken = (req, res, next) => {...}"
    }
  ]
}

Get Question History

GET /api/questions/:projectId/history
Authorization: Bearer <token>

Delete Question

DELETE /api/questions/:questionId
Authorization: Bearer <token>

Rate Limits

Authentication endpoints: 5 requests per 15 minutes
Project endpoints: 50 requests per 15 minutes
Question endpoints: 20 requests per 15 minutes

📁 Project Structure

codechat/
├── backend/
│   ├── config/
│   │   ├── database.js          # Database connection
│   │   └── gemini.js            # Gemini AI configuration
│   ├── middleware/
│   │   ├── auth.js              # JWT authentication
│   │   └── rateLimiter.js       # Rate limiting
│   ├── routes/
│   │   ├── auth.js              # Authentication routes
│   │   ├── projects.js          # Project management
│   │   └── questions.js         # Q&A routes
│   ├── services/
│   │   ├── githubService.js     # GitHub API integration
│   │   ├── embeddingService.js  # Vector embedding generation
│   │   ├── indexingService.js   # Repository indexing
│   │   └── qaService.js         # Question answering
│   ├── utils/
│   │   ├── vectorSearch.js      # pgvector similarity search
│   │   └── fileParser.js        # Code file parsing
│   ├── .env                     # Environment variables
│   ├── server.js                # Express server entry
│   └── package.json
│
├── frontend/
│   ├── public/
│   │   └── index.html
│   ├── src/
│   │   ├── components/
│   │   │   ├── Auth/
│   │   │   │   ├── Login.jsx
│   │   │   │   └── Register.jsx
│   │   │   ├── Dashboard/
│   │   │   │   ├── ProjectList.jsx
│   │   │   │   └── ProjectCard.jsx
│   │   │   ├── Project/
│   │   │   │   ├── ProjectDetails.jsx
│   │   │   │   ├── IndexingStatus.jsx
│   │   │   │   └── QAInterface.jsx
│   │   │   └── History/
│   │   │       └── QuestionHistory.jsx
│   │   ├── context/
│   │   │   └── AuthContext.jsx   # Authentication context
│   │   ├── services/
│   │   │   └── api.js            # Axios API client
│   │   ├── App.jsx               # Main app component
│   │   ├── index.js              # Entry point
│   │   └── index.css             # Tailwind styles
│   ├── .env
│   ├── tailwind.config.js
│   └── package.json
│
└── README.md

🐛 Troubleshooting

Common Issues

1. Database Connection Errors

Error: Connection refused to PostgreSQL

Solution:

Verify DATABASE_URL in .env
Check Supabase project is active
Ensure network connectivity

2. pgvector Extension Not Found

Error: extension "vector" does not exist

Solution:

-- Run in Supabase SQL Editor
CREATE EXTENSION IF NOT EXISTS vector;

3. Gemini API Rate Limits

Error: 429 Too Many Requests

Solution:

Reduce concurrent indexing operations
Implement request queuing
Consider upgrading Gemini API tier

4. Large Repositories Timeout

Error: Request timeout during indexing

Solution:

Process files in smaller batches
Increase timeout limits in axios config
Filter out non-essential files

5. CORS Errors

Error: Access-Control-Allow-Origin blocked

Solution:

Verify FRONTEND_URL in backend .env
Check CORS middleware configuration
Ensure correct API URL in frontend .env

6. JWT Token Expired

Error: Token expired or invalid

Solution:

User needs to login again
Implement token refresh mechanism
Check JWT_EXPIRE setting

Debug Mode

Enable detailed logging:

# Add to backend .env
DEBUG=true
LOG_LEVEL=verbose

🗺️ Roadmap

Version 1.1 (Q2 2024)

Support for more programming languages
Batch question asking (multiple questions at once)
Export Q&A sessions to PDF/Markdown
Code snippet highlighting in answers
Project sharing with team members

Version 1.2 (Q3 2024)

Real-time collaboration on projects
Integration with GitLab and Bitbucket
Custom AI model fine-tuning
Advanced filtering and search options
Mobile application (React Native)

Version 2.0 (Q4 2024)

Code generation based on Q&A context
Automatic documentation generation
Integration with IDE plugins (VS Code, IntelliJ)
Multi-repository project support
Advanced analytics and insights dashboard

Community Requests

Voice-to-text question input
Diagram generation from code explanations
Integration with Slack/Discord bots
Webhook support for CI/CD pipelines

🤝 Contributing

We welcome contributions from the community! CodeChat is open-source and thrives on collaboration.

How to Contribute

Fork the Repository

git clone https://github.com/yourusername/codechat.git

Create a Feature Branch
```
git checkout -b feature/amazing-feature
```
Make Your Changes
- Write clean, documented code
- Follow existing code style
- Add tests if applicable
Commit Your Changes
```
git commit -m "Add amazing feature"
```
Push to Your Fork
```
git push origin feature/amazing-feature
```
Open a Pull Request
- Describe your changes clearly
- Reference any related issues
- Wait for review and feedback

Contribution Guidelines

Code Style: Follow ESLint and Prettier configurations
Commits: Use conventional commit messages
Testing: Add tests for new features
Documentation: Update README and inline comments
Issues: Check existing issues before creating new ones

Development Setup

# Install development dependencies
npm install --include=dev

# Run tests
npm test

# Run linting
npm run lint

# Format code
npm run format

Areas for Contribution

🐛 Bug fixes and error handling
✨ New features and enhancements
📝 Documentation improvements
🎨 UI/UX enhancements
🧪 Test coverage expansion
🌐 Internationalization (i18n)

📄 License

This project is licensed under the MIT License.

MIT License

Copyright (c) 2024 CodeChat

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

See LICENSE file for details.

🙏 Acknowledgments

CodeChat is built on the shoulders of giants. We'd like to thank:

Technologies

Google Gemini - Powerful AI models for code understanding
Supabase - Backend infrastructure and database
pgvector - Vector similarity search for PostgreSQL
React - UI library for building the frontend
Tailwind CSS - Utility-first CSS framework

Inspiration

GitHub Copilot - AI-powered code assistance
Phind - AI search for developers
Sourcegraph - Code intelligence platform

Contributors

Thank you to all our contributors who help make CodeChat better!

⭐ Star this repository if you find it helpful!

Built with ❤️ by developers, for developers

⬆ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
public		public
src		src
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tailwind.config.js		tailwind.config.js

Shubz224/RepoChat

Folders and files

Latest commit

History

Repository files navigation

💬 CodeChat - AI-Powered Code Repository Q&A

📖 Table of Contents

🌟 Overview

Why CodeChat?

✨ Key Features

🔐 Authentication & User Management

📦 GitHub Repository Integration

🧠 Intelligent Code Analysis

💬 Natural Language Q&A

📊 Project Management

📜 Question History

🎬 Demo

Chat Box

Repository Indexing

🛠️ Tech Stack

Backend

Frontend

AI & Machine Learning

🏗️ Architecture

Data Flow

📋 Prerequisites

Required Accounts & API Keys

🚀 Installation

1. Clone the Repository

2. Backend Setup

3. Frontend Setup

⚙️ Configuration

Backend Environment Variables

Frontend Environment Variables

🗄️ Database Setup

Option 1: Supabase (Recommended)

Option 2: Local PostgreSQL

📖 Usage Guide

Starting the Application

Using CodeChat

Step 1: Register/Login

Step 2: Create a Project

Step 3: Index the Repository

Step 4: Ask Questions

Step 5: View History

Example Questions

📚 API Documentation

Base URL

Authentication Endpoints

Register User

Login User

Project Endpoints

List Projects

Create Project

Get Project Details

Start Indexing

Check Indexing Status

Delete Project

Question Endpoints

Ask Question

Get Question History

Delete Question

Rate Limits

📁 Project Structure

🐛 Troubleshooting

Common Issues

1. Database Connection Errors

2. pgvector Extension Not Found

3. Gemini API Rate Limits

4. Large Repositories Timeout

5. CORS Errors

6. JWT Token Expired

Debug Mode

🗺️ Roadmap

Version 1.1 (Q2 2024)

Version 1.2 (Q3 2024)

Version 2.0 (Q4 2024)

Community Requests

🤝 Contributing

How to Contribute

Packages