Skip to content

duonglabs/mcp-codeintel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  MCP Code Intelligence Server

TypeScript Node.js Qdrant MCP

License: MIT npm version Build Status Coverage Status

A powerful Model Context Protocol (MCP) server that provides intelligent code analysis, semantic search, and structural understanding of codebases through advanced AI-powered indexing and vector embeddings.

๐Ÿš€ Quick Start โ€ข ๐Ÿ“– Documentation โ€ข ๐Ÿ› ๏ธ API Reference โ€ข ๐Ÿ—บ๏ธ Roadmap โ€ข ๐Ÿค Contributing โ€ข ๐Ÿ’ฌ Community


๐Ÿ’– Support This Project

GitHub Sponsors Buy Me A Coffee

If this project helps you build better software, consider supporting its development!


๐Ÿ“– The Story Behind This Project

I've been using Kilocode with custom API endpoints for months, and one of my favorite features was the automatic code indexing with vector embeddings. It made navigating large codebases incredibly intuitiveโ€”just ask questions in natural language and find exactly what you need. Plus, it saved a ton of AI tokens by only loading relevant code into context instead of dumping entire files.

Then yesterday, Kilocode updated and removed this feature entirely. ๐Ÿ˜ข

I searched everywhere for an alternativeโ€”existing MCP servers, standalone tools, anything that could provide the same semantic code search experience. Nothing came close to what I needed: real-time indexing, multi-language support, privacy-first options, and deep code understanding.

So I spent an evening building this. MCP Code Intelligence is my answer to that gapโ€”a powerful, flexible, and open-source solution that brings intelligent code search to any MCP-compatible client.

If you find this useful, please consider giving it a โญ on GitHub!


โœจ What Makes This Special?

Ever spent hours grepping through thousands of files trying to find "that authentication function"? Or wished you could ask your codebase questions in plain English?

MCP Code Intelligence transforms how you interact with code by understanding what it does, not just what it says. Instead of searching for exact keywords, you describe what you're looking forโ€”and it finds the right code, even if it uses completely different terminology.

Real-World Examples

Traditional Search (grep/find):

grep -r "authenticate" .  # Misses: login(), verifyUser(), checkCredentials()

Semantic Search (MCP Code Intelligence):

Query: "user authentication logic"
โœ“ Finds: login(), authenticate(), verifyCredentials(), checkUserAccess()
โœ“ Understands context: password hashing, JWT validation, session management

Your Intelligent Coding Companion

Whether you're:

  • ๐Ÿ” Exploring unfamiliar codebases - "Show me how payments are processed"
  • ๐Ÿ› Debugging complex issues - "Find error handling for database connections"
  • ๐Ÿ”ง Refactoring legacy code - "Locate all API endpoint definitions"
  • ๐Ÿ“š Onboarding new developers - Natural language queries instead of tribal knowledge

MCP Code Intelligence becomes your AI-powered guide through any codebase.

๐ŸŽฏ Key Highlights

  • ๐Ÿ” Natural Language Code Search: Ask "find user authentication logic" instead of grepping for keywords
  • ๐Ÿง  AI-Powered Analysis: Understand code relationships, complexity, and patterns automatically
  • โšก Real-time Intelligence: Auto-index changes as you code with zero configuration
  • ๐Ÿ  Privacy-First: Run completely local with Ollama - your code never leaves your machine
  • ๐Ÿ”— Cross-Language Support: Works across JavaScript, Python, PHP, Java, Go, Rust, and more
  • ๐Ÿ“Š Rich Insights: Get complexity scores, call graphs, and improvement suggestions

๐Ÿš€ Features

๐Ÿง  Advanced Code Intelligence

  • Multi-language Support: JavaScript, TypeScript, Python, PHP, Java, Go, Rust, and more
  • AST-based Parsing: Deep structural analysis of code components
  • Semantic Search: Find code by meaning, not just keywords
  • Symbol Tracking: Track function calls, method invocations, and class instantiations
  • Call Graph Analysis: Build and traverse dependency relationships between code components

๐Ÿ” Smart Code Analysis

  • Function Chunking: Automatically split large functions into logical sub-chunks (>2000 chars)
  • Smart Truncation: Truncate content at semantic boundaries (end of statements/blocks)
  • Complexity Calculation: Automatic complexity scoring based on control structures
  • Cross-reference Resolution: Link function calls to their definitions across files

๐ŸŽฏ Flexible Embedding Providers

  • OpenAI: GPT-based embeddings for high-quality semantic understanding
  • Google Vertex AI: Enterprise-grade embeddings with Google's models
  • Ollama: Local embeddings with models like nomic-embed-text for privacy

โšก Real-time Indexing

  • Auto-indexing: Automatically index code changes as you work
  • File Watching: Real-time monitoring of file system changes
  • Batch Processing: Efficient processing of large codebases
  • Incremental Updates: Only re-index changed content

๐Ÿ”ง Production Ready

  • Configurable: Extensive configuration options via environment variables
  • Scalable: Handles large codebases with thousands of files
  • Resilient: Robust error handling and recovery mechanisms
  • Observable: Comprehensive logging and status reporting

๐Ÿš€ Quick Start

Get up and running in under 5 minutes!

๐Ÿ“‹ Prerequisites

  • Node.js 18+
  • Qdrant vector database (we'll help you set this up)
  • Embedding Provider: Choose from OpenAI, Google Cloud, or local Ollama

โšก One-Command Setup

# Clone and setup everything
git clone https://github.com:duonglabs/mcp-codeintel.git
cd mcp-codeintel

# Run the automated setup script
chmod +x scripts/setup.sh
./scripts/setup.sh

# The script will:
# - Install dependencies and build the project
# - Set up Qdrant vector database
# - Install and configure Ollama
# - Create your configuration file

๐Ÿ”ง Configure Your MCP Client

Quick Setup: Use our pre-made configuration files in the config/ folder:

Manual Configuration Example (Ollama):

{
  "mcpServers": {
    "code-intelligence": {
      "command": "node",
      "args": ["path/to/mcp-codeintel/dist/server.js", "/path/to/your/workspace"],
      "env": {
        "EMBEDDING_DRIVER": "ollama",
        "OLLAMA_MODEL": "nomic-embed-text",
        "OLLAMA_ENDPOINT": "http://localhost:11434",
        "QDRANT_URL": "http://localhost:6333",
        "AUTO_INDEX_ENABLED": "true"
      }
    }
  }
}

๐ŸŽ‰ Start Coding!

That's it! Your code will be automatically indexed, and you can start using natural language to search and analyze your codebase.


โš™๏ธ Configuration

Environment Variables

Variable Description Default Required
EMBEDDING_DRIVER Embedding driver (ollama, openai, google-vertex, google-ai-studio) ollama โœ…
QDRANT_URL Qdrant database URL http://localhost:6333 โœ…
QDRANT_COLLECTION Qdrant collection name (important for multi-project setups) code_intelligence โœ…

Ollama Driver Configuration

Variable Description Default Required
OLLAMA_MODEL Ollama model name nomic-embed-text โœ…
OLLAMA_ENDPOINT Ollama endpoint URL http://localhost:11434 โœ…
OLLAMA_DIMENSIONS Vector dimensions 768 โŒ
OLLAMA_BATCH_SIZE Batch size for processing 50 โŒ

OpenAI Driver Configuration

Variable Description Default Required
OPENAI_API_KEY OpenAI API key - โœ…
OPENAI_MODEL OpenAI model name text-embedding-3-small โŒ
OPENAI_DIMENSIONS Vector dimensions 1536 โŒ
OPENAI_BATCH_SIZE Batch size for processing 100 โŒ
OPENAI_ORGANIZATION OpenAI organization ID - โŒ

Google Vertex AI Driver Configuration

Variable Description Default Required
GOOGLE_CLOUD_PROJECT Google Cloud project ID - โœ…
VERTEX_MODEL Vertex AI model name textembedding-gecko@003 โŒ
VERTEX_LOCATION Vertex AI location us-central1 โŒ
VERTEX_DIMENSIONS Vector dimensions 768 โŒ
VERTEX_BATCH_SIZE Batch size for processing 50 โŒ

Google AI Studio Driver Configuration

Variable Description Default Required
GOOGLE_AI_STUDIO_API_KEY Google AI Studio API key - โœ…
GOOGLE_AI_STUDIO_MODEL AI Studio model name text-embedding-004 โŒ
GOOGLE_AI_STUDIO_DIMENSIONS Vector dimensions 768 โŒ
GOOGLE_AI_STUDIO_BATCH_SIZE Batch size for processing 100 โŒ

General Configuration

Variable Description Default Required
AUTO_INDEX_ENABLED Enable automatic indexing true โŒ
AUTO_START_INDEXING Start indexing on server startup true โŒ
INDEX_EXISTING_ON_START Index existing files on startup true โŒ
WATCH_PATTERNS File patterns to watch (comma-separated) **/*.{js,ts,py,php,java,go,rs} โŒ
IGNORE_PATTERNS Patterns to ignore (comma-separated) node_modules/**,vendor/**,.git/** โŒ
DEBOUNCE_MS File change debounce time 1000 โŒ

Embedding Driver Setup

Ollama (Local - Recommended)

# Install Ollama first: https://ollama.ai/
ollama pull nomic-embed-text

export EMBEDDING_DRIVER=ollama
export OLLAMA_MODEL=nomic-embed-text
export OLLAMA_ENDPOINT=http://localhost:11434

OpenAI

export EMBEDDING_DRIVER=openai
export OPENAI_MODEL=text-embedding-3-small
export OPENAI_API_KEY=your_api_key_here

Google Vertex AI

export EMBEDDING_DRIVER=google-vertex
export VERTEX_MODEL=textembedding-gecko@003
export GOOGLE_CLOUD_PROJECT=your_project_id
# Ensure you have Google Cloud credentials configured

Google AI Studio

export EMBEDDING_DRIVER=google-ai-studio
export GOOGLE_AI_STUDIO_MODEL=text-embedding-004
export GOOGLE_AI_STUDIO_API_KEY=your_api_key_here

๐Ÿ› ๏ธ Available MCP Tools

The server provides a comprehensive set of tools for intelligent code analysis:

๐Ÿ” Search & Discovery Tools

search_code

Semantic search across your entire codebase using natural language queries.

Parameters:

  • query (string): Natural language description of what you're looking for
  • limit (number, optional): Maximum results to return (default: 10)
  • fileTypes (array, optional): Filter by file extensions (e.g., ["js", "ts", "py"])
  • chunkTypes (array, optional): Filter by code element types (function, class, method, etc.)
  • languages (array, optional): Filter by programming languages

Example:

await mcp.call("search_code", {
  query: "database connection with error handling",
  limit: 5,
  fileTypes: ["js", "ts"],
  chunkTypes: ["function"]
});

find_similar_code

Find code patterns similar to a given snippet with configurable similarity thresholds.

Parameters:

  • codeSnippet (string): Code snippet to find matches for
  • threshold (number, optional): Similarity threshold 0.0-1.0 (default: 0.7)
  • limit (number, optional): Maximum results (default: 10)

analyze_code

Comprehensive analysis of code structure, complexity, and patterns.

Parameters:

  • filePath (string): Path to file for analysis
  • analysisType (string, optional): "structure", "complexity", "dependencies", "patterns", or "all"
  • codeSnippet (string, optional): Analyze specific code snippet instead of entire file

suggest_improvements

AI-powered code improvement suggestions based on best practices and patterns in your codebase.

Parameters:

  • filePath (string): File to analyze for improvements
  • focusArea (string, optional): "performance", "readability", "maintainability", "security", or "all"
  • codeSnippet (string, optional): Focus on specific code snippet
๐Ÿ“Š Indexing & Management Tools

start_indexing

Start or restart the code indexing process with customizable options.

Parameters:

  • force (boolean, optional): Force re-indexing of all files (default: false)
  • patterns (array, optional): Specific file patterns to index

stop_indexing

Gracefully stop the indexing process and cleanup resources.

index_file

Manually index a specific file or directory.

Parameters:

  • filePath (string): Path to file or directory to index
  • force (boolean, optional): Force re-indexing even if already indexed

get_indexing_status

Get detailed status of the indexing process including queue size, progress, and performance metrics.

โš™๏ธ Configuration & Status Tools

get_status

Get overall system status including database connection, indexed files count, and performance metrics.

get_config

Retrieve current configuration settings and runtime parameters.

configure_auto_indexing

Update auto-indexing settings at runtime without restarting the server.

Parameters:

  • enabled (boolean, optional): Enable/disable auto-indexing
  • watchPatterns (array, optional): File patterns to watch
  • ignorePatterns (array, optional): Patterns to ignore
  • debounceMs (number, optional): Debounce time for file changes
  • batchSize (number, optional): Batch size for processing
  • autoStart (boolean, optional): Auto-start indexing on server startup

๐Ÿ“š Usage Examples

๐Ÿ” Semantic Code Search

// Find authentication-related code across your entire codebase
const results = await mcp.call("search_code", {
  query: "user login authentication with password validation",
  limit: 10,
  fileTypes: ["js", "ts", "py"],
  chunkTypes: ["function", "class"]
});

// Results will include relevant functions even if they don't contain exact keywords

๐Ÿง  Intelligent Code Analysis

// Get comprehensive analysis of any file
const analysis = await mcp.call("analyze_code", {
  filePath: "src/auth/UserService.ts",
  analysisType: "all"  // structure, complexity, dependencies, patterns
});

// Returns: complexity scores, function signatures, dependencies, suggestions

๐Ÿ”— Find Similar Code Patterns

// Discover similar implementations across your codebase
const similar = await mcp.call("find_similar_code", {
  codeSnippet: `
    async function validateUser(email: string, password: string) {
      const user = await User.findByEmail(email);
      return bcrypt.compare(password, user.hashedPassword);
    }
  `,
  threshold: 0.75,
  limit: 5
});

// Find all similar validation patterns, even in different languages

๐Ÿš€ AI-Powered Improvements

// Get intelligent suggestions for code improvements
const suggestions = await mcp.call("suggest_improvements", {
  filePath: "src/utils/helpers.js",
  focusArea: "performance"  // performance, readability, maintainability, security
});

// Returns specific, actionable improvement recommendations

โš™๏ธ Auto-Indexing Management

// Configure real-time indexing for your workflow
await mcp.call("configure_auto_indexing", {
  enabled: true,
  watchPatterns: ["**/*.{ts,js,py}", "src/**/*.php"],
  ignorePatterns: ["node_modules/**", "*.test.js"],
  debounceMs: 500  // Index changes after 500ms of inactivity
});

// Your code intelligence stays up-to-date automatically

๐Ÿ—๏ธ Architecture & Design

๐Ÿง  Intelligent Processing Pipeline

graph TD
    A[IDE Startup] --> B[Auto-Index Enabled?]
    B -->|Yes| C[Scan Existing Files]
    B -->|No| D[Wait for Manual Trigger]
    
    C --> E[File Change Detected]
    D --> E
    E --> F[Check File Hash]
    F -->|Hash Changed| G[Language Detection]
    F -->|Hash Same| H[Skip Processing]
    
    G --> I[AST Parsing]
    I --> J[Symbol Extraction]
    J --> K[Smart Chunking]
    K --> L[Embedding Generation]
    L --> M[Vector Storage with Hash]
    M --> N[Call Graph Building]
    
    O[Search Query] --> P[Query Embedding]
    P --> Q[Vector Similarity Search]
    Q --> R[Cosine Similarity Ranking]
    R --> S[Response with Context]
    
    style A fill:#e1f5fe
    style C fill:#f3e5f5
    style F fill:#fff3e0
    style H fill:#ffebee
    style R fill:#e8f5e8
Loading

๐Ÿ”ง Core Components

๐Ÿ“ CodeIntelligenceEngine - The orchestration layer
  • Coordinates all processing components
  • Manages indexing lifecycle and state
  • Handles concurrent operations and queuing
  • Provides unified API for all operations
  • Hash-based duplicate detection to avoid re-processing unchanged files
๐Ÿ” CodeParser & AST Analysis - Deep code understanding
  • Multi-language AST parsing using Babel and language-specific parsers
  • Symbol extraction for functions, classes, variables, imports
  • Complexity calculation based on cyclomatic complexity algorithms
  • Smart chunking that respects code boundaries and semantic meaning
  • Content hashing for change detection and incremental updates
๐ŸŽฏ Embedding Providers - Flexible AI backends
  • OpenAI: High-quality embeddings with text-embedding-3-* models
  • Google Vertex AI: Enterprise-grade embeddings with textembedding-gecko
  • Ollama: Privacy-first local embeddings with nomic-embed-text
  • Extensible: Easy to add new providers via the EmbeddingProvider interface
๐Ÿ’พ Vector Database Integration - Scalable storage
  • Qdrant integration with automatic collection management
  • Rich metadata storage for filtering and context
  • Cosine similarity search with configurable distance metrics
  • Hash-based tracking to prevent duplicate embeddings
  • Batch operations for optimal performance
๐Ÿ‘๏ธ File System Monitoring - Real-time intelligence
  • Chokidar-based file watching with efficient change detection
  • Auto-indexing on startup for existing files (configurable)
  • Debounced processing to handle rapid file changes
  • Pattern-based filtering with glob support
  • Queue management for handling concurrent changes

๐Ÿ“Š Code Chunk Types & Metadata

Each code element is processed into specialized chunks with rich metadata:

Chunk Type Description Metadata Included
file Complete file content (smartly truncated) Language, complexity, imports, exports
class Individual class definitions Methods, properties, inheritance, complexity
function Standalone functions and methods Parameters, return type, calls made, complexity
function-part Sub-chunks of large functions (>2000 chars) Parent function, logical boundaries, context
method Class methods with context Class context, access modifiers, overrides
variable Variable declarations and assignments Type, scope, usage patterns
import Import/require statements Dependencies, module relationships

๐Ÿ”— Call Graph & Dependency Analysis

The system builds comprehensive dependency graphs by:

  1. Static Analysis: Parse function calls, method invocations, class instantiations
  2. Cross-file Resolution: Link calls to definitions across the codebase
  3. Relationship Mapping: Build bidirectional "calls" and "called-by" relationships
  4. Depth-limited Traversal: Find related code with configurable depth limits

๐Ÿš€ Advanced Setup & Deployment

๐Ÿ“ Setup Scripts

All setup scripts are organized in the scripts/ folder for easy access:

scripts/
โ”œโ”€โ”€ setup.sh              # Complete automated setup (Linux/Mac)
โ”œโ”€โ”€ setup-ollama.sh       # Install and configure Ollama
โ”œโ”€โ”€ setup-ollama.bat      # Install Ollama (Windows)
โ”œโ”€โ”€ setup-qdrant.sh       # Set up Qdrant vector database
โ”œโ”€โ”€ setup-qdrant.bat      # Set up Qdrant (Windows)
โ””โ”€โ”€ start-server.bat      # Start server (Windows)

๐Ÿณ Docker Deployment

For production environments, use our Docker setup:

# Start everything with Docker Compose
docker-compose up -d

# The server will be available with Qdrant pre-configured

โ˜๏ธ Cloud Deployment Options

Google Cloud Platform
# Set up Vertex AI embeddings
export EMBEDDING_PROVIDER=google-vertex
export EMBEDDING_MODEL=textembedding-gecko@003
export GOOGLE_CLOUD_PROJECT=your-project-id

# Deploy to Cloud Run
gcloud run deploy mcp-codeintel \
  --source . \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated
AWS Deployment
# Use OpenAI embeddings
export EMBEDDING_PROVIDER=openai
export EMBEDDING_MODEL=text-embedding-3-small
export OPENAI_API_KEY=your-api-key

# Deploy using AWS Lambda or ECS
# See deployment guides in /docs

๐Ÿ”ง Performance Tuning

For large codebases (10,000+ files), optimize these settings:

# High-performance configuration
export BATCH_SIZE=50
export DEBOUNCE_MS=2000
export MAX_CHUNK_SIZE=4000
export FUNCTION_CHUNK_THRESHOLD=3000

# Enable performance features
export CALCULATE_COMPLEXITY=true
export ENABLE_CALL_GRAPH=true
export MAX_RELATED_DEPTH=3

๐Ÿ“Š Monitoring & Observability

// Get detailed system metrics
const status = await mcp.call("get_status");
console.log(`Indexed files: ${status.indexedFiles}`);
console.log(`Vector count: ${status.vectorCount}`);
console.log(`Memory usage: ${status.memoryUsage}`);

// Monitor indexing performance
const indexStatus = await mcp.call("get_indexing_status");
console.log(`Queue size: ${indexStatus.queueSize}`);
console.log(`Processing rate: ${indexStatus.filesPerMinute} files/min`);

๐Ÿ”ง Development & Customization

๐Ÿ—๏ธ Project Structure

mcp-codeintel/
โ”œโ”€โ”€ ๐Ÿ“ src/
โ”‚   โ”œโ”€โ”€ ๐Ÿ”— analysis/          # Call graph and dependency analysis
โ”‚   โ”œโ”€โ”€ โš™๏ธ config/           # Configuration management  
โ”‚   โ”œโ”€โ”€ ๐ŸŽฏ core/             # Core engine and orchestration
โ”‚   โ”œโ”€โ”€ ๐Ÿค– embeddings/       # Embedding provider implementations
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ parsers/          # Code parsing and AST analysis
โ”‚   โ”œโ”€โ”€ ๐Ÿ› ๏ธ tools/            # MCP tool implementations
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‹ types/            # TypeScript type definitions
โ”‚   โ”œโ”€โ”€ ๐Ÿ’พ vector/           # Vector database integration
โ”‚   โ””โ”€โ”€ ๐Ÿ‘๏ธ watchers/         # File system monitoring
โ”œโ”€โ”€ ๐Ÿ“ฆ dist/                 # Compiled JavaScript output
โ”œโ”€โ”€ ๐Ÿณ docker/               # Docker configuration
โ”œโ”€โ”€ ๐Ÿ“œ scripts/              # Setup and utility scripts
โ””โ”€โ”€ ๐Ÿ“š docs/                 # Additional documentation

๐Ÿš€ Development Workflow

# Development setup
git clone https://github.com:duonglabs/mcp-codeintel.git
cd mcp-codeintel
npm install

# Development with hot reload
npm run dev

# Build for production
npm run build

# Run comprehensive tests
npm test

# Code quality checks
npm run lint
npm run format

๐Ÿ”Œ Adding New Language Support

Extend the system to support additional programming languages:

1. Language Detection
// src/parsers/LanguageDetector.ts
private static extensionMap: Record<string, string> = {
  '.kt': 'kotlin',
  '.swift': 'swift',
  '.dart': 'dart',
  // Add your language extension
};
2. AST Parser Implementation
// src/parsers/CodeParser.ts
private async parseKotlin(filePath: string, content: string): Promise<CodeChunk[]> {
  const chunks: CodeChunk[] = [];
  
  // Implement language-specific parsing logic
  // Extract classes, functions, variables, etc.
  
  return chunks;
}
3. Symbol Tracking
// src/parsers/SymbolTracker.ts
private static extractKotlinCalls(lines: string[]): FunctionCall[] {
  const calls: FunctionCall[] = [];
  
  // Extract function calls, method invocations
  // Pattern matching for language-specific syntax
  
  return calls;
}

๐ŸŽฏ Adding New Embedding Providers

Integrate additional AI services for embeddings:

Provider Implementation
// src/embeddings/HuggingFaceProvider.ts
export class HuggingFaceEmbeddingProvider implements EmbeddingProvider {
  constructor(private config: HuggingFaceConfig) {}

  async embed(text: string): Promise<EmbeddingResult> {
    // Implement API calls to HuggingFace
    // Handle rate limiting, errors, etc.
    return { embedding: vector, model: this.config.model };
  }

  async embedBatch(texts: string[]): Promise<EmbeddingResult[]> {
    // Batch processing implementation
  }
}
Factory Registration
// src/embeddings/EmbeddingFactory.ts
export class EmbeddingFactory {
  static create(config: EmbeddingConfig): EmbeddingProvider {
    switch (config.provider) {
      case 'huggingface':
        return new HuggingFaceEmbeddingProvider(config);
      // ... other providers
    }
  }
}

๐Ÿงช Testing Strategy

# Unit tests for individual components
npm run test:unit

# Integration tests with real databases
npm run test:integration

# End-to-end tests with sample codebases
npm run test:e2e

# Performance benchmarks
npm run test:performance

๐Ÿ“Š Custom Analytics & Metrics

Extend the system with custom metrics collection:

// src/analytics/MetricsCollector.ts
export class MetricsCollector {
  trackSearchQuery(query: string, resultCount: number, latency: number) {
    // Custom analytics implementation
  }
  
  trackIndexingPerformance(fileCount: number, duration: number) {
    // Performance monitoring
  }
}

๐Ÿค Contributing

We welcome contributions from developers of all skill levels! Whether you're fixing bugs, adding features, improving documentation, or sharing ideas, your contributions make this project better.

๐ŸŒŸ Ways to Contribute

  • ๐Ÿ› Bug Reports: Found an issue? Open a bug report
  • โœจ Feature Requests: Have an idea? Suggest a feature
  • ๐Ÿ“ Documentation: Improve our docs, add examples, or write tutorials
  • ๐Ÿ”ง Code: Fix bugs, add features, or optimize performance
  • ๐Ÿงช Testing: Add tests, improve coverage, or test on different platforms
  • ๐ŸŒ Language Support: Add parsers for new programming languages

๐Ÿš€ Quick Contribution Guide

  1. Fork & Clone

    git clone https://github.com:duonglabs/mcp-codeintel.git
    cd mcp-codeintel
    npm install
  2. Create Feature Branch

    git checkout -b feature/amazing-new-feature
  3. Make Changes & Test

    npm run build
    npm test
    npm run lint
  4. Commit & Push

    git commit -m "feat: add amazing new feature"
    git push origin feature/amazing-new-feature
  5. Open Pull Request

    • Use our PR template
    • Link related issues
    • Describe your changes clearly

๐Ÿ“‹ Development Guidelines

  • Code Style: We use Prettier and ESLint - run npm run format and npm run lint
  • Commit Messages: Follow Conventional Commits
  • Testing: Add tests for new features, ensure existing tests pass
  • Documentation: Update README and docs for user-facing changes
  • TypeScript: Use proper types, avoid any when possible

๐ŸŽฏ High-Impact Contribution Areas

๐Ÿ”ฅ High Priority
  • Language Parsers: Add support for Kotlin, Swift, Dart, Scala
  • Performance: Optimize indexing speed and memory usage
  • Error Handling: Improve resilience and error recovery
  • Testing: Increase test coverage, add integration tests
๐Ÿš€ Medium Priority
  • Embedding Providers: Add HuggingFace, Cohere, local transformers
  • Vector Databases: Support for Pinecone, Weaviate, Chroma
  • Monitoring: Add metrics, health checks, observability
  • Documentation: API docs, tutorials, deployment guides
๐Ÿ’ก Ideas Welcome
  • IDE Integrations: VS Code extension, IntelliJ plugin
  • UI Dashboard: Web interface for code exploration
  • Advanced Analytics: Code quality metrics, technical debt analysis
  • Collaboration: Multi-user features, team insights

๐Ÿ† Recognition

Contributors are recognized in:

  • ๐Ÿ“œ CONTRIBUTORS.md - Hall of fame for all contributors
  • ๐ŸŽ‰ Release notes for significant contributions
  • ๐Ÿท๏ธ GitHub contributor badges and stats
  • ๐Ÿ’ฌ Shoutouts in community discussions

๐Ÿ“ž Get Help

Read our full Contributing Guide for detailed information.


๐Ÿ“Š Performance & Benchmarks

โšก Real-World Performance

Tested on a variety of codebases to ensure production-ready performance:

Codebase Size Files Indexing Time Memory Usage Search Latency
Small (React app) ~500 files 2-3 minutes ~50MB <50ms
Medium (Express API) ~2,000 files 8-12 minutes ~150MB <75ms
Large (Enterprise) ~10,000 files 30-45 minutes ~500MB <100ms
Huge (Monorepo) ~50,000 files 2-3 hours ~2GB <150ms

๐ŸŽฏ Optimization Tips

For Large Codebases (10,000+ files)
# Optimize batch processing
export BATCH_SIZE=50
export DEBOUNCE_MS=2000

# Use selective indexing
export WATCH_PATTERNS="src/**/*.{js,ts},lib/**/*.py"
export IGNORE_PATTERNS="node_modules/**,vendor/**,*.min.js,dist/**"

# Enable performance features
export CALCULATE_COMPLEXITY=false  # Disable if not needed
export MAX_CHUNK_SIZE=2000         # Smaller chunks for faster processing
For Memory-Constrained Environments
# Reduce memory footprint
export BATCH_SIZE=10
export MAX_CHUNK_SIZE=1500
export FUNCTION_CHUNK_THRESHOLD=1000

# Disable resource-intensive features
export ENABLE_CALL_GRAPH=false
export DEBUG_PARSING=false

๐Ÿ“ˆ Scaling Strategies

  • Horizontal Scaling: Deploy multiple instances with load balancing
  • Selective Indexing: Index only critical directories and file types
  • Incremental Updates: Leverage auto-indexing for minimal re-processing
  • Caching: Use Redis for embedding cache in distributed setups

๐Ÿ”’ Security & Privacy

๐Ÿ›ก๏ธ Data Protection

  • Local Processing: With Ollama, your code never leaves your machine
  • No Code Storage: Only vector embeddings are stored, not source code
  • Secure Communication: All API calls use HTTPS/TLS encryption
  • Access Control: Configurable file patterns and access restrictions

๐Ÿ” API Security

  • Rate Limiting: Built-in protection against API abuse
  • Input Validation: Comprehensive sanitization of all inputs
  • Error Sanitization: No sensitive information in error messages
  • Key Management: Secure handling of API keys and credentials

๐Ÿ  Privacy-First Setup

For maximum privacy, use the local-only configuration:

# Complete local setup - no external API calls
export EMBEDDING_PROVIDER=ollama
export EMBEDDING_MODEL=nomic-embed-text
export QDRANT_URL=http://localhost:6333

# Your code intelligence runs entirely on your machine

๐ŸŒŸ Showcase & Examples

๐ŸŽฏ Real-World Use Cases

๐Ÿ” Code Exploration & Onboarding

Scenario: New developer joining a large codebase

// Find all authentication-related code
await mcp.call("search_code", {
  query: "user authentication login session management",
  limit: 20
});

// Understand the payment processing flow
await mcp.call("search_code", {
  query: "payment processing stripe checkout billing",
  chunkTypes: ["function", "class"]
});

// Find similar error handling patterns
await mcp.call("find_similar_code", {
  codeSnippet: "try { ... } catch (error) { logger.error(...) }",
  threshold: 0.7
});
๐Ÿ› Bug Investigation & Debugging

Scenario: Investigating a production issue

// Find all code that handles user sessions
await mcp.call("search_code", {
  query: "session timeout expiration cleanup",
  fileTypes: ["js", "ts"]
});

// Analyze complexity of suspicious functions
await mcp.call("analyze_code", {
  filePath: "src/auth/SessionManager.js",
  analysisType: "complexity"
});

// Find similar timeout handling across the codebase
await mcp.call("find_similar_code", {
  codeSnippet: "setTimeout(() => { session.destroy() }, timeout)",
  threshold: 0.6
});
โ™ป๏ธ Refactoring & Code Quality

Scenario: Improving code quality and removing duplication

// Find all database connection patterns
await mcp.call("search_code", {
  query: "database connection pool mysql postgres",
  chunkTypes: ["function"]
});

// Get improvement suggestions for a complex file
await mcp.call("suggest_improvements", {
  filePath: "src/services/UserService.js",
  focusArea: "maintainability"
});

// Find duplicate validation logic
await mcp.call("find_similar_code", {
  codeSnippet: "function validateEmail(email) { return /^[^@]+@[^@]+$/.test(email); }",
  threshold: 0.8
});

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿค What This Means

  • โœ… Commercial Use: Use in commercial projects and products
  • โœ… Modification: Modify and adapt the code for your needs
  • โœ… Distribution: Share and distribute the software
  • โœ… Private Use: Use privately without restrictions
  • โ— Attribution: Include the original license and copyright notice

๐Ÿ™ Acknowledgments & Credits

This project stands on the shoulders of giants. Special thanks to:

๐Ÿ—๏ธ Core Technologies

  • Model Context Protocol - For the excellent MCP specification and ecosystem
  • Qdrant - For the powerful and efficient vector database
  • Ollama - For making local AI accessible and privacy-friendly
  • Babel - For robust JavaScript/TypeScript AST parsing

๐Ÿค– AI & Embedding Providers

  • OpenAI - For high-quality embedding models
  • Google Cloud AI - For enterprise-grade Vertex AI embeddings
  • Nomic AI - For the excellent nomic-embed-text model

๐Ÿ› ๏ธ Development Tools

  • TypeScript - For type safety and developer experience
  • Chokidar - For efficient file system watching
  • Jest - For comprehensive testing framework

๐ŸŒŸ Community Contributors

A huge thank you to all our contributors who have helped make this project better:

  • JohnnyDao - Project creator and maintainer. A passionate developer who loves building AI agents and applications that make life easier and more productive.

Want to see your name here? Contribute to the project!

๐Ÿ’ก Inspiration & Research

This project was inspired by:

  • Code search tools like GitHub's semantic search and Sourcegraph
  • Academic research on code embeddings and program analysis
  • Developer pain points in understanding large codebases
  • The vision of AI-assisted software development

๐Ÿ“ž Support & Community

๐Ÿ’ฌ Get Help

  • ๐Ÿ“– Documentation: Wiki - Comprehensive guides and tutorials
  • ๐Ÿ› Bug Reports: Issues - Report bugs and request features
  • ๐Ÿ’ญ Discussions: GitHub Discussions - Ask questions and share ideas
  • ๐Ÿ“ง Direct Contact: hi.duonglabs@gmail.com - For sensitive or private inquiries

๐ŸŒ Community

๐Ÿš€ Quick Links

  • ๐Ÿ“‹ Roadmap - See our exciting future plans and intelligent features
  • ๐Ÿท๏ธ Releases - Download the latest version
  • ๐Ÿ“Š Changelog - See what's new in each version
  • ๐Ÿค Contributing - Join our development community

โ“ FAQ

How does this compare to GitHub Copilot or other AI coding tools?

MCP Code Intelligence focuses on understanding and searching existing code, while Copilot generates new code. We're complementary tools - use MCP to understand your codebase, then use Copilot to write new features based on that understanding.

Can I use this with private/proprietary code?

Absolutely! With Ollama, everything runs locally - your code never leaves your machine. Even with cloud providers, only vector embeddings (not source code) are sent to external services.

What's the difference between this and traditional code search?

Traditional search matches keywords exactly. MCP Code Intelligence understands meaning - search for "user authentication" and find relevant code even if it uses terms like "login", "credentials", or "auth tokens".

How accurate is the code analysis?

Our AST-based parsing is highly accurate for supported languages. Symbol tracking and call graph analysis use heuristics and may miss some dynamic calls, but work well for typical codebases (90%+ accuracy in our testing).


๐ŸŒŸ Star History

Star History Chart


โญ If this project helps you, please consider giving it a star!

Made with โค๏ธ by JohnnyDao and the open source community

Empowering developers with AI-powered code intelligence since 2026

๐Ÿ’– Support the Project

If this project has helped you build better software, consider supporting its continued development:

Your support helps maintain this project and develop new features that benefit the entire developer community!

About

AI-powered semantic code search for MCP clients. Auto-index your codebase with vector embeddings, search using natural language, and save AI tokens by loading only relevant code into context.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors