A powerful Model Context Protocol (MCP) server that provides intelligent code analysis, semantic search, and structural understanding of codebases through advanced AI-powered indexing and vector embeddings.
๐ Quick Start โข ๐ Documentation โข ๐ ๏ธ API Reference โข ๐บ๏ธ Roadmap โข ๐ค Contributing โข ๐ฌ Community
If this project helps you build better software, consider supporting its development!
I've been using Kilocode with custom API endpoints for months, and one of my favorite features was the automatic code indexing with vector embeddings. It made navigating large codebases incredibly intuitiveโjust ask questions in natural language and find exactly what you need. Plus, it saved a ton of AI tokens by only loading relevant code into context instead of dumping entire files.
Then yesterday, Kilocode updated and removed this feature entirely. ๐ข
I searched everywhere for an alternativeโexisting MCP servers, standalone tools, anything that could provide the same semantic code search experience. Nothing came close to what I needed: real-time indexing, multi-language support, privacy-first options, and deep code understanding.
So I spent an evening building this. MCP Code Intelligence is my answer to that gapโa powerful, flexible, and open-source solution that brings intelligent code search to any MCP-compatible client.
If you find this useful, please consider giving it a โญ on GitHub!
Ever spent hours grepping through thousands of files trying to find "that authentication function"? Or wished you could ask your codebase questions in plain English?
MCP Code Intelligence transforms how you interact with code by understanding what it does, not just what it says. Instead of searching for exact keywords, you describe what you're looking forโand it finds the right code, even if it uses completely different terminology.
Traditional Search (grep/find):
grep -r "authenticate" . # Misses: login(), verifyUser(), checkCredentials()Semantic Search (MCP Code Intelligence):
Query: "user authentication logic"
โ Finds: login(), authenticate(), verifyCredentials(), checkUserAccess()
โ Understands context: password hashing, JWT validation, session management
Whether you're:
- ๐ Exploring unfamiliar codebases - "Show me how payments are processed"
- ๐ Debugging complex issues - "Find error handling for database connections"
- ๐ง Refactoring legacy code - "Locate all API endpoint definitions"
- ๐ Onboarding new developers - Natural language queries instead of tribal knowledge
MCP Code Intelligence becomes your AI-powered guide through any codebase.
- ๐ Natural Language Code Search: Ask "find user authentication logic" instead of grepping for keywords
- ๐ง AI-Powered Analysis: Understand code relationships, complexity, and patterns automatically
- โก Real-time Intelligence: Auto-index changes as you code with zero configuration
- ๐ Privacy-First: Run completely local with Ollama - your code never leaves your machine
- ๐ Cross-Language Support: Works across JavaScript, Python, PHP, Java, Go, Rust, and more
- ๐ Rich Insights: Get complexity scores, call graphs, and improvement suggestions
- Multi-language Support: JavaScript, TypeScript, Python, PHP, Java, Go, Rust, and more
- AST-based Parsing: Deep structural analysis of code components
- Semantic Search: Find code by meaning, not just keywords
- Symbol Tracking: Track function calls, method invocations, and class instantiations
- Call Graph Analysis: Build and traverse dependency relationships between code components
- Function Chunking: Automatically split large functions into logical sub-chunks (>2000 chars)
- Smart Truncation: Truncate content at semantic boundaries (end of statements/blocks)
- Complexity Calculation: Automatic complexity scoring based on control structures
- Cross-reference Resolution: Link function calls to their definitions across files
- OpenAI: GPT-based embeddings for high-quality semantic understanding
- Google Vertex AI: Enterprise-grade embeddings with Google's models
- Ollama: Local embeddings with models like
nomic-embed-textfor privacy
- Auto-indexing: Automatically index code changes as you work
- File Watching: Real-time monitoring of file system changes
- Batch Processing: Efficient processing of large codebases
- Incremental Updates: Only re-index changed content
- Configurable: Extensive configuration options via environment variables
- Scalable: Handles large codebases with thousands of files
- Resilient: Robust error handling and recovery mechanisms
- Observable: Comprehensive logging and status reporting
Get up and running in under 5 minutes!
- Node.js 18+
- Qdrant vector database (we'll help you set this up)
- Embedding Provider: Choose from OpenAI, Google Cloud, or local Ollama
# Clone and setup everything
git clone https://github.com:duonglabs/mcp-codeintel.git
cd mcp-codeintel
# Run the automated setup script
chmod +x scripts/setup.sh
./scripts/setup.sh
# The script will:
# - Install dependencies and build the project
# - Set up Qdrant vector database
# - Install and configure Ollama
# - Create your configuration fileQuick Setup: Use our pre-made configuration files in the config/ folder:
- Ollama (Local):
config/mcp-ollama.json - OpenAI:
config/mcp-openai.json - Google Vertex AI:
config/mcp-google-vertex.json - Google AI Studio:
config/mcp-google-ai-studio.json
Manual Configuration Example (Ollama):
{
"mcpServers": {
"code-intelligence": {
"command": "node",
"args": ["path/to/mcp-codeintel/dist/server.js", "/path/to/your/workspace"],
"env": {
"EMBEDDING_DRIVER": "ollama",
"OLLAMA_MODEL": "nomic-embed-text",
"OLLAMA_ENDPOINT": "http://localhost:11434",
"QDRANT_URL": "http://localhost:6333",
"AUTO_INDEX_ENABLED": "true"
}
}
}
}That's it! Your code will be automatically indexed, and you can start using natural language to search and analyze your codebase.
| Variable | Description | Default | Required |
|---|---|---|---|
EMBEDDING_DRIVER |
Embedding driver (ollama, openai, google-vertex, google-ai-studio) |
ollama |
โ |
QDRANT_URL |
Qdrant database URL | http://localhost:6333 |
โ |
QDRANT_COLLECTION |
Qdrant collection name (important for multi-project setups) | code_intelligence |
โ |
| Variable | Description | Default | Required |
|---|---|---|---|
OLLAMA_MODEL |
Ollama model name | nomic-embed-text |
โ |
OLLAMA_ENDPOINT |
Ollama endpoint URL | http://localhost:11434 |
โ |
OLLAMA_DIMENSIONS |
Vector dimensions | 768 |
โ |
OLLAMA_BATCH_SIZE |
Batch size for processing | 50 |
โ |
| Variable | Description | Default | Required |
|---|---|---|---|
OPENAI_API_KEY |
OpenAI API key | - | โ |
OPENAI_MODEL |
OpenAI model name | text-embedding-3-small |
โ |
OPENAI_DIMENSIONS |
Vector dimensions | 1536 |
โ |
OPENAI_BATCH_SIZE |
Batch size for processing | 100 |
โ |
OPENAI_ORGANIZATION |
OpenAI organization ID | - | โ |
| Variable | Description | Default | Required |
|---|---|---|---|
GOOGLE_CLOUD_PROJECT |
Google Cloud project ID | - | โ |
VERTEX_MODEL |
Vertex AI model name | textembedding-gecko@003 |
โ |
VERTEX_LOCATION |
Vertex AI location | us-central1 |
โ |
VERTEX_DIMENSIONS |
Vector dimensions | 768 |
โ |
VERTEX_BATCH_SIZE |
Batch size for processing | 50 |
โ |
| Variable | Description | Default | Required |
|---|---|---|---|
GOOGLE_AI_STUDIO_API_KEY |
Google AI Studio API key | - | โ |
GOOGLE_AI_STUDIO_MODEL |
AI Studio model name | text-embedding-004 |
โ |
GOOGLE_AI_STUDIO_DIMENSIONS |
Vector dimensions | 768 |
โ |
GOOGLE_AI_STUDIO_BATCH_SIZE |
Batch size for processing | 100 |
โ |
| Variable | Description | Default | Required |
|---|---|---|---|
AUTO_INDEX_ENABLED |
Enable automatic indexing | true |
โ |
AUTO_START_INDEXING |
Start indexing on server startup | true |
โ |
INDEX_EXISTING_ON_START |
Index existing files on startup | true |
โ |
WATCH_PATTERNS |
File patterns to watch (comma-separated) | **/*.{js,ts,py,php,java,go,rs} |
โ |
IGNORE_PATTERNS |
Patterns to ignore (comma-separated) | node_modules/**,vendor/**,.git/** |
โ |
DEBOUNCE_MS |
File change debounce time | 1000 |
โ |
# Install Ollama first: https://ollama.ai/
ollama pull nomic-embed-text
export EMBEDDING_DRIVER=ollama
export OLLAMA_MODEL=nomic-embed-text
export OLLAMA_ENDPOINT=http://localhost:11434export EMBEDDING_DRIVER=openai
export OPENAI_MODEL=text-embedding-3-small
export OPENAI_API_KEY=your_api_key_hereexport EMBEDDING_DRIVER=google-vertex
export VERTEX_MODEL=textembedding-gecko@003
export GOOGLE_CLOUD_PROJECT=your_project_id
# Ensure you have Google Cloud credentials configuredexport EMBEDDING_DRIVER=google-ai-studio
export GOOGLE_AI_STUDIO_MODEL=text-embedding-004
export GOOGLE_AI_STUDIO_API_KEY=your_api_key_hereThe server provides a comprehensive set of tools for intelligent code analysis:
๐ Search & Discovery Tools
Semantic search across your entire codebase using natural language queries.
Parameters:
query(string): Natural language description of what you're looking forlimit(number, optional): Maximum results to return (default: 10)fileTypes(array, optional): Filter by file extensions (e.g., ["js", "ts", "py"])chunkTypes(array, optional): Filter by code element types (function, class, method, etc.)languages(array, optional): Filter by programming languages
Example:
await mcp.call("search_code", {
query: "database connection with error handling",
limit: 5,
fileTypes: ["js", "ts"],
chunkTypes: ["function"]
});Find code patterns similar to a given snippet with configurable similarity thresholds.
Parameters:
codeSnippet(string): Code snippet to find matches forthreshold(number, optional): Similarity threshold 0.0-1.0 (default: 0.7)limit(number, optional): Maximum results (default: 10)
Comprehensive analysis of code structure, complexity, and patterns.
Parameters:
filePath(string): Path to file for analysisanalysisType(string, optional): "structure", "complexity", "dependencies", "patterns", or "all"codeSnippet(string, optional): Analyze specific code snippet instead of entire file
AI-powered code improvement suggestions based on best practices and patterns in your codebase.
Parameters:
filePath(string): File to analyze for improvementsfocusArea(string, optional): "performance", "readability", "maintainability", "security", or "all"codeSnippet(string, optional): Focus on specific code snippet
๐ Indexing & Management Tools
Start or restart the code indexing process with customizable options.
Parameters:
force(boolean, optional): Force re-indexing of all files (default: false)patterns(array, optional): Specific file patterns to index
Gracefully stop the indexing process and cleanup resources.
Manually index a specific file or directory.
Parameters:
filePath(string): Path to file or directory to indexforce(boolean, optional): Force re-indexing even if already indexed
Get detailed status of the indexing process including queue size, progress, and performance metrics.
โ๏ธ Configuration & Status Tools
Get overall system status including database connection, indexed files count, and performance metrics.
Retrieve current configuration settings and runtime parameters.
Update auto-indexing settings at runtime without restarting the server.
Parameters:
enabled(boolean, optional): Enable/disable auto-indexingwatchPatterns(array, optional): File patterns to watchignorePatterns(array, optional): Patterns to ignoredebounceMs(number, optional): Debounce time for file changesbatchSize(number, optional): Batch size for processingautoStart(boolean, optional): Auto-start indexing on server startup
// Find authentication-related code across your entire codebase
const results = await mcp.call("search_code", {
query: "user login authentication with password validation",
limit: 10,
fileTypes: ["js", "ts", "py"],
chunkTypes: ["function", "class"]
});
// Results will include relevant functions even if they don't contain exact keywords// Get comprehensive analysis of any file
const analysis = await mcp.call("analyze_code", {
filePath: "src/auth/UserService.ts",
analysisType: "all" // structure, complexity, dependencies, patterns
});
// Returns: complexity scores, function signatures, dependencies, suggestions// Discover similar implementations across your codebase
const similar = await mcp.call("find_similar_code", {
codeSnippet: `
async function validateUser(email: string, password: string) {
const user = await User.findByEmail(email);
return bcrypt.compare(password, user.hashedPassword);
}
`,
threshold: 0.75,
limit: 5
});
// Find all similar validation patterns, even in different languages// Get intelligent suggestions for code improvements
const suggestions = await mcp.call("suggest_improvements", {
filePath: "src/utils/helpers.js",
focusArea: "performance" // performance, readability, maintainability, security
});
// Returns specific, actionable improvement recommendations// Configure real-time indexing for your workflow
await mcp.call("configure_auto_indexing", {
enabled: true,
watchPatterns: ["**/*.{ts,js,py}", "src/**/*.php"],
ignorePatterns: ["node_modules/**", "*.test.js"],
debounceMs: 500 // Index changes after 500ms of inactivity
});
// Your code intelligence stays up-to-date automaticallygraph TD
A[IDE Startup] --> B[Auto-Index Enabled?]
B -->|Yes| C[Scan Existing Files]
B -->|No| D[Wait for Manual Trigger]
C --> E[File Change Detected]
D --> E
E --> F[Check File Hash]
F -->|Hash Changed| G[Language Detection]
F -->|Hash Same| H[Skip Processing]
G --> I[AST Parsing]
I --> J[Symbol Extraction]
J --> K[Smart Chunking]
K --> L[Embedding Generation]
L --> M[Vector Storage with Hash]
M --> N[Call Graph Building]
O[Search Query] --> P[Query Embedding]
P --> Q[Vector Similarity Search]
Q --> R[Cosine Similarity Ranking]
R --> S[Response with Context]
style A fill:#e1f5fe
style C fill:#f3e5f5
style F fill:#fff3e0
style H fill:#ffebee
style R fill:#e8f5e8
๐ CodeIntelligenceEngine - The orchestration layer
- Coordinates all processing components
- Manages indexing lifecycle and state
- Handles concurrent operations and queuing
- Provides unified API for all operations
- Hash-based duplicate detection to avoid re-processing unchanged files
๐ CodeParser & AST Analysis - Deep code understanding
- Multi-language AST parsing using Babel and language-specific parsers
- Symbol extraction for functions, classes, variables, imports
- Complexity calculation based on cyclomatic complexity algorithms
- Smart chunking that respects code boundaries and semantic meaning
- Content hashing for change detection and incremental updates
๐ฏ Embedding Providers - Flexible AI backends
- OpenAI: High-quality embeddings with text-embedding-3-* models
- Google Vertex AI: Enterprise-grade embeddings with textembedding-gecko
- Ollama: Privacy-first local embeddings with nomic-embed-text
- Extensible: Easy to add new providers via the EmbeddingProvider interface
๐พ Vector Database Integration - Scalable storage
- Qdrant integration with automatic collection management
- Rich metadata storage for filtering and context
- Cosine similarity search with configurable distance metrics
- Hash-based tracking to prevent duplicate embeddings
- Batch operations for optimal performance
๐๏ธ File System Monitoring - Real-time intelligence
- Chokidar-based file watching with efficient change detection
- Auto-indexing on startup for existing files (configurable)
- Debounced processing to handle rapid file changes
- Pattern-based filtering with glob support
- Queue management for handling concurrent changes
Each code element is processed into specialized chunks with rich metadata:
| Chunk Type | Description | Metadata Included |
|---|---|---|
file |
Complete file content (smartly truncated) | Language, complexity, imports, exports |
class |
Individual class definitions | Methods, properties, inheritance, complexity |
function |
Standalone functions and methods | Parameters, return type, calls made, complexity |
function-part |
Sub-chunks of large functions (>2000 chars) | Parent function, logical boundaries, context |
method |
Class methods with context | Class context, access modifiers, overrides |
variable |
Variable declarations and assignments | Type, scope, usage patterns |
import |
Import/require statements | Dependencies, module relationships |
The system builds comprehensive dependency graphs by:
- Static Analysis: Parse function calls, method invocations, class instantiations
- Cross-file Resolution: Link calls to definitions across the codebase
- Relationship Mapping: Build bidirectional "calls" and "called-by" relationships
- Depth-limited Traversal: Find related code with configurable depth limits
All setup scripts are organized in the scripts/ folder for easy access:
scripts/
โโโ setup.sh # Complete automated setup (Linux/Mac)
โโโ setup-ollama.sh # Install and configure Ollama
โโโ setup-ollama.bat # Install Ollama (Windows)
โโโ setup-qdrant.sh # Set up Qdrant vector database
โโโ setup-qdrant.bat # Set up Qdrant (Windows)
โโโ start-server.bat # Start server (Windows)For production environments, use our Docker setup:
# Start everything with Docker Compose
docker-compose up -d
# The server will be available with Qdrant pre-configuredGoogle Cloud Platform
# Set up Vertex AI embeddings
export EMBEDDING_PROVIDER=google-vertex
export EMBEDDING_MODEL=textembedding-gecko@003
export GOOGLE_CLOUD_PROJECT=your-project-id
# Deploy to Cloud Run
gcloud run deploy mcp-codeintel \
--source . \
--platform managed \
--region us-central1 \
--allow-unauthenticatedAWS Deployment
# Use OpenAI embeddings
export EMBEDDING_PROVIDER=openai
export EMBEDDING_MODEL=text-embedding-3-small
export OPENAI_API_KEY=your-api-key
# Deploy using AWS Lambda or ECS
# See deployment guides in /docsFor large codebases (10,000+ files), optimize these settings:
# High-performance configuration
export BATCH_SIZE=50
export DEBOUNCE_MS=2000
export MAX_CHUNK_SIZE=4000
export FUNCTION_CHUNK_THRESHOLD=3000
# Enable performance features
export CALCULATE_COMPLEXITY=true
export ENABLE_CALL_GRAPH=true
export MAX_RELATED_DEPTH=3// Get detailed system metrics
const status = await mcp.call("get_status");
console.log(`Indexed files: ${status.indexedFiles}`);
console.log(`Vector count: ${status.vectorCount}`);
console.log(`Memory usage: ${status.memoryUsage}`);
// Monitor indexing performance
const indexStatus = await mcp.call("get_indexing_status");
console.log(`Queue size: ${indexStatus.queueSize}`);
console.log(`Processing rate: ${indexStatus.filesPerMinute} files/min`);mcp-codeintel/
โโโ ๐ src/
โ โโโ ๐ analysis/ # Call graph and dependency analysis
โ โโโ โ๏ธ config/ # Configuration management
โ โโโ ๐ฏ core/ # Core engine and orchestration
โ โโโ ๐ค embeddings/ # Embedding provider implementations
โ โโโ ๐ parsers/ # Code parsing and AST analysis
โ โโโ ๐ ๏ธ tools/ # MCP tool implementations
โ โโโ ๐ types/ # TypeScript type definitions
โ โโโ ๐พ vector/ # Vector database integration
โ โโโ ๐๏ธ watchers/ # File system monitoring
โโโ ๐ฆ dist/ # Compiled JavaScript output
โโโ ๐ณ docker/ # Docker configuration
โโโ ๐ scripts/ # Setup and utility scripts
โโโ ๐ docs/ # Additional documentation
# Development setup
git clone https://github.com:duonglabs/mcp-codeintel.git
cd mcp-codeintel
npm install
# Development with hot reload
npm run dev
# Build for production
npm run build
# Run comprehensive tests
npm test
# Code quality checks
npm run lint
npm run formatExtend the system to support additional programming languages:
1. Language Detection
// src/parsers/LanguageDetector.ts
private static extensionMap: Record<string, string> = {
'.kt': 'kotlin',
'.swift': 'swift',
'.dart': 'dart',
// Add your language extension
};2. AST Parser Implementation
// src/parsers/CodeParser.ts
private async parseKotlin(filePath: string, content: string): Promise<CodeChunk[]> {
const chunks: CodeChunk[] = [];
// Implement language-specific parsing logic
// Extract classes, functions, variables, etc.
return chunks;
}3. Symbol Tracking
// src/parsers/SymbolTracker.ts
private static extractKotlinCalls(lines: string[]): FunctionCall[] {
const calls: FunctionCall[] = [];
// Extract function calls, method invocations
// Pattern matching for language-specific syntax
return calls;
}Integrate additional AI services for embeddings:
Provider Implementation
// src/embeddings/HuggingFaceProvider.ts
export class HuggingFaceEmbeddingProvider implements EmbeddingProvider {
constructor(private config: HuggingFaceConfig) {}
async embed(text: string): Promise<EmbeddingResult> {
// Implement API calls to HuggingFace
// Handle rate limiting, errors, etc.
return { embedding: vector, model: this.config.model };
}
async embedBatch(texts: string[]): Promise<EmbeddingResult[]> {
// Batch processing implementation
}
}Factory Registration
// src/embeddings/EmbeddingFactory.ts
export class EmbeddingFactory {
static create(config: EmbeddingConfig): EmbeddingProvider {
switch (config.provider) {
case 'huggingface':
return new HuggingFaceEmbeddingProvider(config);
// ... other providers
}
}
}# Unit tests for individual components
npm run test:unit
# Integration tests with real databases
npm run test:integration
# End-to-end tests with sample codebases
npm run test:e2e
# Performance benchmarks
npm run test:performanceExtend the system with custom metrics collection:
// src/analytics/MetricsCollector.ts
export class MetricsCollector {
trackSearchQuery(query: string, resultCount: number, latency: number) {
// Custom analytics implementation
}
trackIndexingPerformance(fileCount: number, duration: number) {
// Performance monitoring
}
}We welcome contributions from developers of all skill levels! Whether you're fixing bugs, adding features, improving documentation, or sharing ideas, your contributions make this project better.
- ๐ Bug Reports: Found an issue? Open a bug report
- โจ Feature Requests: Have an idea? Suggest a feature
- ๐ Documentation: Improve our docs, add examples, or write tutorials
- ๐ง Code: Fix bugs, add features, or optimize performance
- ๐งช Testing: Add tests, improve coverage, or test on different platforms
- ๐ Language Support: Add parsers for new programming languages
-
Fork & Clone
git clone https://github.com:duonglabs/mcp-codeintel.git cd mcp-codeintel npm install -
Create Feature Branch
git checkout -b feature/amazing-new-feature
-
Make Changes & Test
npm run build npm test npm run lint -
Commit & Push
git commit -m "feat: add amazing new feature" git push origin feature/amazing-new-feature -
Open Pull Request
- Use our PR template
- Link related issues
- Describe your changes clearly
- Code Style: We use Prettier and ESLint - run
npm run formatandnpm run lint - Commit Messages: Follow Conventional Commits
- Testing: Add tests for new features, ensure existing tests pass
- Documentation: Update README and docs for user-facing changes
- TypeScript: Use proper types, avoid
anywhen possible
๐ฅ High Priority
- Language Parsers: Add support for Kotlin, Swift, Dart, Scala
- Performance: Optimize indexing speed and memory usage
- Error Handling: Improve resilience and error recovery
- Testing: Increase test coverage, add integration tests
๐ Medium Priority
- Embedding Providers: Add HuggingFace, Cohere, local transformers
- Vector Databases: Support for Pinecone, Weaviate, Chroma
- Monitoring: Add metrics, health checks, observability
- Documentation: API docs, tutorials, deployment guides
๐ก Ideas Welcome
- IDE Integrations: VS Code extension, IntelliJ plugin
- UI Dashboard: Web interface for code exploration
- Advanced Analytics: Code quality metrics, technical debt analysis
- Collaboration: Multi-user features, team insights
Contributors are recognized in:
- ๐ CONTRIBUTORS.md - Hall of fame for all contributors
- ๐ Release notes for significant contributions
- ๐ท๏ธ GitHub contributor badges and stats
- ๐ฌ Shoutouts in community discussions
- ๐ฌ Discussions: GitHub Discussions for questions and ideas
- ๐ Issues: GitHub Issues for bugs and feature requests
- ๐ง Direct Contact: hi.duonglabs@gmail.com for sensitive topics
Read our full Contributing Guide for detailed information.
Tested on a variety of codebases to ensure production-ready performance:
| Codebase Size | Files | Indexing Time | Memory Usage | Search Latency |
|---|---|---|---|---|
| Small (React app) | ~500 files | 2-3 minutes | ~50MB | <50ms |
| Medium (Express API) | ~2,000 files | 8-12 minutes | ~150MB | <75ms |
| Large (Enterprise) | ~10,000 files | 30-45 minutes | ~500MB | <100ms |
| Huge (Monorepo) | ~50,000 files | 2-3 hours | ~2GB | <150ms |
For Large Codebases (10,000+ files)
# Optimize batch processing
export BATCH_SIZE=50
export DEBOUNCE_MS=2000
# Use selective indexing
export WATCH_PATTERNS="src/**/*.{js,ts},lib/**/*.py"
export IGNORE_PATTERNS="node_modules/**,vendor/**,*.min.js,dist/**"
# Enable performance features
export CALCULATE_COMPLEXITY=false # Disable if not needed
export MAX_CHUNK_SIZE=2000 # Smaller chunks for faster processingFor Memory-Constrained Environments
# Reduce memory footprint
export BATCH_SIZE=10
export MAX_CHUNK_SIZE=1500
export FUNCTION_CHUNK_THRESHOLD=1000
# Disable resource-intensive features
export ENABLE_CALL_GRAPH=false
export DEBUG_PARSING=false- Horizontal Scaling: Deploy multiple instances with load balancing
- Selective Indexing: Index only critical directories and file types
- Incremental Updates: Leverage auto-indexing for minimal re-processing
- Caching: Use Redis for embedding cache in distributed setups
- Local Processing: With Ollama, your code never leaves your machine
- No Code Storage: Only vector embeddings are stored, not source code
- Secure Communication: All API calls use HTTPS/TLS encryption
- Access Control: Configurable file patterns and access restrictions
- Rate Limiting: Built-in protection against API abuse
- Input Validation: Comprehensive sanitization of all inputs
- Error Sanitization: No sensitive information in error messages
- Key Management: Secure handling of API keys and credentials
For maximum privacy, use the local-only configuration:
# Complete local setup - no external API calls
export EMBEDDING_PROVIDER=ollama
export EMBEDDING_MODEL=nomic-embed-text
export QDRANT_URL=http://localhost:6333
# Your code intelligence runs entirely on your machine๐ Code Exploration & Onboarding
Scenario: New developer joining a large codebase
// Find all authentication-related code
await mcp.call("search_code", {
query: "user authentication login session management",
limit: 20
});
// Understand the payment processing flow
await mcp.call("search_code", {
query: "payment processing stripe checkout billing",
chunkTypes: ["function", "class"]
});
// Find similar error handling patterns
await mcp.call("find_similar_code", {
codeSnippet: "try { ... } catch (error) { logger.error(...) }",
threshold: 0.7
});๐ Bug Investigation & Debugging
Scenario: Investigating a production issue
// Find all code that handles user sessions
await mcp.call("search_code", {
query: "session timeout expiration cleanup",
fileTypes: ["js", "ts"]
});
// Analyze complexity of suspicious functions
await mcp.call("analyze_code", {
filePath: "src/auth/SessionManager.js",
analysisType: "complexity"
});
// Find similar timeout handling across the codebase
await mcp.call("find_similar_code", {
codeSnippet: "setTimeout(() => { session.destroy() }, timeout)",
threshold: 0.6
});โป๏ธ Refactoring & Code Quality
Scenario: Improving code quality and removing duplication
// Find all database connection patterns
await mcp.call("search_code", {
query: "database connection pool mysql postgres",
chunkTypes: ["function"]
});
// Get improvement suggestions for a complex file
await mcp.call("suggest_improvements", {
filePath: "src/services/UserService.js",
focusArea: "maintainability"
});
// Find duplicate validation logic
await mcp.call("find_similar_code", {
codeSnippet: "function validateEmail(email) { return /^[^@]+@[^@]+$/.test(email); }",
threshold: 0.8
});This project is licensed under the MIT License - see the LICENSE file for details.
- โ Commercial Use: Use in commercial projects and products
- โ Modification: Modify and adapt the code for your needs
- โ Distribution: Share and distribute the software
- โ Private Use: Use privately without restrictions
- โ Attribution: Include the original license and copyright notice
This project stands on the shoulders of giants. Special thanks to:
- Model Context Protocol - For the excellent MCP specification and ecosystem
- Qdrant - For the powerful and efficient vector database
- Ollama - For making local AI accessible and privacy-friendly
- Babel - For robust JavaScript/TypeScript AST parsing
- OpenAI - For high-quality embedding models
- Google Cloud AI - For enterprise-grade Vertex AI embeddings
- Nomic AI - For the excellent nomic-embed-text model
- TypeScript - For type safety and developer experience
- Chokidar - For efficient file system watching
- Jest - For comprehensive testing framework
A huge thank you to all our contributors who have helped make this project better:
- JohnnyDao - Project creator and maintainer. A passionate developer who loves building AI agents and applications that make life easier and more productive.
Want to see your name here? Contribute to the project!
This project was inspired by:
- Code search tools like GitHub's semantic search and Sourcegraph
- Academic research on code embeddings and program analysis
- Developer pain points in understanding large codebases
- The vision of AI-assisted software development
- ๐ Documentation: Wiki - Comprehensive guides and tutorials
- ๐ Bug Reports: Issues - Report bugs and request features
- ๐ญ Discussions: GitHub Discussions - Ask questions and share ideas
- ๐ง Direct Contact: hi.duonglabs@gmail.com - For sensitive or private inquiries
- ๐ฆ Twitter: @JohnnyDao_Dev - Updates and announcements
- ๐ผ LinkedIn: JohnnyDao - Professional updates
- ๐ฎ Discord: Join our server - Real-time chat and support
- ๐บ YouTube: JohnnyDao AI - Tutorials and demos
- ๐ Roadmap - See our exciting future plans and intelligent features
- ๐ท๏ธ Releases - Download the latest version
- ๐ Changelog - See what's new in each version
- ๐ค Contributing - Join our development community
How does this compare to GitHub Copilot or other AI coding tools?
MCP Code Intelligence focuses on understanding and searching existing code, while Copilot generates new code. We're complementary tools - use MCP to understand your codebase, then use Copilot to write new features based on that understanding.
Can I use this with private/proprietary code?
Absolutely! With Ollama, everything runs locally - your code never leaves your machine. Even with cloud providers, only vector embeddings (not source code) are sent to external services.
What's the difference between this and traditional code search?
Traditional search matches keywords exactly. MCP Code Intelligence understands meaning - search for "user authentication" and find relevant code even if it uses terms like "login", "credentials", or "auth tokens".
How accurate is the code analysis?
Our AST-based parsing is highly accurate for supported languages. Symbol tracking and call graph analysis use heuristics and may miss some dynamic calls, but work well for typical codebases (90%+ accuracy in our testing).
Made with โค๏ธ by JohnnyDao and the open source community
Empowering developers with AI-powered code intelligence since 2026
If this project has helped you build better software, consider supporting its continued development:
Your support helps maintain this project and develop new features that benefit the entire developer community!