Hyperion is an AI-powered code indexing and analysis platform with MCP integration for Claude Code.
This is the fastest verified path from source code to a running Hyperion instance.
- Go 1.21+
- Node.js 18+
- Docker + Docker Compose
git clone https://github.com/HyperionWave-AI/dev-squad.git
cd dev-squad
make installcd hyper
go build -tags dev -o ../bin/hyper ./cmd/coordinator
cd ..
export HYPER_BIN="$(pwd)/bin/hyper"
$HYPER_BIN --helpUse hyper init in the folder you want Hyperion to manage.
mkdir -p ~/hyperion-demo
cd ~/hyperion-demo
$HYPER_BIN init -provider ollamahyper init creates:
docker-compose.yml.env.hyperlitellm.config.yamlHYPER_README.md
docker compose up -d
docker compose logs -f ollama-pull$HYPER_BIN --mode=http- Web UI: http://localhost:7095
- Health endpoint: http://localhost:7095/api/v1/health
curl http://localhost:7095/api/v1/healthFor provider-specific setup, see HYPER_README.md generated by hyper init or docs/setup/HYPER_INIT_WITH_PROVIDER.md.
For a focused local setup with Ollama + Qwen Coder, see docs/setup/QUICK_START.md.
Hyper also includes a native desktop shell built with Tauri in desktop-app/.
The desktop shell starts the local hyper backend as a sidecar process and opens the existing UI automatically.
- Rust toolchain (
rustup,cargo) - Tauri CLI (
cargo install tauri-cli) - Platform dependencies for Tauri (WebKitGTK on Linux, Xcode CLT on macOS, WebView2 on Windows)
make desktopmake desktop-buildCross-platform example:
make desktop-build PLATFORMS="macos-arm64 windows-amd64 linux-amd64"Bundle outputs are under:
- macOS:
desktop-app/src-tauri/target/<target>/release/bundle/macos/ - Windows:
desktop-app/src-tauri/target/<target>/release/bundle/msi/ - Linux:
desktop-app/src-tauri/target/<target>/release/bundle/appimage/
Hyperion (codebase: hyper) is a unified AI-powered code analysis and coordination platform that integrates with Claude Code via the Model Context Protocol (MCP). It provides intelligent code indexing, semantic search, and AI-assisted development workflows through a single Go binary with multiple runtime modes.
- Single unified binary (
hyper) with three runtime modes - AI-powered code understanding via embeddings and vector search
- Claude Code integration through MCP stdio protocol
- REST API + Web UI for standalone use
- Real-time file watching and automatic code indexing
- Multi-embedding support (Ollama, OpenAI, Voyage, TEI)
┌─────────────────────────────────────────────────────────────┐
│ Hyperion (hyper binary) │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ HTTP Mode │ │ MCP Mode │ │ Both Mode │ │
│ │ (REST + UI) │ │ (stdio) │ │ (default) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ Port 7095 Claude Code Both Active │
│ Web Browser Integration │
│ │
├─────────────────────────────────────────────────────────────┤
│ Core Services │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Code Indexing & Analysis │ │
│ │ • File watcher (fsnotify) │ │
│ │ • Code parser & tokenizer │ │
│ │ • Semantic indexing │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Embedding & Vector Search │ │
│ │ • Multiple embedding providers │ │
│ │ • Qdrant vector database │ │
│ │ • Semantic similarity search │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ AI Integration │ │
│ │ • LangChain integration │ │
│ │ • Tool definitions (JSON Schema) │ │
│ │ • MCP protocol handlers │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Storage Layer │ │
│ │ • MongoDB (metadata, tasks, history) │ │
│ │ • Qdrant (vector embeddings) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
hyper/
├── cmd/
│ └── coordinator/ # Unified binary entry point
│ └── main.go # --mode flag: http|mcp|both
│
├── internal/
│ ├── server/ # HTTP server (Gin framework)
│ │ ├── routes.go # REST API endpoints
│ │ ├── handlers/ # HTTP request handlers
│ │ └── middleware/ # CORS, auth, logging
│ │
│ ├── mcp/ # Model Context Protocol
│ │ ├── handlers/ # MCP tool implementations
│ │ ├── storage/ # MongoDB + Qdrant clients
│ │ ├── embeddings/ # Embedding providers
│ │ ├── indexer/ # Code indexing logic
│ │ ├── watcher/ # File watching
│ │ └── protocol.go # MCP protocol handling
│ │
│ ├── ai-service/
│ │ ├── tools/ # Tool definitions
│ │ └── llm/ # LLM integrations
│ │
│ └── middleware/ # Shared middleware
│
├── embed/ # Embedded UI (auto-generated)
│ └── ui/ # Built React UI
│
├── go.mod # Go dependencies
├── Makefile # Build targets
└── .archived/ # Archived redundant binaries
├── cmd/bridge/
├── cmd/mcp-server/
├── cmd/indexer/
└── cmd/hyper/
coordinator/
└── ui/ # React UI source
├── src/
│ ├── components/ # React components
│ ├── pages/ # Page components
│ ├── services/ # API clients
│ └── App.tsx # Main app
├── dist/ # Built UI (auto-generated)
└── package.json
| Component | Technology | Purpose |
|---|---|---|
| Framework | Gin Web Framework | HTTP server & routing |
| Protocol | MCP Go SDK | Claude Code integration |
| Database | MongoDB | Metadata, tasks, history |
| Vector DB | Qdrant | Semantic search |
| File Watching | fsnotify | Real-time file monitoring |
| Embeddings | Multiple providers | Vector generation |
| Logging | Uber Zap | Structured logging |
| LLM Chain | LangChain Go | AI orchestration |
| JWT | golang-jwt | Authentication |
| WebSocket | Gorilla WebSocket | Real-time updates |
| Component | Technology | Purpose |
|---|---|---|
| Framework | React 18+ | UI library |
| Build Tool | Vite | Fast bundling |
| Styling | TBD | UI styling |
| API Client | Fetch/Axios | REST API communication |
| State | TBD | State management |
| Provider | Model | Use Case |
|---|---|---|
| Ollama (Recommended) | nomic-embed-text | Local, GPU-accelerated, privacy-first (default) |
| OpenAI | text-embedding-3-small | Cloud-based, high quality |
| Voyage AI | voyage-3 | Specialized embeddings |
| TEI | Custom models | Self-hosted embeddings |
Recommendation: We strongly recommend using Ollama for embeddings due to:
- Privacy: All code stays on your machine
- Cost: No API fees or rate limits
- Performance: GPU-accelerated local processing
- Offline: Works without internet connection
- Quality: Nomic-embed-text provides excellent code embeddings
| Component | Purpose |
|---|---|
| MongoDB Atlas | Cloud database |
| Qdrant Cloud | Managed vector database |
| Docker | Containerization |
| Docker Compose | Local development |
- Real-time file watching using fsnotify
- Automatic code parsing and tokenization
- Semantic indexing with embeddings
- Incremental updates for performance
- Multi-language support (Go, Python, JavaScript, etc.)
- Vector-based similarity search via Qdrant
- Code snippet retrieval by semantic meaning
- Context-aware search using embeddings
- Filtering and ranking capabilities
- stdio protocol for direct Claude integration
- Tool definitions in JSON Schema format
- Real-time code analysis from Claude
- Bi-directional communication with Claude Code
- RESTful endpoints for all operations
- React-based web interface on port 7095
- Real-time updates via WebSocket
- Authentication via JWT tokens
- CORS support for cross-origin requests
- Recursive directory monitoring
- Automatic re-indexing on file changes
- Batch processing for efficiency
- Configurable watch patterns
- LangChain integration for AI workflows
- Tool calling for structured AI interactions
- Prompt templates for consistent outputs
- Token counting and cost estimation
./bin/hyper --mode=http- REST API on port 7095
- Web UI embedded in binary
- Standalone operation without Claude
- Use case: Standalone code analysis tool
./bin/hyper --mode=mcp- stdio protocol for Claude Code
- No HTTP server running
- Direct Claude integration
- Use case: Claude Code plugin
./bin/hyper --mode=both
./bin/hyper # Default- HTTP server on port 7095
- MCP stdio for Claude Code
- Both interfaces active simultaneously
- Use case: Full-featured development environment
# MongoDB
MONGODB_URI="mongodb+srv://user:pass@cluster.mongodb.net"
MONGODB_DATABASE="coordinator_db1"
# Qdrant Vector Database
QDRANT_URL="https://qdrant-instance.com"
QDRANT_KNOWLEDGE_COLLECTION="dev_squad_knowledge"
# Embedding Provider (ollama|openai|voyage|tei)
EMBEDDING="ollama"
# Ollama Configuration
OLLAMA_URL="http://localhost:11434"
OLLAMA_MODEL="nomic-embed-text"
# OpenAI Configuration
OPENAI_API_KEY="sk-..."
OPENAI_MODEL="text-embedding-3-small"
# Voyage AI Configuration
VOYAGE_API_KEY="pa-..."
# Server Configuration
PORT="7095"
LOG_LEVEL="info"
# Code Indexing
CODE_INDEX_AUTO_RECREATE="false"- Location:
.env.hyper(in executable directory or current directory) - Priority: Custom config path > executable dir > current dir
- Format: Standard
.envformat
POST /api/index/scan- Scan directory for codeGET /api/index/status- Get indexing statusDELETE /api/index/clear- Clear all indexed code
POST /api/search/semantic- Semantic code searchGET /api/search/results/:id- Get search results
GET /api/code/:fileId- Get code filePOST /api/analyze- Analyze code snippetGET /api/dependencies/:fileId- Get file dependencies
GET /api/tasks- List tasksPOST /api/tasks- Create taskGET /api/history- Get operation history
POST /api/mcp/tools- List available toolsPOST /api/mcp/execute- Execute MCP tool
# Build unified binary with embedded UI
make native
# Development with hot reload
make dev-hot
# Run tests
make test- Binary:
bin/hyper(~16MB with embedded UI) - Platforms: Linux, macOS, Windows
- Embedded: React UI included in binary
# Build Docker image (release Dockerfile)
docker build -f Dockerfile.release -t hyperion:latest .
# Run with Docker Compose
docker-compose up
# Run container
docker run -p 7095:7095 \
-e MONGODB_URI="..." \
-e QDRANT_URL="..." \
hyperion:latestGitHub releases and multi-platform Docker images are automated via:
.github/workflows/release.yml(triggered by pushing tags likev1.2.3)- Published image:
ghcr.io/<owner>/<repo>:<tag>
- Analyze code changes with AI
- Get semantic understanding of code
- Identify patterns and issues
- Use as Claude Code plugin
- Real-time code analysis in Claude
- Semantic search from Claude
- Find similar code patterns
- Discover related files
- Navigate large codebases
- Auto-generate docs from code
- Create API documentation
- Generate architecture diagrams
- Detect code smells
- Identify refactoring opportunities
- Enforce coding standards
- Index project knowledge
- Store architectural decisions
- Maintain code documentation
# Install dependencies
make install
# Install Air for hot reload
make install-air
# Configure environment
cp .env.example .env.hyper# Start with hot reload (Go + UI)
make dev-hot
# Or just Go hot reload
make dev
# Run tests
make test
# Build for distribution
make native# Run all tests
make test
# Run specific test
go test ./internal/mcp/handlers -v
# Test with coverage
go test -cover ./...- Location:
internal/mcp/indexer/ - Purpose: Parse and index code files
- Features:
- Language detection
- Token extraction
- Function/class identification
- Dependency analysis
- Location:
internal/mcp/embeddings/ - Purpose: Generate vector embeddings
- Providers:
- Ollama (local, GPU)
- OpenAI (cloud)
- Voyage AI (specialized)
- TEI (self-hosted)
- Location:
internal/mcp/storage/ - Components:
- MongoDB client (metadata)
- Qdrant client (vectors)
- Collection management
- Query builders
- Location:
internal/mcp/handlers/ - Purpose: Implement MCP tools
- Tools:
- Code analysis
- Search
- Indexing
- File operations
- Location:
internal/server/ - Framework: Gin Web Framework
- Features:
- RESTful routing
- Middleware (CORS, auth)
- Error handling
- Request validation
- Speed: ~1000 files/second (depends on file size)
- Memory: ~100MB for 10K files
- Storage: ~1MB per 1000 files (metadata)
- Latency: <100ms for semantic search
- Throughput: 100+ queries/second
- Accuracy: High (vector-based similarity)
- Response Time: <50ms for most endpoints
- Throughput: 1000+ requests/second
- Concurrency: Fully concurrent
- JWT tokens for API access
- Token expiration and refresh
- Role-based access control (RBAC)
- Encryption in transit (HTTPS)
- Encryption at rest (MongoDB)
- API key management for external services
- Local indexing option (Ollama)
- No code sent to external services (unless configured)
- Configurable data retention
Problem: Switching embedding models causes dimension mismatch Solution:
# Auto-recreate collection
export CODE_INDEX_AUTO_RECREATE=true
./bin/hyper --mode=http
# Or manually confirm when promptedProblem: Cannot connect to MongoDB Solution:
# Verify connection string
echo $MONGODB_URI
# Test connection
mongosh "$MONGODB_URI"Problem: Cannot connect to Qdrant Solution:
# Check Qdrant health
curl https://your-qdrant-url/health
# Verify URL in config
echo $QDRANT_URLProblem: Embedding generation fails Solution:
# For Ollama (Recommended): ensure service is running
brew services start ollama # macOS
# or
systemctl start ollama # Linux with systemd
# Pull the model if not already available
ollama pull nomic-embed-text
# Test embedding generation
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "test"
}'
# For OpenAI: verify API key
echo $OPENAI_API_KEY
# For Voyage AI: verify API key
echo $VOYAGE_API_KEYSee the Using Ollama for Embeddings section for detailed setup and troubleshooting.
- Unified binary architecture
- HTTP + MCP modes
- Code indexing
- Vector search
- REST API
- Web UI
- MongoDB integration
- Qdrant integration
- Multiple embedding providers
- File watching
- MCP protocol support
- Advanced code analysis
- Refactoring suggestions
- Architecture visualization
- Performance optimization
- Desktop application
- IDE plugins (VS Code, JetBrains)
- Git integration
- CI/CD integration
- Team collaboration features
- Follow Go conventions
- Use
gofmtfor formatting - Add tests for new features
- Document public APIs
# Run all tests
make test
# Run specific package
go test ./internal/mcp/handlers -v
# With coverage
go test -cover ./...# Clean build
make clean && make native
# Verify binary
./bin/hyper --versionThis project is licensed under the MIT License. See LICENSE for details.
- README: This file (comprehensive overview)
- OLLAMA_SETUP_GUIDE.md: Ollama installation and embedding model selection
- CLEAN_INSTALL_GUIDE.md: Clean installation guide
- CLEAN_INSTALL_COMPLETE.md: Clean install implementation details
- CLEANUP_COMPLETE.md: Build system details
- MAKEFILE_CLEANUP_SUMMARY.md: Makefile reference
- Check troubleshooting section above
- Review environment variables
- Check logs for errors
Use the Install and Try (First) section at the top of this README.
If Hyperion is already installed and initialized, run:
docker compose up -d
hyper --mode=http
# or: /path/to/dev-squad/bin/hyper --mode=httpThen open:
- Web UI: http://localhost:7095
- Health: http://localhost:7095/api/v1/health
Ollama is our recommended embedding provider for Hyperion because it offers:
- 🔒 Privacy-First: Your code never leaves your machine
- 💰 Zero Cost: No API fees or rate limits
- ⚡ Fast: GPU-accelerated local processing
- 📴 Offline: Works without internet connection
- 🎯 High Quality: Nomic-embed-text model is optimized for code embeddings
- 🚀 Easy Setup: Simple installation and configuration
# Install via Homebrew
brew install ollama
# Start Ollama service
brew services start ollama
# Verify installation
ollama --version# Install via curl
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama service
ollama serve &
# Verify installation
ollama --version# Download installer from https://ollama.com/download
# Run the installer and follow instructions
# Ollama will start automatically# Pull the recommended nomic-embed-text model
ollama pull nomic-embed-text
# Verify model is available
ollama listExpected output:
NAME ID SIZE MODIFIED
nomic-embed-text:latest a80c4f17acd5 274MB 2 minutes ago
Edit your .env.hyper file:
# Embedding Provider Configuration
EMBEDDING=ollama
# Ollama Configuration
OLLAMA_URL=http://localhost:11434
OLLAMA_MODEL=nomic-embed-text
# Embedding Dimensions (must match model)
EMBEDDING_DIMENSION=768# Test Ollama API
curl http://localhost:11434/api/tags
# Test embedding generation
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "test code snippet"
}'# Build and run
make native
./bin/hyper --mode=http
# Or with hot reload
make dev-hot# Start Hyperion with Ollama
./bin/hyper --mode=http
# Open browser to http://localhost:7095
# Navigate to Code Search
# Click "Add Folder" and select your project directory
# Ollama will generate embeddings locally# Search for authentication code
curl -X POST http://localhost:7095/api/search/semantic \
-H "Content-Type: application/json" \
-d '{
"query": "user authentication with JWT tokens",
"limit": 10
}'package main
import (
"github.com/ollama/ollama/api"
)
func main() {
client, _ := api.ClientFromEnvironment()
// Generate embeddings for code
req := &api.EmbeddingRequest{
Model: "nomic-embed-text",
Prompt: "function calculateTotal(items) { return items.reduce((a,b) => a+b, 0); }",
}
resp, _ := client.Embeddings(context.Background(), req)
// resp.Embedding contains 768-dimensional vector
}# Try other embedding models
ollama pull mxbai-embed-large # 335M params, 1024 dimensions
ollama pull all-minilm # 22M params, 384 dimensions
# Update .env.hyper
OLLAMA_MODEL=mxbai-embed-large
EMBEDDING_DIMENSION=1024Ollama automatically uses GPU if available:
# Check GPU usage
nvidia-smi # NVIDIA GPUs
# or
metal-smi # Apple Silicon# Adjust Ollama settings for performance
# In ~/.ollama/config.json
{
"num_gpu": 1, # Number of GPUs to use
"num_thread": 8, # CPU threads
"num_parallel": 4 # Parallel requests
}# macOS
brew services restart ollama
# Linux
killall ollama && ollama serve &
# Check status
curl http://localhost:11434/api/tags# Re-pull the model
ollama pull nomic-embed-text
# Verify it's available
ollama list# Hyperion will detect dimension mismatch and offer to recreate collection
# Or manually set auto-recreate:
export CODE_INDEX_AUTO_RECREATE=true
./bin/hyper --mode=http# Ensure GPU is being used
ollama ps # Should show GPU memory usage
# If CPU-only, check GPU drivers
nvidia-smi # NVIDIA
system_profiler SPDisplaysDataType # macOS# 1. Install and configure Ollama (see above)
# 2. Update .env.hyper
EMBEDDING=ollama
OLLAMA_URL=http://localhost:11434
OLLAMA_MODEL=nomic-embed-text
EMBEDDING_DIMENSION=768
# 3. Recreate code index (dimensions changed from 1536 to 768)
export CODE_INDEX_AUTO_RECREATE=true
./bin/hyper --mode=http
# 4. Re-index your code
# The system will automatically use Ollama for new embeddings# Similar process, just update EMBEDDING variable
EMBEDDING=ollama
# Rest of the steps are the same| Provider | Cost per 1M tokens | Notes |
|---|---|---|
| Ollama | FREE | Unlimited local usage |
| OpenAI | ~$0.13 | text-embedding-3-small |
| Voyage AI | ~$0.12 | voyage-3 |
| TEI | Infrastructure costs | Self-hosted |
For a typical medium-sized codebase (10K files), you might generate 50M tokens of embeddings, which would cost ~$6.50 with cloud providers but is completely free with Ollama.
- One executable with all features
- No separate services needed
- Easy deployment and distribution
- Reduced complexity and maintenance
- Clear separation of concerns
- Pluggable components (embeddings, storage)
- Easy to extend and customize
- Testable architecture
- MongoDB Atlas for scalability
- Qdrant Cloud for vector search
- Docker support for containerization
- Environment-based configuration
Last Updated: February 22, 2026 Project: Hyperion (hyper)