Version: 1.0.0
Last Updated: January 10, 2026
Target Hardware: NVIDIA RTX 2060 (6GB VRAM) with WSL2
Performance Priority: Maximum Performance
This project provides comprehensive GPU acceleration for the OpenCode CLI, transforming it from a CPU-bound AI assistant into a high-performance, GPU-accelerated development environment. By leveraging NVIDIA CUDA capabilities through Docker containers, we achieve dramatic performance improvements across all computationally intensive operations.
| Metric | Before (CPU) | After (GPU) | Improvement |
|---|---|---|---|
| LLM Inference | 10-30 tokens/s | 200-500 tokens/s | 15-50x faster |
| Vector Search | 100-500 queries/s | 5000-20000 queries/s | 10-100x faster |
| Embeddings | 10-20 docs/s | 100-500 docs/s | 10-25x faster |
| Code Search | 10K-50K lines/s | 50K-200K lines/s | 2-5x faster |
| Token Counting | 5K-10K tokens/s | 20K-50K tokens/s | 3-5x faster |
| File Processing | 10-50 MB/s | 20-150 MB/s | 2-3x faster |
┌─────────────────────────────────────────────────────────────────────┐
│ OpenCode CLI │
│ (Bun Runtime - 144MB Binary) │
├─────────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐│
│ │ MCP Gateway │ │ Tool Registry │ │ LLM Provider │ │ Session ││
│ │ (Port 8090) │ │ (15+ Tools) │ │ (40+ Models) │ │ Manager ││
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘│
└─────────┼────────────────┼────────────────┼────────────────┼─────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ Docker Compose Layer │
├─────────────────────────────────────────────────────────────────────┤
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Ollama GPU │ │ MCP Agents GPU │ │ vLLM Optimized │ │
│ │ (LLM Inference) │ │ (PyTorch/FAISS) │ │ (Batch Serving) │ │
│ │ • phi3:3.8b │ │ • Embeddings │ │ • High Throughput│ │
│ │ • gemma2:2b │ │ • Vector Search │ │ • Low Latency │ │
│ │ • deepseek-coder │ │ • Tokenization │ │ • Continuous │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
├─────────────────────────────────────────────────────────────────────┤
│ NVIDIA Container Toolkit │
├─────────────────────────────────────────────────────────────────────┤
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ NVIDIA RTX 2060 (6GB) │ │
│ │ ┌─────────────┬─────────────┬─────────────┬─────────────┐ │ │
│ │ │ Ollama │ PyTorch │ FAISS │ CUDA Grep │ │ │
│ │ │ 3-4GB │ 1-1.5GB │ 0.5-1GB │ 0.5GB │ │ │
│ │ └─────────────┴─────────────┴─────────────┴─────────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Before beginning, ensure your system meets these requirements:
# 1. Verify NVIDIA Drivers (Windows)
nvidia-smi
# 2. Verify WSL2 GPU Access
wsl.exe -l -v
wsl.exe nvidia-smi
# 3. Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
# 4. Restart Docker Desktop
# (Restart from Docker Desktop UI or Windows Services)# 1. Clone this repository
git clone https://github.com/your-org/gpu-acceleration-dockerized.git
cd gpu-acceleration-dockerized
# 2. Run setup script
chmod +x SCRIPTS/setup-gpu.sh
./SCRIPTS/setup-gpu.sh
# 3. Verify GPU access
./SCRIPTS/test-gpu.sh
# 4. Start GPU-accelerated services
docker-compose -f CONFIG/docker-compose.ollama.gpu.yml up -d
docker-compose -f CONFIG/docker-compose.agent.gpu.yml up -d
# 5. Update OpenCode configuration
cp CONFIG/opencode.gpu.config.jsonc ~/.config/opencode/opencode.jsoncgpu-acceleration-dockerized/
├── README.md # This file
├── ARCHITECTURE.md # Detailed architecture documentation
├── PREREQUISITES.md # Setup requirements and verification
├── IMPLEMENTATION/
│ ├── PHASE1_Ollama_GPU.md # Ollama GPU configuration
│ ├── PHASE2_MCP_Agents_GPU.md # MCP agent GPU integration
│ ├── PHASE3_Tool_Acceleration.md # Tool GPU acceleration
│ └── PHASE4_Advanced_Opt.md # Advanced optimizations
├── CONFIG/
│ ├── docker-compose.ollama.gpu.yml # Ollama with GPU support
│ ├── docker-compose.agent.gpu.yml # MCP agents with GPU
│ ├── docker-compose.vllm.yml # vLLM batch serving
│ ├── opencode.gpu.config.jsonc # OpenCode GPU config
│ └── nvidia.json # NVIDIA device configuration
├── SCRIPTS/
│ ├── setup-gpu.sh # Main setup script
│ ├── benchmark.sh # Performance benchmarking
│ ├── monitor-gpu.sh # GPU monitoring
│ └── test-gpu.sh # GPU verification tests
├── MODELS/
│ └── RECOMMENDATIONS.md # Model recommendations for RTX 2060
├── MONITORING/
│ ├── METRICS.md # Performance metrics guide
│ └── TROUBLESHOOTING.md # Common issues and solutions
└── BENCHMARKS/
├── RESULTS.md # Benchmark results
└── COMPARISON.md # CPU vs GPU comparison
- Day 1-2: Verify NVIDIA drivers and CUDA in WSL2
- Day 3-4: Configure Ollama with GPU acceleration
- Day 5: Test Ollama models (phi3:3.8b, gemma2:2b)
- Day 6-7: Benchmark LLM inference speed
- Day 1-3: Update RayAI agent Dockerfile with PyTorch GPU
- Day 4-5: Implement FAISS-GPU vector search
- Day 6-7: Test embeddings and similarity search
- Day 1-2: Deploy CUDA grep for code search
- Day 3-4: Implement GPU tokenizer service
- Day 5-6: Update OpenCode configuration
- Day 7: Full integration testing
- Day 1-3: Deploy vLLM alongside Ollama
- Day 4-5: Implement CuPy for batch processing
- Day 6: Performance benchmarking
- Day 7: Documentation and tuning
| Model | Size | VRAM | Speed | Use Case |
|---|---|---|---|---|
| gemma2:2b | 2GB | 2GB | Ultra-Fast | Quick tasks, documentation |
| phi3:3.8b | 4GB | 4GB | Very Fast | Code completion, general |
| llama3.2:3b | 3GB | 3GB | Fast | Balanced performance |
| deepseek-coder:6.7b-q4 | 4GB | 4GB | Fast | Code understanding |
| codellama:7b-q4 | 4GB | 4GB | Fast | Code completion |
See MODELS/RECOMMENDATIONS.md for detailed model comparisons.
# Real-time monitoring
./SCRIPTS/monitor-gpu.sh
# Manual monitoring
watch -n 1 nvidia-smi- OpenCode stats:
opencode stats - Docker stats:
docker stats - Service logs:
docker logs <container_name>
See MONITORING/METRICS.md for detailed metrics.
Common issues and solutions are documented in:
- MONITORING/TROUBLESHOOTING.md - Common problems
- PREREQUISITES.md - Setup verification
# GPU not detected
nvidia-smi
# Docker GPU access failed
sudo systemctl restart docker
# OOM errors
# Reduce model size or GPU memory utilizationDetailed benchmark results are available in:
- BENCHMARKS/RESULTS.md - Raw benchmark data
- BENCHMARKS/COMPARISON.md - CPU vs GPU analysis
- Fork the repository
- Create a feature branch (
git checkout -b feature/gpu-optimization) - Commit changes (
git commit -m 'Add GPU batch processing') - Push to branch (
git push origin feature/gpu-optimization) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
For issues and questions:
- Check TROUBLESHOOTING.md
- Review PREREQUISITES.md
- Open an issue on GitHub
GPU-Powered Development at Maximum Performance