Cost-Optimized AI Assistant with Hybrid Architecture Self-hosted β’ Skill-Based β’ $3-5/month vs $97/month alternatives
SecureBot is a cost-optimized, self-hosted AI assistant that combines the best of both worlds:
- π Local inference with Ollama (ANY model you want - runs on YOUR hardware)
- βοΈ Cloud power with Claude API (only for complex tasks and skill creation)
- π° 97% cost savings - $3-5/month instead of $97/month
- π Skill-based architecture - Create once with Claude, reuse forever FREE with local models
Created by Roland (Rojman1984) β’ Built with AI assistance
- ποΈ Hardware Flexibility - Works on ANY machine from budget laptops to GPU servers
- πΈ Extreme Cost Efficiency - Skills created with Claude API run FREE locally forever
- π Security First - Secrets isolated in vault, never exposed to AI models
- π Multi-Provider Search - Google Custom Search, Tavily, DuckDuckGo with auto-fallback
- π§© Reusable Skills - Create AI capabilities once, use infinitely at zero marginal cost
- π Zero-Shot Routing - GLiClass (144M params, <50ms) routes by intent: search, task, knowledge, chat, or action β no heuristics, no scoring
- π§ Memory & Continuity - Persistent context across sessions with system-native automation
- π€ System-Native Heartbeat - systemd timers (not Python loops) for reliability
- βοΈ Automation Skills - Teach cron, systemd, bash, and ansible best practices
- π³ Docker-Native - Simple deployment with docker-compose
- π Multi-Channel Ready - API endpoints for Telegram, Discord, CLI, or custom integrations
The faster your hardware, the faster your responses - but SecureBot works on ANY machine!
SecureBot uses Ollama for local inference, which means YOU choose the model based on YOUR hardware:
| Hardware | Recommended Model | Response Speed | Monthly Cost |
|---|---|---|---|
| π» Budget (8GB RAM) | phi4-mini:3.8b | ~50 seconds | $0 |
| π₯οΈ Mid (16GB RAM) | llama3:8b | ~30 seconds | $0 |
| π₯ AMD Ryzen AI Max | llama3:70b | ~5 seconds | $0 |
| π Mac Mini M4 | llama3:70b | ~3 seconds | $0 |
| π Mac Studio M4 Max | llama3:405b | ~5 seconds | $0 |
| β‘ GPU Server | Any model | <1 second | $0 |
Recommended Sweet Spots:
- π Mac Mini M4 ($599) - Best price/performance for Apple Silicon
- π AMD Ryzen AI Max - Best for Windows/Linux with integrated NPU + large iGPU
- πΌ Budget Start - Begin with phi4-mini on ANY machine, upgrade hardware later
- π’ Enterprise - Add GPU server for sub-second responses
Key Point: Claude API handles complex tasks (skill creation, architecture decisions) regardless of your local hardware. Your local model only handles simple execution and search summarization.
See docs/HARDWARE.md for detailed setup guides and benchmarks.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER REQUEST β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GATEWAY (Port 8080) β
β β’ Multi-channel message routing (Telegram/Discord/API) β
β β’ Search detection and orchestration β
β β’ Memory context loading (soul/user/session) β
β β’ Request/response formatting β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββ΄ββββββββββββββββββ
βΌ βΌ
ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ
β MEMORY SERVICE (8300) β β VAULT (Port 8200) β
β β’ soul.md (identity) β β β’ Secrets isolation β
β β’ user.md (profile) β β β’ API key injection β
β β’ session.md (context) β β β’ Search providers: β
β β’ tasks.json (todos) β β - Google (100/day) β
β β’ REST API for files β β - Tavily (1000/mo) β
ββββββββββββββββββββββββββββ β - DuckDuckGo (free) β
β ββββββββββββββββββββββββββββ
βββββββββββββββββββ¬ββββββββββββββββββββ
βΌ
β
βββββββββββββββββββ΄ββββββββββββββββββ
βΌ βΌ
ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ
β VAULT (Port 8200) β β OLLAMA (Port 11434) β
β β’ Secrets isolation β β β’ Local inference β
β β’ API key injection β β β’ ANY model YOU choose β
β β’ Search providers: β β β’ Zero marginal cost β
β - Google (100/day) β β β’ Speed = YOUR hardware β
β - Tavily (1000/mo) β β β’ phi4-mini (default) β
β - DuckDuckGo (free) β β β’ llama3:8b β
ββββββββββββββββββββββββββββ β β’ llama3:70b β
β β β’ llama3:405b β
βΌ β β’ Custom models β
ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ
β CLAUDE API (On-Demand) β β
β β’ Skill creation ($$$) β β
β β’ Complex reasoning β β
β β’ Architecture design β β
β β’ ~$0.006 per query β β
ββββββββββββββββββββββββββββ β
β β
βββββββββββββββ¬ββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ORCHESTRATOR (Zero-Shot Routing) β
β β
β [1] GLiClass Classification (144M params Β· <50ms) β
β β β
β βββ search β Vault Web Search β Ollama summary (FREE) β
β βββ task β Memory tasks.json β Ollama summary (FREE) β
β βββ knowledge β ChromaDB RAG context β Ollama (FREE) β
β βββ chat β ChromaDB RAG context β Ollama (FREE) β
β βββ action β [2] SkillRegistry (deterministic match) β
β βββ Match β Execute locally (FREE) β
β βββ No match β [3] CodeBot (:8500) β
β Pi + Haiku β Save β β
β Execute (~$0.01 once) β
β (fallback: Haiku directβ
β if CodeBot unavailable)β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SKILLS DIRECTORY β
β β’ SKILL.md format (Claude Code compatible) β
β β’ Reusable AI capabilities β
β β’ Created once ($$$), execute forever (FREE) β
β β’ Categories: search, code, stt, tts, general β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
See docs/ARCHITECTURE.md for technical deep dive.
| Service | Monthly Cost | Features |
|---|---|---|
| SecureBot | $3-5 | Self-hosted, unlimited local use |
| Claude AI Pro | $97 | Web interface, limited features |
| ChatGPT Plus | $20 | Web interface, rate limited |
| Anthropic API | ~$50-200 | Pay-per-token, no optimization |
How SecureBot Achieves 97% Savings:
- Local Inference - Ollama runs on YOUR hardware (zero marginal cost)
- Skill Reuse - Create skill once with Claude ($0.10), execute unlimited times FREE
- Zero-Shot Routing - GLiClass intent classification routes all queries to the optimal free local path; cloud API used only for new skill creation
- Free Search Tiers - Google (100/day), Tavily (1000/mo), DuckDuckGo (unlimited)
- Secrets Management - No accidental API calls leaking credentials
Example Month:
- 300 simple queries β Ollama β $0
- 20 search queries β Free tiers β $0
- 5 new skills created β Claude API β $0.50
- 10 complex queries β Claude API β $0.06
- Total: ~$0.56 (vs $97 for Claude Pro)
See docs/COST_ANALYSIS.md for detailed breakdown.
- Docker & Docker Compose
- Ollama installed and running
- 8GB+ RAM minimum (16GB+ recommended)
- Anthropic API key (for skill creation)
# 1. Install Docker (if not already installed)
# Linux: https://docs.docker.com/engine/install/
# Mac: https://docs.docker.com/desktop/install/mac-install/
# Windows: https://docs.docker.com/desktop/install/windows-install/
# 2. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 3. Pull a model (choose based on your hardware)
# Budget (8GB RAM):
ollama pull phi4-mini:3.8b
# Mid-range (16GB RAM):
ollama pull llama3:8b
# Mac Mini M4 or AMD Ryzen AI Max (32GB+ RAM):
ollama pull llama3:70b
# High-end (64GB+ RAM):
ollama pull llama3:405b
# 4. Clone SecureBot
git clone https://github.com/Rojman1984/securebot.git
cd securebot
# 5. Configure secrets
mkdir -p vault/secrets
cat > vault/secrets/secrets.json << 'EOF'
{
"anthropic_api_key": "your-anthropic-api-key-here",
"search": {
"google_api_key": "your-google-api-key-optional",
"google_cx": "your-google-cx-optional",
"tavily_api_key": "your-tavily-api-key-optional"
}
}
EOF
# 6. (OPTIONAL) Update model in docker-compose.yml
# Edit line 31: OLLAMA_MODEL=phi4-mini:3.8b
# Change to your preferred model (llama3:8b, llama3:70b, etc.)
# 7. Start services
docker-compose up -d
# 8. Install system automation (optional but recommended)
sudo bash services/scripts/install_systemd.sh
sudo bash services/config/install_logrotate.sh
# 9. Verify installation
curl http://localhost:8080/health
curl http://localhost:8200/health
curl http://localhost:8300/health # Memory service
# 10. Send your first message!
curl -X POST http://localhost:8080/message \
-H "Content-Type: application/json" \
-d '{
"channel": "api",
"user_id": "test-user",
"text": "What is the capital of France?"
}'Memory & Automation Setup:
- Memory system provides persistent context across sessions
- Heartbeat keeps Ollama warm and checks service health (every 5 min)
- Hourly summaries track system stats
- Daily reports archive sessions and task status
- See docs/MEMORY.md for details
Response times:
- Budget hardware (phi4-mini): 30-50 seconds
- Mid-range (llama3:8b): 15-30 seconds
- Mac Mini M4 (llama3:70b): 3-5 seconds
- GPU server: <1 second
See docs/INSTALL.md for detailed installation guide.
{
"anthropic_api_key": "sk-ant-...",
"search": {
"google_api_key": "AIza...",
"google_cx": "custom-search-engine-id",
"tavily_api_key": "tvly-..."
}
}skills:
enabled:
- search-google
- search-tavily
- search-duckduckgo
priorities:
search-google: 1 # Try Google first
search-tavily: 2 # Then Tavily
search-duckduckgo: 3 # DuckDuckGo as fallback
rate_limits:
google:
daily: 100
monthly: 3000
tavily:
monthly: 1000
gateway:
search_detection: normal # strict, normal, relaxedSee docs/CONFIGURATION.md for complete reference.
curl -X POST http://localhost:8080/message \
-H "Content-Type: application/json" \
-d '{
"channel": "api",
"user_id": "user123",
"text": "Explain Python list comprehensions"
}'curl -X POST http://localhost:8080/message \
-H "Content-Type: application/json" \
-d '{
"channel": "api",
"user_id": "user123",
"text": "What are the latest AI developments in 2026?"
}'curl -X POST http://localhost:8080/message \
-H "Content-Type: application/json" \
-d '{
"channel": "api",
"user_id": "user123",
"text": "Design a scalable microservices architecture for an e-commerce platform with high availability requirements. Consider trade-offs between consistency and availability."
}'curl -X POST http://localhost:8080/message \
-H "Content-Type: application/json" \
-d '{
"channel": "api",
"user_id": "user123",
"text": "Create a skill to analyze Python code for security vulnerabilities"
}'After creation, the skill runs FREE on Ollama forever!
Skills are reusable AI capabilities - the secret sauce of SecureBot's cost efficiency.
- One-Time Creation - Claude API analyzes your request and creates a SKILL.md file (~$0.10)
- Infinite Reuse - Skill executes with local Ollama model (FREE forever)
- Zero Marginal Cost - Each execution costs $0 after initial skill creation
---
name: python-security-audit
description: Analyze Python code for common security vulnerabilities including SQL injection, XSS, command injection, and insecure deserialization
category: code
priority: 1
---
# Python Security Audit
Perform comprehensive security analysis on Python code.
## Steps
1. Analyze code for SQL injection vulnerabilities
2. Check for XSS attack vectors
3. Identify command injection risks
4. Review deserialization security
5. Flag hardcoded credentials
6. Assess input validation
## Output Format
- Severity: HIGH/MEDIUM/LOW
- Vulnerability type
- Location (file:line)
- RecommendationSearch Skills:
- search-google - Google Custom Search (100 queries/day free)
- search-tavily - Tavily AI Search (1000 queries/month free)
- search-duckduckgo - DuckDuckGo Search (no API key needed)
Automation Skills:
- cron-manager - Schedule recurring tasks with cron
- systemd-service - Create background services
- systemd-timer - Modern alternative to cron
- bash-automation - System automation scripts
- ansible-playbook - Multi-machine automation
See docs/SKILLS.md for creating your own skills. See docs/MEMORY.md for memory system and automation philosophy.
securebot/
βββ gateway/ # API gateway and message routing
β βββ gateway_service.py # FastAPI service
β βββ orchestrator.py # Smart routing logic
βββ vault/ # Secrets management
β βββ vault_service.py # Secure API key injection
β βββ secrets/ # secrets.json (gitignored)
βββ codebot/ # Skill generation specialist agent (:8500)
β βββ codebot_service.py # FastAPI service
β βββ skill_router.py # GLiClass coding intent classifier
β βββ pi_config.json # Pi coding agent configuration
β βββ tools/ # Pi CLI tools (lint, test, validate, commit)
βββ skills/ # Reusable AI skills
β βββ search-google/
β βββ search-tavily/
β βββ search-duckduckgo/
βββ common/ # Shared utilities
β βββ config.py # Configuration management
βββ docker-compose.yml # Service orchestration
βββ docs/ # Documentation
- Python 3.10+ - Core services
- FastAPI - REST API endpoints
- Ollama - Local LLM inference
- Docker - Containerization
- Claude API - Complex reasoning (on-demand)
- Hardware: Ryzen 5 8600G + GTX 1050 Ti Β· 16GB RAM (SecureBot-P2, McAllen TX)
- Holy Trinity of Models:
- GLiClass
knowledgator/gliclass-small-v1.0(144M params) β CPU in gateway container, intent routing <50ms nomic-embed-text(137M params) β via Ollama on host GPU, RAG embeddingsllama3.2:3b-instruct-q4_K_Mβ via Ollama on host GPU, response generation
- GLiClass
- Performance: <50ms routing + 2-5 seconds generation on host GPU
- Assistance: Built with Claude Code and Windsurf IDE
We welcome contributions! SecureBot is built by the community, for the community.
- Skills - Create and share reusable skills
- Providers - Add new search providers (Brave, Perplexity, etc.)
- Integrations - Build Telegram, Discord, Slack bots
- Documentation - Improve guides and examples
- Bug Fixes - Report and fix issues
See CONTRIBUTING.md for guidelines.
# Gateway health
curl http://localhost:8080/health
# Vault health (shows configured providers)
curl http://localhost:8200/health
# Search usage statistics
curl http://localhost:8200/search/usage
# Ollama health
curl http://localhost:11434/api/tags# View all logs
docker-compose logs -f
# Gateway logs only
docker-compose logs -f gateway
# Vault logs only
docker-compose logs -f vaultSecureBot implements defense-in-depth security with multiple layers:
- HMAC-SHA256 Signed Requests - All service-to-service communication is cryptographically signed
- Fully Implemented & Verified -
Depends(verify_service_request)wired to all protected endpoints in vault (:8200), memory (:8300), and rag (:8400) via APIRouter pattern. All three services return 401 on unsigned requests. - Replay Attack Prevention - 30-second timestamp window + nonce tracking
- Service Trust Matrix - Each service explicitly defines who can call it
- Zero External Access - External requests to internal services are rejected (401 Unauthorized)
- Health Endpoints Public -
/healthon all services remains unauthenticated for Docker healthchecks
- Secrets Isolation - API keys never exposed to AI models
- Vault Pattern - Secrets injected at execution time only
- No Prompt Injection - AI cannot access credentials via clever prompts
- Environment Variables - Secrets stored in
.env(gitignored, never committed)
- Docker Network Isolation - Services communicate on private
securebotbridge network - Port Restrictions - Only gateway (8080) exposed externally
- Health Endpoints Public -
/healthendpoints remain accessible for Docker healthchecks
- Local First - Your data stays on your hardware
- No Telemetry - No analytics, tracking, or data collection
- Your Models - Use ANY Ollama model, hosted on YOUR machine
- Full Security Model: See docs/SECURITY.md
- Setup Guide: Run
bash services/scripts/setup_auth.sh - Trust Matrix: Details which services can communicate
- Troubleshooting: Common auth issues and solutions
β Unauthorized access to internal services β Replay attacks (duplicate/old requests) β Man-in-the-middle tampering β Service impersonation β Prompt injection credential theft β External API abuse
Note: For production deployments requiring maximum security, consider implementing mTLS (mutual TLS) with client certificates. Contact for implementation guidance.
MIT License - see LICENSE for details.
Free to use, modify, and distribute. Commercial use allowed.
- Creator: Roland (Rojman1984)
- Built with: Claude Code, Windsurf IDE
- Inspired by: The need for affordable, powerful AI assistants
- Community: Thank you to all contributors!
- GitHub: https://github.com/Rojman1984/securebot
- Issues: https://github.com/Rojman1984/securebot/issues
- Discussions: https://github.com/Rojman1984/securebot/discussions
- β Star this repo if you find it useful!
- π Report bugs via GitHub Issues
- π‘ Request features via GitHub Discussions
- π Improve docs via Pull Requests
Built with β€οΈ by the open-source community
Self-hosted β’ Cost-Optimized β’ Privacy-Focused β’ Community-Driven