Unified Open-Source Super Agent Β· Voice Β· Vision Β· OS Control Β· Deep Research Β· Multi-Platform Messaging
OCTO-Pro is a unified Super Agent that bridges the gap between high-level human intent and low-level system execution. It synthesizes real-time sensory perception, stateful orchestration, autonomous learning, and intelligent model routing into a single, cohesive ecosystem.
Unlike disparate AI toolkits, OCTO-Pro functions as one platform where sensory inputs from the edge feed into a central orchestration harness, backed by a persistent memory layer and a high-performance model proxy.
OCTO-Pro is built from the ground up as a 100% local-first application. Your personal information, files, database configurations, trading suggestions, chat histories, and API credentials NEVER leave your local device (desktop or laptop), except for direct encrypted HTTPS requests made directly to official generative model providers you configure.
- Zero Third-Party Telemetry: We do not collect, intercept, or upload any user analytics, system data, model inputs/outputs, or usage telemetry to third parties.
- Strictly Local Storage: All API credentials and configuration options are saved locally inside
config/api_keys.json,config/gateway.json, and~/.fcc/.env. They are never stored in a cloud database or transmitted to any middleman. - Local Sandbox Execution: The DeerFlow sub-agent sandbox is mapped to local loopback directories or isolated local Docker containers to keep your code execution secure and private.
- Independent MT5 Suggested Workflows: Reconciliations between your technical candles and the Google TimesFM 2.5 predictions run completely on your machine.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OCTO-Pro Super Model β
ββββββββββββββββββββ¬βββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββ€
β Sensory & OS β Orchestration β Learning Loop β Model Proxy β
β Mark-XXXIX β DeerFlow 2.0 β Hermes Agent β Free-Claude β
β (Body) β (Brain) β (Memory) β (Nervous Sys)β
ββββββββββββββββββββ΄βββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββ
| Layer | Tool | Role |
|---|---|---|
| π₯οΈ Sensory & OS Control | Mark-XXXIX | Real-time voice/vision perception and native OS manipulation β the "body" |
| π§ Orchestration & Sandbox | DeerFlow 2.0 | Lead agent logic, sub-agent decomposition, isolated execution β the "brain" |
| πΎ Learning & Persistence | Hermes Agent | Autonomous skill creation and persistent user/context modeling β the "memory" |
| β‘ Model Routing Proxy | Free-Claude-Code | API interception, protocol normalization, and backend routing β the "nervous system" |
| Feature | Description |
|---|---|
| ποΈ Real-time Voice | Ultra-low latency Gemini Live conversation with seamless voice β keyboard switching |
| ποΈ Visual Awareness | Real-time screen processing and webcam vision β the agent sees your workspace |
| π Hardware & System Metrics | Real-time HUD tracking for CPU, GPU, Memory, Network, and thermals (_SysMetrics) |
| π 3D WebGL HUD UI | Adaptive PyQt6 interface featuring an animated, reactive 3D WebGL Avatar |
| π Quant Trading Bridge | Native MT5 (mt5_mcp) and TradingView (tradingview_mcp) control and telemetry reading |
| π₯οΈ OS Control | App orchestration, file I/O, terminal execution, volume, brightness, WiFi |
| π€ Sub-Agent Orchestration | DeerFlow decomposes complex goals into parallel workstreams |
| π§ Persistent Memory | FTS5 session search, Honcho user modeling, and memory nudges |
| π Autonomous Skill Creation | Hermes creates and improves skills from successful experiences |
| π MCP Tools | Connect filesystem, GitHub, Postgres, Brave Search, Puppeteer, and more |
| β‘ Model Proxy | Route requests to NVIDIA NIM, DeepSeek, Kimi, Ollama β transparently |
| π‘ Multi-Channel Gateway | Telegram Β· Discord Β· Slack Β· WhatsApp Β· Signal Β· DingTalk |
| π Security Hardened | Loopback-only Admin UI, Nginx pre-auth, VLAN isolation for high-privilege agents |
| π Hibernate-on-Idle | Modal/Daytona backends β near-zero cost when idle, instant resume |
# 1. Clone the repository
git clone https://github.com/Boyapati13/octo.git
cd octo
# 2. Run the installer (installs all deps, checks Ollama, Node.js, ffmpeg, etc.)
Set-ExecutionPolicy -Scope Process Bypass
.\install_octo.ps1The installer will:
- Install all Python packages from
requirements.txt - Install Playwright browser binaries
- Check for Node.js (MCP servers), ffmpeg (audio), ripgrep (file search)
- Check for Ollama + show model pull commands (Gemma 3, Llama 3.2, Mistral, etc.)
- Create default config files
- Offer to launch OCTO immediately
# 1. Clone & Enter Repository
git clone https://github.com/Boyapati13/octo.git
cd octo/octo
# 2. Install Dependencies
pip install -r requirements.txt
playwright install
# 3. Start the Monolith (Voice loop + Model Proxy + DeerFlow Gateway + Hermes engine)
python server.pyYou can control which parts of the monolith start using command-line arguments:
# Headless Mode: Run model proxy + DeerFlow gateway only (no PyQt/Voice loop)
python server.py --no-voice
# Skip Model Proxy (if running your own proxy elsewhere)
python server.py --no-proxy
# Skip DeerFlow Gateway
python server.py --no-gateway
# Specify Custom Ports
python server.py --proxy-port 8082 --gateway-port 2026All channel settings are centrally managed in config.yaml in the project root:
- Open
config.yaml - Locate the
channelssection:channels: telegram: enabled: true bot_token: "YOUR_TELEGRAM_BOT_TOKEN" discord: enabled: false bot_token: ""
- Alternatively, use the OCTO Desktop β Gateway page to configure channels with a GUI form.
OCTO supports any model available through Ollama as a Haiku-tier backup through the built-in proxy. This means if all cloud API keys are offline, OCTO falls back to your local model automatically.
# Install Ollama (Windows/macOS/Linux)
# β https://ollama.ai/download
# Pull your preferred backup model (pick one):
ollama pull gemma3:4b # Google Gemma 3 4B β fast, low VRAM
ollama pull gemma3:12b # Google Gemma 3 12B β higher quality
ollama pull llama3.2:latest # Meta Llama 3.2
ollama pull mistral:latest # Mistral 7B
ollama pull deepseek-r1:8b # DeepSeek R1 8B (reasoning)Then in the OCTO Desktop Proxy page β OLLAMA β LOCAL MODEL BACKUP:
- Set the URL to
http://localhost:11434 - Click Detect Models β your installed models appear in the dropdown
- Select your preferred backup model and click Save
octo/
βββ main.py # Entry point β Gemini Live voice loop + tool dispatch (mt5_mcp, tradingview_mcp)
βββ ui.py # PyQt6 adaptive UI (3D WebGL HUD, Hardware Metrics _SysMetrics, FileDropZone)
βββ ui_pages/ # Settings, MCP, Gateway, Skills, Memory, Scheduler
β
βββ agent/ # π§ Orchestration layer
β βββ planner.py # LLM-driven task decomposition
β βββ executor.py # Step execution + code generation
β βββ error_handler.py # Strict Tool-Call Recovery + retry logic
β βββ task_queue.py # Async task queue
β βββ context_compressor.py # Context window compression (Hermes-inspired)
β βββ hermes_bridge.py # Hermes Agent integration bridge
β βββ mcp_bridge.py # MCP server client
β
βββ channels/ # π‘ Multi-platform messaging gateway
β βββ manager.py # Channel orchestrator
β βββ telegram_channel.py # Typewriter-style streaming
β βββ discord_channel.py # Typewriter-style streaming
β βββ slack_channel.py # Slack Bot + Socket Mode
β βββ whatsapp_channel.py # WhatsApp via Twilio
β βββ dingtalk.py # DingTalk group robot webhook
β βββ feishu.py # Feishu / Lark open platform
β
βββ actions/ # βοΈ Atomic OS Actions (Mark-XXXIX layer)
β βββ browser_control.py # Vision-based browser automation
β βββ computer_control.py # Mouse, keyboard, window management
β βββ computer_settings.py # Volume, brightness, WiFi, power
β βββ screen_processor.py # Real-time screen capture + analysis
β βββ file_controller.py # File I/O operations
β βββ file_processor.py # Deep PDF and source code analysis
β βββ dev_agent.py # Terminal + git + docker execution
β βββ deep_research.py # Long-horizon web crawling + synthesis
β βββ deerflow_task.py # DeerFlow sub-agent dispatch
β βββ mcp_connect.py # Universal MCP client handler
β βββ ... # 15+ additional action modules
β
βββ memory/ # πΎ Hermes learning loop
β βββ memory_manager.py # FTS5 session search + persistent JSON store
β
βββ skills/ # π Autonomous skill management
β βββ skill_manager.py # agentskills.io standard discovery
β
βββ deerflow_bridge.py # DeerFlow 2.0 integration
βββ core/
β βββ prompt.txt # OCTO-Pro v2.0 system prompt
β βββ text_llm.py # LLM client (Gemini / OpenAI-compatible)
βββ config/
βββ api_keys.json # API key store
βββ mcp_servers.json # MCP server definitions
Free-Claude-Code intercepts Anthropic Messages API traffic and routes to the optimal backend:
| Tier | Recommended Backends |
|---|---|
| Opus (Pro/Ultra) | NVIDIA NIM Β· Kimi 2.5 Β· Doubao-Seed-2.0-Code |
| Sonnet (Standard) | DeepSeek v3.2 Β· Wafer Β· OpenRouter |
| Haiku (Flash) | Local Ollama Β· llama.cpp Β· LM Studio |
The proxy handles protocol normalization β translating OpenAI-style chat streaming into Anthropic SSE format, including thinking blocks and tool-call mapping, so clients never need to change.
CLAUDE_CODE_AUTO_COMPACT_WINDOWis set to 190,000 tokens- DeerFlow uses Strict Tool-Call Recovery to fix malformed history by injecting placeholders for dangling calls
| Mode | Description |
|---|---|
flash |
Single-agent reply β fastest |
standard |
Balanced depth β default |
pro |
Enables thinking and planning |
ultra |
Full sub-agent orchestration β most thorough |
Lead Agent β decompose goal
β
Sub-agents (parallel workstreams)
βββ Initialization: scoped context + tool-set
βββ Isolation: separate context (prevents token bloat)
βββ Filesystem offload: intermediate results β disk
βββ Synthesis: results β Lead Agent β final output
| Mode | Provider | Isolation Strategy |
|---|---|---|
| Local | LocalSandboxProvider |
Host-mapped directories; Bash disabled by default |
| Docker | AioSandboxProvider |
Isolated container via shell-service |
| K8s | Provisioner Service | Scalable pods with PVC data scoped by user |
| Feature | Implementation |
|---|---|
| Session Search | FTS5 full-text search with LLM-based summarization |
| User Modeling | Honcho dialectic profile β preferences, tech stack |
| Memory Nudges | Internal prompts that proactively store relevant context |
| Skill Creation | Auto-creates skills from successful experiences (agentskills.io) |
| Hibernate-on-Idle | Modal + Daytona: near-zero cost when inactive, instant resume |
Configure in Settings β Gateway:
| Platform | Credentials |
|---|---|
| Telegram | Bot token from @BotFather |
| Discord | Bot token + Message Content Intent |
| Slack | xoxb- bot token + xapp- Socket Mode token |
| Meta Cloud API token + Phone Number ID | |
| Signal | Signal CLI instance |
| DingTalk | App key + secret |
Voice interactions use FFmpeg for audio processing and either local Whisper or NVIDIA NIM (Riva gRPC) for transcription. Discord and Telegram support typewriter-style progress streaming.
Edit config/mcp_servers.json or use Settings β MCP:
{
"servers": [
{
"name": "filesystem",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "~/Desktop"]
},
{
"name": "github",
"url": "https://mcp.github.com/sse",
"headers": { "Authorization": "Bearer ghp_..." }
},
{
"name": "postgres",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/mydb"]
}
]
}OCTO is natively integrated with institutional-grade trading setups, functioning as a fully autonomous, high-expectancy AI Quant Risk Manager:
- MetaTrader 5 Bridge:
mt5_mcpstreams tick-level OHLCV data, JSON telemetry (like SMC liquidity and Volume Profile data from Whale Suite indicators), and handles live execution of BUY/SELL orders. - TradingView Bridge:
tradingview_mcpreads chart states, active PineScript indicators, and sets alerts directly via Chrome DevTools Protocol. - Backtesting & Optimization: Deep integration with Python-based quantitative tools (like
investing-algorithm-framework) allows vector-backtesting of the AI's decision logic against thousands of parameters. - Hybrid Execution: MT5 handles the lightning-fast tick-level execution and wick-rejection math, while the OCTO Python Brain processes the JSON telemetry to determine macro trend, sentiment, and trade permissions.
- Periodically scrapes Google News RSS feeds without API dependencies using high-context query parameters.
- Processes headlines through a rules-based quantitative linguistic lexicon targeting:
- Geopolitical Risks (military escalation, global tensions, sanctions, conflicts).
- Central Bank Policy Bias (hawkish/dovish indicators, interest rate changes, inflation sticky pressures).
- Energy Supply Shocks (crude oil disruptions, energy shortages).
- Assigns unified geopolitical risk indices (
CRITICAL,HIGH,MEDIUM,LOW) and directional biases to local JSON telemetry and MT5 Common files, enabling live hot-reloads on active charts.
- Reads real-time TimesFM forecasts and geopolitical risk scores before allowing order placement.
- Implements four runtime-switchable filtering modes (hot-reloaded from
live_bot_config.json):BLOCKβ Actively blocks trades if TimesFM has high-confidence conflict with EA signals.SOFTβ Halves position lot sizing if the AI disagrees with the direction.WARNβ Allows full trade size but fires high-priority warning alerts via Telegram.OFFβ Bypasses the G4 gate completely.
- Macro-Gating Overlay: If global geopolitical risk is classified as
CRITICALorHIGH, it dynamically triggers aMACRO_SOFTlot-halving for contrary trades even if G4 is disabled.
- Tracks scheduled red-folder macroeconomic calendar releases.
- Integrates calendar events with the unstructured news alerts generated by
macro_sentiment_analyst.pyto create a unified timeline of market risk windows.
- Engine Split: Integrates a multi-asset trading harness:
- Forex majors (EURUSD+, GBPUSD+) running H1 Robust RSI & EMA Plateau logic.
- Metal & Indices (NAS100, XAUUSD+) running M15 Pure Volume breakout wick-absorption profiles.
- Resilient 24/7 Reconnect State Machine: Catches MT5 terminal or network dropouts, entering persistent reconnect retry loops every 10 seconds to protect system execution from crashing.
- Dynamic 24-Hour Self-Optimizing Sweep: Checks if 24 hours have elapsed since the last sweep and triggers a native walk-forward optimization run over 5,000 candles to re-tune Markov windows and hedging thresholds. Hot-reloads fresh parameters to MetaQuotes Common Files on the fly.
- Indicator Computations: Handles localized, high-fidelity calculation of technical markers:
- ADX: Trend strength gating.
- MACD: Trend crossovers for Forex.
- VWAP: Premium/discount verification (restricting BUYs below VWAP and SELLs above).
- Swing Levels: Defines precise TP boundaries dynamically.
- Zero-Shot Engine: Natively runs Google's advanced
timesfm-2.5-200m-pytorchmodel on a 5-minute background cycle, forecasting 8β12 bars into the future with dynamic 80% prediction intervals. - Unified IPC: Automatically persists per-symbol cached directional biases (
BULL/BEAR/NEUTRAL) and confidence metrics to both JSON and MQL5-compatibletimesfm_signal.jsonfiles for instant read-out.
OCTO is natively integrated with a high-fidelity WhatsApp communication loop (C:\Users\Tenders\octo\octo\scripts\run_live_bot.py), converting your private chat or Broadcast Channel into a secure, mobile-operable system interface:
- Dynamic Broadcast Channel Parsing: Startup sequences parse channel URLs (e.g.
https://whatsapp.com/channel/0029Vb8a3Zs9mrGeyYMTNx42), querying the Express bridgeβs native/resolve-newsletter/:codemetadata resolver to obtain the correct newsletter JID (120363427287192115@newsletter). - Interactive Trading Console: Supports market control commands sent via chat:
buy <symbol> [lots]/sell <symbol> [lots]β Executes market orders on MT5 via G4 risk checks.close <symbol>/close allβ Closes active positions.status/balance/positionsβ Checks live MT5 balance, equity, and open tickets with floating PnL.sentiment/newsβ Returns Central Bank warning events and geopolitical risk threat indexes.help/menuβ Displays the mobile operations manual.
- LangGraph Assistant Routing: Any non-trading command or general query is automatically forwarded in real-time to the monolithic personal assistant agent (
deerflow_bridge.chat()). - Persistent Conversation Memory: The bot automatically maps the sender's WhatsApp JID as the LangGraph
session_id, enabling persistent multi-turn conversations and user profiles directly over mobile chat.
OCTO-Pro integrates recursively with Graphifyβan advanced static-analysis AST parser that turns the entire repository workspace into an interactive, navigable knowledge graph (consisting of 176,143 nodes, 265,284 edges, and 11,511 communities).
The generated graph and markdown report are located in graphify-out/ at the workspace root.
- Query Graph:
graphify query "How does G4 risk gating evaluate trade lot sizing?"(resolves structural & design questions using a compact local subgraph). - Trace Path:
graphify path "run_live_bot" "macro_sentiment_analyst"(traces dependencies and interaction flows). - Explain Concept:
graphify explain "TimesFM Zero-Shot Forecaster"(gets a high-level conceptual explanation). - Update Index:
graphify update .(synchronizes the index with new changes).
OCTO-Pro registers premium tools inside the FastMCP server, allowing sub-agents to interface with active systems:
octo_timesfm_forecastβ Pulls cached or fresh AI price predictions.octo_risk_manager_statusβ Reports current active G4 configuration and watchlist details.octo_risk_manager_set_configβ Programmatically overrides gate modes and thresholds.
| Target | Resources | Use Case |
|---|---|---|
| Local Evaluation | 8 vCPU Β· 16 GB RAM Β· 20 GB SSD | Single developer; hosted APIs |
| Docker Development | 8 vCPU Β· 16 GB RAM Β· 25 GB SSD | Container testing; sandbox builds |
| Production Server | 16 vCPU Β· 32 GB RAM Β· 40 GB SSD | Multi-agent runs; heavy sandbox workloads |
Production deployment recommended via Docker Compose. For serverless persistence, Modal/Daytona backends enable hibernate-on-idle on low-cost VPS tiers.
- Loopback enforcement: All Admin UIs bound to
127.0.0.1by default - Authentication gateway: Nginx reverse proxy with strong pre-authentication for any external access
- XSS mitigation: Gateway serves active web content (HTML/SVG) as download attachments, never inline
- Network isolation: High-privilege agents executing system commands placed in a dedicated VLAN, isolated from the public internet
- Key management: All API keys stored locally in
config/api_keys.jsonβ never transmitted externally
Python 3.11β3.14
Windows 10/11 Β· macOS Β· Linux
Gemini API key (free tier: gemini-2.5-flash)
Node.js (for MCP servers)
FFmpeg (for audio)
Playwright (for vision)
ripgrep (for file search)
Optional for full stack:
- Docker / Docker Compose (sandbox execution)
- Kubernetes (production scale)
- Modal or Daytona account (hibernate-on-idle)
- NVIDIA NIM API key (high-performance transcription + routing)
| Provider | Models |
|---|---|
| Gemini (default) | 2.5 Flash, 2.5 Pro, Ultra (free tier works) |
| NVIDIA NIM | Opus-tier β high-performance routing |
| DeepSeek | v3.2 β Sonnet-tier standard |
| Kimi / Doubao | Opus-tier alternatives |
| Ollama (local) | gemma4, llama3, mistral, qwen β Haiku-tier |
| OpenAI-compatible | Any endpoint via Free-Claude-Code proxy |
OCTO-Pro stands on the shoulders of these open-source projects:
| Project | Contribution |
|---|---|
| FatihMakes/Mark-XXXIX | Core voice assistant Β· OS sensory foundation |
| bytedance/deer-flow | LangGraph orchestration Β· sub-agent decomposition Β· sandbox execution |
| NousResearch/hermes-agent | Context engine Β· MCP tools Β· skill creation Β· persistent memory |
| Alishahryar1/free-claude-code | Model routing proxy Β· protocol normalization Β· multi-backend support |
MIT β see LICENSE