A distributed AI orchestrator that manages heterogeneous hardware - NVIDIA GPUs, Apple Silicon, DGX systems - loading optimal models on each device and coordinating them to complete autonomous "missions".
License: MIT | Python: 3.10+ | nCore port: 1903 | OAPI port: 1919
ClusterFlock unifies mixed GPU hardware into a single AI backend. One nCore orchestrator manages many agents - each running on a different machine with different hardware.
┌─────────────────────────────────┐
│ nCore (port 1903) │
│ orchestrator · registry · OAPI │
│ missions · access · catalog │
└──────┬──────┬──────┬─────┬──────┘
│ │ │ │
┌────────────┘ │ │ └────────────┐
│ │ │ │
agent_spark agent_linux agent_mac agent_lms
(DGX/GB10) (amd64+CUDA) (Apple Silicon) (LM Studio)
llama.cpp+CUDA llama.cpp llama.cpp+Metal LM Studio CLI
Key features:
- Zero-touch model provisioning - auto-profiles hardware, bin-packs models to VRAM
- Autonomous missions - a showrunner LLM coordinates a flock of worker LLMs in Docker containers
- OpenAI-compatible API - drop-in replacement on port 1919 (fanout, speed, manual routing)
- Zero third-party deps for nCore - the orchestrator is pure Python stdlib
- 4 agent types - DGX Spark, generic Linux+CUDA, Apple Silicon, LM Studio
git clone --recurse-submodules https://github.com/notum-robotics/ClusterFlock.git
cd ClusterFlockThe
--recurse-submodulesflag pulls the llama.cpp source used by agent_spark. If you forgot it, rungit submodule update --initafterwards.
nCore itself needs nothing beyond Python 3.10+ - it's pure stdlib.
Agents that download models from HuggingFace need one package:
pip install huggingface_hubBuild prerequisites (for agents that compile llama.cpp locally):
| Agent | You need |
|---|---|
| agent_mac | Xcode Command Line Tools (xcode-select --install), CMake |
| agent_linux | build-essential, CMake, NVIDIA drivers (CUDA toolkit only for building) |
| agent_spark | CMake, CUDA toolkit |
| agent_lms | LM Studio installed |
python3 nCore/run.pyThat's it. nCore starts on port 1903 (API) and 1919 (OpenAI-compatible endpoint). Open http://localhost:1903 in a browser to see the dashboard.
Pick the agent that matches your hardware and run the interactive setup. It will build llama.cpp, profile your GPU, pick a model, and connect to nCore.
Mac (Apple Silicon):
cd agents/agent_mac
python3 run.py setupLinux with NVIDIA GPU:
cd agents/agent_linux
./build.sh # one-time: compiles llama.cpp
python3 run.py setupDGX Spark:
cd agents/agent_spark
python3 run.py setupLM Studio (any platform):
cd agents/agent_lms
python3 run.py setupSetup will ask for the nCore address (default http://localhost:1903) and register with the cluster. Once connected, the agent shows up in the dashboard and nCore auto-loads the best model for your hardware.
For long-running deployments, use the watchdog (auto-restarts on crash):
python3 watchdog.py# Chat via the OpenAI-compatible API
curl http://localhost:1919/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "clusterflock", "messages": [{"role": "user", "content": "Hello!"}]}'
# Or launch an autonomous mission
curl -X POST http://localhost:1903/api/v1/missions \
-H "Content-Type: application/json" \
-d '{"mission_id": "my-task", "mission_text": "Build a calculator web app"}'Missions require Docker on the nCore host. The first mission auto-builds a lightweight container image.
- ClusterFlock.md - full architecture, API reference, deployment guide
- ClusterFlockOAPI.md - OpenAI-compatible API documentation
nCore includes an optional web dashboard at http://<host>:1903/. The API is the primary interface; the web UI is just one consumer of it.
MIT - see LICENSE.
See CONTRIBUTING.md.