Professional-grade process orchestrator for robotics systems built in Rust.
Krill provides DAG-based service orchestration, health monitoring, and safety interception for critical robotics applications. It manages complex dependency graphs of services (pixi tasks, ROS2 launch files, shell commands) with automatic restart policies, fault cascading, and emergency stop capabilities.
Key Features:
- โก DAG-based orchestration - Services start/stop in correct dependency order
- ๐ Automatic restarts - Configurable policies: always, on-failure, never
- ๐ Health monitoring - Heartbeat, TCP, HTTP, and script-based checks
- ๐จ Safety interception - Critical service failures trigger emergency stop
- ๐ Cascading failures - Dependent services stop when dependencies fail
- ๐ Terminal UI - Monitoring interface
- ๐ IPC protocol - JSON-based client-server communication
- ๐ Session logging - Per-service logs with timeline aggregation
- ๐ฎ GPU validation - Checks GPU availability before starting services
- ๐ก๏ธ Shell safety - Validates and rejects dangerous shell patterns
just installHere's a complete example orchestrating a ROS2 robot navigation stack:
version: "1"
name: autonomous-robot
log_dir: ~/.krill/logs
env:
ROS_DOMAIN_ID: "42"
ROS_LOCALHOST_ONLY: "0"
services:
# Hardware drivers start first
lidar:
execute:
type: ros2
package: ldlidar_ros2
launch_file: ldlidar.launch.py
health_check:
type: tcp
port: 4048
policy:
restart: on-failure
max_restarts: 3
camera:
execute:
type: ros2
package: realsense2_camera
launch_file: rs_launch.py
launch_args:
enable_depth: "true"
enable_color: "true"
dependencies:
- lidar
health_check:
type: tcp
port: 8554
# SLAM for mapping and localization
slam:
execute:
type: ros2
package: slam_toolbox
launch_file: online_async_launch.py
dependencies:
- lidar: healthy
- camera: healthy
health_check:
type: heartbeat
timeout: 5s
# Navigation stack
navigation:
execute:
type: ros2
package: nav2_bringup
launch_file: navigation_launch.py
dependencies:
- slam: healthy
critical: true # If navigation fails, stop everything
health_check:
type: http
port: 8080
path: /health
policy:
restart: always
restart_delay: 2s
# Web dashboard
dashboard:
execute:
type: docker
image: ghcr.io/robotics/web-ui:latest
ports:
- "3000:3000"
volumes:
- "./config:/app/config:ro"
network: host
dependencies:
- navigation: startedSee Configuration Guide for all available options.
Start the daemon and open the TUI
krill up krill.yamlyou can skip opening the TUI with the option -d/--daemon.
If a daemon is already running, just connect to the TUI
krillStop krill with the command:
krill downkrill-demo.mov
After working on various robotics projects, we realised the need for a robust process orchestrator that could handle complex dependencies and provide a user-friendly interface for monitoring and managing services. Krill was born out of this need, with a focus on:
- Predictability: Know exactly why a service failed and which dependent nodes were brought down as a result.
- Safety-First: If a critical "Guardian" node fails, Krill can trigger an immediate system-wide shutdown or emergency state.
- Tool Agnostic: Stop fighting environment variables. Seamlessly mix Rust, Python, C++, and Dockerized workloads in a single graph.
Backends:
- Pixi - Python package manager tasks (Highly recommended).
- ROS2 - Launch files with argument support.
- Docker - Containerized execution.
- Shell - Validated safe shell commands.
Health Checks
- Heartbeat - Services "check-in" via SDK (Rust/Python/C++).
- TCP/HTTP - Port and endpoint validation.
- Script - Run a custom command to verify health.
Krill provides SDKs for Rust, Python, and C++ to facilitate easy integration with your services.
use krill_sdk_rust::KrillClient;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = KrillClient::new("my-service").await?;
loop {
// Do work...
// Send heartbeat
client.heartbeat().await?;
tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
}
}from krill import KrillClient
with KrillClient("my-service") as client:
while True:
# Do work...
# Send heartbeat
client.heartbeat()
time.sleep(1)Async version:
import asyncio
from krill import AsyncKrillClient
async def main():
client = await AsyncKrillClient.connect("my-service")
while True:
# Do work...
# Send heartbeat
await client.heartbeat()
await asyncio.sleep(1)
asyncio.run(main())#include "krill.hpp"
int main() {
try {
krill::Client client("my-service");
while (true) {
// Do work...
// Send heartbeat
client.heartbeat();
std::this_thread::sleep_for(std::chrono::seconds(1));
}
} catch (const krill::KrillError& e) {
std::cerr << "Error: " << e.what() << std::endl;
return 1;
}
}Krill follows an open-core model. The community edition you see here is fully open-source under the Apache-2.0 license and covers everything needed to orchestrate robotics services in production:
- DAG-based orchestration, health monitoring, restart policies, cascading failures, and safety interception
- Terminal UI, CLI, and client SDKs (Rust, Python, C++)
- Pixi, ROS2, Docker, and shell execution backends
Krill Pro (coming soon) extends the core with enterprise features for larger teams and fleet deployments:
- Advanced scheduling policies
- Fleet-wide orchestration and remote management
- Metrics export and observability integrations
- Priority support
The boundary is simple: if you're running services on a single robot or dev machine, the open-source edition has you covered. Pro targets multi-robot fleets and enterprise operational needs.
We believe the core orchestrator should always be free and community-driven. Revenue from Pro funds continued development of both editions.
| Key | Action |
|---|---|
| โ/k | Previous service |
| โ/j | Next service |
| Enter | View logs |
| d | Detail view |
| r | Restart service |
| s | Stop service |
| S | Stop daemon (with confirmation) |
| q | Quit TUI |
| h | Help |
Comprehensive guides and references:
- Quick Reference - Fast lookup for common configurations
- Configuration Guide - Complete recipe file reference
- SDK Installation - Install and use SDKs (Python, Rust, C++)
- Health Checks - Service monitoring patterns
- Dependencies - DAG orchestration strategies
- Documentation Index - All documentation
Building from source:
just check
just buildSafety Design
- Shell command validation - Rejects pipes, redirections, command substitution
- PGID isolation - Each service in its own process group
- GPU validation - Checks GPU availability before starting
- Dependency validation - Ensures all dependencies exist
- Config validation - Validates YAML against schema
Apache-2.0
Copyright 2026 Tommaso Pardi
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
