Aetheris is a durable, replayable execution runtime β the "Temporal for Agents" that your production AI systems desperately need.
Quick Start β’ Documentation β’ Examples β’ Blog β’ Discord
Your AI agent worked in testing. But production is different.
β Worker crashed β Restart from beginning
β Tool called twice β Duplicate payments
β Need to audit AI decisions β No trace
β Agent waiting for approval β Wastes resources
β Need to replay failed run β Impossible
Go agent frameworks (LangChainGo, LangGraphGo, ADK) build agents. Aetheris runs them in production.
Kubernetes for AI Agents
Aetheris manages agents β providing durability, reliability, and observability for production systems.
It's not:
- β Chatbot framework
- β Prompt library
- β RAG system
- β Another way to write agents
It is:
- β Agent execution runtime β host LangChainGo/LangGraphGo/ADK agents
- β Durable execution β survive crashes, resume from checkpoints
- β Reliable orchestrator β at-most-once tool execution
- β Auditable system β full decision history
| Feature | What It Means |
|---|---|
| At-Most-Once | Tool calls never repeat, even after crashes |
| Crash Recovery | Resume from checkpoints, not from scratch |
| Deterministic Replay | Reproduce any run for debugging |
| Human-in-the-Loop | Pause for approval, resume without waste |
| Full Audit Trail | Every decision traced |
| Multi-Framework | Plug in LangChainGo/LangGraphGo/ADK |
| Use Case | Description |
|---|---|
| Human-in-the-Loop | Agents pause for approval, resume with full context |
| Compliance & Audit | Event-sourced traceability, replayable evidence |
| Local-First | Private cloud, air-gapped environments |
# Install
go install github.com/Colin4k1024/Aetheris/cmd/cli@latest
# Or use Docker
./scripts/local-2.0-stack.sh start
# Initialize
aetheris init my-agent
cd my-agent
aetheris run
# Monitor
aetheris jobs list
aetheris trace <job_id>See Getting Started Guide for details.
Build agents in Eino, run them on Aetheris for durability, replay, and audit.
flowchart LR
subgraph authoring["Authoring Layer (Eino-first)"]
einoBuild["Eino Agent Construction"]
otherFrameworks["Other Frameworks (Optional Legacy)"]
end
subgraph control["Aetheris Control Plane"]
api["API / CLI / SDK Facade"]
auth["Auth / RBAC / Audit Policy"]
end
subgraph data["Aetheris Data Plane (Runtime Core)"]
scheduler["Lease Scheduler / Worker Coordinator"]
runner["Durable Runner / Step Executor"]
toolPlane["Tool Plane (Native + MCP Host)"]
replay["Replay / Verify / Trace"]
end
subgraph storage["Durable Stores"]
eventStore["Event Store (Append-only)"]
checkpointStore["Checkpoint Store"]
effectStore["Effect + Invocation Store"]
jobStore["Job Metadata Store"]
end
authoring --> api
api --> scheduler
scheduler --> runner
runner --> toolPlane
runner --> eventStore
runner --> checkpointStore
runner --> effectStore
scheduler --> jobStore
replay --> eventStore
auth --> api
The flow: Eino authoring β Aetheris runtime submission β scheduler/runner execution β durable events/checkpoints/effects β replay/verify/audit.
| Component | Path | Responsibility |
|---|---|---|
| API Server | cmd/api/ |
HTTP server (Hertz), creates and interacts with agents |
| Worker | cmd/worker/ |
Background execution worker, schedules and executes jobs |
| CLI | cmd/cli/ |
Command-line tool (init, chat, jobs, trace, replay, etc.) |
| AgentFactory | internal/runtime/eino/agent_factory.go |
Config-driven Eino ADK agent creation (recommended entry point) |
| Tool Bridge | internal/runtime/eino/tool_bridge.go |
Converts Aetheris RuntimeTool β Eino InvokableTool |
| Eino Engine | internal/runtime/eino/engine.go |
Workflow compilation, runner management |
| Agent Runtime | internal/agent/runtime/ |
Core execution engine (DAG compiler + runner) |
| Job Store | internal/agent/runtime/job/ |
Event-sourced durable execution history (PostgreSQL) |
| Scheduler | internal/agent/runtime/job/scheduler.go |
Leases and retries tasks with lease fencing |
| Runner | internal/agent/runtime/runner/ |
Step-level execution with checkpointing |
| Planner | internal/agent/planner/ |
Produces TaskGraph from goals |
| Executor | internal/agent/runtime/executor/ |
Executes DAG nodes using eino framework |
| Effects | internal/agent/effects/ |
At-most-once tool execution guarantee via Ledger |
User Message β API creates Job (dual-write: event stream + stateful Job)
β Scheduler picks up pending Job
β Runner.RunForJob: if Job.Cursor exists, restore from Checkpoint;
otherwise PlanGoal β TaskGraph β Compiler β DAG
β Steppable executes nodes one by one
β Each node writes Checkpoint, updates Session.LastCheckpoint and Job.Cursor
β Recovery resumes from next node
| Concept | Description |
|---|---|
| Job | Durable task unit, survives worker crashes |
| Step | Single execution unit within a Job |
| Checkpoint | State snapshot after step completion, enables resume |
| Effect | External side effect record (API calls, DB writes) |
| Ledger | Tool invocation authorization ledger (guarantees at-most-once) |
| TaskGraph | Directed acyclic graph of step dependencies |
Each step produces exactly one outcome:
| Outcome | Meaning |
|---|---|
| Pure | No side effects; safe to replay |
| SideEffectCommitted | World changed; must not re-execute |
| Retryable | Failure, world unchanged; retry allowed |
| PermanentFailure | Failure; job cannot continue |
| Compensated | Rollback applied; terminal state |
| Guarantee | Description |
|---|---|
| At-Most-Once | Tool calls never repeat, even after crashes |
| Crash Recovery | Agents resume from checkpoints, not from scratch |
| Deterministic Replay | Reproduce any run for debugging or auditing |
| Event Sourcing | Full execution history as append-only event stream |
LLMs made agents possible.
Aetheris makes agents production-ready.
| Problem | Without Aetheris | With Aetheris |
|---|---|---|
| Worker crash | Restart from beginning | Resume from checkpoint |
| Duplicate calls | Possible ($$$ loss) | Guaranteed at-most-once |
| Debug | Guess what happened | Deterministic replay |
| Audit | Impossible | Full evidence chain |
| Human approval | Wastes resources | StatusParked |
Discord β’ Discussions β’ Docs
β Star us on GitHub!
Apache License 2.0 β free for commercial use.
β Star us. Build production agents. Ship with confidence.
This section helps improve searchability for specific use cases and related queries.
- durable AI agent execution runtime
- AI agent crash recovery and checkpoint
- production AI agent orchestration
- at-most-once tool execution AI
- event-sourced AI agent audit trail
- deterministic AI agent replay
- Go AI agent framework production
- LangChainGo production deployment
- LangGraphGo durability
- AI agent human-in-the-loop approval
- AI agent state management
- AI agent workflow checkpointing
- AI agent idempotency guarantee
- AI agent observability and tracing
- enterprise AI agent compliance
- AI agent local-first deployment
- AI agent private cloud
- AI agent air-gapped environment
- AI agent regulatory audit
- AI agent financial services compliance
- AI agent healthcare data handling
- AI agent checkpoint resume
- AI agent decision replay
- AI agent full stacktrace
- AI agent failure recovery
- AI agent retry with fencing
- AI agent lease management
- AI agent side effect ledger
- AI agent effect store
- Eino framework integration
- MCP server agent hosting
- AI agent MCP tool bridge
- ADK agent hosting runtime
- AI agent API gateway
- AI agent webhook integration