Proposal: Fold `atelet` into `ateom`


  ## Background

 Today the actor lifecycle is split across two tightly coupled components:

- `atelet`: a node-level DaemonSet that the control plane talks to
- `ateom`: the worker component inside the worker pod that actually runs, checkpoints, and restores actors

To create a single worker, the control plane today coordinates **two RPCs**:
one to `atelet` and one to `ateom-gvisor`. The two processes also share
state through a host bind mount at `/run/ateom-gvisor` so they can hand
off snapshot files.

## Problem

This split presents three structural issues:

1. **Two-component coordination.** Every worker-lifecycle operation is a
    distributed transaction across `atelet` and `ateom`. Failures and
    partial states have to be reconciled by callers, and upgrades have to
    keep the two binaries version-compatible. Debugging means reading two
    sets of logs and reasoning about the handoff between them.

2. **Backend lock-in.** The split assumes the gVisor model (a node agent
    plus an in-sandbox helper). Adding a different worker backend
    (Firecracker, for example) will be harder as we will need to build the support for it in 2 components.

3. **Shared host `/run` mount is a blast radius.** The `atelet` ↔ `ateom`
    handoff requires a host bind mount on `/run/ateom-gvisor`. A
    misbehaving sandbox that fills that directory can exhaust `/run` on
    the node and take down every other pod on it. With per-pod state
    (no host mount), one bad sandbox only takes itself down.

## Proposal

Remove `atelet` and consolidate its responsibilities into `ateom`
(running per-worker-pod), exposing a single control-plane-facing
interface. Concretely:

- Worker lifecycle RPCs (create / start / suspend / restore / destroy)
become a single call to the per-pod agent.
- The backend (gVisor today, others later) lives behind an interface
inside `ateom`; new backends plug in there.
- Snapshot/restore state stays inside the worker pod's own filesystem —
no host mount needed.
- In the future, we can potentially even standardtize the api that `ateom` exposes to allow out-of-tree `ateom`s.

Currently it is not possible due to how the `ateom` does networking, but once https://github.com/agent-substrate/substrate/pull/110 is in, we can implement this proposal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Fold `atelet` into `ateom` #128

Background

Problem

Proposal

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Proposal: Fold atelet into ateom #128

Description

Background

Problem

Proposal

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Proposal: Fold `atelet` into `ateom` #128