# Ray Actors in Detail

© 2025, Anyscale. All Rights Reserved

This document provides an introduction to Ray Actors, which extend the Ray API from functions (tasks) to classes.


<div class="alert alert-block alert-info">

<b> Here is the roadmap for this notebook </b>

<ol>
  <li>Overview and setup</li>
  <li>Simple actor submission (creating, executing, and getting results)</li>
  <li>Actor resource fulfillment and scheduling</li>
  <li>Fault tolerance with Actors</li>
  <li>Multi-threading with Actors</li>
  <li>Asyncio with Actors</li>
  <li>Placement groups</li>
  <li>Actor pool abstraction</li>
</ol>
</div>

**Imports**

In [None]:
import asyncio
import os
import time
import threading


import ray
from ray.util import ActorPool
from ray.util.scheduling_strategies import NodeAffinitySchedulingStrategy
from ray.util.placement_group import placement_group, remove_placement_group
from ray.util.scheduling_strategies import PlacementGroupSchedulingStrategy

## Simple actor submission (creating, executing, and getting results)

Actors extend the Ray API from functions (tasks) to classes.

An actor is a stateful worker. When a new actor is instantiated, a new worker is created, and methods of the actor are scheduled on that specific worker and can access and mutate the state of that worker. Similarly to Ray Tasks, actors support CPU and GPU compute as well as fractional resources.

Let's look at an example of an actor which maintains a running balance.

In [None]:
@ray.remote
class Accounting:
    def __init__(self):
        self.total = 0
    
    def add(self, amount):
        self.total += amount
        
    def remove(self, amount):
        self.total -= amount
        
    def total(self):
        return self.total

<div class="alert alert-info">
  <strong><a href="https://docs.ray.io/en/latest/ray-core/key-concepts.html#actors" target="_blank">Actor</a></strong> is a remote, stateful Python class.
</div>

<div class="alert alert-info">

The most common use case for actors is with state that is not mutated but is large enough that we may want to load it only once and ensure we can route calls to it over time, such as a large AI model.

</div>

Define an actor with the `@ray.remote` decorator and then use `<class_name>.remote()` ask Ray to construct and instance of this actor somewhere in the cluster.

We get an actor handle which we can use to communicate with that actor, pass to other code, tasks, or actors, etc.

In [None]:
acc = Accounting.remote()

We can send a message to an actor -- with RPC semantics -- by using `<handle>.<method_name>.remote()`

In [None]:
acc.total.remote()

Not surprisingly, we get an object ref back

In [None]:
ray.get(acc.total.remote())

We can mutate the state inside this actor instance

In [None]:
acc.add.remote(100)

In [None]:
acc.remove.remote(10)

In [None]:
ray.get(acc.total.remote())

### Activity: Linear Model Inference

<div class="alert alert-block alert-info">

__Activity: linear model inference__

* Create an actor which applies a model to convert Celsius temperatures to Fahrenheit
* The constructor should take model weights (w1 and w0) and store them as instance state
* A convert method should take a scalar, multiply it by w1 then add w0 (weights retrieved from instance state) and then return the result


In [None]:
# Hint: define the below as a remote actor
class LinearModel:
    def __init__(self, w0, w1):
        """Hint: store the weights"""

    def convert(self, celsius):
        """Hint: convert the celsius temperature to Fahrenheit."""

# Hint: create an instance of the LinearModel actor

# Hint: convert 100 Celsius to Fahrenheit

</div>

In [None]:
# Write your solution here

<div class="alert alert-block alert-info">

<details>

<summary> Click to see solution </summary>

```python
@ray.remote
class LinearModel:
    def __init__(self, w0, w1):
        self.w0 = w0
        self.w1 = w1

    def convert(self, celsius):
        return self.w1 * celsius + self.w0

model = LinearModel.remote(w1=9/5, w0=32)
ray.get(model.convert.remote(100))
``` 

</details>

</details>
</div>


## Actor resource fulfillment and scheduling

Actors reserve resources for their entire lifetime. Method calls (actor tasks) execute on the same worker process that hosts the actor.

- An actor's resource shape is specified on the class via `@ray.remote(...)` or with `.options(...)` at construction time.
- Actor methods run on that dedicated worker; they do not request additional resources beyond what the actor already holds.
- Calls to the same actor are queued and executed according to its concurrency settings (default: one at a time).

In [None]:
@ray.remote(num_cpus=2, num_gpus=0.5, resources={"db": 1})
class ModelServer:
    def __init__(self):
        self.ready = True

    def infer(self, x):
        return x * 2

# Placement: let Ray decide (default)
srv = ModelServer.remote()

# Placement: spread actors across nodes
srv_spread = ModelServer.options(scheduling_strategy="SPREAD").remote()

# Placement: node affinity
node_id = ray.get_runtime_context().get_node_id()
affinity = NodeAffinitySchedulingStrategy(node_id=node_id, soft=True)
srv_aff = ModelServer.options(scheduling_strategy=affinity).remote()

ray.get(srv.infer.remote(21))

### How actor placement is chosen

| Rule | When | Behavior |
| --- | --- | --- |
| Data locality | Actor args include large `ObjectRef`s | Prefer node with most bytes local |
| Node affinity | `scheduling_strategy=NodeAffinitySchedulingStrategy(...)` | Try preferred node; fallback if `soft=True` |
| Default | No preferences | Use caller's local raylet if resources fit |

### Execution model
- Actor creation is a placement decision; resources are leased to the actor's worker for its full lifetime.
- Actor method calls reuse that worker (no per-call placement), honoring FIFO and concurrency limits.
- To scale throughput, create multiple actors or increase `max_concurrency` (see Multithreaded/Async actors).

<div class="alert alert-info">
  <b>Tip:</b> Use fractional resources (e.g., <code>num_cpus=0.5</code>) for I/O-heavy actors to pack more per node.
  Inspect <code>ray.available_resources()</code> and <code>ray.cluster_resources()</code> to reason about placement capacity.
  Consider <code>SPREAD</code> to avoid hotspots when launching many actors.
</div>


## Fault tolerance with Actors

Actors can automatically restart on failure. Configure restart behavior on the class or at construction.

In [None]:
@ray.remote(max_restarts=2, max_task_retries=5)
class Unstable:
    def __init__(self):
        self.n = 0

    def bump(self):
        self.n += 1
        return self.n

    def crash(self):
        os._exit(1)  # simulate hard failure

a = Unstable.remote()
try:
    ray.get(a.crash.remote())  # triggers restart (up to 2x)
except Exception:
    pass
ray.get(a.bump.remote())

### Key behaviors
- max_restarts: How many times to recreate the actor after process/node failures.
- max_task_retries: How many times to retry a failed actor method due to system errors.
- Application exceptions from methods propagate as `RayTaskError` to the caller.

### Preserving state across restarts
Actor memory is process-local. After a restart, you must restore state explicitly.

In [None]:
@ray.remote(max_restarts=3)
class Checkpointed:
    def __init__(self, ckpt_ref=None):
        self.state = {"sum": 0}
        if ckpt_ref is not None:
            self.state = ray.get(ckpt_ref)

    def add(self, x):
        self.state["sum"] += x
        return self.state["sum"]

    def checkpoint(self):
        return ray.put(self.state)

ckpt_actor = Checkpointed.remote()
ray.get([ckpt_actor.add.remote(i) for i in range(10)])
ckpt_ref = ray.get(ckpt_actor.checkpoint.remote())

# Recreate using checkpoint (e.g., after failure)
ckpt_actor2 = Checkpointed.options(args=(ckpt_ref,)).remote()

### Detached actors
Make long-lived, globally named services resilient to driver exits.

In [None]:
svc = ModelServer.options(lifetime="detached", name="global_model").remote()
# Later (or from another driver):
svc = ray.get_actor("global_model")

### Killing actors

In [None]:
# Prevent restart on kill
ray.kill(a, no_restart=True)

<div class="alert alert-warning">
<b>Note:</b> Restarted actors run <code>__init__</code> again. Implement idempotent initialization and explicit restore paths.
</div>


## Multithreaded actors

By default, an actor runs one method at a time. Increase parallelism with `max_concurrency` and ensure thread-safety.

In [None]:
@ray.remote(max_concurrency=8)
class Counter:
    def __init__(self):
        self.value = 0
        self._lock = threading.Lock()

    def add(self, x):
        time.sleep(0.1)  # simulate work
        with self._lock:
            self.value += x
            return self.value

c = Counter.remote()
refs = [c.add.remote(1) for _ in range(32)]
ray.get(refs)  # up to 8 run concurrently

Guidelines:
- Protect shared mutable state with locks or use immutable updates.
- Use higher `max_concurrency` for I/O-bound actors; keep modest for CPU-bound to avoid oversubscription.
- For CPU-heavy parallelism, prefer multiple actors to scale across cores/nodes.


## Async actors

Async actors run an asyncio event loop; methods declared with `async def` can interleave via `await` points. Concurrency is bounded by `max_concurrency`.

In [None]:
@ray.remote(max_concurrency=16)
class AsyncWorker:
    async def work(self, i):
        await asyncio.sleep(0.2)
        return i * i

aw = AsyncWorker.remote()
results = ray.get([aw.work.remote(i) for i in range(20)])

Patterns:
- Prefer async actors for network-bound or timer-based workflows; use `await` to yield.
- Combine async with backpressure at the caller (e.g., submit N calls, `ray.wait`, then submit more).
- You can mix sync and async methods in the same actor.

<div class="alert alert-info">
  <b>Tip:</b> Async actors avoid Python thread contention and can scale high-concurrency I/O. Set <code>max_concurrency</code> to the target in-flight operations.
</div>


## Placement groups (bundle-aware placement)

Placement groups co-locate a set of resources into one or more bundles and schedule them atomically. They are useful when you need:
- Multiple actors/tasks to be co-located on the same node (e.g., pipeline stages sharing data)
- Gang scheduling for tightly coupled components (e.g., parameter server + workers)
- Reserved capacity before launching a topology of actors

In [None]:
# Create a placement group with two bundles on the same node (PACK)
pg = placement_group(
    [
        {"CPU": 2},  # bundle 0
        {"CPU": 2},  # bundle 1
    ],
    strategy="PACK",
    name="actors_pg",
)
ray.get(pg.ready())


# Schedule an actor in bundle 0
@ray.remote(num_cpus=2)
class StageA:
    def run(self, x):
        return x + 1


# Schedule an actor in bundle 1
@ray.remote(num_cpus=2)
class StageB:
    def run(self, x):
        return x * 2


bundle0 = PlacementGroupSchedulingStrategy(
    placement_group=pg, placement_group_bundle_index=0
)
bundle1 = PlacementGroupSchedulingStrategy(
    placement_group=pg, placement_group_bundle_index=1
)

a = StageA.options(scheduling_strategy=bundle0).remote()
b = StageB.options(scheduling_strategy=bundle1).remote()

res = ray.get(b.run.remote(ray.get(a.run.remote(10))))

# Cleanup when done
remove_placement_group(pg)

### Strategies
- PACK: Prefer to place all bundles on as few nodes as possible (good for locality).
- SPREAD: Spread bundles across nodes (fault isolation, bandwidth).
- STRICT_PACK / STRICT_SPREAD: Hard constraints; fail if not possible.

### Best practices
- Create the placement group first, wait for `pg.ready()` before launching actors to avoid queuing delays.
- Use `PlacementGroupSchedulingStrategy` on each actor/task that must reserve from the group.
- Right-size bundles to the actors/tasks that will occupy them (avoid internal fragmentation).
- For elastic topologies, prefer multiple smaller placement groups over a single monolith.

<div class="alert alert-warning">
<b>Note:</b> Placement groups reserve capacity; they can increase pending time if the cluster is busy. Use them when co-placement matters.
</div>


## ActorPool (simple worker pool over actors)

`ray.util.ActorPool` provides a lightweight way to manage a pool of homogeneous actors and submit many small jobs with automatic load balancing.

In [None]:
@ray.remote
class Worker:
    def process(self, x):
        return x * x

# Create N actors
workers = [Worker.remote() for _ in range(4)]
pool = ActorPool(workers)

# Map over inputs (unordered completion)
inputs = range(10)
results = list(pool.map(lambda a, x: a.process.remote(x), inputs))

# Or submit tasks incrementally and consume as ready
for x in range(10, 20):
    pool.submit(lambda a, v: a.process.remote(v), x)

ready = [pool.get_next() for _ in range(10)]

When to use:
- Many short, similar actor method calls; you want automatic fair scheduling across a fixed set of actors.
- Simple replacement for manual round-robin over actor handles.

Prefer alternatives when:
- You need heterogeneous actors or topology (use multiple actor types or placement groups).
- You need backpressure/windowed submission (combine with `ray.wait` or use async actors with queues).