# Ray Actors Revisited

© 2019-2022, Anyscale. All Rights Reserved

![Anyscale Academy](../images/AnyscaleAcademyLogo.png)

The [Ray Crash Course](../ray-crash-course/00-Ray-Crash-Course-Overview.ipynb) introduced the core concepts of Ray's API and how they parallelize work. Specifically, we learned how to define Ray _tasks_ and _actors_, run them, and retrieve the results. 

This lesson explores Ray actors in greater depth, including the following:

* Detached actors
* Specifying limits on the number of invocations and retries on failure
* Profiling actors

In [None]:
import ray, time, sys, os 
import numpy as np 
sys.path.append("..")
from util.printing import pd  # convenience methods for printing results.

In [None]:
ray.init(ignore_reinit_error=True)

The Ray Dashboard URL is printed above and also part of the output dictionary item `webui_url`

(When using the Anyscale platform, use the URL provided by your instructor to access the Ray Dashboard.)

## Detached Actors

[Detached actors](https://docs.ray.io/en/latest/advanced.html#detached-actors) are designed to be long-lived actors that can be referenced by name and must be explicitly cleaned up. They are not deleted automatically when references to them go out of scope, as for regular actors. 

Detached actors are useful for "services", where different tasks and actors in the application want to lookup an actor and use it.

> **Note:** This is an evolving feature. Check the [documentation](https://docs.ray.io/en/latest/advanced.html#detached-actors) for the latest details.

Here is an example of a "normal" actor definition:

In [None]:
@ray.remote
class Counter:
    def __init__(self):
        self.label = 'Counter'
        self.count = 0
    def next(self):
        self.count += 1
        return self.count

Now create a detached instance of it.

In [None]:
counter = Counter.options(name="Counter1", lifetime="detached").remote()

Then we can use it "somewhere else":

In [None]:
c = ray.get_actor("Counter1")
print(ray.get([counter.next.remote() for _ in range(100)]))

See also the notes on detached actors and actor lifecycles in the lesson [03: Ray Internals](03-Ray-Internals.ipynb). See also the [detached actors](https://docs.ray.io/en/latest/advanced.html#detached-actors) documentation.

To kill a detached actor, use `ray.kill()`:

In [None]:
ray.kill(c)

### Limitations

This is a new feature with a few limitations, both of which will be fixed in a forthcoming release of Ray.

While `ray.kill()` kills the actor, it does not remove the name from the registration table, currently. Hence, it isn't possible to reregister a new instance with the same name. 

If the actor was created with a configuration value of `max_restarts` not equal to zero (discussed in the next section). the actor will be restarted up to `max_restarts` time, which will be infinitely many times if the value was set to -1.

A `no_restart=True|False` keyword argument is being added to `ray.kill()` for this situation:

```python
c = ray.get_actor("Counter1")
ray.kill(c, no_restart=True)  # new optional keyword argument
```

The `no_restart=True` will be necessary for these actors.

## Limiting Actor Invocations and Retries on Failure

> **Note:** This feature may change in a future version of Ray. See the latest details in the [Ray documentation](https://docs.ray.io/en/latest/package-ref.html#ray.remote). 

Two options you can pass to `ray.remote` when defining an actor affect how often it can be invoked and retrying on failure:

* `max_restarts`: This specifies the maximum number of times that the actor should be restarted when it dies unexpectedly. The minimum valid value is 0 (default), which indicates that the actor doesn't need to be restarted. A value of -1 indicates that an actor should be restarted indefinitely.
* `max_task_retries`: How many times to retry an actor task if the task fails due to a system error, e.g., the actor has died. If set to -1, the system will retry the failed task until the task succeeds, or the actor has reached its max_restart limit. If set to to a value `n` greater than 0, the system will retry the failed task up to `n` times, after which time the task will throw a `RayActorError` exception when `ray.get` attempts to retrieve a result. Note that Python exceptions are not considered system errors and will not trigger retries.

Example:

```python
@ray.remote(max_restarts=-1, max_task_retries=-1)
class Foo():
    pass
```

See the [ray.remote()](https://docs.ray.io/en/latest/package-ref.html#ray.remote) documentation for all the keyword arguments supported.

### Overriding with config()

Remote task and actor objects returned by `@ray.remote` can also be dynamically modified with the same arguments supported by `ray.remote()` using `options()` as in the following example:

```python
@ray.remote(num_cpus=2, resources={"CustomResource": 1})
class Foo:
    def method(self):
        return 1
Bar = Foo.options(num_cpus=1, resources=None)
```

## Actor Performance Profiling with the Ray Dashboard

In lesson [01: Ray Tasks Revisited](01-Ray-Tasks-Revisited.ipynb), we learned how to profile task performance using [ray.timeline(file)](https://ray.readthedocs.io/en/latest/package-ref.html#ray.timeline)) and a Chrome web browser to view the data. 

Now we'll investigate how to profile performance of Ray actors using the Ray Dashboard ([documentation](https://ray.readthedocs.io/en/latest/ray-dashboard.html#ray-dashboard)). 

First, let's redefine the _Conway's Game of Life_ code we used in [02: Ray Actors](../ray-crash-course/02-Ray-Actors.ipynb) in the [Ray Crash Course](../ray-crash-course/00-Ray-Crash-Course-Overview.ipynb) tutorial. We've simplified a few details and pulled the definitions of `RayConwaysRules` and `State` into `RayGame` for easier distribution of everything over a cluster.

This same code will be used in the exercise below. You can also find it the file [game_of_life_2.py](game_of_life_2.py).

In [None]:
@ray.remote
class RayGame:
    # TODO: Game memory grows unbounded; trim older states?
    def __init__(self, grid_size, rules_ref):
        self.states = [RayGame.State(size = grid_size)]
        self.rules_ref = rules_ref

    def get_states(self):
        return self.states

    def step(self, num_steps = 1):
        """Take 1 or more steps, returning a list of new states."""
        start_index = len(self.states)
        for _ in range(num_steps):
            new_state_ref = self.rules_ref.step.remote(self.states[-1])
            self.states.append(ray.get(new_state_ref))
        return self.states[start_index:-1]  # return the new states only!

    @ray.remote
    class RayConwaysRules:
        """
        Apply the rules to a state and return a new state.
        """
        def step(self, state):
            """
            Determine the next values for all the cells, based on the current
            state. Creates a new State with the changes.
            """
            new_grid = state.grid.copy()
            for i in range(state.size):
                for j in range(state.size):
                    lns = self.live_neighbors(i, j, state)
                    new_grid[i][j] = self.apply_rules(i, j, lns, state)
            new_state = RayGame.State(grid = new_grid)
            return new_state

        def apply_rules(self, i, j, live_neighbors, state):
            """
            Determine next value for a cell, which could be the same.
            The rules for Conway's Game of Life:
                Any live cell with fewer than two live neighbours dies, as if by underpopulation.
                Any live cell with two or three live neighbours lives on to the next generation.
                Any live cell with more than three live neighbours dies, as if by overpopulation.
                Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.
            """
            cell = state.grid[i][j]  # default value is no change in state
            if cell == 1:
                if live_neighbors < 2 or live_neighbors > 3:
                    cell = 0
            elif live_neighbors == 3:
                cell = 1
            return cell

        def live_neighbors(self, i, j, state):
            """
            Wrap at boundaries (i.e., treat the grid as a 2-dim "toroid")
            To wrap at boundaries, when k-1=-1, that wraps itself;
            for k+1=state.size, we mod it (which works for -1, too)
            For simplicity, we count the cell itself, then subtact it
            """
            s = state.size
            g = state.grid
            return sum([g[i2%s][j2%s] for i2 in [i-1,i,i+1] for j2 in [j-1,j,j+1]]) - g[i][j]

    class State:
        """
        Represents a grid of game cells.
        For simplicity, require square grids.
        Each instance is considered immutable.
        """
        def __init__(self, grid = None, size = 10):
            """
            Create a State. Specify either a grid of cells or a size, for
            which an size x size grid will be computed with random values.
            (For simplicity, only use square grids.)
            """
            if type(grid) != type(None): # avoid annoying AttributeError
                assert grid.shape[0] == grid.shape[1]
                self.size = grid.shape[0]
                self.grid = grid.copy()
            else:
                self.size = size
                # Seed: random initialization
                self.grid = np.random.randint(2, size = size*size).reshape((size, size))


        def living_cells(self):
            """
            Returns ([x1, x2, ...], [y1, y2, ...]) for all living cells.
            Simplifies graphing.
            """
            cells = [(i,j) for i in range(self.size) for j in range(self.size) if self.grid[i][j] == 1]
            return zip(*cells)

        def __str__(self):
            s = ' |\n| '.join([' '.join(map(lambda x: '*' if x else ' ', self.grid[i])) for i in range(self.size)])
            return '| ' + s + ' |'

Finally, a timing function similar to one used in the other lesson.

In [None]:
def time_ray_games(num_games = 1, max_steps = 100, batch_size = 1, grid_size = 100):
    rules_refs = []
    game_refs = []
    for i in range(num_games):
        rules_ref = RayGame.RayConwaysRules.remote()
        game_ref  = RayGame.remote(grid_size, rules_ref)
        game_refs.append(game_ref)
        rules_refs.append(rules_ref)
    print(f'rules_refs:\n{rules_refs}')  # these will produce more interesting flame graphs!
    print(f'game_refs:\n{game_refs}')
    start = time.time()
    state_refs = []
    for game_ref in game_refs:
        for i in range(int(max_steps/batch_size)):  # Do a total of max_steps game steps, which is max_steps/delta_steps
            state_refs.append(game_ref.step.remote(batch_size))
    ray.get(state_refs)  # wait for everything to finish! We are ignoring what ray.get() returns, but what will it be??
    pd(time.time() - start, prefix = f'Total time for {num_games} games (max_steps = {max_steps}, batch_size = {batch_size})')

### Ray Dashboard Profiling

The Ray Dashboard provides an easy way to profile execution of Ray actors, producing [flame graphs](http://www.brendangregg.com/flamegraphs.html), which show performance characteristics of the application. This feature uses [py-spy](https://github.com/benfred/py-spy) to instrument and profile the application.

> **WARNING:** Py-spy requires `sudo` access. When you follow the instructions we are about to give, you may see a message that `sudo` is required, but there's no way to enter the password in the Dashboard. 

You won't see this issue if you are using the Anyscale platform for this tutorial; `sudo` access is already setup as needed.

If you are working on your laptop, we discuss fixes and workarounds for the `sudo` issue in [Troubleshooting, Tips, and Tricks](reference/Troubleshooting-Tips-Tricks.ipynb#Profiling-Actors). For example, one workaround is to run the application you want to profile outside a notebook, using a command line. Then if the Dashboard needs a password, you will be prompted at the terminal. 

In any case, we'll describe the process of profiling here and provide a demonstration video and at live tutorial events.

To profile with the Dashboard, click the _Logical View_ tab. It shows a list of actors that have been executed or are running. Find the running actor that appears to be the one you want to profile. You'll see a line like this:

> Actor <hex_number> (Profile for 10s 30s 60s) Kill Actor

The _10s, 30s, 60s_ are links. We'll use of them to profile our `RayGame` performance.

More specifically, use this procedure, which helps you find the correct actor:

1. Start the next cell
2. Copy to the clipboard the hex number shown, something like `[Actor(RayGame, a139e2970100)]`
3. Immediately go to the Dashboard's _Logical View_
4. Search for the actor, CTRL-F (Windows/Linux) or CMD-F (MacOS) and enter the hex code
3. For the two actors found, one for `RayGame` and one for `RayGame.RayConwaysRules`, click _10s_ to profile them.
4. When the profile run finishes, click _Profile results_ for each one to see the _flame graphs_ in other tabs.

In [None]:
%time time_ray_games(num_games = 1, max_steps = 400, batch_size = 50, grid_size = 100)

At the top of each graph click the small left or right arrow next to _py-spy_ to see different pages of output. 

Cycle between _Time Order_, _Left Heavy_, and _Sandwich_ in the upper-left hand corner to see how they change the displayed information. Often _Left Heavy_ is most immediately useful.

For the `RayGame` graph, there is little interesting information, as mostly it waits for `RayConwaysRules` to crunch through grids of numbers. The function calls you see are primarily networking and actor messaging background:
![Conway's GoL Flame Graph](../images/RayGame-FlameGraph.png)

The `RayConwaysRules` display is more interesting. Here is a screen show of the _Time Order_ view _flame graph_.
![Conway's GoL Flame Graph](../images/RayConwaysRules-FlameGraph.png)

We are looking at the call stack, with the top-most stack frame at the bottom. It's showing that the _list comprehension_ in the `ConwaysRules.live_neighbors` method, along with the rest of that method's work, are taking the most time. That's where you would want to optimize the performance, if possible!

> **Tip:** 
>
> 1. This view is called the _speedscope_ view of the data, which shows the flame graph. You can learn more about navigating and using this tool at the [speedscope GitHub site](https://github.com/jlfwong/speedscope).
> 2. The [Ray Dashboard documentation](https://ray.readthedocs.io/en/latest/ray-dashboard.html#debugging-a-blocked-actor) offers tips for using the _Logical View_ to debug actor issues.

## Exercise 4 - "Homework"

This exercise is more involved and will mostly appeal to those of you who like the challenges of optimizing low-level performance. Also, for a live tutorial event, this exercise is too much to take on in the limite time available. That's why it's labelled _Homework_. I encourage you to read the solution though.

We already proved that running whole games as actors gives us a performance boost when we need to run many of them at once, simply by running the games concurrently across the available CPU cores.

This exercise explores whether or not we can improve the performance of a single game. This task isn't easy. We'll try a few ideas, but find that many don't provide much improvement. Only doing low-level optimizations provide significant improvement at this point. That could be important in a massive system where every bit of efficiency matters, but _Premature optimization is the root of all evil_. Why? Because optimizations often obscure the logic of the code, making it harder to maintain and bugs more likely. This code does basic math, but a lot of it, and sometimes it's better to crank through it in a single thread.

Try running the same code again. This time watch the _Machine View_ of the Ray Dashboard. How is the load distributed over the workers?

In [None]:
%time time_ray_games(num_games = 1, max_steps = 400, batch_size = 50, grid_size = 100)

You notice that **one** core was pegged running `RayConwaysRules.step()`. Profiling the result and looking at the flame graph provides more information. As we showed previously in this lesson, most of the time is taken up in `live_neighbors()` and in that method, more than half the time is spent in the _list comprehension_.

For the exercise, there are a few things you could try.

Since profiling shows that `live_neighbors` is the bottleneck, what could be done to reduce its execution time? 

The solution discussed in the [solutions notebook](solutions/Advanced-Ray-Solutions.ipynb) notebook shows that in fact you can reduce its overhead by about 40%. Not bad. The trick is to process the grid updates in parallel, in. blocks of rows at a time, rather than synchronously iterating through the grid cells (i.e., a "block" the size of the whole grid). For larger and larger game sizes, the improvement should be more noticeable.

But let's step back for a moment; this is the sort of optimization you do when you _really_ have a compelling reason to squeeze optimal performance out of the code. Hence, for this exercise, it's probably overkill, unless you're interested in low-level performance optimizations like this. If you are, see the discussion in the _Solutions_ notebook.

Other, easier experiments you can try may not not produce much improvement, based on the flame graph results above, but consider trying them for "practice". Look at `RayGame.step()` and `RayConwaysRules.step()`. There are a bunch of remote calls in there. What refactoring could be done that might improve performance? For example, what about extending `RayConwaysRules.step()` to accept a `num_steps` argument like `RayGames2.step()` supports, then modify the call to it from `RayGames.step()`? Does this actually improve performance or not? Don't forget to watch the Dashboard.

In [None]:
ray.shutdown()  # "Undo ray.init()".

The next lesson, [Ray Internals](03-Ray-Internals.ipynb), explores the architecture of Ray, task scheduling, the Object Store, etc.