# Solutions to exercises in the `ray-core` Lessons

First, import everything we'll need and start Ray:

In [1]:
import ray, time, sys
import numpy as np

In [2]:
def pnd(n, duration, prefix=''):
    """Print an integer and a time duration, with an optional prefix."""
    prefix2 = prefix if len(prefix) == 0 else prefix+' '
    print('{:s}n: {:2d}, duration: {:6.3f} seconds'.format(prefix2, n, duration))

def pd(duration, prefix=''):
    """Print a time duration, with an optional prefix."""
    prefix2 = prefix if len(prefix) == 0 else prefix+' '
    print('{:s}duration: {:6.3f} seconds'.format(prefix2, duration))

In [3]:
ray.init(ignore_reinit_error=True)

2020-04-14 07:19:40,828	INFO resource_spec.py:212 -- Starting Ray with 4.35 GiB memory available for workers and up to 2.2 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-04-14 07:19:41,181	INFO services.py:1148 -- View the Ray dashboard at [1m[32mlocalhost:8265[39m[22m


{'node_ip_address': '192.168.1.149',
 'redis_address': '192.168.1.149:33125',
 'object_store_address': '/tmp/ray/session_2020-04-14_07-19-40_819186_79092/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-04-14_07-19-40_819186_79092/sockets/raylet',
 'webui_url': 'localhost:8265',
 'session_dir': '/tmp/ray/session_2020-04-14_07-19-40_819186_79092'}

## Exercise 1 in 02-TaskParallelism-Part1

You were asked to convert the regular Python code to Ray code. Here are the three cells appropriately modified.

First, we need the appropriate imports and `ray.init()`.

In [1]:
@ray.remote
def slow_square(n):
    time.sleep(n)
    return n*n

NameError: name 'ray' is not defined

In [5]:
start = time.time()
ids = [slow_square.remote(n) for n in range(4)]
squares = ray.get(ids)
duration = time.time() - start

In [6]:
assert squares == [0, 1, 4, 9]
# should fail until the code modifications are made:
assert duration < 4.1, f'duration = {duration}' 

## Exercise 2 in 03-TaskParallelism-Part2

You were asked to use `ray.wait()` with a shorter timeout, `2.5` seconds. First we need to redefine in this notebook the remote functions we used in that lesson:

In [7]:
@ray.remote
def make_array(n):
    time.sleep(n/10.0)
    return np.random.standard_normal(n)

@ray.remote
def add_arrays(a1, a2):
    time.sleep(a1.size/10.0)
    return np.add(a1, a2)

In [8]:
start = time.time()
array_ids = [make_array.remote(n*10) for n in range(5)]
added_array_ids = [add_arrays.remote(id, id) for id in array_ids]

arrays = []
waiting_ids = list(added_array_ids)        # Assign a working list to the full list of ids
while len(waiting_ids) > 0:                # Loop until all tasks have completed
    # Call ray.wait with:
    #   1. the list of ids we're still waiting to complete,
    #   2. tell it to return immediately as soon as TWO of them complete,
    #   3. tell it wait up to 10 seconds before timing out.
    return_n = 2 if len(waiting_ids) > 1 else 1
    ready_ids, remaining_ids = ray.wait(waiting_ids, num_returns=return_n, timeout=2.5)
    print('Returned {:3d} completed tasks. (elapsed time: {:6.3f})'.format(len(ready_ids), time.time() - start))
    new_arrays = ray.get(ready_ids)
    arrays.extend(new_arrays)
    for array in new_arrays:
        print(f'{array.size}: {array}')
    waiting_ids = remaining_ids  # Reset this list; don't include the completed ids in the list again!
    
print(f"\nall arrays: {arrays}")
pd(time.time() - start, prefix="Total time:")

Returned   2 completed tasks. (elapsed time:  2.017)
0: []
10: [-0.44553469 -1.93479913 -3.5977004   1.9238182   4.39475763 -2.29234861
  0.15558072 -0.9253621  -0.76277374  2.75928495]
Returned   1 completed tasks. (elapsed time:  5.532)
20: [-1.34267102 -2.06120681  0.05251457  0.67592777  0.64237916 -1.77380716
 -0.17278449  1.8836336   1.03791764  2.07584008  0.41470915 -0.31120551
 -0.48290277  2.03349318 -3.58036838  2.7793085   2.22316961  3.2815568
 -3.00447352 -1.39071692]
Returned   2 completed tasks. (elapsed time:  8.024)
30: [ 0.50489809 -1.26025978  1.90875912 -1.47413317  3.56160123 -0.02548098
  1.82681312  1.22907694 -0.93719539 -4.25904422  1.84724104  1.67586983
  2.79733789 -1.77137426 -0.34103046  0.56282364  0.48166511  5.00366035
 -0.07997564 -2.71198896 -1.989703    0.97500589  1.70713808 -0.84229131
  1.13273471 -0.80687159  2.57089412  5.7337013   2.40813672  0.89270571]
40: [ 1.31413776  1.07380793  0.19614594  2.76340287  6.17523772 -1.0867485
  1.63531305 -

For a timeout of `2.5` seconds, the second call to `ray.wait()` times out before two tasks finish, so it only returns one completed task. Why did the third and last iteration not time out? (That is, they both successfully returned two items.) It's because all the tasks were running in parallel so they had time to finish. If you use a shorter timeout, you'll see more time outs, where zero or one items are returned. 

Try `1.5` seconds, where all but one iteration times out and returns one item. The first iteration returns two items.
Try `0.5` seconds, where you'll get several iterations that time out and return zero items, while all the other iterations time out and return one item.

## Exercise 3 in 03-TaskParallelism-Part2

You were asked to convert the code to use Ray, especially `ray.wait()`.

In [9]:
@ray.remote
def slow_square(n):
    time.sleep(n)
    return n*n

start = time.time()
ids = [slow_square.remote(n) for n in range(4)]
squares = []
waiting_ids = ids
while len(waiting_ids) > 0:
    finished_ids, waiting_ids = ray.wait(waiting_ids)  # We just assign the second list to waiting_ids...
    squares.extend(ray.get(finished_ids))
duration = time.time() - start

In [10]:
assert squares == [0, 1, 4, 9]
assert duration < 4.1, f'duration = {duration}' 

## Exercise 4 in 04-DistributedStateWithActors

Let's see if we can achieve better performance results than our last run. For your convenience, here are new versions of `RayGame` (called `RayGame2`) and `ConwaysRules` (called `RayConwaysRules`), both declared as actors. What about `State`? It actually _doesn't_ make sense to make it an actor, because it is really just an _immutable_ holder of data, so making it an actor is not going to bring any benefit.

> **Note:** This solution is only partially done. For a work in progress, see `../../util/Ex4-GameOfLife.py`.

First, let's redefine a few things we need from that notebook, including the exercise code we need.

In [11]:
grid_size = 100
max_steps = 200

def cleanup(ids):
    for id in ids: 
        id.__ray_terminate__.remote()

In [12]:
print(f'http://{ray.get_webui_url()}')

http://localhost:8265


For comparison, my runs with the exercise code before improvements were about 12 to 12.5 seconds.

If you look at `RayGame2.step`, it calls `RayConwaysRules.step` one step at a time, using remote calls. This seems like a good place for improvement. Let's extend `RayConwaysRules.step` to do more than one step, just like `RayGame2.step` already supports.

Changes are indicated with comments.

In [13]:
class State:
    """
    Represents a grid of game cells.
    For simplicity, require square grids.
    Each instance is considered immutable.
    """
    def __init__(self, grid = None, size = 10):
        """
        Create a State. Specify either a grid of cells or a size, for
        which an size x size grid will be computed with random values.
        (For simplicity, only use square grids.)
        """
        if type(grid) != type(None): # avoid annoying AttributeError
            assert grid.shape[0] == grid.shape[1]
            self.size = grid.shape[0]
            self.grid = grid.copy()
        else:
            self.size = size
            # Seed: random initialization
            self.grid = np.random.random(size*size).reshape((size, size)).round()

    def living_cells(self):
        """
        Returns ([x1, x2, ...], [y1, y2, ...]) for all living cells.
        Simplifies graphing.
        """
        cells = [(i,j) for i in range(self.size) for j in range(self.size) if self.grid[i][j] == 1]
        return zip(*cells)

    def __str__(self):
        s = ' |\n| '.join([' '.join(map(lambda x: '*' if x else ' ', self.grid[i])) for i in range(self.size)])
        return '| ' + s + ' |'

In [14]:
@ray.remote
class RayConwaysRules:
    """
    Apply the rules to a state and return a new state.
    """
    def step(self, state, num_steps = 1):
        """
        Determine the next values for all the cells, based on the current
        state. Creates a new State with the changes and returns a one-elemen array
        of new states, supporting num_steps > 1.
        """
        new_states = []
        for n in range(num_steps):
            new_grid = state.grid.copy()
            for i in range(state.size):
                for j in range(state.size):
                    lns = self.live_neighbors(i, j, state)
                    new_grid[i][j] = self.apply_rules(i, j, lns, state)
            new_states.append(State(grid = new_grid))
        return new_states

    def apply_rules(self, i, j, live_neighbors, state):
        """
        Determine next value for a cell, which could be the same.
        The rules for Conway's Game of Life:
            Any live cell with fewer than two live neighbours dies, as if by underpopulation.
            Any live cell with two or three live neighbours lives on to the next generation.
            Any live cell with more than three live neighbours dies, as if by overpopulation.
            Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.
        """
        cell = state.grid[i][j]  # default value is no change in state
        if cell == 1:
            if live_neighbors < 2 or live_neighbors > 3:
                cell = 0
        elif live_neighbors == 3:
            cell = 1
        return cell

    def live_neighbors(self, i, j, state):
        """
        Wrap at boundaries (i.e., treat the grid as a 2-dim "toroid")
        To wrap at boundaries, when k-1=-1, that wraps itself;
        for k+1=state.size, we mod it (which works for -1, too)
        For simplicity, we count the cell itself, then subtact it
        """
        s = state.size
        g = state.grid
        return sum([g[i2%s][j2%s] for i2 in [i-1,i,i+1] for j2 in [j-1,j,j+1]]) - g[i][j]

In [15]:
@ray.remote
class RayGame2:
    # TODO: Game memory grows unbounded; trim older states?
    def __init__(self, initial_state, rules_id):
        self.states = [initial_state]
        self.rules_id = rules_id

    def step(self, num_steps = 1):
        """Take 1 or more steps, returning a list of new states."""
        start_index = len(self.states)
        new_state_ids = self.rules_id.step.remote(self.states[-1], num_steps)
        self.states.extend(ray.get(new_state_ids))
        return self.states[start_index:-1]  # return the new states only!

In [16]:
def time_ray_games2(num_games = 10, max_steps = max_steps, batch_size = 1, grid_size = grid_size):
    game_ids = [RayGame2.remote(State(size = grid_size), RayConwaysRules.remote()) for i in range(num_games)]
    start = time.time()
    state_ids = []
    for game_id in game_ids:
        for i in range(int(max_steps/batch_size)):  # Do a total of max_steps game steps, which is max_steps/delta_steps
            state_ids.append(game_id.step.remote(batch_size))
    ray.get(state_ids)  # wait for everything to finish! We are ignoring what ray.get() returns, but what will it be??
    pd(time.time() - start, prefix = f'Total time for {num_games} games (max_steps = {max_steps}, batch_size = {batch_size})')
    return game_ids  # for cleanup afterwards

In [17]:
ids1 = time_ray_games2(num_games = 1, max_steps = max_steps, batch_size=1, grid_size=grid_size)
ids2 = time_ray_games2(num_games = 1, max_steps = max_steps, batch_size=50, grid_size=grid_size)

Total time for 1 games (max_steps = 200, batch_size = 1) duration: 11.669 seconds
Total time for 1 games (max_steps = 200, batch_size = 50) duration: 11.923 seconds


In [18]:
cleanup(ids1)
cleanup(ids2)

Sanity check; does the order of the nested looping in `time_ray_games2` make a difference? Let's see...

In [21]:
def time_ray_games3(num_games = 10, max_steps = max_steps, batch_size = 1, grid_size = grid_size):
    game_ids = [RayGame2.remote(State(size = grid_size), RayConwaysRules.remote()) for i in range(num_games)]
    start = time.time()
    state_ids = []
    for i in range(int(max_steps/batch_size)):  # Do a total of max_steps game steps, which is max_steps/delta_steps
        for game_id in game_ids:
            state_ids.append(game_id.step.remote(batch_size))
    ray.get(state_ids)  # wait for everything to finish! We are ignoring what ray.get() returns, but what will it be??
    pd(time.time() - start, prefix = f'Total time for {num_games} games (max_steps = {max_steps}, batch_size = {batch_size})')
    return game_ids  # for cleanup afterwards

In [22]:
ids1 = time_ray_games3(num_games = 1, max_steps = max_steps, batch_size=1, grid_size=grid_size)
ids2 = time_ray_games3(num_games = 1, max_steps = max_steps, batch_size=50, grid_size=grid_size)

Total time for 1 games (max_steps = 200, batch_size = 1) duration: 11.158 seconds
Total time for 1 games (max_steps = 200, batch_size = 50) duration: 10.963 seconds


In [25]:
cleanup(ids1)
cleanup(ids2)

No, it doesn't matter.

So, this didn't help. In both runs, one core was pegged, as discussed in the lesson; we are bottlenecked in `RayConwaysRules.step()`. Now let's try parallizing that. 

We saw already that must compute the evolution of the game serially, because each state depends on the previous state, so we can't parallelize that. However, each _cell_ only depends on its eight neighbors and we have a grid of cells. So, we can break up the grid into smaller grids and compute their changes in parallel.

This is a nontrivial refactoring. This notebook will update in GitHub soon with the solution. 