# Efficient Parallel Task Execuition with *ray*
Since *ray.Pool*s appear to be batched and hence always wait for the longest task to finish, we need a more efficient way.

We have seen that policy evaluation doesn't vary in duration so much, so round robin is a good strategy for that use case. However, trajectory evaluation varies greatly. But the evaluation can be executed in a non-functional way. Workers don't need to return the result to the caller. They can instead store the result anywhere. Overall task termination status can be polled in a classical check-and-wait loop. The approach illustrated in this little tutorial takes little more that the average time over all task executions, i.e. it is almost perfectly efficient. Hence we'll implement that approach for the self play actors.

In [1]:
import ray
from alphazero.ray.generic import TaskMonitor, RayFilePickler, SimpleCountingDispatcher

### Some Dummy implementations
Workers pick up tasks from a common dispatcher and provide the result to the collector. The dispatcher acts as common counter, but it could as well enumerate input resources.

In [2]:
import uuid
import time

@ray.remote
class Worker:
    def __init__(self, wid, dispatcher, collector):
        self.wid = wid
        self.collector = collector
        self.dispatcher = dispatcher

    def init(self, *args, **kwargs):
        pass

    def work(self):
        """
        fetch task, report result - until all jobs are done
        """
        while True:
            task = ray.get(self.dispatcher.get_task.remote())
            if task is None:
                break
            seqno, effort = task
            time.sleep(effort)
            the_result = str(uuid.uuid4())
            self.collector.collect.remote(self.wid, seqno, the_result)

@ray.remote
class Collector:

    def __init__(self, monitor):
        self.result = []
        self.monitor = monitor

    def collect(self, wid, seqno, load):
        """
        collect all results
        """
        self.result.append((wid, seqno, load))
        self.monitor.report.remote()

    def get_result(self):
        return self.result


@ray.remote
class Dispatcher:

    def __init__(self, num_tasks):
        self.seqno = 0
        self.num_tasks = num_tasks

    def get_task(self):
        """
        Provide tasks of different 'effort' until exhausted
        """
        self.seqno = self.seqno + 1
        if self.seqno <= self.num_tasks:
            return self.seqno, self.seqno % 4 + 1
        else:
            return None


In [3]:
rctx = ray.init(ignore_reinit_error=True)

2022-08-05 06:04:16,572	INFO services.py:1470 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


In [4]:
N_WORKERS = 8
N_TASKS = 32

In [5]:
the_monitor = TaskMonitor.remote()

In [6]:
the_collector = Collector.remote(the_monitor)

$N_t$ tasks, duration 1-4s, average $t=2.5s$, with $N_w$ workers: We expect $N_t/N_w*t$ seconds total duration

In [7]:
the_dispatcher = Dispatcher.remote(N_TASKS)

In [8]:
workers = [Worker.remote(wid, the_dispatcher, the_collector) for wid in range(N_WORKERS)]

In [9]:
for worker in workers:
    worker.work.remote()
from alphazero.utils import Timer

with Timer(verbose=True):
    while True:
        time.sleep(.5)
        status = ray.get(the_monitor.get_status.remote())
        if status == N_TASKS:
            break

elapsed time: 11116.232351 ms


A quick check confirms that actor order is sufficiently random to rule out systematic inefficiencies.

In [10]:
result = ray.get(the_collector.get_result.remote())
result

[(2, 4, '5267c11c-5457-4ce3-8d48-5a4e35413223'),
 (7, 8, 'bced6ded-03b1-49b8-b826-83af94eb1142'),
 (3, 1, 'fc29a9ba-750c-4aa8-8ab8-d0be300c382a'),
 (4, 5, 'dec59af2-08a4-494f-b016-bb2d2574c994'),
 (0, 2, 'b2431b6d-e409-409f-a1e7-39cc81a52de9'),
 (5, 6, '59e888be-efe4-479f-b338-1039bea41c11'),
 (4, 12, '5fbaafe2-5c4e-434f-af35-a857fe85033d'),
 (2, 9, '3d311bdf-f9bf-43d8-a022-96c0c27f8ef5'),
 (1, 3, 'e22dc829-0a1a-4e10-bfb6-e57c191237ee'),
 (6, 7, '71136609-16d4-4f26-9ac5-d4afe63d1253'),
 (7, 10, '9ceb0d6d-fe25-4a22-b74f-6cf10c4b9615'),
 (2, 16, '3f7d2ff0-dde7-4f74-ad52-f394cd11e47b'),
 (0, 13, '7414d498-1307-41c4-94ca-aab22302efa9'),
 (2, 20, 'd75d4efd-0d50-4403-a196-73ad93ae9dcd'),
 (3, 11, '605ac780-fc3f-4f5e-bfb6-63cc624be5b1'),
 (5, 14, '5b290cc7-6c1a-451d-a8b2-a0f098d62947'),
 (1, 17, '340fc799-339a-405e-b021-88244d8d2304'),
 (6, 18, 'b3fe2671-1225-4d1e-8bc9-2266316ddd3d'),
 (4, 15, '738be65f-22f0-4e82-80da-7e98e65850a2'),
 (1, 24, 'e78750d5-e2f5-479a-963a-0423891db80d'),
 (0, 21, 

In [11]:
ray.shutdown()

---
# Collecting worker results in a file

In [None]:
rctx = ray.init(ignore_reinit_error=True)

In [20]:
import os
filename = os.getcwd() + "/tmp.pickle"
f = RayFilePickler.remote(filename, 'wb+')

In [21]:
f.write.remote([1,2,3])
f.write.remote([1,2,3]);

In [22]:
f.close.remote();

On the ray dashboard, you can observe that closing the file also removes the actor.

### Check the result

In [23]:
!ls -lt *tmp*

-rw-r--r--  1 wgiersche  staff  44 Aug  5 06:11 tmp.pickle


In [24]:
from pickle import Unpickler
def read_entries(file_name):
    fp = open(file_name, 'rb')
    rows = []
    while True:
        try:
            rows.append(Unpickler(fp).load())
        except EOFError:
            return rows

In [25]:
read_entries(file_name=filename)

[[1, 2, 3], [1, 2, 3]]

### Clean up

In [29]:
!rm -f tmp.pickle

In [30]:
ray.shutdown()

---
# Efficient Parallel SelfPlay

In [33]:
from alphazero.ray.trainer import SelfPlayDelegator

In [31]:
ray.init(ignore_reinit_error=True);

2022-08-05 06:17:44,080	INFO services.py:1470 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


In [34]:
import os
filename = os.getcwd() + "/tmp.pickle"
the_writer = RayFilePickler.remote(filename, 'wb+')

In [35]:
the_counter = SimpleCountingDispatcher.remote(2)

### The *Business Logic*

In [36]:
from alphazero.gomoku_game import RandomBoardInitializer, GomokuGame
from alphazero.interfaces import MctsParams

params = MctsParams(
    cpuct = 1.0,
    num_simulations=100,
    model_threshold=.2)

# Number of self-play workers
N_SP = 1

# Number of policy workers
N_P = 1

rbi = RandomBoardInitializer(15, 4, 5, 9, 5, 9)
game = GomokuGame(15, rbi)


## Setting up the zoo of actors
A pool of selp-play actors feeding into a round-robin dispatcher serving a fixed number of policy workers. This fixed number may be necessary when the workers utilize non-shareable  resources like e.g. GPUs.

In [37]:
from alphazero.interfaces import PolicySpec
from alphazero.self_play import SelfPlay
from alphazero.policies.ray_impl import HeuristicRayPolicy
from alphazero.ray.trainer import create_pool, PolicyRef

the_dispatcher = create_pool(num_workers=N_P, policy=HeuristicRayPolicy(),
                         board_size=15, cut_off = 0.5)
selfplays = [SelfPlay.remote(mcts_params=params) for _ in range(N_SP)]
for selfplay in selfplays:
    selfplay.init.remote(15, game, PolicySpec(pool_ref=PolicyRef(the_dispatcher)))

ModuleNotFoundError: No module named 'alphazero.ray_pool'

In [26]:
workers = [SelfPlayDelegator.remote(1, the_writer, the_counter, selfplay) for selfplay in selfplays]

In [27]:
for worker in workers:
    worker.work.remote()

In [29]:
the_writer.close.remote();

In [30]:
!ls -lt *tmp*

-rw-r--r--  1 wgiersche  staff  11152 Aug  4 23:50 tmp.pickle


In [15]:
ray.shutdown()

In [33]:
trajectories = read_entries(filename)

In [34]:
trajectories

[[(array([129, 126, 144, 111], dtype=uint8),
   array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
           0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
           0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
           0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
           0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 12,  0,  0,  0,
           0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 15,  0,  0, 38,  0,  0,
           0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 61,  0,  0,  0,  0,
           0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
           0,  0,  0,  0,  0, 15,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
           0,  0,  0, 12,  0,  0, 56,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
           0,  0,  0,  0, 41,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
           0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
           0,  0,  