# Exercise 8 - Speed up Serialization

**GOAL:** The goal of this exercise is to illustrate how to speed up serialization by using `ray.put`.

### Concepts for this Exercise - ray.put

Object IDs can be created in multiple ways.
- They are returned by remote function calls.
- They are returned by actor method calls.
- They are returned by `ray.put`.

When an object is passed to `ray.put`, the object is serialized using the Apache Arrow format (see https://arrow.apache.org/ for more information about Arrow) and copied into a shared memory object store. This object will then be available to other workers on the same machine via shared memory. If it is needed by workers on another machine, it will be shipped under the hood.

**When objects are passed into a remote function, Ray puts them in the object store under the hood.** That is, if `f` is a remote function, the code

```python
x = np.zeros(1000)
f.remote(x)
```

is essentially transformed under the hood to

```python
x = np.zeros(1000)
x_id = ray.put(x)
f.remote(x_id)
```

The call to `ray.put` copies the numpy array into the shared-memory object store, from where it can be read by all of the worker processes (without additional copying). However, if you do something like

```python
for i in range(10):
    f.remote(x)
```

then 10 copies of the array will be placed into the object store. This takes up more memory in the object store than is necessary, and it also takes time to copy the array into the object store over and over. This can be made more efficient by placing the array in the object store only once as follows.

```python
x_id = ray.put(x)
for i in range(10):
    f.remote(x_id)
```

In this exercise, you will speed up the code below and reduce the memory footprint by calling `ray.put` on the neural net weights before passing them into the remote functions.

**WARNING:** This exercise requires a lot of memory to run. If this notebook is running within a Docker container, then the docker container must be started with a large shared-memory file system. This can be done by starting the docker container with the `--shm-size` flag.

In [1]:
import os
os.system('pip install ray')

0

In [0]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import pickle
import numpy as np
import ray
import time

In [3]:
ray.init(num_cpus=4, include_webui=False, ignore_reinit_error=True)

2019-05-16 12:21:49,949	INFO node.py:469 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-05-16_12-21-49_125/logs.
2019-05-16 12:21:50,064	INFO services.py:407 -- Waiting for redis server at 127.0.0.1:22136 to respond...
2019-05-16 12:21:50,194	INFO services.py:407 -- Waiting for redis server at 127.0.0.1:35395 to respond...
2019-05-16 12:21:50,198	INFO services.py:804 -- Starting Redis shard with 2.58 GB max memory.
2019-05-16 12:21:50,232	INFO node.py:483 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-05-16_12-21-49_125/logs.
2019-05-16 12:21:50,235	INFO services.py:1427 -- Starting the Plasma object store with 3.87 GB memory using /dev/shm.


{'node_ip_address': '172.28.0.2',
 'object_store_address': '/tmp/ray/session_2019-05-16_12-21-49_125/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2019-05-16_12-21-49_125/sockets/raylet',
 'redis_address': '172.28.0.2:22136',
 'webui_url': None}

Define some neural net weights which will be passed into a number of tasks.

In [0]:
neural_net_weights = {'variable{}'.format(i): np.random.normal(size=2**18)
                      for i in range(50)}

**EXERCISE:** Compare the time required to serialize the neural net weights and copy them into the object store using Ray versus the time required to pickle and unpickle the weights. The big win should be with the time required for *deserialization*.

Note that when you call `ray.put`, in addition to serializing the object, we are copying it into shared memory where it can be efficiently accessed by other workers on the same machine.

**NOTE:** You don't actually have to do anything here other than run the cell below and read the output.

**NOTE:** Sometimes `ray.put` can be faster than `pickle.dumps`. This is because `ray.put` leverages multiple threads when serializing large objects. Note that this is not possible with `pickle`.

In [6]:
print('Ray - serializing')
%time x_id = ray.put(neural_net_weights)
print('\nRay - deserializing')
%time x_val = ray.get(x_id)

print('\npickle - serializing')
%time serialized = pickle.dumps(neural_net_weights)
print('\npickle - deserializing')
%time deserialized = pickle.loads(serialized)

Ray - serializing
CPU times: user 55.3 ms, sys: 130 ms, total: 185 ms
Wall time: 130 ms

Ray - deserializing
CPU times: user 1.13 ms, sys: 23 µs, total: 1.15 ms
Wall time: 1.08 ms

pickle - serializing
CPU times: user 125 ms, sys: 175 ms, total: 300 ms
Wall time: 300 ms

pickle - deserializing
CPU times: user 47.5 ms, sys: 8.95 ms, total: 56.4 ms
Wall time: 56.6 ms


Define a remote function which uses the neural net weights.

In [0]:
@ray.remote
def use_weights(weights, i):
    return i

**EXERCISE:** In the code below, use `ray.put` to avoid copying the neural net weights to the object store multiple times.

In [0]:
# Sleep a little to improve the accuracy of the timing measurements below.
time.sleep(2.0)
start_time = time.time()
nnw_id = ray.put(neural_net_weights)
results = ray.get([use_weights.remote(nnw_id, i)
                   for i in range(20)])

end_time = time.time()
duration = end_time - start_time

**VERIFY:** Run some checks to verify that the changes you made to the code were correct. Some of the checks should fail when you initially run the cells. After completing the exercises, the checks should pass.

In [17]:
assert results == list(range(20))
assert duration < 1, ('The experiments ran in {} seconds. This is too '
                      'slow.'.format(duration))

print('Success! The example took {} seconds.'.format(duration))

Success! The example took 0.07344412803649902 seconds.
