## Ray
<!-- video shot="/rb-TmEfJIdI" start="16:37" end="22:00" -->

#### What is Ray?

This whole course we've been using the Ray package:

In [1]:
import ray.rllib

![](img/ray-logo.png)

What is Ray? From the [docs](https://docs.ray.io/en/latest/):

> Ray is a general-purpose and universal distributed compute framework.

Ray is also:

- An [active open source project](https://github.com/ray-project/ray) with over 20k stars on GitHub 🤩
- Backed by the unicorn startup [Anyscale](https://www.anyscale.com/), that produced this course 🦄

Notes:

But, back to distributed computing.

#### What is distributed computing?

_Distributed computing_ is computing that involves multiple machines (nodes) distributed across a network.

![](img/supercomputer.png)

Pros:

- Massively improved capabilities

Cons/challenges:

- Synchronization
- Failure
- ...

#### Ray makes distributed computing easy

- The goal of Ray is to make distributed computing easy and accessible.
- Ray handles most of the challenges for users.
- RLlib, tune and the other sub-packages were built on top of Ray.
- This means _RLlib and tune automatically have distributed capabilities._

Notes:

Surprise! RLlib is easy to use and conveniently implements many state-of-the-art RL algorithms, but it has another benefit that we didn't mention until now: natural distributed computing capabilities. This puts it well ahead of competing packages in ease of distributing the computation.

#### RLlib, distributed

- In this course we set up algorithm configs many times.
- But there are some parameters we haven't used before:

In [2]:
from ray.rllib.algorithms.ppo import PPOConfig

In [3]:
ppo_config = (
    PPOConfig()
    .framework("torch")
    .rollouts(num_rollout_workers=4, num_envs_per_worker=2)
    .resources(num_gpus=0)
)

You can read more about specifying resources [here](https://docs.ray.io/en/master/rllib/rllib-training.html#specifying-resources) and about scaling [here](https://docs.ray.io/en/master/rllib/rllib-training.html#scaling-guide).

But... what is a "rollout worker"?

#### Rollout workers

- Rollout workers collect data from the environment (simulator) in parallel.
- For most simulator environments, one can replicate the environment in a cluster.
- Therefore, you can collect data much faster and avoid bottlenecking the training.
- Whatever cluster Ray is connected to on the backend, `num_rollout_workers=4` works seamlessly.

Notes:

In supervised learning, when you're waiting you know you're probably waiting for the model to train. In RL, the bottleneck could be the data collection or the model updates. Being able to parallelize rollouts alleviates the data collection bottleneck. 

#### Ray tune, revisited

- Keep in mind that hyperparameter tuning like grid search is also easily distributed.
- Fortunately `tune` is also part of Ray and, like RLlib, takes care of this for you! 
- As you can see, Ray + tune + RLlib becomes quite a powerful combination.

#### Driver

In all our configs we've had

```python
create_env_on_driver = True
```

What this means is that we put the env on the same "driver" process that's running the training.

#### Summary

- Ray is incredibly powerful, and we've only scratched the tip of the iceberg.
- Some other resources:
  - [ray.io](https://www.ray.io/)
  - [Learning Ray](https://www.oreilly.com/library/view/learning-ray/9781098117214/) (book)

#### Let's apply what we learned!

## What is Ray?
<!-- multiple choice -->

What is Ray?

- [ ] The company that creates RLlib. | You might be thinking of Anyscale, the company behind Ray!
- [ ] A sub-package within RLlib that deals with distributed computing.
- [x] A general-purpose package that includes RLlib, which deals with distributed computing.
- [ ] A reinforcement learning algorithm. 

## Distributing RLlib
<!-- multiple choice -->

What is the primary way in which RLlib makes use of distributed computing abilities?

- [x] Distributed rollout workers generate data from env clones that is fed into the learning algorithm. 
- [ ] The policy neural network training is distributed across multiple nodes.
- [ ] Each node has a separate policy neural network that is trained independently on its own node.

## Experimenting with rollout workers
<!-- coding exercise -->

The code below creates two PPO algorithm instances, one that should use two rollout workers with two envs per worker, and one that uses only one rollout worker with one env per worker. It then prints out the time elapsed to train each one for 5 iterations. Complete and run the code and then compare the times. 

Note: this experiment will _usually_ work. However, this code us running on a server that is potentially being used my multiple learners at the same time, so the runtimes may be influenced by the server load. Also, this server is not a proper cluster so any benefits would come from multiple cores of CPU being available to parallelize within one machine.

In [4]:
# EXERCISE

from ray.rllib.algorithms.ppo import PPOConfig
import time
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 

ppo_config_many = (
    PPOConfig()
    .framework("torch")
    .____(____)
    .training(model={"fcnet_hiddens" : [32,32]})
    .environment(env="FrozenLake-v1")
)

ppo_config_single = (
    PPOConfig()
    .framework("torch")
    .____(num_rollout_workers=1, num_envs_per_worker=1)
    .training(model={"fcnet_hiddens" : [32,32]})
    .environment(env="FrozenLake-v1")
)

ppo_many = ppo_config_many.build()
t = time.time()
for i in range(5):
    ppo_many.train()
print(f"Elapsed time with 2 workers, 2 envs each: {time.time()-t:.1f}s.")
ppo_many.stop()

ppo_single = ppo_config_single.build()
t = time.time()
for i in range(5):
    ppo_single.train()
print(f"Elapsed time with 1 worker, 1 env: {time.time()-t:.1f}s.")
ppo_single.stop()

AttributeError: 'PPOConfig' object has no attribute '____'

In [1]:
# SOLUTION

from ray.rllib.algorithms.ppo import PPOConfig
import time
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 

ppo_config_many = (
    PPOConfig()
    .framework("torch")
    .rollouts(num_rollout_workers=2, num_envs_per_worker=2)
    .training(model={"fcnet_hiddens" : [32,32]})
    .environment(env="FrozenLake-v1")
)

ppo_config_single = (
    PPOConfig()
    .framework("torch")
    .rollouts(num_rollout_workers=1, num_envs_per_worker=1)
    .training(model={"fcnet_hiddens" : [32,32]})
    .environment(env="FrozenLake-v1")
)

ppo_many = ppo_config_many.build()
t = time.time()
for i in range(5):
    ppo_many.train()
print(f"Elapsed time with 2 workers, 2 envs each: {time.time()-t:.1f}s.")
ppo_many.stop()

ppo_single = ppo_config_single.build()
t = time.time()
for i in range(5):
    ppo_single.train()
print(f"Elapsed time with 1 worker, 1 env: {time.time()-t:.1f}s.")
ppo_single.stop()

2022-11-19 15:51:10,871	INFO worker.py:1518 -- Started a local Ray instance.


Elapsed time with 2 workers, 2 envs each: 8.6s.




Elapsed time with 1 worker, 1 env: 13.6s.
