# Ray RLlib - Overview

© 2019-2020, Anyscale. All Rights Reserved

![Anyscale Academy](../images/AnyscaleAcademy_Logo_clearbanner_141x100.png)

This tutorial, part of [Anyscale Academy](https://anyscale.com/academy), introduces the broad topic of _reinforcement learning_ (RL) and [RLlib](https://ray.readthedocs.io/en/latest/rllib.html), Ray's comprehensive RL library.

The lessons in this tutorial use different _environments_ from [OpenAI Gym](https://gym.openai.com/) to illustrate how to train _policies_.

See the instructions in the [README](../README.md) for setting up your environment to use this tutorial.

Go [here](../Overview.ipynb) for an overview of all tutorials.

## Tutorial Sections

Because of the breadth of RL this tutorial is divided into several segments. See below for a recommended _learning plan_.

### Core Reinforcement Learning

|    | Lesson | Description |
| :- | :----- | :---------- |
| 00 | [Ray RLlib Overview](00-Ray-RLlib-Overview.iypnb) | Overview of this tutorial. |
| 01 | [Introduction to Reinforcement Learning](01-Introduction-to-Reinforcement-Learning.ipynb) | A quick introduction to the concepts of reinforcement learning. You can skim or skip this lesson if you already understand RL concepts. |
| 02 | [Introduction to RLlib](02-Introduction-to-RLlib.ipynb) | An overview of RLlib, its goals and the capabilities it provides. |
| 03 | [Application - Cart Pole](03-Application-Cart-Pole.ipynb) | The best starting place for learning how to use RL, in this case to train a moving car to balance a vertical pole. Based on the `CartPole-v0` environment from OpenAI Gym, combined with RLlib. |
| 04 | [Application: Bipedal Walker](04-Bipedal-Walker.ipynb) | Train a two-legged robot simulator. This is an optional lesson, due to the longer compute times required, but fun to try. |
| 05 | [Custom Environments and Reward Shaping](05-Custom-Environments-Reward-Shaping.ipynb) | How to customize environments and rewards for your applications. |
| 06 | [Online Learning with DQN](06-Online-Learning-with-DQN.ipynb) | How to set up a server that simultaneously serves and learns a policy. |
| 07 | [RL References](07-RL-References.ipynb) | References on reinforcement learning. |

Some additional examples you might explore can be found in the `extras` folder:

| Lesson | Description |
| :----- | :---------- |
| [Extra: Application - Mountain Car](extras/Extra-Application-Mountain-Car.ipynb) | Based on the `MountainCar-v0` environment from OpenAI Gym. |
| [Extra: Application - Taxi](extras/Extra-Application-Taxi.ipynb) | Based on the `Taxi-v3` environment from OpenAI Gym. |
| [Extra: Application - Frozen Lake](extras/Extra-Application-Frozen-Lake.ipynb) | Based on the `FrozenLake-v0` environment from OpenAI Gym. |

In addition, exercise solutions for this tutorial can be found [here](solutions/Ray-RLlib-Solutions.ipynb).

For earlier versions of some of these tutorials, see [`rllib_exercises`](https://github.com/ray-project/tutorial/blob/master/rllib_exercises/rllib_colab.ipynb) in the original [github.com/ray-project/tutorial](https://github.com/ray-project/tutorial) project.

### Multi-Armed Bandits

_Multi-Armed Bandits_ (MABs) are a special kind of RL problem that have broad and growing applications. The term is inspired by the slot machines in casinos, so called _one-armed bandits_, but where a machine might have more than one arm. 

|    | Lesson | Description |
| :- | :----- | :---------- |
| 00 | [Multi-Armed-Bandits Overview](multi-armed-bandits/00-Multi-Armed-Bandits-Overview.iypnb) | Overview of this set of lessons. |
| 01 | [Introduction to Multi-Armed Bandits](multi-armed-bandits/01-Introduction-to-Multi-Armed-Bandits.ipynb) | A quick introduction to the concepts of multi-armeed bandits (MABs) and how they fit in the spectrum of RL problems. |
| 02 | [Exploration vs. Exploitation Strategies](multi-armed-bandits/02-Exploration-vs-Exploitation-Strategies.iypnb) | A deeper look at algorithms that balance exploration vs. exploitation, the key challenge for efficient solutions. Much of this material is technical and can be skipped in a first reading, but skim the first part of this lesson at least. |
| 03 | [Simple Multi-Armed Bandit](multi-armed-bandits/03-Simple-Multi-Armed-Bandit.ipynb) | A simple example of a multi-armed bandit to illustrate the core ideas. |
| 04 | [Linear Upper Confidence Bound](multi-armed-bandits/04-Linear-Upper-Confidence-Bound.ipynb) | One popular algorithm for exploration vs. exploitation is _Upper Confidence Bound_. This lesson shows how to use a linear version in RLlib. |
| 05 | [Linear Thompson Sampling](multi-armed-bandits/05-Linear-Thompson-Sampling.ipynb) | Another popular algorithm for exploration vs. exploitation is _Thompson Sampling_. This lesson shows how to use a linear version in RLlib. |
| 06 | [Market Example](multi-armed-bandits/06-Market-Example.ipynb) | A simplified real-world example of MABs, finding the optimal stock and bond investment strategy. |

In addition, exercise solutions for this segment of the tutorial can be found [here](multi-armed-bandits/solutions/Multi-Armed-Bandits-Solutions.ipynb).

## Learning Plan

We recommend the following _learning plan_ for working through the lessons:

Start with the overview material for RL and RLlib:

* [Ray RLlib Overview](00-Ray-RLlib-Overview.iypnb)
* [Introduction to Reinforcement Learning](01-Introduction-to-Reinforcement-Learning.ipynb) 
* [Introduction to RLlib](02-Introduction-to-RLlib.ipynb)

Then study several of the lessons for MABs, starting with these lessons:

* [Multi-Armed-Bandits Overview](multi-armed-bandits/00-Multi-Armed-Bandits-Overview.iypnb)
* [Introduction to Multi-Armed Bandits](multi-armed-bandits/01-Introduction-to-Multi-Armed-Bandits.ipynb)
* [Exploration vs. Exploitation Strategies](multi-armed-bandits/02-Exploration-vs-Exploitation-Strategies.iypnb): Skim at least the first part of this lesson. 
* [Simple Multi-Armed Bandit](multi-armed-bandits/03-Simple-Multi-Armed-Bandit.ipynb)

As time permits, study one or both of the following lessons:

* [Linear Upper Confidence Bound](multi-armed-bandits/04-Linear-Upper-Confidence-Bound.ipynb)
* [Linear Thompson Sampling](multi-armed-bandits/05-Linear-Thompson-Sampling.ipynb)

Then finish with this more complete example:

* [Market Example](multi-armed-bandits/06-Market-Example.ipynb)

Next, return to the "core" RL lessons:

Study the popular _CartPole_ example:

* [Application: Cart Pole](03-Application-Cart-Pole.ipynb)

If you working through the lessons at your own pace (i.e., not in a class), work through the _Bipedal Walker_ example:

* [Application: Bipedal Walker](04-Bipedal-Walker.ipynb)

Then finish the rest of the `ray-rllib` lessons:

* [Custom Environments and Reward Shaping](05-Custom-Environments-Reward-Shaping.ipynb)
* [Online Learning with DQN](06-Online-Learning-with-DQN.ipynb)

Other examples are provided for your use. See the `extras` directory for examples that work with other OpenAI Gym environments:

* [Extra: Application - Mountain Car](extras/Extra-Application-Mountain-Car.ipynb)
* [Extra: Application - Taxi](extras/Extra-Application-Taxi.ipynb)
* [Extra: Application - Frozen Lake](extras/Extra-Application-Frozen-Lake.ipynb)

## Getting Help

* The [#tutorial channel](https://ray-distributed.slack.com/archives/C011ML23W5B) on the [Ray Slack](https://ray-distributed.slack.com)
* [Email](mailto:academy@anyscale.com)

Find an issue? Please report it!

* [GitHub issues](https://github.com/anyscale/academy/issues)