# About RLlib

In the [previous lesson](01-Introduction-to-Reinforcement-Learning.ipynb), we learned the basic concepts of reinforcement learning, with a "taste" of [RLlib](https://rllib.io) and [OpenAI Gym](https://gym.openai.com). This lesson takes a step back to provide more information about RLlib and the features it provides. The subsequent lessons will continue our exploration of RL algorithms and tools.

For more information about RLlib and its open source community:

* [documentation](https://ray.readthedocs.io/en/latest/rllib.html)
* [GitHub repo](https://github.com/ray-project/ray/tree/master/rllib#rllib-scalable-reinforcement-learning)

RLlib is structured conceptually like this:

![RLlib architecture](../images/RLlib-architecture.png)

The applications we mentioned in the [Introduction](01-Introduction-to-Reinforcement-Learning.ipynb) are summarized on top. 

Next we decide our agent approach:

* Just one agent? The traditional RL problem.
* Multiple cooperating agents? TODO
* A hierarchy of agents operating at different scopes?
* Offline batch? This relatively new technique is useful for situations where an environment simulator doesn't exist and it's not possible to train in the real environment (e.g., a chemical plant), but what can we learn from the historical log data acrued from the real system?

Under the approach, the user access RLlib through a consistent, concise API.

The API provides a wide list of the popular RL algorithms.

RLlib leverages Ray for efficient, cluster-wide scalability.

TODO: more conceptual information about RLlib, e.g., discuss...

* the modules the user needs to understand
* the integrations with TensorFlow, PyTorch, etc. 

In these lessons, we'll use RLlib with TensorFlow.

Here is the current list of supported algorithms, each of which often specifies a particular system architecture. We'll explore many of these in various lessons:

TODO: Provide summaries.

### High-throughput Architectures

* [Distributed Prioritized Experience Replay (Ape-X)](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#distributed-prioritized-experience-replay-ape-x)
* [Importance Weighted Actor-Learner Architecture (IMPALA)](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#importance-weighted-actor-learner-architecture-impala)
* [Asynchronous Proximal Policy Optimization (APPO)](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#asynchronous-proximal-policy-optimization-appo)

### Gradient-based

* [Soft Actor-Critic (SAC)](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#soft-actor-critic-sac)
* [Advantage Actor-Critic (A2C, A3C)](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#advantage-actor-critic-a2c-a3c)
* [Deep Deterministic Policy Gradients (DDPG, TD3)](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#deep-deterministic-policy-gradients-ddpg-td3)
* [Deep Q Networks (DQN, Rainbow, Parametric DQN)](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#deep-q-networks-dqn-rainbow-parametric-dqn)
* [Policy Gradients](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#policy-gradients)
* [Proximal Policy Optimization (PPO)](https://docs.ray.io/en/latest/rllib-algorithms.html#proximal-policy-optimization-ppo)

### Gradient-free

* [Augmented Random Search (ARS)](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#augmented-random-search-ars)
* [Evolution Strategies](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#evolution-strategies)

### Multi-agent Specific

* [QMIX Monotonic Value Factorisation (QMIX, VDN, IQN)](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#qmix-monotonic-value-factorisation-qmix-vdn-iqn)

### Offline

* [Advantage Re-Weighted Imitation Learning (MARWIL)](https://ray.readthedocs.io/en/latest/rllib-algorithms.html#advantage-re-weighted-imitation-learning-marwil)

The next lesson, [03: Application: Cart Pole](03-Application-Cart-Pole.ipynb) returns to the _cart pole_ example, where we train a moving car to balance a vertical pole. Based on the `CartPole-v0` environment from OpenAI Gym.