# Ray RLlib - Multi-Armed Bandits - Overview

© 2019-2021, Anyscale. All Rights Reserved

![Anyscale Academy](../../images/AnyscaleAcademyLogo.png)

This part of the [RLlib tutorial](../00-Ray-RLlib-Overview.ipynb) tutorial introduces _Multi-Armed Bandits_ (MABs) a popular approach that is very similar to "classic" reinforcement learning (RL), but with some differences, which we'll cover in the [first lesson](01-Introduction-to-Multi-Armed-Bandits.ipynb).

[RLlib](https://ray.readthedocs.io/en/latest/rllib.html) provides several bandit algorithms:

* [Linear Upper Confidence Bound (contrib/LinUCB)](https://docs.ray.io/en/latest/rllib-algorithms.html#linear-upper-confidence-bound-contrib-linucb)
* [Linear Thompson Sampling (contrib/LinTS)](https://docs.ray.io/en/latest/rllib-algorithms.html#linear-thompson-sampling-contrib-lints)

Here are the lessons. Note that the `04a-04c` lessons can be studied in any order.

|     | Lesson | Description |
| :-- | :----- | :---------- |
| 00  | [Multi-Armed-Bandits Overview](00-Multi-Armed-Bandits-Overview.ipynb) | Overview of this set of lessons. |
| 01  | [Introduction to Multi-Armed Bandits](01-Introduction-to-Multi-Armed-Bandits.ipynb) | A quick introduction to the concepts of multi-armed bandits (MABs) and how they fit in the spectrum of RL problems. |
| 02  | [Exploration vs. Exploitation Strategies](02-Exploration-vs-Exploitation-Strategies.ipynb) | A deeper look at algorithms that balance exploration vs. exploitation, the key challenge for efficient solutions. Much of this material is technical and can be skipped in a first reading, but skim the first part of this lesson at least. |
| 03  | [Simple Multi-Armed Bandit](03-Simple-Multi-Armed-Bandit.ipynb) | A simple example of a multi-armed bandit to illustrate the core ideas. |
| 04  | [Linear Upper Confidence Bound](04-Linear-Upper-Confidence-Bound.ipynb) | One popular algorithm for exploration vs. exploitation is _Upper Confidence Bound_. This lesson shows how to use a linear version in RLlib. |
| 05  | [Linear Thompson Sampling](05-Linear-Thompson-Sampling.ipynb) | Another popular algorithm for exploration vs. exploitation is _Thompson Sampling_. This lesson shows how to use a linear version in RLlib. |
| 06  | [Market Example](06-Market-Example.ipynb) | A simplified real-world example of MABs, finding the optimal stock and bond investment strategy. |

In addition, exercise solutions for this tutorial can be found [here](solutions/Multi-Armed-Bandits-Solutions.ipynb).

## Getting Help

* The [#tutorial channel](https://ray-distributed.slack.com/archives/C011ML23W5B) on the [Ray Slack](https://ray-distributed.slack.com). [Click here](https://forms.gle/9TSdDYUgxYs8SA9e8) to join.

Find an issue? Please report it!

* [GitHub issues](https://github.com/anyscale/academy/issues)