## Supervised learning vs. reinforcement learning
<!-- video shot="/0Ml-ccrTVS4" start="00:31" end="03:48" -->

#### Supervised learning refresher

- Supervised learning learns to predict y (most commonly a number or category) from x (most commonly a vector of numbers or an image) given a dataset.

<br><br>

➡️ Press the right arrow key to advance to the next slide (and the escape key to see all slides).

#### Supervised learning examples

- Predicting house prices given house features
- Classifying an email as spam or not spam
- Identifying whether an image contains a dog

General idea: predict the output given the input

#### The "API" of supervised learning

![](img/SL-API.png)

#### The "API" of reinforcement learning

![](img/SL-vs-RL.png)

#### Reinforcement learning

- RL input: an **environment** 
- RL output: a **policy** that makes decisions

We will define environments and policies later in this module!

#### Reinforcement learning examples

- A self-driving car learning to drive
  - Env: road conditions, how the car drives. Policy: the self-driving algorithm
- Learning to play a video game
  - Env: the game. Policy: the game AI
- Learning the "best" sequence of movies to recommend to a user
  - Env: user movie preferences/behaviors. Policy: the recommender system

General idea: you need to make a sequence of decisions and want to act optimally at each step.

These decisions affect future inputs/outputs (unlike SL).

Notes:

- The input to supervised learning is a dataset
- The output is the trained model that can make predictions
- The model itself has inputs and outputs ($x$ and $y$)
- Note that, again, the output is a function which can be used to compute/predict something.
- In the next section we'll delve into environments and policies.


#### Let's apply what we learned!

## Supervised learning or reinforcement learning
<!-- multiple choice -->

#### Would you solve this problem with SL or RL?

_Classifying the species based on an image of a tree_

- [x] SL 
- [ ] RL | Try again -- this is a single prediction rather than a sequence of actions.

#### Would you solve this problem with SL or RL?

_Creating an AI that plays chess_

- [ ] SL | Try again -- the chess AI needs to act optimally at each step in a sequence.
- [x] RL 

#### Would you solve this problem with SL or RL?

_Predicting whether or not a user will like a movie_

- [x] SL 
- [ ] RL | Try again -- this is a single prediction rather than a sequence of actions.


## Frozen Lake preview
<!-- coding exercise -->

In the next section you'll be exploring an RL environment called [Frozen Lake](https://www.gymlibrary.dev/environments/toy_text/frozen_lake/). It is a small game that looks like this:

```
P...
.O.O
...O
O..G
```

The goal is for the player `P` to reach the goal `G` while walking on the ice `.` and avoiding falling into the holes `O`. The ice is slippery though, and sometimes when the player tries to walk in a certain direction, they slip and go a different way. This makes the game challenging as you may slip into a hole by accident.

This is your first coding exercise. You'll access the coding exercises for the course by clicking the "Open in Colab" button below. **When running the code in the Colab notebook, make sure to run the cells at the top first, which will install the necessary dependencies for any of the exercises to run.** 

You won't understand the code yet (that is coming!) but you can start exploring already. First, run the code and you will observe a player behaving randomly in the environment. Then, change `USE_RL = False` to `USE_RL = True` and observe a player that learned how to play using RL. When you are done, answer the multiple choice questions below. 

In [22]:
# EXERCISE
from ray.rllib.algorithms.ppo import PPO
from utils import slippery_algo_config
import gym
from IPython import display
import time
import numpy as np

ppo = PPO(env="FrozenLake-v1", config=slippery_algo_config)
ppo.restore("models/FrozenLakeSlippery50/checkpoint_000050/")

env = gym.make("FrozenLake-v1", is_slippery=True)
from utils import fix_frozen_lake_render
fix_frozen_lake_render(env)

obs = env.reset()

# The player behaves randomly by default.
# To use a player that learned how to play using RL, change this to True!
USE_RL = False

done = False
while not done:
    if USE_RL:
        action = ppo.compute_single_action(obs, explore=False)
    else:
        action = env.action_space.sample()
    obs, reward, done, _ = env.step(action)

    display.clear_output(wait=True)
    env.render()
    time.sleep(0.25)
    
if reward == 1:
    print("Success!")
else:
    print("Agent failed to reach the goal :(")
    
ppo.stop()

  (Up)
....
.P.O
...O
O..G
Agent failed to reach the goal :(


#### Random player

How does the random player do at this Frozen Lake game?

- [x] It usually fails to reach the goal.
- [ ] It usually reaches the goal!

#### RL player

How does the player trained with RL do at this Frozen Lake game?

- [ ] It usually fails to reach the goal.
- [x] It usually reaches the goal!

In [1]:
# TODO
# start the course by showing some  cool results.
# not sure yet whether that should be here, or at the end of module 1 (maybe a section 1.4)
# let's see...

# Also TODO:
# Talk about simulations since we can't usually do RL training in a "real" environment
