# Q Learning
### Introduction
Today, you'll learn about QLearning using the gym api.
> Gym is a standard API for reinforcement learning, and a diverse collection of reference environments.

Reinforcement learning is a subset of Artificial Intelligence. Our AI will learn by playing the same game over and over again until it learns what it must do in order to win.

In this workshop, you will:

1. Learn about Q Tables
2. Setup a gym environment
3. Train an AI to solve the FrozenLake environment (https://www.gymlibrary.ml/environments/toy_text/frozen_lake/)

Let's start by importing the required libraries:

In [None]:
%pip install pygame gym numpy matplotlib
import random
import gym
import numpy as np
import matplotlib.pyplot as plt

## 1. Q-Learning

A simple example of Q-Learning: 

![Example of a Q-Learning Environment](./images/example.png)

the illustrations in this notebook come from here: (https://www.freecodecamp.org/news/an-introduction-to-q-learning-reinforcement-learning-14ac0b4493cc/#:~:text=Q%2DTable%20is%20just%20a,at%20each%20non%2Dedge%20tile.)

A game where the AI must get to the “End” coordinates on the map in the shortest amount of time while avoiding the bombs and collecting the bonuses.

A Q-Table for this environment could be visualized like this. 

![Example of a Q-Table](./images/qtable.png)

With all of the possible actions listed horizontally and all of the possible states listed vertically.

If you wish to visualise this in python, run the following command:

In [None]:
# Initializing an empty table

def initQtable(x,y):
## Create an array of zeroes of shape 5x4
## correction: qTable = np.zeros((x, y))
    return None

qTable = initQtable(5, 4)
print("Q-Table:\n" + str(qTable))

In reinforcement learning, the AI receives a **reward** each time it **acts**:\
Here, the AI would likely receive a **negative reward** (-1) if it chooses to move to the right (because there's a bomb)\
However, if it went anywhere else, it would probably receive a **neutral reward** (0), since it would end up on blank squares.\
Assuming it decided to go to the right, the AI would encounter a bomb and the value for [Start, Move_Right] would decrease.\
In order to decide what its new value will be, we use the following formula:

![QFormula](./images/qformula.png)

#### Translation:

**Q<sup>new</sup>(s<sub>t</sub>, a<sub>t</sub>)** = the new value for the current State and Action on our Qtable\
**Q(s<sub>t</sub>, a<sub>t</sub>)** = the old value for the current State and Action on our QTable\
**α (learning rate)** = a constant value we use for our algorithm's learning speed (usually a number between 0.5 and 0.05)\
**r<sub>t</sub> (reward)** = the reward received (as per the example above, -1 if you hit a bomb)\
**𝛾 (discount factor)** = another constant value we use to determine how good of a memory our AI has\
**max Q<sup>new</sup>(s<sub>t+1</sub>, a)** = the value of the most profitable action at the new state

Let's try to implement this formula as a python function:\
(You can use the `learning_rate` and `discount_factor` constants in your formula)

In [None]:
LEARNING_RATE = 0.05
DISCOUNT_RATE = 0.99

def qFunc(qTable, state, action, reward, newState):
    ## enter your code here:
    Q_New = None
    ##
    # correction: optimal_future_value = max(qTable[newState])
    # correction: Q_New = qTable[state, action] + LEARNING_RATE * (reward + DISCOUNT_RATE * optimal_future_value - qTable[state, action])
    return Q_New

If qTable[0, 1]'s value below is `-0.05`, congrats: you have successfully implemented the formula !

In [None]:
qTable = initQtable(5,4)

qTable[0, 1] = qFunc(qTable=qTable, state=0, action=1, reward=-1, newState=3)

print("Q-Table after action:\n" + str(qTable))

## 2. Setting up the GYM Environment

Let's get to the fun part:
Read the documentation for FrozenLake here: https://www.gymlibrary.ml/environments/toy_text/frozen_lake/
and find out how to load the environment.\

For now, we will set the `is_slippery` variable to `False` 


In [None]:
# write some code to load and make the FrozenLake environment:
env = None
# correction: env = gym.make("FrozenLake-v1", map_name="4x4", is_slippery=False)

In [None]:
def random_action(env):
    ## the action should be equal to a random number between 0 and the number of possible actions (find this value inside env)
    return None
    # correction: random.randint(0, env.action_space.n - 1)

In [None]:
observation, info = env.reset(return_info=True)

# Performing an action
action = random_action(env)
observation, reward, done, info = env.step(action)

# Displaying the first frame of the game
plt.imshow(env.render(mode='rgb_array'))

# Printing game info
print(f"actions: {env.action_space.n}\nstates: {env.observation_space.n}")
print(f"Current state: {observation}")

# Closing the environment
env.close()

In this environment, there are **4 possible actions** for each of the **16 possible states**.\
Feel free to play around with the code above to get a better understanding of the API.

## 3. Solving the environment

In [None]:
def game_loop(env, qTable, action):
    new_state, reward, done, info = env.step(action)
    ## Use the qFunc() function you wrote above to change the Q-Table
    ## All of the arguments needed for the qFunc() function can be found inside this loop
    qTable[state, action] = None
    # correction : qFunc(state=state, action=action, newState=new_state, qTable=qTable, reward=reward)
    ##
    return qTable, new_state, done, reward

In [None]:
env = gym.make("FrozenLake-v1", map_name="4x4", is_slippery=False)
qTable = initQtable((env.observation_space.n, env.action_space.n))

state, info = env.reset(return_info=True)
while (True):
    env.render()
    action = random_action(env)
    qTable, state, done, reward = game_loop(env, qTable, action)
    if done:
        break
env.close()

Now, to see if it works, we will launch the environment 1000 times and see how the Q-Table evolves.

In [None]:
EPOCH = 10000

env = gym.make("FrozenLake-v1", map_name="4x4", is_slippery=False)
qTable = initQtable((env.observation_space.n, env.action_space.n))

for i in range(EPOCH):
    state, info = env.reset(return_info=True)
    while (True):
        # This time, we won't render the game each frame because it would take too long
        action = random_action(env)
        qTable, state, done, reward = game_loop(env, qTable, action)
        if done:
            break
env.close()

# Printing the QTable result:
for states in qTable:
    for actions in states:
        if (actions == max(states)):
            print("\033[4m", end="")
        else:
            print("\033[0m", end="")
        if (actions > 0):
            print("\033[92m", end="")
        else:
            print("\033[00m", end="")
        print(round(actions, 3), end="\t")
    print()

Great ! We now have a nice Q-Table that knows which action is best for each state.

At first, the AI should probably try to explore all the different possibilities before it starts optimising its gain...
In order to solve this problem, we can use the Epsilon-Greedy strategy!

```
epsilon = 0.9
random = random()
if (random > epsilon):
    greedy_action()
else:
    random_action()
```

Try to implement this into the following function that we'll use to determine which action the AI will choose:

In [None]:
def chooseAction(epsilon, qTable, state, env):
    ## write some code that returns either a random action or the best action
    return None

In [None]:
env = gym.make("FrozenLake-v1", map_name="4x4", is_slippery=False)

state, info = env.reset(return_info=True)
while (True):
    env.render()
    ## we can give it an epsilon of "0" because we want it to always choose the most profitable state
    action = chooseAction(0, qTable, state, env)
    qTable, state, done, reward = game_loop(env, qTable, action)
    if done:
        break

# Displaying the last frame of the game
plt.imshow(env.render(mode='rgb_array'))

env.close()

If all went well, our AI should easily reach its goal !

But we're not done...\
You might have noticed that when we load our environment, we give it a certain argument:\
`is_slippery=False`

This argument makes the game far easier !
If you want a real challenge, set it to true.

In [None]:
env = gym.make("FrozenLake-v1", map_name="4x4", is_slippery=True)

qTable = initQtable((env.observation_space.n, env.action_space.n))

# Training the AI
epsilon = 1.0
for i in range(10000):
    epsilon = max(epsilon - 0.0001, 0)
    state, info = env.reset(return_info=True)
    while (True):
        action = chooseAction(epsilon, qTable, state, env)
        qTable, state, done, reward = game_loop(env, qTable, action)
        if done:
            break

# Testing the AI
wins = 0.0
for i in range(100):
    state, info = env.reset(return_info=True)
    while (True):
        action = chooseAction(0, qTable, state, env)
        _, state, done, reward = game_loop(env, qTable, action)
        if done:
            # increment the number of wins if the AI was successful (you can use one of the variables above for this)
            # ~2 lines of code:
            # correction: if (reward > 0):
            #               wins += 1
            break

print(f"{round(wins / (i+1) * 100, 2)}% winrate")
print(wins)

# Displaying the last frame of the game
plt.imshow(env.render(mode='rgb_array'))

env.close()

Great job ! You completed this workshop !
If you want to continue in the wonderful world of reinforcement learning, you could try:
- A custom map for Frozen Lake. For example: `gym.make('FrozenLake-v1', map_name="8x8", is_slippery=True)`
- A different environment from https://www.gymlibrary.ml/
- Going **deeper** with https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html
- Trying what you've learned with games you know and love:
    - https://www.gymlibrary.ml/environments/atari/
    - https://pypi.org/project/gym-super-mario-bros/