# Q-Learning Analysis

## Introduction
The purpose of this assignment is to practice implementing Q-Learning in increasing challenging environments and reinforce the understanding of reinforcement learning and Agent-Environment interactions.

## Set Up

In [6]:
import QLearningAgent as qa
import QLearningAgent2 as qa2
import QLearningAgentTaxi as qat

import numpy as np
import coffeegame
import math
import gym

## Results

### QLearningAgent (fixed exploration rate)
#### Learning Output & Q-Table

In [3]:
agent_coffee = qa.QLearningAgent(coffeegame.CoffeeEnv())
agent_coffee.init()

env = gym.make("FrozenLake-v0", is_slippery = False)
agent_frozen = qa.QLearningAgent(env)
agent_frozen.init()


[41mS[0mCH
DHC
HHG
  (Right)
S[41mC[0mH
DHC
HHG
  (Down)
SCH
D[41mH[0mC
HHG
  (Right)
SCH
DH[41mC[0m
HHG
  (Down)
SCH
DHC
HH[41mG[0m

[41mS[0mFFF
FHFH
FFFH
HFFG
  (Down)
SFFF
[41mF[0mHFH
FFFH
HFFG
  (Down)
SFFF
FHFH
[41mF[0mFFH
HFFG
  (Right)
SFFF
FHFH
F[41mF[0mFH
HFFG
  (Down)
SFFF
FHFH
FFFH
H[41mF[0mFG
  (Right)
SFFF
FHFH
FFFH
HF[41mF[0mG
  (Right)
SFFF
FHFH
FFFH
HFF[41mG[0m


#### Find Best Parameter

In [10]:
import find_best_parameter as fbp

fbp.findBestParameter()

0.05 0.4 0.4
0.05 0.4 0.45
0.05 0.4 0.5
0.05 0.4 0.55
0.05 0.4 0.6
0.05 0.5 0.4
0.05 0.5 0.45
0.05 0.5 0.5
0.05 0.5 0.55
0.05 0.5 0.6
0.05 0.6000000000000001 0.4
0.05 0.6000000000000001 0.45
0.05 0.6000000000000001 0.5
0.05 0.6000000000000001 0.55
0.05 0.6000000000000001 0.6
0.05 0.7000000000000001 0.4
0.05 0.7000000000000001 0.45
0.05 0.7000000000000001 0.5
0.05 0.7000000000000001 0.55
0.05 0.7000000000000001 0.6
0.05 0.8 0.4
0.05 0.8 0.45
0.05 0.8 0.5
0.05 0.8 0.55
0.05 0.8 0.6
0.1 0.4 0.4
0.1 0.4 0.45
0.1 0.4 0.5
0.1 0.4 0.55
0.1 0.4 0.6
0.1 0.5 0.4
0.1 0.5 0.45
0.1 0.5 0.5
0.1 0.5 0.55
0.1 0.5 0.6
0.1 0.6000000000000001 0.4
0.1 0.6000000000000001 0.45
0.1 0.6000000000000001 0.5
0.1 0.6000000000000001 0.55
0.1 0.6000000000000001 0.6
0.1 0.7000000000000001 0.4
0.1 0.7000000000000001 0.45
0.1 0.7000000000000001 0.5
0.1 0.7000000000000001 0.55
0.1 0.7000000000000001 0.6
0.1 0.8 0.4
0.1 0.8 0.45
0.1 0.8 0.5
0.1 0.8 0.55
0.1 0.8 0.6
0.15000000000000002 0.4 0.4
0.15000000000000002 0.4 0.4

[[0.1, 0.6000000000000001, 0.4], 0.8729243278503418]

The **[findBestParameter()](http://localhost:8888/edit/project-6-q/find_best_parameter.py)** function trys different combinations of alpha, gamma, and epsilon of **[QLearningAgent](http://localhost:8888/edit/project-6-q/QLearningAgent.py)** and compares their efficiency. The run-time is calculated for each learning and the best result is the combination with least run-time, which is **alpha = 0.1, gamma = 0.6, epsilon = 0.4** with approximate run-time **0.87**.

### QLearningAgent2 (with decay rate)
#### Learning Output & Q-Table

In [7]:
agent_coffee_2 = qa2.QLearningAgent(coffeegame.CoffeeEnv())
agent_coffee_2.init('red', 'coffee game')

env_2 = gym.make("FrozenLake-v0", is_slippery = False)
agent_frozen_2 = qa2.QLearningAgent(env_2)
agent_frozen_2.init('blue', 'frozen lake')


[41mS[0mCH
DHC
HHG
  (Right)
S[41mC[0mH
DHC
HHG
  (Down)
SCH
D[41mH[0mC
HHG
  (Right)
SCH
DH[41mC[0m
HHG
  (Down)
SCH
DHC
HH[41mG[0m
[[  0.752      -10.           2.92         0.752     ]
 [  0.752        3.2          3.2          2.92      ]
 [  2.92         7.           3.2          3.2       ]
 [  0.           0.           0.           0.        ]
 [-10.           5.           7.           2.92      ]
 [  3.2         10.           7.           3.2       ]
 [  1.38217909   0.96565957   4.99075813  -9.35389181]
 [  1.97684735   4.99898701  10.           3.1999286 ]
 [  0.           0.           0.           0.        ]]
total reward:  11
steps made:  [2, 1, 2, 1]

[41mS[0mFFF
FHFH
FFFH
HFFG
  (Down)
SFFF
[41mF[0mHFH
FFFH
HFFG
  (Down)
SFFF
FHFH
[41mF[0mFFH
HFFG
  (Right)
SFFF
FHFH
F[41mF[0mFH
HFFG
  (Right)
SFFF
FHFH
FF[41mF[0mH
HFFG
  (Down)
SFFF
FHFH
FFFH
HF[41mF[0mG
  (Right)
SFFF
FHFH
FFFH
HFF[41mG[0m
[[0.046656   0.07776    0.07776    0.046656  ]
 [0.0466

#### Graph Result
![average reward with dacay rate](decay_average_reward.png)

#### Analysis
- According to the graph result, the average reward (of each 1000 episode) in both the coffee game (red) and the FrozenLake (blue) is increasing.
- The coffee game has more obvious result than FrozenLake because the max reward of coffeegame is 11 while the max reward of FrozenLake is 1.

### QLearningAgentTaxi
#### Learning Output & Q-Table

In [12]:
taxi_game = gym.make("Taxi-v3")
agent_taxi = qat.QLearningAgent(taxi_game)
agent_taxi.init()

+---------+
|[34;1mR[0m: | : :G|
| : | : : |
| : :[43m [0m: : |
| | : | : |
|[35mY[0m| : |B: |
+---------+

+---------+
|[34;1mR[0m: | : :G|
| : | : : |
| :[43m [0m: : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (West)
+---------+
|[34;1mR[0m: | : :G|
| : | : : |
|[43m [0m: : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (West)
+---------+
|[34;1mR[0m: | : :G|
|[43m [0m: | : : |
| : : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (North)
+---------+
|[34;1m[43mR[0m[0m: | : :G|
| : | : : |
| : : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (North)
+---------+
|[42mR[0m: | : :G|
| : | : : |
| : : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (Pickup)
+---------+
|R: | : :G|
|[42m_[0m: | : : |
| : : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (South)
+---------+
|R: | : :G|
| : | : : |
|[42m_[0m: : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (South)
+---------+
|R: | : :G|
| : | : : |
| : : : : |
|[42m_[0m

#### Graph Result
![average_reward_taxi](decay_average_reward_taxi.png)

#### Analysis
- The taxi-game is nondeterministic, which indicates that Q-Learning may be able to learn the game with uncertainties.
- The average reward during learning is increasing when the exploration rate is decreasing.