# $Q$-learning on cartpole
Below, we run the $Q$-learning algorithm on the cartpole example. 

## Importing libraries
First, we import the required libraries. If you get an error, you have possibly forgotten to change the kernel. See [Prepare a virtual environment](Preparation.ipynb).

In [1]:
import numpy as np
import tensorflow as tf
import datetime as dt
import warnings
warnings.filterwarnings('ignore')
from policy_iteration import PI
from dynamics import CartPole

## Saving directories
Next, we set up some paths to write data and capture some videos for future investigation.

In [2]:
STORE_PATH = '/tmp/cartpole_exp1/Q'
data_path = STORE_PATH + f"/data_{dt.datetime.now().strftime('%d%m%Y%H%M')}"
agent_path = STORE_PATH + f"/agent_{dt.datetime.now().strftime('%d%m%Y%H%M')}"
train_writer = tf.summary.create_file_writer(data_path)

## Making the environment
We select the random seed and make the cartpole environment.


In [3]:
Rand_Seed = 1
env_par = {
    'Rand_Seed': Rand_Seed,
    'STORE_PATH': STORE_PATH,
    'monitor': False,
    'threshold': 195.0
}
Rand_Seed = 1
CP = CartPole(env_par)


## Making the $Q$-learning agent
We make the $Q$-learning agent. You can change the following hyper parameters if you like.
* `hidden_size`: Number of nodes in the layers.
* `GAMMA`: forgetting factor in the total cost. It should be in $[0\:1]$.
* `num_episodes`: The maximum number of episodes to run.
* `epsilon`: The probability of exploration. It should be in $(0\:1)$.
* `learning_rate_adam`: The learning rate for adam optimization.
* `adam_eps`: The epsilon in adam optimization.

In [4]:
agent_par = {
    'num_state': CP.env.observation_space.shape[0],
    'num_actions': CP.env.action_space.n,
    'Rand_Seed': Rand_Seed,
    'hidden_size': 30,
    'GAMMA': 1.0,
    'num_episodes': 5000,
    'epsilon': 0.1, 
    'learning_rate_adam': 0.001,
    'adam_eps': 0.1,
}
policy = PI(agent_par)


## Start learning
Now, we start the learning loop. The learning loop itertes for a maximum of number of `num_episodes`. In each iteration
* The agent derives the environment for one episode. It is called a rollout.
* We update the agent by $Q$-learning using the recorded data.
* We check if the problem is solved.
* We write the data.
At the end of the learning loop, we close the environment.

In [5]:
tot_rews = []
mean_100ep = 0
for episode in range(agent_par['num_episodes']):

    # Do one rollout
    states, actions, rewards, new_states, dones = CP.one_rollout(policy)

    # Update the agent
    loss = policy.update_network(states, actions, rewards, new_states, dones)

    # Check if the problem is solved
    if episode > 100:
        mean_100ep = np.mean(tot_rews[-101:-1])

    tot_reward = sum(rewards)
    tot_rews.append(tot_reward)
    print(f"Episode: {episode}, Reward: {tot_reward}, Mean of 100 cons episodes: {mean_100ep}")
    if mean_100ep > env_par['threshold']:
        print(f"Problem is  solved.")
        policy.network.save(agent_path)
        break

    # Save data
    with train_writer.as_default():
        tf.summary.scalar('reward', tot_reward, step=episode)

# Close the environment
CP.env.close()

# Print the summary of the solution
if mean_100ep > env_par['threshold']:
    print(f"\n\nProblem is solved after {episode} Episode with the mean reward {mean_100ep} over the last 100 episodes")


Episode: 0, Reward: 11.0, Mean of 100 cons episodes: 0
Episode: 1, Reward: 10.0, Mean of 100 cons episodes: 0
Episode: 2, Reward: 16.0, Mean of 100 cons episodes: 0
Episode: 3, Reward: 8.0, Mean of 100 cons episodes: 0
Episode: 4, Reward: 10.0, Mean of 100 cons episodes: 0
Episode: 5, Reward: 9.0, Mean of 100 cons episodes: 0
Episode: 6, Reward: 9.0, Mean of 100 cons episodes: 0
Episode: 7, Reward: 10.0, Mean of 100 cons episodes: 0
Episode: 8, Reward: 10.0, Mean of 100 cons episodes: 0
Episode: 9, Reward: 10.0, Mean of 100 cons episodes: 0
Episode: 10, Reward: 11.0, Mean of 100 cons episodes: 0
Episode: 11, Reward: 8.0, Mean of 100 cons episodes: 0
Episode: 12, Reward: 13.0, Mean of 100 cons episodes: 0
Episode: 13, Reward: 10.0, Mean of 100 cons episodes: 0
Episode: 14, Reward: 8.0, Mean of 100 cons episodes: 0
Episode: 15, Reward: 10.0, Mean of 100 cons episodes: 0
Episode: 16, Reward: 10.0, Mean of 100 cons episodes: 0
Episode: 17, Reward: 10.0, Mean of 100 cons episodes: 0
Episode

Episode: 144, Reward: 9.0, Mean of 100 cons episodes: 9.78
Episode: 145, Reward: 8.0, Mean of 100 cons episodes: 9.79
Episode: 146, Reward: 9.0, Mean of 100 cons episodes: 9.78
Episode: 147, Reward: 8.0, Mean of 100 cons episodes: 9.76
Episode: 148, Reward: 16.0, Mean of 100 cons episodes: 9.75
Episode: 149, Reward: 9.0, Mean of 100 cons episodes: 9.75
Episode: 150, Reward: 8.0, Mean of 100 cons episodes: 9.79
Episode: 151, Reward: 10.0, Mean of 100 cons episodes: 9.79
Episode: 152, Reward: 10.0, Mean of 100 cons episodes: 9.78
Episode: 153, Reward: 10.0, Mean of 100 cons episodes: 9.8
Episode: 154, Reward: 9.0, Mean of 100 cons episodes: 9.76
Episode: 155, Reward: 10.0, Mean of 100 cons episodes: 9.76
Episode: 156, Reward: 12.0, Mean of 100 cons episodes: 9.77
Episode: 157, Reward: 9.0, Mean of 100 cons episodes: 9.77
Episode: 158, Reward: 10.0, Mean of 100 cons episodes: 9.78
Episode: 159, Reward: 8.0, Mean of 100 cons episodes: 9.76
Episode: 160, Reward: 12.0, Mean of 100 cons episo

Episode: 285, Reward: 8.0, Mean of 100 cons episodes: 9.78
Episode: 286, Reward: 9.0, Mean of 100 cons episodes: 9.78
Episode: 287, Reward: 10.0, Mean of 100 cons episodes: 9.76
Episode: 288, Reward: 8.0, Mean of 100 cons episodes: 9.75
Episode: 289, Reward: 10.0, Mean of 100 cons episodes: 9.75
Episode: 290, Reward: 11.0, Mean of 100 cons episodes: 9.75
Episode: 291, Reward: 10.0, Mean of 100 cons episodes: 9.74
Episode: 292, Reward: 9.0, Mean of 100 cons episodes: 9.76
Episode: 293, Reward: 12.0, Mean of 100 cons episodes: 9.76
Episode: 294, Reward: 9.0, Mean of 100 cons episodes: 9.76
Episode: 295, Reward: 10.0, Mean of 100 cons episodes: 9.77
Episode: 296, Reward: 8.0, Mean of 100 cons episodes: 9.76
Episode: 297, Reward: 10.0, Mean of 100 cons episodes: 9.76
Episode: 298, Reward: 10.0, Mean of 100 cons episodes: 9.74
Episode: 299, Reward: 9.0, Mean of 100 cons episodes: 9.75
Episode: 300, Reward: 11.0, Mean of 100 cons episodes: 9.75
Episode: 301, Reward: 11.0, Mean of 100 cons ep

Episode: 423, Reward: 8.0, Mean of 100 cons episodes: 9.82
Episode: 424, Reward: 10.0, Mean of 100 cons episodes: 9.85
Episode: 425, Reward: 9.0, Mean of 100 cons episodes: 9.83
Episode: 426, Reward: 9.0, Mean of 100 cons episodes: 9.84
Episode: 427, Reward: 12.0, Mean of 100 cons episodes: 9.85
Episode: 428, Reward: 12.0, Mean of 100 cons episodes: 9.85
Episode: 429, Reward: 10.0, Mean of 100 cons episodes: 9.88
Episode: 430, Reward: 12.0, Mean of 100 cons episodes: 9.89
Episode: 431, Reward: 10.0, Mean of 100 cons episodes: 9.89
Episode: 432, Reward: 11.0, Mean of 100 cons episodes: 9.92
Episode: 433, Reward: 10.0, Mean of 100 cons episodes: 9.92
Episode: 434, Reward: 8.0, Mean of 100 cons episodes: 9.94
Episode: 435, Reward: 10.0, Mean of 100 cons episodes: 9.94
Episode: 436, Reward: 10.0, Mean of 100 cons episodes: 9.87
Episode: 437, Reward: 8.0, Mean of 100 cons episodes: 9.88
Episode: 438, Reward: 9.0, Mean of 100 cons episodes: 9.89
Episode: 439, Reward: 9.0, Mean of 100 cons ep

Episode: 559, Reward: 10.0, Mean of 100 cons episodes: 10.89
Episode: 560, Reward: 10.0, Mean of 100 cons episodes: 10.9
Episode: 561, Reward: 10.0, Mean of 100 cons episodes: 10.89
Episode: 562, Reward: 9.0, Mean of 100 cons episodes: 10.87
Episode: 563, Reward: 13.0, Mean of 100 cons episodes: 10.88
Episode: 564, Reward: 11.0, Mean of 100 cons episodes: 10.85
Episode: 565, Reward: 11.0, Mean of 100 cons episodes: 10.88
Episode: 566, Reward: 14.0, Mean of 100 cons episodes: 10.87
Episode: 567, Reward: 17.0, Mean of 100 cons episodes: 10.86
Episode: 568, Reward: 13.0, Mean of 100 cons episodes: 10.88
Episode: 569, Reward: 13.0, Mean of 100 cons episodes: 10.94
Episode: 570, Reward: 14.0, Mean of 100 cons episodes: 10.94
Episode: 571, Reward: 17.0, Mean of 100 cons episodes: 10.96
Episode: 572, Reward: 17.0, Mean of 100 cons episodes: 11.0
Episode: 573, Reward: 19.0, Mean of 100 cons episodes: 11.06
Episode: 574, Reward: 17.0, Mean of 100 cons episodes: 11.12
Episode: 575, Reward: 20.0,

Episode: 695, Reward: 15.0, Mean of 100 cons episodes: 23.15
Episode: 696, Reward: 15.0, Mean of 100 cons episodes: 23.04
Episode: 697, Reward: 16.0, Mean of 100 cons episodes: 23.02
Episode: 698, Reward: 18.0, Mean of 100 cons episodes: 22.9
Episode: 699, Reward: 18.0, Mean of 100 cons episodes: 22.8
Episode: 700, Reward: 18.0, Mean of 100 cons episodes: 22.7
Episode: 701, Reward: 18.0, Mean of 100 cons episodes: 22.6
Episode: 702, Reward: 22.0, Mean of 100 cons episodes: 22.51
Episode: 703, Reward: 21.0, Mean of 100 cons episodes: 22.43
Episode: 704, Reward: 22.0, Mean of 100 cons episodes: 22.35
Episode: 705, Reward: 22.0, Mean of 100 cons episodes: 22.21
Episode: 706, Reward: 18.0, Mean of 100 cons episodes: 22.13
Episode: 707, Reward: 25.0, Mean of 100 cons episodes: 22.07
Episode: 708, Reward: 27.0, Mean of 100 cons episodes: 21.93
Episode: 709, Reward: 18.0, Mean of 100 cons episodes: 21.92
Episode: 710, Reward: 21.0, Mean of 100 cons episodes: 21.88
Episode: 711, Reward: 19.0, 

Episode: 828, Reward: 58.0, Mean of 100 cons episodes: 107.77
Episode: 829, Reward: 172.0, Mean of 100 cons episodes: 107.99
Episode: 830, Reward: 46.0, Mean of 100 cons episodes: 108.14
Episode: 831, Reward: 53.0, Mean of 100 cons episodes: 109.46
Episode: 832, Reward: 57.0, Mean of 100 cons episodes: 109.1
Episode: 833, Reward: 59.0, Mean of 100 cons episodes: 109.08
Episode: 834, Reward: 87.0, Mean of 100 cons episodes: 109.09
Episode: 835, Reward: 53.0, Mean of 100 cons episodes: 109.17
Episode: 836, Reward: 60.0, Mean of 100 cons episodes: 109.26
Episode: 837, Reward: 67.0, Mean of 100 cons episodes: 109.17
Episode: 838, Reward: 70.0, Mean of 100 cons episodes: 109.18
Episode: 839, Reward: 52.0, Mean of 100 cons episodes: 109.26
Episode: 840, Reward: 200.0, Mean of 100 cons episodes: 109.13
Episode: 841, Reward: 127.0, Mean of 100 cons episodes: 108.82
Episode: 842, Reward: 87.0, Mean of 100 cons episodes: 110.15
Episode: 843, Reward: 69.0, Mean of 100 cons episodes: 110.65
Episod

Episode: 961, Reward: 39.0, Mean of 100 cons episodes: 30.47
Episode: 962, Reward: 42.0, Mean of 100 cons episodes: 30.32
Episode: 963, Reward: 60.0, Mean of 100 cons episodes: 30.18
Episode: 964, Reward: 53.0, Mean of 100 cons episodes: 29.96
Episode: 965, Reward: 54.0, Mean of 100 cons episodes: 29.96
Episode: 966, Reward: 55.0, Mean of 100 cons episodes: 30.09
Episode: 967, Reward: 140.0, Mean of 100 cons episodes: 30.09
Episode: 968, Reward: 65.0, Mean of 100 cons episodes: 30.26
Episode: 969, Reward: 64.0, Mean of 100 cons episodes: 30.9
Episode: 970, Reward: 44.0, Mean of 100 cons episodes: 31.18
Episode: 971, Reward: 134.0, Mean of 100 cons episodes: 31.38
Episode: 972, Reward: 52.0, Mean of 100 cons episodes: 31.38
Episode: 973, Reward: 51.0, Mean of 100 cons episodes: 32.43
Episode: 974, Reward: 59.0, Mean of 100 cons episodes: 32.71
Episode: 975, Reward: 92.0, Mean of 100 cons episodes: 32.95
Episode: 976, Reward: 52.0, Mean of 100 cons episodes: 33.37
Episode: 977, Reward: 7

Episode: 1098, Reward: 19.0, Mean of 100 cons episodes: 27.34
Episode: 1099, Reward: 23.0, Mean of 100 cons episodes: 27.1
Episode: 1100, Reward: 25.0, Mean of 100 cons episodes: 26.85
Episode: 1101, Reward: 25.0, Mean of 100 cons episodes: 26.73
Episode: 1102, Reward: 22.0, Mean of 100 cons episodes: 26.53
Episode: 1103, Reward: 25.0, Mean of 100 cons episodes: 26.54
Episode: 1104, Reward: 23.0, Mean of 100 cons episodes: 26.39
Episode: 1105, Reward: 21.0, Mean of 100 cons episodes: 26.34
Episode: 1106, Reward: 24.0, Mean of 100 cons episodes: 26.28
Episode: 1107, Reward: 22.0, Mean of 100 cons episodes: 26.23
Episode: 1108, Reward: 20.0, Mean of 100 cons episodes: 26.17
Episode: 1109, Reward: 21.0, Mean of 100 cons episodes: 26.11
Episode: 1110, Reward: 19.0, Mean of 100 cons episodes: 26.03
Episode: 1111, Reward: 27.0, Mean of 100 cons episodes: 25.94
Episode: 1112, Reward: 23.0, Mean of 100 cons episodes: 25.94
Episode: 1113, Reward: 23.0, Mean of 100 cons episodes: 25.99
Episode: 

Episode: 1231, Reward: 41.0, Mean of 100 cons episodes: 28.98
Episode: 1232, Reward: 58.0, Mean of 100 cons episodes: 29.16
Episode: 1233, Reward: 78.0, Mean of 100 cons episodes: 29.37
Episode: 1234, Reward: 53.0, Mean of 100 cons episodes: 29.71
Episode: 1235, Reward: 37.0, Mean of 100 cons episodes: 30.26
Episode: 1236, Reward: 57.0, Mean of 100 cons episodes: 30.5
Episode: 1237, Reward: 42.0, Mean of 100 cons episodes: 30.61
Episode: 1238, Reward: 43.0, Mean of 100 cons episodes: 30.92
Episode: 1239, Reward: 46.0, Mean of 100 cons episodes: 31.03
Episode: 1240, Reward: 44.0, Mean of 100 cons episodes: 31.19
Episode: 1241, Reward: 53.0, Mean of 100 cons episodes: 31.4
Episode: 1242, Reward: 45.0, Mean of 100 cons episodes: 31.57
Episode: 1243, Reward: 42.0, Mean of 100 cons episodes: 31.83
Episode: 1244, Reward: 45.0, Mean of 100 cons episodes: 32.05
Episode: 1245, Reward: 34.0, Mean of 100 cons episodes: 32.08
Episode: 1246, Reward: 32.0, Mean of 100 cons episodes: 32.23
Episode: 1

Episode: 1362, Reward: 42.0, Mean of 100 cons episodes: 75.52
Episode: 1363, Reward: 46.0, Mean of 100 cons episodes: 75.71
Episode: 1364, Reward: 35.0, Mean of 100 cons episodes: 75.92
Episode: 1365, Reward: 42.0, Mean of 100 cons episodes: 76.23
Episode: 1366, Reward: 42.0, Mean of 100 cons episodes: 76.4
Episode: 1367, Reward: 45.0, Mean of 100 cons episodes: 76.63
Episode: 1368, Reward: 27.0, Mean of 100 cons episodes: 76.83
Episode: 1369, Reward: 24.0, Mean of 100 cons episodes: 77.07
Episode: 1370, Reward: 34.0, Mean of 100 cons episodes: 77.16
Episode: 1371, Reward: 24.0, Mean of 100 cons episodes: 77.26
Episode: 1372, Reward: 30.0, Mean of 100 cons episodes: 77.46
Episode: 1373, Reward: 28.0, Mean of 100 cons episodes: 77.54
Episode: 1374, Reward: 26.0, Mean of 100 cons episodes: 77.66
Episode: 1375, Reward: 26.0, Mean of 100 cons episodes: 77.78
Episode: 1376, Reward: 24.0, Mean of 100 cons episodes: 77.85
Episode: 1377, Reward: 26.0, Mean of 100 cons episodes: 77.96
Episode: 

Episode: 1496, Reward: 70.0, Mean of 100 cons episodes: 71.38
Episode: 1497, Reward: 69.0, Mean of 100 cons episodes: 71.72
Episode: 1498, Reward: 85.0, Mean of 100 cons episodes: 71.97
Episode: 1499, Reward: 59.0, Mean of 100 cons episodes: 72.16
Episode: 1500, Reward: 72.0, Mean of 100 cons episodes: 72.33
Episode: 1501, Reward: 71.0, Mean of 100 cons episodes: 72.41
Episode: 1502, Reward: 77.0, Mean of 100 cons episodes: 72.67
Episode: 1503, Reward: 83.0, Mean of 100 cons episodes: 72.71
Episode: 1504, Reward: 84.0, Mean of 100 cons episodes: 72.93
Episode: 1505, Reward: 82.0, Mean of 100 cons episodes: 72.91
Episode: 1506, Reward: 78.0, Mean of 100 cons episodes: 72.8
Episode: 1507, Reward: 87.0, Mean of 100 cons episodes: 72.92
Episode: 1508, Reward: 70.0, Mean of 100 cons episodes: 72.76
Episode: 1509, Reward: 89.0, Mean of 100 cons episodes: 72.84
Episode: 1510, Reward: 70.0, Mean of 100 cons episodes: 72.66
Episode: 1511, Reward: 89.0, Mean of 100 cons episodes: 72.36
Episode: 

Episode: 1630, Reward: 33.0, Mean of 100 cons episodes: 70.67
Episode: 1631, Reward: 103.0, Mean of 100 cons episodes: 71.08
Episode: 1632, Reward: 26.0, Mean of 100 cons episodes: 70.74
Episode: 1633, Reward: 89.0, Mean of 100 cons episodes: 71.19
Episode: 1634, Reward: 27.0, Mean of 100 cons episodes: 70.89
Episode: 1635, Reward: 94.0, Mean of 100 cons episodes: 71.17
Episode: 1636, Reward: 27.0, Mean of 100 cons episodes: 70.85
Episode: 1637, Reward: 91.0, Mean of 100 cons episodes: 71.17
Episode: 1638, Reward: 98.0, Mean of 100 cons episodes: 70.7
Episode: 1639, Reward: 100.0, Mean of 100 cons episodes: 71.13
Episode: 1640, Reward: 100.0, Mean of 100 cons episodes: 71.6
Episode: 1641, Reward: 102.0, Mean of 100 cons episodes: 72.13
Episode: 1642, Reward: 100.0, Mean of 100 cons episodes: 72.55
Episode: 1643, Reward: 93.0, Mean of 100 cons episodes: 72.98
Episode: 1644, Reward: 90.0, Mean of 100 cons episodes: 73.35
Episode: 1645, Reward: 98.0, Mean of 100 cons episodes: 73.79
Episo

Episode: 1762, Reward: 59.0, Mean of 100 cons episodes: 90.56
Episode: 1763, Reward: 62.0, Mean of 100 cons episodes: 90.19
Episode: 1764, Reward: 62.0, Mean of 100 cons episodes: 90.03
Episode: 1765, Reward: 62.0, Mean of 100 cons episodes: 89.74
Episode: 1766, Reward: 73.0, Mean of 100 cons episodes: 89.43
Episode: 1767, Reward: 61.0, Mean of 100 cons episodes: 88.94
Episode: 1768, Reward: 74.0, Mean of 100 cons episodes: 88.78
Episode: 1769, Reward: 69.0, Mean of 100 cons episodes: 88.55
Episode: 1770, Reward: 51.0, Mean of 100 cons episodes: 88.45
Episode: 1771, Reward: 50.0, Mean of 100 cons episodes: 88.32
Episode: 1772, Reward: 36.0, Mean of 100 cons episodes: 87.8
Episode: 1773, Reward: 54.0, Mean of 100 cons episodes: 87.31
Episode: 1774, Reward: 57.0, Mean of 100 cons episodes: 86.8
Episode: 1775, Reward: 49.0, Mean of 100 cons episodes: 86.46
Episode: 1776, Reward: 59.0, Mean of 100 cons episodes: 86.16
Episode: 1777, Reward: 63.0, Mean of 100 cons episodes: 85.81
Episode: 1

Episode: 1894, Reward: 126.0, Mean of 100 cons episodes: 86.41
Episode: 1895, Reward: 85.0, Mean of 100 cons episodes: 86.51
Episode: 1896, Reward: 93.0, Mean of 100 cons episodes: 87.09
Episode: 1897, Reward: 96.0, Mean of 100 cons episodes: 86.97
Episode: 1898, Reward: 99.0, Mean of 100 cons episodes: 87.23
Episode: 1899, Reward: 104.0, Mean of 100 cons episodes: 87.17
Episode: 1900, Reward: 105.0, Mean of 100 cons episodes: 87.23
Episode: 1901, Reward: 114.0, Mean of 100 cons episodes: 87.22
Episode: 1902, Reward: 102.0, Mean of 100 cons episodes: 87.61
Episode: 1903, Reward: 95.0, Mean of 100 cons episodes: 87.87
Episode: 1904, Reward: 137.0, Mean of 100 cons episodes: 88.12
Episode: 1905, Reward: 112.0, Mean of 100 cons episodes: 88.28
Episode: 1906, Reward: 106.0, Mean of 100 cons episodes: 88.68
Episode: 1907, Reward: 63.0, Mean of 100 cons episodes: 88.81
Episode: 1908, Reward: 75.0, Mean of 100 cons episodes: 89.16
Episode: 1909, Reward: 114.0, Mean of 100 cons episodes: 88.82

Episode: 2026, Reward: 101.0, Mean of 100 cons episodes: 98.7
Episode: 2027, Reward: 96.0, Mean of 100 cons episodes: 98.7
Episode: 2028, Reward: 105.0, Mean of 100 cons episodes: 98.91
Episode: 2029, Reward: 102.0, Mean of 100 cons episodes: 98.96
Episode: 2030, Reward: 100.0, Mean of 100 cons episodes: 99.53
Episode: 2031, Reward: 99.0, Mean of 100 cons episodes: 99.61
Episode: 2032, Reward: 122.0, Mean of 100 cons episodes: 99.84
Episode: 2033, Reward: 110.0, Mean of 100 cons episodes: 99.83
Episode: 2034, Reward: 107.0, Mean of 100 cons episodes: 100.07
Episode: 2035, Reward: 103.0, Mean of 100 cons episodes: 100.49
Episode: 2036, Reward: 102.0, Mean of 100 cons episodes: 100.5
Episode: 2037, Reward: 99.0, Mean of 100 cons episodes: 100.63
Episode: 2038, Reward: 115.0, Mean of 100 cons episodes: 100.71
Episode: 2039, Reward: 90.0, Mean of 100 cons episodes: 100.77
Episode: 2040, Reward: 128.0, Mean of 100 cons episodes: 100.83
Episode: 2041, Reward: 100.0, Mean of 100 cons episodes

Episode: 2154, Reward: 118.0, Mean of 100 cons episodes: 108.83
Episode: 2155, Reward: 116.0, Mean of 100 cons episodes: 108.94
Episode: 2156, Reward: 113.0, Mean of 100 cons episodes: 109.13
Episode: 2157, Reward: 121.0, Mean of 100 cons episodes: 109.17
Episode: 2158, Reward: 134.0, Mean of 100 cons episodes: 109.21
Episode: 2159, Reward: 112.0, Mean of 100 cons episodes: 109.39
Episode: 2160, Reward: 111.0, Mean of 100 cons episodes: 109.52
Episode: 2161, Reward: 109.0, Mean of 100 cons episodes: 109.53
Episode: 2162, Reward: 111.0, Mean of 100 cons episodes: 109.63
Episode: 2163, Reward: 117.0, Mean of 100 cons episodes: 109.62
Episode: 2164, Reward: 119.0, Mean of 100 cons episodes: 109.58
Episode: 2165, Reward: 115.0, Mean of 100 cons episodes: 109.73
Episode: 2166, Reward: 125.0, Mean of 100 cons episodes: 109.81
Episode: 2167, Reward: 115.0, Mean of 100 cons episodes: 109.77
Episode: 2168, Reward: 111.0, Mean of 100 cons episodes: 110.0
Episode: 2169, Reward: 113.0, Mean of 100

Episode: 2282, Reward: 121.0, Mean of 100 cons episodes: 113.13
Episode: 2283, Reward: 115.0, Mean of 100 cons episodes: 113.2
Episode: 2284, Reward: 127.0, Mean of 100 cons episodes: 113.31
Episode: 2285, Reward: 100.0, Mean of 100 cons episodes: 113.46
Episode: 2286, Reward: 117.0, Mean of 100 cons episodes: 113.63
Episode: 2287, Reward: 112.0, Mean of 100 cons episodes: 113.64
Episode: 2288, Reward: 118.0, Mean of 100 cons episodes: 113.83
Episode: 2289, Reward: 114.0, Mean of 100 cons episodes: 113.77
Episode: 2290, Reward: 107.0, Mean of 100 cons episodes: 113.9
Episode: 2291, Reward: 101.0, Mean of 100 cons episodes: 113.92
Episode: 2292, Reward: 110.0, Mean of 100 cons episodes: 113.96
Episode: 2293, Reward: 93.0, Mean of 100 cons episodes: 113.72
Episode: 2294, Reward: 114.0, Mean of 100 cons episodes: 113.74
Episode: 2295, Reward: 93.0, Mean of 100 cons episodes: 113.65
Episode: 2296, Reward: 108.0, Mean of 100 cons episodes: 113.73
Episode: 2297, Reward: 83.0, Mean of 100 con

Episode: 2412, Reward: 123.0, Mean of 100 cons episodes: 113.34
Episode: 2413, Reward: 115.0, Mean of 100 cons episodes: 113.2
Episode: 2414, Reward: 121.0, Mean of 100 cons episodes: 113.61
Episode: 2415, Reward: 95.0, Mean of 100 cons episodes: 113.85
Episode: 2416, Reward: 120.0, Mean of 100 cons episodes: 114.08
Episode: 2417, Reward: 88.0, Mean of 100 cons episodes: 114.1
Episode: 2418, Reward: 137.0, Mean of 100 cons episodes: 114.31
Episode: 2419, Reward: 95.0, Mean of 100 cons episodes: 114.47
Episode: 2420, Reward: 94.0, Mean of 100 cons episodes: 115.14
Episode: 2421, Reward: 100.0, Mean of 100 cons episodes: 115.4
Episode: 2422, Reward: 94.0, Mean of 100 cons episodes: 115.52
Episode: 2423, Reward: 116.0, Mean of 100 cons episodes: 115.65
Episode: 2424, Reward: 110.0, Mean of 100 cons episodes: 115.69
Episode: 2425, Reward: 107.0, Mean of 100 cons episodes: 116.05
Episode: 2426, Reward: 101.0, Mean of 100 cons episodes: 116.37
Episode: 2427, Reward: 112.0, Mean of 100 cons e

Episode: 2542, Reward: 133.0, Mean of 100 cons episodes: 120.13
Episode: 2543, Reward: 111.0, Mean of 100 cons episodes: 119.62
Episode: 2544, Reward: 99.0, Mean of 100 cons episodes: 119.77
Episode: 2545, Reward: 156.0, Mean of 100 cons episodes: 119.71
Episode: 2546, Reward: 141.0, Mean of 100 cons episodes: 119.36
Episode: 2547, Reward: 97.0, Mean of 100 cons episodes: 119.73
Episode: 2548, Reward: 118.0, Mean of 100 cons episodes: 119.77
Episode: 2549, Reward: 107.0, Mean of 100 cons episodes: 119.46
Episode: 2550, Reward: 146.0, Mean of 100 cons episodes: 118.93
Episode: 2551, Reward: 124.0, Mean of 100 cons episodes: 119.01
Episode: 2552, Reward: 114.0, Mean of 100 cons episodes: 119.23
Episode: 2553, Reward: 107.0, Mean of 100 cons episodes: 119.36
Episode: 2554, Reward: 99.0, Mean of 100 cons episodes: 118.97
Episode: 2555, Reward: 106.0, Mean of 100 cons episodes: 118.84
Episode: 2556, Reward: 96.0, Mean of 100 cons episodes: 118.43
Episode: 2557, Reward: 112.0, Mean of 100 co

Episode: 2671, Reward: 147.0, Mean of 100 cons episodes: 134.95
Episode: 2672, Reward: 170.0, Mean of 100 cons episodes: 135.88
Episode: 2673, Reward: 162.0, Mean of 100 cons episodes: 135.97
Episode: 2674, Reward: 162.0, Mean of 100 cons episodes: 136.52
Episode: 2675, Reward: 159.0, Mean of 100 cons episodes: 136.99
Episode: 2676, Reward: 127.0, Mean of 100 cons episodes: 137.19
Episode: 2677, Reward: 146.0, Mean of 100 cons episodes: 137.55
Episode: 2678, Reward: 178.0, Mean of 100 cons episodes: 137.6
Episode: 2679, Reward: 133.0, Mean of 100 cons episodes: 137.42
Episode: 2680, Reward: 126.0, Mean of 100 cons episodes: 137.83
Episode: 2681, Reward: 188.0, Mean of 100 cons episodes: 138.1
Episode: 2682, Reward: 131.0, Mean of 100 cons episodes: 138.18
Episode: 2683, Reward: 162.0, Mean of 100 cons episodes: 139.01
Episode: 2684, Reward: 161.0, Mean of 100 cons episodes: 139.03
Episode: 2685, Reward: 173.0, Mean of 100 cons episodes: 139.53
Episode: 2686, Reward: 200.0, Mean of 100 

Episode: 2800, Reward: 200.0, Mean of 100 cons episodes: 184.02
Episode: 2801, Reward: 180.0, Mean of 100 cons episodes: 184.04
Episode: 2802, Reward: 200.0, Mean of 100 cons episodes: 184.44
Episode: 2803, Reward: 200.0, Mean of 100 cons episodes: 184.24
Episode: 2804, Reward: 200.0, Mean of 100 cons episodes: 184.24
Episode: 2805, Reward: 200.0, Mean of 100 cons episodes: 184.24
Episode: 2806, Reward: 200.0, Mean of 100 cons episodes: 184.24
Episode: 2807, Reward: 200.0, Mean of 100 cons episodes: 184.24
Episode: 2808, Reward: 176.0, Mean of 100 cons episodes: 184.24
Episode: 2809, Reward: 173.0, Mean of 100 cons episodes: 184.29
Episode: 2810, Reward: 168.0, Mean of 100 cons episodes: 184.31
Episode: 2811, Reward: 180.0, Mean of 100 cons episodes: 184.43
Episode: 2812, Reward: 200.0, Mean of 100 cons episodes: 184.58
Episode: 2813, Reward: 170.0, Mean of 100 cons episodes: 184.72
Episode: 2814, Reward: 157.0, Mean of 100 cons episodes: 185.11
Episode: 2815, Reward: 184.0, Mean of 10

## Results
It will get around 5-6 minutes to run the above cell. You will probably get some WARNING\ERROR. Some of these are related to incompatibility between some libraries. Don't panic. If you get the following at the end, the problem is solved successfully.