# Deep Reinforcement Learning for OpenAI's "cartpole-v0"

In this notebook, we will perform the experiments, do hyperparameters tuning and visualize the results.

The agent classes are in the Python module agents.DQNforCartpole. 
We will first import the DQN agent and perform a number of experiments with it. 

For logging and visualization, the files logz.py and plot.py are used. They have been 
taken from UC Berkeley's course on deep reinforcement learning, homework 2, available here: https://github.com/berkeleydeeprlcourse/homework/tree/master/hw2 

In [1]:
from agents.DQNforCartpole import DQNforCartpole
from environments import Environments
import os, time
from util.plotting import plot_result
import pickle
import json

  return f(*args, **kwds)
Using TensorFlow backend.


First, we specify the environment to use. As of now, this is not particularly difficult because we've only implemented
one: the cartpole. 

In [2]:
# create the cartpole environment
env = Environments.importCartpole()


Importing environment CartPole-v0
----------------------------------
CartPole-v0's action space:       Discrete(2)
CartPole-v0's observation space:  Box(4,)

For the cartpole environment, the observation space is: 
obs[0]: the horizontal position of the cart (0.0 = center)
obs[1]: the velocity of the cart (0.0 = standing still)
obs[2]: the angle of the pole (0.0 = vertical)
obs[3]: the angular velocity of the cartpole (0.0 = standing still)


Next, we will instantiate a deep q-learning agent. This agent is based on Mnih et al. (2013), which means that it does 
use experience replay but does not use target networks, as their Mnih et al. (2015) paper. For the hyperparameters, 
we will use pretty much what Mnih et al. have used, with the exception of the replay memory capacity, the neural
network architecture and the replay start size. The cartpole problem is much more lower-dimensional than the 
visual input from the Atari games, so we get away with a significantly simpler function approximator, compared to the 
CNN used by DeepMind.

In [3]:
# benchmark model: hyperparameters similar to Mnih et al. (2015)
dqn1 = DQNforCartpole(environment=env,
                      learning_rate=0.00025,
                      discount_rate=0.99,
                      exploration_rate=1.0,
                      exploration_rate_min=0.1,
                      exploration_rate_decay=0.999,
                      replay_memory_capacity=10000, 
                      replay_sampling_batch_size=32,
                      nn_architecture=[10],
                      replay_start_size=32,
                      exp_name="dqn1"
                      )

Initializing DQN agent...
 .... dimension of state space: 4
 .... dimension of action space: 2


We will perform some parameter tuning. First, we will double the number of hidden layers.

In [4]:
dqn2 = DQNforCartpole(environment=env,
                      learning_rate=0.00025,
                      discount_rate=0.99,
                      exploration_rate=1.0,
                      exploration_rate_min=0.1,
                      exploration_rate_decay=0.999,
                      replay_memory_capacity=10000, 
                      replay_sampling_batch_size=32,
                      nn_architecture=[10, 10],
                      replay_start_size=32,
                      exp_name="dqn2"
                      )

Initializing DQN agent...
 .... dimension of state space: 4
 .... dimension of action space: 2


In [5]:
dqn3 = DQNforCartpole(environment=env,
                      learning_rate=0.00025,
                      discount_rate=0.99,
                      exploration_rate=1.0,
                      exploration_rate_min=0.1,
                      exploration_rate_decay=0.999,
                      replay_memory_capacity=10000, 
                      replay_sampling_batch_size=32,
                      nn_architecture=[20],
                      replay_start_size=32,
                      exp_name="dqn3"
                      )

Initializing DQN agent...
 .... dimension of state space: 4
 .... dimension of action space: 2


In [13]:
allDQNs = [dqn2, dqn3]

# specify whether you want to run the simulations or whether you want 
# to plot or simulate the results
bResults = False

# set up dict to save the locations of the results files for each
# experiment
target = 'data/logdirs.p'
try:
    if os.path.getsize(target) > 0:
        with open(target, "rb") as handle:
            unpickler = pickle.Unpickler(handle)
            dict_of_logdirs = unpickler.load()
        print("Loading dictionary of logdirs")
except:
    print("Creating empty dict")
    dict_of_logdirs = dict()

for dqn in allDQNs:
    # make directory for experiment
    if not(os.path.exists('data')):
        os.makedirs('data')
    logdir = "DQN"+'-cartpole' + '_' + time.strftime("%d-%m-%Y_%H-%M-%S")
    logdir = os.path.join('data', logdir)
    if not(os.path.exists(logdir)):
        os.makedirs(logdir)
        
    # safe logdir for current experiment for visualizaton later on
    dict_of_logdirs[dqn.exp_name] = logdir
    
    # run experiment
    dqn.run_numberOfTrials_experiments(
        numberOfTrials=2,
        numberOfEpisodesForEachTrial=100, 
        logdir=logdir
    )
    
# save the dict_of_logdirs to disc
pickle.dump( dict_of_logdirs, open('data/logdirs.p', 'wb'))

Loading dictionary of logdirs
[32;1mLogging data to data/DQN-cartpole_29-01-2018_15-20-40/trial_1/log.txt[0m
Starting new trial
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |               1 |
|                    Time |           0.304 |
|           AverageReturn |            23.2 |
|               StdReturn |            12.6 |
|               MaxReturn |              58 |
|               MinReturn |              10 |
|                 Epsilon |           0.999 |
|               EpLenMean |            23.2 |
|                EpLenStd |            12.6 |
| AvgScoresFor100Episodes |            22.2 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |               2 |
|                    Time |           0.36

---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              17 |
|                    Time |            1.33 |
|           AverageReturn |              22 |
|               StdReturn |            12.5 |
|               MaxReturn |              59 |
|               MinReturn |              10 |
|                 Epsilon |           0.983 |
|               EpLenMean |              22 |
|                EpLenStd |            12.5 |
| AvgScoresFor100Episodes |              21 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              18 |
|                    Time |            1.39 |
|           AverageReturn |            21.8 |
|               StdReturn |            12.5 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              33 |
|                    Time |            2.35 |
|           AverageReturn |            22.2 |
|               StdReturn |            12.2 |
|               MaxReturn |              59 |
|               MinReturn |               8 |
|                 Epsilon |           0.968 |
|               EpLenMean |            22.2 |
|                EpLenStd |            12.2 |
| AvgScoresFor100Episodes |            21.2 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              34 |
|                    Time |            2.41 |
|           AverageReturn |            22.2 |
|               StdReturn |            12.1 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              49 |
|                    Time |            3.37 |
|           AverageReturn |            22.3 |
|               StdReturn |            12.7 |
|               MaxReturn |              67 |
|               MinReturn |               8 |
|                 Epsilon |           0.952 |
|               EpLenMean |            22.3 |
|                EpLenStd |            12.7 |
| AvgScoresFor100Episodes |            21.3 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              50 |
|                    Time |            3.43 |
|           AverageReturn |            22.2 |
|               StdReturn |            12.7 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              64 |
|                    Time |            4.37 |
|           AverageReturn |            22.3 |
|               StdReturn |            13.1 |
|               MaxReturn |              67 |
|               MinReturn |               8 |
|                 Epsilon |           0.938 |
|               EpLenMean |            22.3 |
|                EpLenStd |            13.1 |
| AvgScoresFor100Episodes |            21.3 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              65 |
|                    Time |            4.43 |
|           AverageReturn |            22.3 |
|               StdReturn |              13 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              80 |
|                    Time |            5.39 |
|           AverageReturn |            22.5 |
|               StdReturn |            12.9 |
|               MaxReturn |              67 |
|               MinReturn |               8 |
|                 Epsilon |           0.923 |
|               EpLenMean |            22.5 |
|                EpLenStd |            12.9 |
| AvgScoresFor100Episodes |            21.7 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              81 |
|                    Time |            5.45 |
|           AverageReturn |            22.4 |
|               StdReturn |            12.9 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              96 |
|                    Time |             6.4 |
|           AverageReturn |              22 |
|               StdReturn |            12.4 |
|               MaxReturn |              67 |
|               MinReturn |               8 |
|                 Epsilon |           0.908 |
|               EpLenMean |              22 |
|                EpLenStd |            12.4 |
| AvgScoresFor100Episodes |            21.2 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              97 |
|                    Time |            6.47 |
|           AverageReturn |            21.9 |
|               StdReturn |            12.4 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |               9 |
|                    Time |           0.829 |
|           AverageReturn |            20.6 |
|               StdReturn |              11 |
|               MaxReturn |              49 |
|               MinReturn |               9 |
|                 Epsilon |           0.991 |
|               EpLenMean |            20.6 |
|                EpLenStd |              11 |
| AvgScoresFor100Episodes |            19.6 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              10 |
|                    Time |           0.892 |
|           AverageReturn |            20.6 |
|               StdReturn |            10.9 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              25 |
|                    Time |            1.84 |
|           AverageReturn |            21.5 |
|               StdReturn |            14.7 |
|               MaxReturn |              95 |
|               MinReturn |               9 |
|                 Epsilon |           0.975 |
|               EpLenMean |            21.5 |
|                EpLenStd |            14.7 |
| AvgScoresFor100Episodes |            20.5 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              26 |
|                    Time |            1.91 |
|           AverageReturn |            21.4 |
|               StdReturn |            14.6 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              41 |
|                    Time |            2.86 |
|           AverageReturn |            20.7 |
|               StdReturn |            13.4 |
|               MaxReturn |              95 |
|               MinReturn |               9 |
|                 Epsilon |            0.96 |
|               EpLenMean |            20.7 |
|                EpLenStd |            13.4 |
| AvgScoresFor100Episodes |            19.7 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              42 |
|                    Time |            2.93 |
|           AverageReturn |            20.6 |
|               StdReturn |            13.4 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              57 |
|                    Time |            3.88 |
|           AverageReturn |              21 |
|               StdReturn |            12.7 |
|               MaxReturn |              95 |
|               MinReturn |               9 |
|                 Epsilon |           0.945 |
|               EpLenMean |              21 |
|                EpLenStd |            12.7 |
| AvgScoresFor100Episodes |              20 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              58 |
|                    Time |            3.95 |
|           AverageReturn |              21 |
|               StdReturn |            12.6 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              73 |
|                    Time |            4.91 |
|           AverageReturn |            21.4 |
|               StdReturn |            12.9 |
|               MaxReturn |              95 |
|               MinReturn |               9 |
|                 Epsilon |            0.93 |
|               EpLenMean |            21.4 |
|                EpLenStd |            12.9 |
| AvgScoresFor100Episodes |            20.6 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              74 |
|                    Time |            4.97 |
|           AverageReturn |            21.3 |
|               StdReturn |            12.9 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              89 |
|                    Time |            5.95 |
|           AverageReturn |            21.3 |
|               StdReturn |            12.4 |
|               MaxReturn |              95 |
|               MinReturn |               9 |
|                 Epsilon |           0.915 |
|               EpLenMean |            21.3 |
|                EpLenStd |            12.4 |
| AvgScoresFor100Episodes |            19.8 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              90 |
|                    Time |            6.01 |
|           AverageReturn |            21.3 |
|               StdReturn |            12.3 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |               5 |
|                    Time |           0.591 |
|           AverageReturn |              23 |
|               StdReturn |            11.1 |
|               MaxReturn |              56 |
|               MinReturn |              11 |
|                 Epsilon |           0.995 |
|               EpLenMean |              23 |
|                EpLenStd |            11.1 |
| AvgScoresFor100Episodes |              22 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |               6 |
|                    Time |            0.65 |
|           AverageReturn |            22.9 |
|               StdReturn |              11 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              21 |
|                    Time |            1.52 |
|           AverageReturn |            23.5 |
|               StdReturn |            11.2 |
|               MaxReturn |              56 |
|               MinReturn |              11 |
|                 Epsilon |           0.979 |
|               EpLenMean |            23.5 |
|                EpLenStd |            11.2 |
| AvgScoresFor100Episodes |            22.5 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              22 |
|                    Time |            1.58 |
|           AverageReturn |            24.1 |
|               StdReturn |              12 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              37 |
|                    Time |            2.47 |
|           AverageReturn |              23 |
|               StdReturn |            11.4 |
|               MaxReturn |              58 |
|               MinReturn |              10 |
|                 Epsilon |           0.964 |
|               EpLenMean |              23 |
|                EpLenStd |            11.4 |
| AvgScoresFor100Episodes |              22 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              38 |
|                    Time |            2.53 |
|           AverageReturn |            22.9 |
|               StdReturn |            11.4 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              53 |
|                    Time |             3.4 |
|           AverageReturn |            22.6 |
|               StdReturn |            10.9 |
|               MaxReturn |              58 |
|               MinReturn |              10 |
|                 Epsilon |           0.948 |
|               EpLenMean |            22.6 |
|                EpLenStd |            10.9 |
| AvgScoresFor100Episodes |            21.6 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              54 |
|                    Time |            3.46 |
|           AverageReturn |            22.5 |
|               StdReturn |            10.8 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              69 |
|                    Time |            4.36 |
|           AverageReturn |            23.1 |
|               StdReturn |              11 |
|               MaxReturn |              58 |
|               MinReturn |              10 |
|                 Epsilon |           0.933 |
|               EpLenMean |            23.1 |
|                EpLenStd |              11 |
| AvgScoresFor100Episodes |              22 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              70 |
|                    Time |            4.42 |
|           AverageReturn |            23.4 |
|               StdReturn |            11.3 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              85 |
|                    Time |            5.33 |
|           AverageReturn |            22.7 |
|               StdReturn |            11.1 |
|               MaxReturn |              58 |
|               MinReturn |              10 |
|                 Epsilon |           0.918 |
|               EpLenMean |            22.7 |
|                EpLenStd |            11.1 |
| AvgScoresFor100Episodes |            21.8 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               1 |
|                    seed |               1 |
|                 Episode |              86 |
|                    Time |            5.39 |
|           AverageReturn |            22.7 |
|               StdReturn |              11 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |               1 |
|                    Time |           0.338 |
|           AverageReturn |            23.9 |
|               StdReturn |            11.3 |
|               MaxReturn |              51 |
|               MinReturn |              10 |
|                 Epsilon |           0.999 |
|               EpLenMean |            23.9 |
|                EpLenStd |            11.3 |
| AvgScoresFor100Episodes |            22.9 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |               2 |
|                    Time |           0.396 |
|           AverageReturn |              24 |
|               StdReturn |            11.2 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              17 |
|                    Time |            1.28 |
|           AverageReturn |            25.6 |
|               StdReturn |            15.3 |
|               MaxReturn |              95 |
|               MinReturn |              10 |
|                 Epsilon |           0.983 |
|               EpLenMean |            25.6 |
|                EpLenStd |            15.3 |
| AvgScoresFor100Episodes |            24.6 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              18 |
|                    Time |            1.34 |
|           AverageReturn |            25.7 |
|               StdReturn |            15.2 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              33 |
|                    Time |             2.2 |
|           AverageReturn |            25.1 |
|               StdReturn |            14.6 |
|               MaxReturn |              95 |
|               MinReturn |              10 |
|                 Epsilon |           0.968 |
|               EpLenMean |            25.1 |
|                EpLenStd |            14.6 |
| AvgScoresFor100Episodes |            24.1 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              34 |
|                    Time |            2.26 |
|           AverageReturn |            24.9 |
|               StdReturn |            14.6 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              49 |
|                    Time |            3.16 |
|           AverageReturn |            23.6 |
|               StdReturn |              14 |
|               MaxReturn |              95 |
|               MinReturn |               9 |
|                 Epsilon |           0.952 |
|               EpLenMean |            23.6 |
|                EpLenStd |              14 |
| AvgScoresFor100Episodes |            22.6 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              50 |
|                    Time |            3.22 |
|           AverageReturn |            23.5 |
|               StdReturn |            13.9 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              65 |
|                    Time |            4.11 |
|           AverageReturn |            23.4 |
|               StdReturn |            13.6 |
|               MaxReturn |              95 |
|               MinReturn |               9 |
|                 Epsilon |           0.937 |
|               EpLenMean |            23.4 |
|                EpLenStd |            13.6 |
| AvgScoresFor100Episodes |            22.4 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              66 |
|                    Time |            4.17 |
|           AverageReturn |            23.3 |
|               StdReturn |            13.6 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              81 |
|                    Time |            5.06 |
|           AverageReturn |              23 |
|               StdReturn |            12.9 |
|               MaxReturn |              95 |
|               MinReturn |               9 |
|                 Epsilon |           0.922 |
|               EpLenMean |              23 |
|                EpLenStd |            12.9 |
| AvgScoresFor100Episodes |            21.4 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              82 |
|                    Time |            5.12 |
|           AverageReturn |            23.1 |
|               StdReturn |            12.8 |
|               MaxReturn |       

---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              97 |
|                    Time |            6.01 |
|           AverageReturn |            22.8 |
|               StdReturn |            12.8 |
|               MaxReturn |              95 |
|               MinReturn |               9 |
|                 Epsilon |           0.908 |
|               EpLenMean |            22.8 |
|                EpLenStd |            12.8 |
| AvgScoresFor100Episodes |            21.5 |
---------------------------------------------
---------------------------------------------
|                 TrialNo |               2 |
|                    seed |               2 |
|                 Episode |              98 |
|                    Time |            6.07 |
|           AverageReturn |            22.8 |
|               StdReturn |            12.8 |
|               MaxReturn |       

In [15]:
dict_of_logdirs = pickle.load(open('data/logdirs.p', 'rb'))

In [8]:
for key in dict_of_logdirs:
    plot_result(experiment, 'AvgScoresFor100Episodes')

NameError: name 'list_of_logdirs' is not defined