# CartPole-v0 example

This is a detailed tutorial, so if you just want to run the example, go to the top libary directory and run `julia cartpole_mail.jl` on the command line.

In [1]:
using Gym
include("dqn.jl")
using DQN

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library lib

## Loading the function for the default network for the CartPole-v0 example

This is a default network of just one hidden layer. The user can define a more complex network architecture as long as the function still returns the required outputs and takes in the required inputs as described in the `README.md`.

In [2]:
include("net1.jl") # createNetwork defined here

createNetwork (generic function with 2 methods)

## Setting up the OpenAI gym environment and useful variables

The environment is needed for the `frame_step` function and also needs to get passed into `trainDQN` and `simulateDQN` to get the monitoring capabilities to work.

In [3]:
env = GymEnvironment("CartPole-v0")
@show ACTIONS = n_actions(env)         # number of valid actions
@show STATE_SHAPE = obs_dimensions(env);

ACTIONS = n_actions(env) = 2
STATE_SHAPE = obs_dimensions(env) = (4,)


[2016-12-11 02:26:42,582] Making new env: CartPole-v0


# Setting up function that interacts with the env

Observe that since the state for CartPole-v0 is just a vector with four entries, there is not much of a need for preprocessing here, so the function does nothing and could be removed, but is added here for consistency. The frame step function is a wrapping of the OpenAI Gym step function and normalizes the rewards to be between $[-1,1]$. The frame step function will depend on the environment/model, but if the user is interacting with OpenAI Gym, not very much work is needed here.

In [4]:
preprocess(x, prev_state=nothing) = x

function frame_step(action, prev_state)
    x_t, r_t, is_terminal = step!(env, action)
    # render(env)
    s_t = preprocess(x_t, prev_state)
    s_0 = is_terminal ? preprocess(reset(env), nothing) : nothing
    s_t, r_t / 200.0, is_terminal, s_0
end

frame_step (generic function with 1 method)

# Initializing the hyperparameters

Look at `dqn.jl` for descriptions of the hyperparameters included in the custom type `HyperParameters`. The fields of are the parameters that the user will need to tune to get better performance on the network.

In [5]:
hyper_params = HyperParameters(ACTIONS, STATE_SHAPE)

DQN.HyperParameters(3.0f-5,0.99f0,1000,4000.0f0,0.05f0,1.0f0,20000,32,1,7500,100000,2,(4,))

# Training the DQN

Running the function below will train the DQN, save the weights/vids/logs and also result in a very long output, which was why max number of episodes is set to 100. If the user wants to train the DQN, it is best to run `julia cartpole_mail.jl` on the command line.

In [6]:
hyper_params.max_num_episodes = 100
hyper_params.observe = 25
trainDQN(env, frame_step, createNetwork, hyper_params, "test")

[2016-12-11 02:26:49,586] Starting new video recorder writing to /home/carol/DQN.jl/test/videos/openaigym.video.0.14115.video000000.mp4
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.86
pciBusID 0000:03:00.0
Total memory: 7.92GiB
Free memory: 7.50GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0)


Finished episode     0. Reward=   0.070


[2016-12-11 02:26:57,066] Starting new video recorder writing to /home/carol/DQN.jl/test/videos/openaigym.video.0.14115.video000001.mp4


starting training
Finished episode     1. Reward=   0.095
Finished episode     2. Reward=   0.085
Finished episode     3. Reward=   0.105
Finished episode     4. Reward=   0.085
Finished episode     5. Reward=   0.095
Finished episode     6. Reward=   0.155
Finished episode     7. Reward=   0.110


[2016-12-11 02:27:01,235] Starting new video recorder writing to /home/carol/DQN.jl/test/videos/openaigym.video.0.14115.video000008.mp4


Finished episode     8. Reward=   0.190
Finished episode     9. Reward=   0.120
Finished episode    10. Reward=   0.315
Finished episode    11. Reward=   0.065
Finished episode    12. Reward=   0.045
Finished episode    13. Reward=   0.090
Finished episode    14. Reward=   0.060
Finished episode    15. Reward=   0.075
Finished episode    16. Reward=   0.090
Finished episode    17. Reward=   0.100
Finished episode    18. Reward=   0.165
Finished episode    19. Reward=   0.120
Finished episode    20. Reward=   0.090
Finished episode    21. Reward=   0.335
Finished episode    22. Reward=   0.255
Finished episode    23. Reward=   0.115
Finished episode    24. Reward=   0.050
Finished episode    25. Reward=   0.080
Finished episode    26. Reward=   0.060


[2016-12-11 02:27:05,319] Starting new video recorder writing to /home/carol/DQN.jl/test/videos/openaigym.video.0.14115.video000027.mp4


Finished episode    27. Reward=   0.065
Finished episode    28. Reward=   0.115
Finished episode    29. Reward=   0.395
Finished episode    30. Reward=   0.060
Finished episode    31. Reward=   0.050
Finished episode    32. Reward=   0.065
Finished episode    33. Reward=   0.065
Finished episode    34. Reward=   0.045
Finished episode    35. Reward=   0.115
Finished episode    36. Reward=   0.160
Finished episode    37. Reward=   0.170
Finished episode    38. Reward=   0.125
Finished episode    39. Reward=   0.115
Finished episode    40. Reward=   0.190
Finished episode    41. Reward=   0.155
Finished episode    42. Reward=   0.100
Finished episode    43. Reward=   0.085
Finished episode    44. Reward=   0.110
Finished episode    45. Reward=   0.105
Finished episode    46. Reward=   0.100
Finished episode    47. Reward=   0.135
Finished episode    48. Reward=   0.170
Finished episode    49. Reward=   0.090
Finished episode    50. Reward=   0.085
Finished episode    51. Reward=   0.185


[2016-12-11 02:27:10,993] Starting new video recorder writing to /home/carol/DQN.jl/test/videos/openaigym.video.0.14115.video000064.mp4


Finished episode    64. Reward=   0.195
Finished episode    65. Reward=   0.275
Finished episode    66. Reward=   0.060
Finished episode    67. Reward=   0.125
Finished episode    68. Reward=   0.065
Finished episode    69. Reward=   0.050
Finished episode    70. Reward=   0.110
Finished episode    71. Reward=   0.055
Finished episode    72. Reward=   0.135
Finished episode    73. Reward=   0.170
Finished episode    74. Reward=   0.070
Finished episode    75. Reward=   0.375
Finished episode    76. Reward=   0.055
Finished episode    77. Reward=   0.090
Finished episode    78. Reward=   0.410
Finished episode    79. Reward=   0.160
Finished episode    80. Reward=   0.070
Finished episode    81. Reward=   0.090
Finished episode    82. Reward=   0.255
Finished episode    83. Reward=   0.425
Finished episode    84. Reward=   0.305
Finished episode    85. Reward=   0.045
Finished episode    86. Reward=   0.105
Finished episode    87. Reward=   0.110
Finished episode    88. Reward=   0.170


[2016-12-11 02:27:18,648] Finished writing results. You can upload them to the scoreboard via gym.upload('/home/carol/DQN.jl/test/videos')


# Simulating the DQN from saved weights

Since it's obvious that training for just 100 episodes does not result in great performance, the below function loads the pre-trained weights and simulates for 2 episodes.

In [7]:
simulateDQN(env, frame_step, createNetwork, "test/saved_wgts/weights-2000", 2, hyper_params)

[2016-12-11 02:28:45,126] Creating monitor directory /tmp/dqn/monitor/exp_CartPole-v0_2016-12-11T02:28:45.105
[2016-12-11 02:28:45,127] Starting new video recorder writing to /tmp/dqn/monitor/exp_CartPole-v0_2016-12-11T02:28:45.105/openaigym.video.1.14115.video000000.mp4
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0)
W tensorflow/core/framework/op_kernel.cc:968] Invalid argument: You must feed a value for placeholder tensor 'placeholder_18' with dtype float
	 [[Node: placeholder_18 = Placeholder[_class=[], dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
W tensorflow/core/framework/op_kernel.cc:968] Invalid argument: You must feed a value for placeholder tensor 'placeholder_18' with dtype float
	 [[Node: placeholder_18 = Placeholder[_class=[], dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
W tensorflow/core/framewor

LoadError: LoadError: Tensorflow error: Status: You must feed a value for placeholder tensor 'placeholder_18' with dtype float
	 [[Node: placeholder_18 = Placeholder[_class=[], dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
	 [[Node: _recv_placeholder_12_0/_10 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_77__recv_placeholder_12_0", _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_placeholder_12_0)]]

while loading In[7], in expression starting on line 1