# Chapter 3: Deep Value-Based Reinforcement Learning

## 1. *DQN*
Implement DQN from the Stable Baselines on Breakout from Gym. Turn off Dueling and Priorities. Find out what the values are for 𝛼, the training rate, for $\varepsilon$, the exploration rate, what kind of neural network architecture is used, what the replay buffer size is, and how frequently the target network is updated.

In [5]:
!python breakout.py

A.L.E: Arcade Learning Environment (version 0.7.4+069f8bd)
[Powered by Stella]
2022-05-07 04:23:16.851407: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/conda/lib/python3.9/site-packages/cv2/../../lib64:
2022-05-07 04:23:16.851450: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
mean reward: 1.0, s.d. of reward: 2.0
  logger.warn(
^C
Traceback (most recent call last):
  File "/workspace/exercises/chapter03/mshibatatt/breakout.py", line 90, in <module>
    main(args.lr, args.ex, args.rb, args.fr, args.t, args.seed, args.hide, args.name, args.save)
  File "/workspace/exercises/chapter03/mshibatatt/breakout.py", line 29, in main
    env.render(mode = 'human')
  File "/opt/conda/lib/python3.9/site-packages/stable_baselines3/common/ve

From Docs of stable-baseline3 DQN library ([here](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html)), default parameters are:
- $\alpha$: 0.0001
- $\varepsilon$: 0.1
- Neural network architectute: CNN for input and activatied by ReLu ([here](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html#stable_baselines3.dqn.CnnPolicy))
- Replay buffer size: 1000000 
- How frequently the target network updated: 4 steps

## 2. *Hyperparameters*
Change all those hyperparameters, up, and down, and note the effect on training speed, and the training outcome: how good is the result? How sensitive is performance to hyperparameter optimization?

In [10]:
!python breakout.py -lr 1e-5 -hide -name alpha_up
!python breakout.py -lr 1e-3 -hide -name alpha_down
!python breakout.py -ex 0.3 -hide -name epsilon_up
!python breakout.py -ex 0.01 -hide -name epsilon_down
!python breakout.py -rb 1e+4 -hide -name replay_buffer_up
!python breakout.py -rb 1e+8 -hide -name replay_buffer_down
!python breakout.py -fr 8 -hide -name frequency_down
!python breakout.py -fr 2 -hide -name frequency_down
!python breakout.py -t 2500000 -hide -name long -save

A.L.E: Arcade Learning Environment (version 0.7.4+069f8bd)
[Powered by Stella]
2022-05-07 05:26:43.068670: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/conda/lib/python3.9/site-packages/cv2/../../lib64:
2022-05-07 05:26:43.068712: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


In [1]:
!tensorboard --logdir ./tensorboard/

2022-05-07 09:28:01.188446: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-05-07 09:28:01.188499: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-05-07 09:28:03.030004: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-05-07 09:28:03.030053: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-05-07 09:28:03.030081: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ip-172-31-32-198.ap-northeast-1.compute.internal): /proc/driver/nvidia/version does not exist

NOTE: Using experimental fa

In [None]:
!python breakout.py -model ./long

# Skip following

## 3. *Cloud*
Use different computers, experiment with GPU versions to speed up training, consider Colab, AWS, or another cloud provider with fast GPU (or TPU) machines.

## 4. *Gym*
Go to Gym and try different problems. For what kind of problems does DQN work, what are characteristics of problems for which it works less well?

## 5. *Stable Baselines*
Go to the Stable baselines and implement different agent algorithms. Try Dueling algorithms, Prioritized experience replay, but also other algorithm, such as Actor critic or policy-based. (These algorithms will be explained in the next chapter.) Note their performance.

## 6. *Tensorboard*
With Tensorboard you can follow the training process as it progresses. Tensorboard works on log files. Try TensorBoard on a Keras exercise and follow different training indicators. Also try TensorBoard on the Stable Baselines and see which indicators you can follow.

[Docs](https://stable-baselines3.readthedocs.io/en/master/guide/tensorboard.html)

## 7. *Checkpointing*
Long training runs in Keras need checkpointing, to save valuable computations in case of a hardware or software failure. Create a large training job, and setup checkpointing. Test everything by interrupting the training, and try to re-load the pre-trained checkpoint to restart the training where it left off.

[Docs](https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html?highlight=checkpoint#checkpointcallback)