# Chapter 3: Deep Value-Based Reinforcement Learning

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## 1. *DQN*
Implement DQN from the Stable Baselines on Breakout from Gym. Turn off Dueling and Priorities. Find out what the values are for 𝛼, the training rate, for $\varepsilon$, the exploration rate, what kind of neural network architecture is used, what the replay buffer size is, and how frequently the target network is updated.

In [None]:
!python breakout.py

From Docs of stable-baseline3 DQN library ([here](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html)), default parameters are:
- $\alpha$: 0.0001
- $\varepsilon$: 0.1
- Neural network architectute: CNN for input and activatied by ReLu ([here](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html#stable_baselines3.dqn.CnnPolicy))
- Replay buffer size: 1000000 
- How frequently the target network updated: 4 steps

## 2. *Hyperparameters*
Change all those hyperparameters, up, and down, and note the effect on training speed, and the training outcome: how good is the result? How sensitive is performance to hyperparameter optimization?

In [None]:
!python breakout.py -lr 1e-5 -show False
!python breakout.py -lr 1e-3 -show False

In [None]:
!tensorboard --logdir ./tensorboard/

# Skip following

## 3. *Cloud*
Use different computers, experiment with GPU versions to speed up training, consider Colab, AWS, or another cloud provider with fast GPU (or TPU) machines.

## 4. *Gym*
Go to Gym and try different problems. For what kind of problems does DQN work, what are characteristics of problems for which it works less well?

## 5. *Stable Baselines*
Go to the Stable baselines and implement different agent algorithms. Try Dueling algorithms, Prioritized experience replay, but also other algorithm, such as Actor critic or policy-based. (These algorithms will be explained in the next chapter.) Note their performance.

## 6. *Tensorboard*
With Tensorboard you can follow the training process as it progresses. Tensorboard works on log files. Try TensorBoard on a Keras exercise and follow different training indicators. Also try TensorBoard on the Stable Baselines and see which indicators you can follow.

## 7. *Checkpointing*
Long training runs in Keras need checkpointing, to save valuable computations in case of a hardware or software failure. Create a large training job, and setup checkpointing. Test everything by interrupting the training, and try to re-load the pre-trained checkpoint to restart the training where it left off.