Double Deep Q-Learning (DDQN)
Hasselt, H., Guez, A., & Silver, D. (2016). Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.
Run with default arguments
./ unstable_baselines / dqn / train .sh - - rank 0 - - seed 1 "BreakoutNoFrameskip-v4"
Run multiple environments with default arguments
./ unstable_baselines / dqn / train .sh - - rank 0 - - seed 1 "BreakoutNoFrameskip-v4" "SeaquestNoFrameskip-v4"
Atari-like environment (Image observation + discrete action)
python - m unstable_baselines .dqn .run - - rank 0 - - seed 1 - - logdir = './log/{env_id}/dqn/{rank}' \
- - logging = 'training.log' - - monitor_dir = 'monitor' - - tb_logdir = '' - - model_dir = 'model' \
- - env_id = "BreakoutNoFrameskip-v4" - - num_envs = 8 - - num_epochs = 312500 \
- - num_steps = 4 - - num_gradsteps = 1 - - batch_size = 256 - - target_update = 625 \
- - explore_rate = 1.0 - - explore_final = 0.05 - - explore_progress = 0.1 \
- - huber - - record_video
Total timesteps (Samples) ≈ num_envs * num_steps * num_epochs (~10M in this case)
Number of times each sample reused ≈ batch_size/num_steps * num_gradsteps/num_envs (~8 in this case)
BeamRiderNoFrameskip-v4
BreakoutNoFrameskip-v4
PongNoFrameskip-v4
SeaquestNoFrameskip-v4
AsteroidsNoFrameskip-v4
EnduroNoFrameskip-v4
QbertNoFrameskip-v4
MsPacmanNoFrameskip-v4
Learning curve
env_id
Max rewards
Mean rewards
Std rewards
Train samples
Train seeds
Eval episodes
Eval seed
AsteroidsNoFrameskip-v4
1530
667
265.68
10M
1~8
20
0
BeamRiderNoFrameskip-v4
10408
6806.6
1689.98
10M
1~8
20
0
BreakoutNoFrameskip-v4
385
364.45
31.98
10M
1~8
20
0
EnduroNoFrameskip-v4
1354
838.95
276.42
10M
1~8
20
0
MsPacmanNoFrameskip-v4
2700
2109.5
295.82
10M
1~8
20
0
PongNoFrameskip-v4
21
20.9
0.3
10M
1~8
20
0
QbertNoFrameskip-v4
11450
9575
1633.19
10M
1~8
20
0
SeaquestNoFrameskip-v4
11660
9434
2410.74
10M
1~8
20
0
M = million (1e6)
env_id
AsteroidsNoFrameskip-v4
BeamRiderNoFrameskip-v4
BreakoutNoFrameskip-v4
EnduroNoFrameskip-v4
MsPacmanNoFrameskip-v4
PongNoFrameskip-v4
QbertNoFrameskip-v4
SeaquestNoFrameskip-v4
num_envs
8
8
8
8
8
8
8
8
num_epochs
312500
312500
312500
312500
312500
312500
312500
312500
num_steps
4
4
4
4
4
4
4
4
num_gradsteps
1
1
1
1
1
1
1
1
batch_size
256
256
256
256
256
256
256
256
target_update
625
625
625
625
625
625
625
625
exploration
Linear(1.0, 0.05)
Linear(1.0, 0.05)
Linear(1.0, 0.05)
Linear(1.0, 0.05)
Linear(1.0, 0.05)
Linear(1.0, 0.05)
Linear(1.0, 0.05)
Linear(1.0, 0.05)
explore_progress
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
Box
Discrete
MultiDiscrete
MultiBinary
Observation
✔️
✔️
✔️
✔️
Action
❌
✔️
❌
❌
force_mlp=False
force_mlp=True