Skip to content

Latest commit

 

History

History
5 lines (4 loc) · 578 Bytes

doubleQLearning.md

File metadata and controls

5 lines (4 loc) · 578 Bytes

###TL;DR The paper demonstrates that along with deep neural networks, Q-learning learns overoptimistic action values on deterministic environments such as Atari video games. The authors propose the course correction method is to employ using a variant of Double Q-learning. The resulting Double DQN algorithm greatly improves over the performance of the DQN algorithm.