###TL;DR The paper demonstrates that along with deep neural networks, Q-learning learns overoptimistic action values on deterministic environments such as Atari video games. The authors propose the course correction method is to employ using a variant of Double Q-learning. The resulting Double DQN algorithm greatly improves over the performance of the DQN algorithm.