Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds N-step learning for DQN-based agents. #317

Merged
merged 64 commits into from Nov 26, 2018

Conversation

prabhatnagarajan
Copy link
Contributor

Known affected agents:
-DQN

  • Double DQN
  • PAL
  • Advantage Learning
  • DPP
  • SARSA
  • Categorical DQN (TODO), waiting on bug fix

@prabhatnagarajan
Copy link
Contributor Author

prabhatnagarajan commented Oct 17, 2018

Results:
These are the results for 7 domains, where each domain and algorithm had at least three runs.

N-step performs equal or slightly better than DQN in two domains: Qbert and Pong. Otherwise, N-step DQN performs worse. Unfortunately, we don't have a baseline to compare against.

Space Invaders
nstep

Breakout
breakout-nstep

Seaquest
seaquest-nstep

Qbert
qbert-nstep

Asterix
asterix-nstep

Pong
pong-nstep

BeamRider
beamrider-nstep

@prabhatnagarajan
Copy link
Contributor Author

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

7 similar comments
@prabhatnagarajan
Copy link
Contributor Author

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

@prabhatnagarajan
Copy link
Contributor Author

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

@prabhatnagarajan
Copy link
Contributor Author

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

@prabhatnagarajan
Copy link
Contributor Author

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

@prabhatnagarajan
Copy link
Contributor Author

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

@prabhatnagarajan
Copy link
Contributor Author

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

@prabhatnagarajan
Copy link
Contributor Author

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

@muupan
Copy link
Member

muupan commented Nov 13, 2018

Can you resolve the conflicts?

chainerrl/agents/dqn.py Outdated Show resolved Hide resolved
chainerrl/agents/dqn.py Outdated Show resolved Hide resolved
chainerrl/agents/dqn.py Show resolved Hide resolved
tests/test_replay_buffer.py Outdated Show resolved Hide resolved
tests/test_replay_buffer.py Outdated Show resolved Hide resolved
@prabhatnagarajan
Copy link
Contributor Author

The previous version of N-step DQN did not add an additional N-1 transitions when an episode finishes. These are the updated results after adding this feature to N-step learning.
asterix-nstep
beamrider-nstep
breakout-nstep
pong-nstep
qbert-nstep
seaquest-nstep
spaceinvaders-nstep

@prabhatnagarajan prabhatnagarajan changed the title [WIP] Adds N-step learning for DQN-based agents. Adds N-step learning for DQN-based agents. Nov 26, 2018
@muupan
Copy link
Member

muupan commented Nov 26, 2018

Looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants