New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Adds N-step learning for DQN-based agents. #317

Merged

muupan merged 64 commits into chainer:master from prabhatnagarajan:nstep

Nov 26, 2018

Contributor

prabhatnagarajan commented Oct 1, 2018

Known affected agents:
-DQN

Double DQN
PAL
Advantage Learning
DPP
SARSA
Categorical DQN (TODO), waiting on bug fix

prabhatnagarajan added 17 commits

September 12, 2018 05:17


          adds some functionality for n step transitions

6fc7617


          adds n step sampling from replay buffer


          fixes minor typo

9ee7bfa


          adds num step returns argument

4d18d4d


          adds parseargs argument for number of steps to use in return value, p…

a27abde

…asses that into replaybuffer


          adds nstep transition clipping before adding n transitions to replay …

2d12b0e

…buffer.


          adds n step deep Q-learning, modifies batch_experiences to enable this

3ddb2e4


          Merge branch 'master' into nstep

fa4ba13


          adds SARSA agent to DQN example

cb63e76


          Merge branch 'sarsa' into nstep

c15531b


          changes sarsa agent to new n-step format

55110d3


          converts several other agents to new exp_batch format

be5934e


          sets up stop current episode and makes dpp agent with new exp_batch

7395d7b


          adds n steps to prioritized buffer and fixes merge conflicts

934717e


          minor fixes

4fd3f33


          removes num_steps from init of prioritized replay buffer

f9af9fa


          makes fixes to prioritized replay buffer to be compatible with n-step…

f15de04

… returns

Contributor Author

prabhatnagarajan commented Oct 17, 2018 •

edited

Results:
These are the results for 7 domains, where each domain and algorithm had at least three runs.

N-step performs equal or slightly better than DQN in two domains: Qbert and Pong. Otherwise, N-step DQN performs worse. Unfortunately, we don't have a baseline to compare against.

Space Invaders

Breakout

Seaquest

Qbert

Asterix

Pong

BeamRider

Contributor Author

prabhatnagarajan commented Oct 22, 2018

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

7 similar comments

Contributor Author

prabhatnagarajan commented Oct 22, 2018

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

Contributor Author

prabhatnagarajan commented Oct 22, 2018

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

Contributor Author

prabhatnagarajan commented Oct 22, 2018

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

Contributor Author

prabhatnagarajan commented Oct 22, 2018

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

Contributor Author

prabhatnagarajan commented Oct 22, 2018

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

Contributor Author

prabhatnagarajan commented Oct 22, 2018

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

Contributor Author

prabhatnagarajan commented Oct 22, 2018

Just eyeballed this PR's DQN (which is actually double DQN) results and compared them to the Tuned DQN results from:
#302 (comment).

The results look similar, so it suggests that the n-step DQN implementation of 1-step DQN does not adversely affect performance.

prabhatnagarajan added 4 commits

October 22, 2018 07:38


          makes some changes to the tests for compatibility with nstep

5cd8002


          Merge branch 'master' into nstep

d92e9ff


          modifies two replay_buffer unit tests to accommodate new replay buffer

0e6b1f2


          addresses merge conflicts

96a64aa

prabhatnagarajan added 4 commits

November 6, 2018 00:46


          addresses flakes

96cf313


          fixes autopep8 issues

4f8c93d


          Merge branch 'master' into nstep

9ccdfbe


          fixes replay buffer test flake

81278d9

Member

muupan commented Nov 13, 2018

Can you resolve the conflicts?

muupan requested changes

View reviewed changes

chainerrl/agents/dqn.py Outdated Show resolved Hide resolved

chainerrl/agents/dqn.py Outdated Show resolved Hide resolved

chainerrl/agents/dqn.py Show resolved Hide resolved


          fixes merge conflicts with sarsa and replay buffer

9e841c8

muupan requested changes

View reviewed changes

tests/test_replay_buffer.py Outdated Show resolved Hide resolved

tests/test_replay_buffer.py Outdated Show resolved Hide resolved

prabhatnagarajan added 3 commits

November 14, 2018 02:37


          Merge branch 'master' into nstep

0fe4b84


          makes recommended changes

6fe1ec4


          fixes flakes

73dd36b

Contributor Author

prabhatnagarajan commented Nov 22, 2018

The previous version of N-step DQN did not add an additional N-1 transitions when an episode finishes. These are the updated results after adding this feature to N-step learning.

prabhatnagarajan added 6 commits

November 22, 2018 04:25


          puts batch experiences test into a new class

aa0ddaa


          addresses flakes

09d361c


          tests discount and next state in batch experience

040368a


          fixes flakes

9de54f8


          applies autopep

4259eb7


          stops using copy.copy and applies list() to the previous transitions

f7b41e2

prabhatnagarajan changed the title ~~[WIP] Adds N-step learning for DQN-based agents.~~ Adds N-step learning for DQN-based agents.

muupan approved these changes

View reviewed changes

Member

muupan commented Nov 26, 2018

Looks good!

muupan merged commit e04c283 into chainer:master

prabhatnagarajan deleted the nstep branch

November 27, 2018 04:36

prabhatnagarajan mentioned this pull request

Makes DQN example match Deepmind #353

Merged

This was referenced Feb 22, 2019

Fix episodic training of DDPG #399

Merged

Fix PGT's training #400

Merged

Fix ResidualDQN's training #402

Merged

muupan added this to the v0.6 milestone

muupan added enhancement no-compat labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment