Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RL4J - Extracted TD Target calculations (StandardDQN and DoubleDQN) #8267

Merged
merged 1 commit into from Oct 9, 2019

Conversation

@aboulang2002
Copy link
Contributor

aboulang2002 commented Oct 4, 2019

What changes were proposed in this pull request?

In this PR, I extracted the TD Target calculations (Standard DQN and Double DQN) from QLearningDiscrete to a bunch of more focused components. First, this should help us when we'll add new algorithms.
Secondly, a fix I did in QLearningDiscrete.setTarget() showed me how tricky this part can be and the necessity to add tests to it.

The way it's plugged in QLearningDiscrete is not perfect but, I'd like to make smaller PRs so, I only made minimal changes to QLearningDiscrete.

How was this patch tested?

unit tests and manual tests

Quick checklist

The following checklist helps ensure your PR is complete:

  • Eclipse Contributor Agreement signed, and signed commits - see IP Requirements page for details
  • Reviewed the Contributing Guidelines and followed the steps within.
  • Created tests for any significant new code additions.
  • Relevant tests for your changes are passing.
Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>
@AlexDBlack AlexDBlack requested a review from saudet Oct 7, 2019
@saudet
saudet approved these changes Oct 8, 2019
Copy link
Member

saudet left a comment

LGTM, and output for Cartpole appears the same (we still need to patch unrelated issues with the random seeds).

@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Oct 8, 2019

@saudet By "issues with the random seeds", are you talking about the new Random() with hard-coded seeds that are scattered in RL4J?

@saudet

This comment has been minimized.

Copy link
Member

saudet commented Oct 8, 2019

Yes, mainly this botched pull #6059, which was meant to fix issue #6050.
Not hard to fix, just need to do it and test it :)

@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Oct 8, 2019

Ok, I see. If you don't mind, I'll do the fix in another PR; I'd like to not just fix ExpReplay but also change all uses of java.util.Random to org.nd4j.linalg.api.rng.Random, which is an interface. Being able to pass a mock will be very helpful for the unit tests.

@saudet saudet merged commit 3aa51e2 into eclipse:master Oct 9, 2019
1 check passed
1 check passed
eclipsefdn/eca The author(s) of the pull request is covered by necessary legal agreements in order to proceed!
Details
@aboulang2002 aboulang2002 deleted the aboulang2002:ab2002_rl4j_tdtarget_algorithm branch Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.