Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RL4J - Added Observable & LegacyMDPWrapper #8368

Merged
merged 4 commits into from Nov 26, 2019

Conversation

@aboulang2002
Copy link
Contributor

aboulang2002 commented Nov 8, 2019

What changes were proposed in this pull request?

In this PR, I will remove all the processing done on the observations in the Learning classes. In my mind, the observation should be 'opaque' to the Learning, that is the Learning classes should not know any detail about it.

I also plan to replace the Encodable with an Observation class. The Encodable works if the observation is simple, for example a video frame. But suppose a use case where additional observations could be useful or necessary, for example a video game in which an audio channel carry the info on an imminent enemy attack, the Encodable is not good enough.
Eventually, I'd like the Observation class to be able to handle multiple observation channels, but since the goal of this PR is only to remove the processing done to the observation from the learning classes, I will limit for now the Observation class as a dummy container that output a single INDArray.

Also, Existing MDPs should continue to work. So, I am adding a LegacyMDPWrapper as the glue between Encodable-MDPs and Learning classes that will use the Observation class.

I changed a refac test (refac_QLearningDiscrete.trainStep) to use trainEpoch() instead of trainStep(), a few things surfaced (not yet fixed):

  1. step from initMdp is used to set the variable step in QLearning.trainEpoch(), but it is never used to set stepCounter (which is used in QLearningDiscrete.trainStep()). The result is that if initMdp returns a frame that should be skipped, trainStep() will not skip it since stepCounter will be 0. In other terms, if we use skipFrame = 2 and history length = 5, when initMDP() returns we will have: history = { 0, 2, 4, 6, 8 }, step = 9 and stepCounter = 0. trainStep() uses stepCounter to know if it's a skipped frame and thinks it's not. So history becomes: { 2, 4, 6, 8, 9 } which is wrong.

  2. In QLearningDiscrete, we create the Transition this way:
    Transition<Integer> trans = new Transition(history, action, accuReward, stepReply.isDone(), nhistory[0]);
    nhistory is an array and nhistory[0] is the oldest element. The statement above would create a Transition with the oldest element in the nhistory as the next observation which is incorrect. Again in other words, if history is { 0, 2, 4, 6, 8 } and the next observation is 9, the nhistory will be { 2, 4, 6, 8, 9 } and 2 will be used as the next observation.

How was this patch tested?

Quick checklist

The following checklist helps ensure your PR is complete:

  • Eclipse Contributor Agreement signed, and signed commits - see IP Requirements page for details
  • Reviewed the Contributing Guidelines and followed the steps within.
  • Created tests for any significant new code additions.
  • Relevant tests for your changes are passing.
Signed-off-by: unknown <aboulang2002@yahoo.com>
@aboulang2002 aboulang2002 changed the title [WIP] Added Observable & LegacyMDPWrapper [WIP] RL4J - Added Observable & LegacyMDPWrapper Nov 8, 2019
Signed-off-by: unknown <aboulang2002@yahoo.com>
@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Nov 13, 2019

In the last commit, I moved most of the observation processing out of QLearningDiscrete. Next step is use DataSets and to clean the mess in LegacyMDPWrapper.

…orithm

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>
@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Nov 15, 2019

Summary of the previous commit:

  • Observation is now using a DataSet, although all the processing is still in LegacyMDPWrapper.
  • Cleaned up the shape/observation manipulation from BaseTDTargetAlgorithm and QLearningDiscrete.
  • Added methods to build observation and nextObservation in Transition, plus unit tests.
@saudet

This comment has been minimized.

Copy link
Member

saudet commented Nov 16, 2019

Try to finalize these changes and let's merge this before it goes over 1000 lines, thanks!

Signed-off-by: unknown <aboulang2002@yahoo.com>
@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Nov 18, 2019

Agreed. It wouldn't be reasonable to put everything in this PR.

@aboulang2002 aboulang2002 marked this pull request as ready for review Nov 18, 2019
@aboulang2002 aboulang2002 changed the title [WIP] RL4J - Added Observable & LegacyMDPWrapper RL4J - Added Observable & LegacyMDPWrapper Nov 25, 2019
@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Nov 25, 2019

Forgot to remove the [WIP] tag from the title. Sorry about that.

@saudet saudet self-requested a review Nov 26, 2019
@saudet
saudet approved these changes Nov 26, 2019
@saudet saudet merged commit 47c58cf into eclipse:master Nov 26, 2019
1 of 2 checks passed
1 of 2 checks passed
Codacy/PR Quality Review Codacy was unable to analyse your pull request.
Details
eclipsefdn/eca The author(s) of the pull request is covered by necessary legal agreements in order to proceed!
Details
@aboulang2002 aboulang2002 deleted the aboulang2002:ab2002_rl4j_observable branch Nov 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.