RL4J - AsyncTrainingListener #8072

aboulang2002 · 2019-08-03T12:26:23Z

What changes were proposed in this pull request?

In this PR, I add the async equivalent of SyncTrainingListener.
I regrouped the common code in a few generic classes in the org.deeplearning4j.rl4j.learning.listener namespace and added the listener pattern to AsyncLearning and AsyncThread. The dependency to DataManager has been removed (with the exception of IDataManager.StatEntry) from all learning classes.

How was this patch tested?

Unit tests

Quick checklist

The following checklist helps ensure your PR is complete:

Eclipse Contributor Agreement signed, and signed commits - see IP Requirements page for details
Reviewed the Contributing Guidelines and followed the steps within.
Created tests for any significant new code additions.
Relevant tests for your changes are passing.

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

aboulang2002 · 2019-08-03T12:27:10Z

Previous commit: I extracted parts of AsyncThread.run() to private methods to help with code clarity. There is some kind of inverse-order-logic going on in there, which is not bad per se, but the logic should be clearer.
Also, I had to change a bit the code around setExpReplay in QLearning in order to make the unit tests compile.

Signed-off-by: unknown <aboulang2002@yahoo.com>

aboulang2002 · 2019-08-08T00:02:11Z

Change summary:

Removed IDataManager from all learning classes & deprecated ctors that uses it
Added async training listeners & extracted common code from sync training learning into org.deeplearning4j.rl4j.learning.listener
Added listeners support to AsyncLearning and AsyncThread
Moved mdp field from derived classes down to AsyncThread base class (required by the unit tests)
AsyncGlobal: changed setRunning(false) to terminate() and fixed a possible memory leak
Added interface to Policy (also required by the unit tests)

saudet · 2019-08-08T22:29:08Z

I'm starting to wonder, what's so different between SyncTrainingListener and AsyncTrainingListener that they need separate class hierarchies? And which one would get used for algorithms like A2C that are multithreaded, but not "async"?

aboulang2002 · 2019-08-08T23:44:27Z

Edit: I think I can come up with a TrainingListener that would work for both sync and async case. I won't be able to work on this over the weekend but I'll update this PR in the coming days.

saudet · 2019-08-12T02:58:41Z

Yes, ok, I was about to say that it doesn't
matter where events are fired from, and we can have dummy events that are not used for some algorithms. In any case, we'll have a better idea of what we need once we start implementing more algorithms (A2C, PPO, IMPALA, etc), so it doen't need to be perfect now.

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

aboulang2002 · 2019-08-16T12:41:48Z

Finally, I kept the trainingStart / end; they're simple and straightfoward so why not keep them.
Renamed onEpochStarted to onNewEpoch and onEpochFinished to onEpochTrainingResult
Removed onTrainingProgress. I think I will instead add (not in this PR) an event in AsyncGlobal (that keeps the 'master' NN) and in the setTarget of a more derived class of SyncLearning.
Removed SyncListener / AsyncListener listeners. Merged all logic in Listener.
I changed/added protected build*Event() methods in Learning and AsyncThread. That way a user can override these methods if he has a more specific need.

And as you said, @saudet, we'll probably have to change a few things as we continue this refac.

saudet

The changes look OK, but I'm not getting anything in ~/rl4j-data anymore by default when launching say the A3CALE example. What's up with that?

rl4j/rl4j-core/src/main/java/org/deeplearning4j/rl4j/learning/async/AsyncGlobal.java

saudet · 2019-08-18T06:36:48Z

rl4j/rl4j-core/src/main/java/org/deeplearning4j/rl4j/learning/listener/TrainingEvent.java

+/**
+ * The base definition of all training events
+ */
+public class TrainingEvent {


Let's declare this abstract? Since it doesn't hold any data or define any interface though, sounds like we could use generics for that instead. What do you think?

In a previous commit, I used generics. I used a different generic for each event (training start, end, epoch start, end) because one might need a different event for training start than for epoch start. Overall, it seemed a bit overkill to me but, on the other hand, classes that will use the listener will look cleaner. I will add it.

It should define some sort of generic interface or contain data common to all events then that would make sense.

I didn't mean literally a Java interface. I mean, the user will get a TrainingEvent or ITrainingEvent, whatever, doesn't matter, but that has no methods, no data. What are we supposed to do with it?

Suppose that a user has a custom listener that need some value or some object instance when calling for example onTrainingStarted(). That user will need to create a class that inherit TrainingEvent or implements ITrainingEvent, override Learning.buildTrainingStartedEvent() and make it return an instance of his custom TrainingEvent.
TrainingEvent is the base event, and it's empty because RL4J doesn't use it, but a user might need to inherit from it.
But that class indeed look useless. We could remove TrainingEvent and just leave ITrainingEvent, and make buildTrainingStartedEvent() and buildTrainingFinishedEvent() return null instead. That would remove the empty, useless-looking class but a user could still pass custom events to his custom listener.

I see what you mean but in the sync case, there is only SyncLearning and in the async case, there are AsyncLearning, AsyncThread and maybe AsyncGlobal.
What do you think if we design this listener and any future events/listeners in RL4J with this design rule:

Everything that change (i.e. is a different value or instance) between calls are received through a getter of the event parameter. Everything else should be passed as a ctor argument of the particular listener.

What I mean is this:

public class MyCustomTrainingListener implements TrainingListener { private final MyCustomLearning learning; public MyCustomTrainingListener(MyCustomLearning learning) { // Type safety at compile time this.learning = learning; } @Override public ListenerResponse onNewEpoch(IEpochTrainingEvent event) { DoSomethingWith(learning.MyCustomField); return ListenerResponse.CONTINUE; } }

instead of having to cast a learning member of event:

public class MyCustomTrainingListener implements TrainingListener { @Override public ListenerResponse onNewEpoch(IEpochTrainingEvent event) { MyCustomLearning learning = ((MyCustomEpochTrainingEvent)event).getLearning(); // 'Type safety' at run time DoSomethingWith(learning.MyCustomField); return ListenerResponse.CONTINUE; } }

Also, in the case of onNewEpoch and onEpochTrainingResult, there is no way to find the calling AsyncThread instance, we don't have the thread num or the object. Maybe we should add that.

Ah, I see, there is a missing base class for Learning. If we had that, would we still need EpochTrainingEvent? Or is the idea with async learning to keep running the threads in the background even while calling the listeners and so we want to take snapshots of the state?

The base of the problem is that we want the same listener be able to fetch the info it needs no matter who calls it and sync and async behaves very differently. We could create an interface that contains what is common between SyncLearning and AsyncThread and pass that to the event but, if a user tries to implement a custom learning algorithm and he needs to pass some info to a custom listener, he should be able to do so without having to add a new member to the caller's state (for example, because it wouldn't make sense from the use case's point of view).
Can you tell me for what reason you would like the EpochTrainingEvent gone?

It doesn't need to be gone, but it does feel strange... I think we could create a parent ILearning interface or something with these properties instead and pass that to the listeners? Then they could cast that reference however they need to get more information, when needed, without having to create temporary objects like EpochTrainingEvent.

I have removed the 'Event' classes. See comment below for details

aboulang2002 · 2019-08-18T20:26:29Z

Please, @saudet, can you point me to a guide on how to get ALE to run with dl4j-examples?
(I work on Windows with intelliJ if that makes a difference.)

saudet · 2019-08-19T02:01:08Z

@aboulang2002 It's basically just changing the versions here:
https://github.com/deeplearning4j/dl4j-examples/blob/master/pom.xml#L37-L45
And uncommenting the right version of ALE here, which hasn't changed:
https://github.com/deeplearning4j/dl4j-examples/blob/master/rl4j-examples/pom.xml#L105

aboulang2002 · 2019-08-19T13:43:41Z

For ~/rl4j-data being empty, I temporarily removed the video recording. I plan to use the VideoRecorder in my other PR.
I also removed the info file. Do you know if it's still used? It seems to be always the same data, except a timestamp.

saudet · 2019-08-19T14:02:21Z

It should contain information about the training, yes, most importantly the cumulative rewards, but other things as well. If it wasn't doing that, then it's a bug...

aboulang2002 · 2019-08-19T14:08:55Z

It doesn't. Each line seems to contain the config used (which never changes), the stepCounter and a timestamp:
{"trainingName":"A3CDiscreteConv","mdpName":"ALEMDP","conf":{"seed":123,"maxEpochStep":10000,"maxStep":8000000,"numThread":8,"nstep":32,"updateStart":500,"rewardFactor":0.1,"gamma":0.99,"errorClamp":10.0,"targetDqnUpdateFreq":-1},"stepCounter":4154,"millisTime":1566219515877}

Maybe it's the stats file you have in mind?
Example line from the stat file:
{"stepCounter":2782,"epochCounter":3,"reward":1.0,"episodeLength":661,"score":-0.03606547849873702}

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

aboulang2002 · 2019-08-19T14:54:29Z

Summary of changes:

Added interfaces to events
Added common data to epoch events
Added missing headers & javadocs

saudet · 2019-08-20T00:23:42Z

Right, the config is there too. We can limit its output to a single line, but either way, right now it's not outputting anything and that's an issue.

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

aboulang2002 · 2019-08-29T13:31:26Z

I re-introduced the training progress event in order to fix the missing info log file.
I also fixed the bug (see #8107) in QLearningDiscrete. I'd appreciate if someone could double-check the fix. I used the breakout ALE ROM and let it run a few hours. It seems to learn but, before my fix, it seemed to learn too, so I hope I'm not breaking things instead of fixing them.

Summary of changes:

re-introduced a progress event
updated unit tests to reflect changes
fixed issue The output from the current DQN and target DQN #8107

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

…oulang2002/deeplearning4j into ab2002_asynclearning_listener

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

aboulang2002 · 2019-08-30T15:37:42Z

Previous commit:

Resolved conflict (I used github directly. Did not realize the commit would not be signed off... sorry for that)
Removed the class TrainingEvent; it was useless and simpler is better.

saudet · 2019-08-31T02:45:45Z

@flyheroljg Does this fix your issue?

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

aboulang2002 · 2019-09-02T22:30:27Z

Previous commit:

Removed the param from onTrainingStart() and onTrainingEnd(), adjusted code / unit tests.

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

aboulang2002 · 2019-09-09T20:35:33Z

I have removed the 'event' classes used in the calls to TrainingListener and replaced them with an interface 'IEpochTrainer' that contains what's common between Learning and AsyncThread.
In DataManagerTrainingListener, while making a final checkup, I realized that some features were missing; Video recording and model saving. While we'll need to record videos differently later, I don't want to break things in the meantime so, I added the missing calls to startMonitor() and stopMonitor().

There was also a call to policy.play() in AsyncLearning. I left this one out because it didn't seem to do anything.

Changes summary:

Removed event classes and changed TrainingListener accordingly
Introduced IEpochTrainer with what's common between Learning and AsyncThread
In DataManager.save(): changed parameter type from Learning to ILearning
DataManagerTrainingListener: added missing call to dataManager.save(learning) and video recording
Added getHistoryProcessor() to ILearning
Added DataManagerTrainingListener unit tests

saudet · 2019-09-17T01:29:53Z

This is looking good! Thanks! The only thing is that I don't seem to be able to make the Cartpole example convergence anymore. Could you move the "fixes" for QLearning in another pull request to get this processed separately and get this PR merged right away?

aboulang2002 · 2019-09-17T23:50:51Z

Thanks @saudet and good catch! I will try to fix the issue and open a new PR.
Also, thanks for your feedback on this PR, I greatly appreciated it.

saudet · 2019-09-18T07:31:57Z

@AlexDBlack I had to create a CQ because this PR changes over 1000 lines:
https://dev.eclipse.org/ipzilla/show_bug.cgi?id=20832
No need to wait after it to merge this though. Please feel free to merge if it's OK with you.

AlexDBlack

Only quickly scanned, but LGTM 👍

Code clarity: Extracted parts of run() into private methods

cf14d27

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

Added listener pattern to async learning

bf453cc

Signed-off-by: unknown <aboulang2002@yahoo.com>

aboulang2002 changed the title ~~[WIP] RL4J - AsyncTrainingListener~~ RL4J - AsyncTrainingListener Aug 8, 2019

aboulang2002 marked this pull request as ready for review August 8, 2019 00:02

aboulang2002 changed the title ~~RL4J - AsyncTrainingListener~~ [WIP] RL4J - AsyncTrainingListener Aug 9, 2019

saudet self-requested a review August 13, 2019 06:24

Merged all listeners logic

7ddc90d

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

aboulang2002 changed the title ~~[WIP] RL4J - AsyncTrainingListener~~ RL4J - AsyncTrainingListener Aug 16, 2019

saudet reviewed Aug 18, 2019

View reviewed changes

Added interface and common data to training events

9e69ad4

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

Fixed missing info log file

64f6ca5

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

aboulang2002 and others added 4 commits August 30, 2019 08:49

Merge branch 'master' into ab2002_asynclearning_listener

8cc8b12

Merge branch 'master' into ab2002_asynclearning_listener

33c9511

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

Merge branch 'ab2002_asynclearning_listener' of https://github.com/ab…

0bb082b

…oulang2002/deeplearning4j into ab2002_asynclearning_listener

Fixed bad merge; removed useless TrainingEvent

f0f2def

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

Removed param from training start/end event

848ec5d

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

Removed 'event' classes from the training listener

b327f97

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>

Reverted changes to QLearningDiscrete.setTarget()

fcf6659

saudet requested review from AlexDBlack and saudet September 18, 2019 07:23

saudet approved these changes Sep 18, 2019

View reviewed changes

AlexDBlack approved these changes Sep 19, 2019

View reviewed changes

AlexDBlack merged commit 59f1cbf into deeplearning4j:master Sep 19, 2019

aboulang2002 deleted the ab2002_asynclearning_listener branch September 21, 2019 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RL4J - AsyncTrainingListener #8072

RL4J - AsyncTrainingListener #8072

aboulang2002 commented Aug 3, 2019 •

edited

aboulang2002 commented Aug 3, 2019

aboulang2002 commented Aug 8, 2019

saudet commented Aug 8, 2019

aboulang2002 commented Aug 8, 2019 •

edited

saudet commented Aug 12, 2019

aboulang2002 commented Aug 16, 2019

saudet left a comment

saudet Aug 18, 2019

aboulang2002 Aug 19, 2019

saudet Aug 19, 2019

saudet Aug 20, 2019

aboulang2002 Aug 29, 2019 •

edited

aboulang2002 Sep 3, 2019

saudet Sep 4, 2019

aboulang2002 Sep 4, 2019

saudet Sep 5, 2019

aboulang2002 Sep 9, 2019

aboulang2002 commented Aug 18, 2019

saudet commented Aug 19, 2019

aboulang2002 commented Aug 19, 2019

saudet commented Aug 19, 2019

aboulang2002 commented Aug 19, 2019

aboulang2002 commented Aug 19, 2019

saudet commented Aug 20, 2019

aboulang2002 commented Aug 29, 2019

aboulang2002 commented Aug 30, 2019 •

edited

saudet commented Aug 31, 2019

aboulang2002 commented Sep 2, 2019

aboulang2002 commented Sep 9, 2019

saudet commented Sep 17, 2019

aboulang2002 commented Sep 17, 2019

saudet commented Sep 18, 2019

AlexDBlack left a comment

RL4J - AsyncTrainingListener #8072

RL4J - AsyncTrainingListener #8072

Conversation

aboulang2002 commented Aug 3, 2019 • edited

What changes were proposed in this pull request?

How was this patch tested?

Quick checklist

aboulang2002 commented Aug 3, 2019

aboulang2002 commented Aug 8, 2019

saudet commented Aug 8, 2019

aboulang2002 commented Aug 8, 2019 • edited

saudet commented Aug 12, 2019

aboulang2002 commented Aug 16, 2019

saudet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aboulang2002 Aug 29, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aboulang2002 commented Aug 18, 2019

saudet commented Aug 19, 2019

aboulang2002 commented Aug 19, 2019

saudet commented Aug 19, 2019

aboulang2002 commented Aug 19, 2019

aboulang2002 commented Aug 19, 2019

saudet commented Aug 20, 2019

aboulang2002 commented Aug 29, 2019

aboulang2002 commented Aug 30, 2019 • edited

saudet commented Aug 31, 2019

aboulang2002 commented Sep 2, 2019

aboulang2002 commented Sep 9, 2019

saudet commented Sep 17, 2019

aboulang2002 commented Sep 17, 2019

saudet commented Sep 18, 2019

AlexDBlack left a comment

Choose a reason for hiding this comment

aboulang2002 commented Aug 3, 2019 •

edited

aboulang2002 commented Aug 8, 2019 •

edited

aboulang2002 Aug 29, 2019 •

edited

aboulang2002 commented Aug 30, 2019 •

edited