Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RL4J - AsyncTrainingListener #8072

Merged
merged 12 commits into from Sep 19, 2019
Merged

RL4J - AsyncTrainingListener #8072

merged 12 commits into from Sep 19, 2019

Conversation

@aboulang2002
Copy link
Contributor

aboulang2002 commented Aug 3, 2019

What changes were proposed in this pull request?

In this PR, I add the async equivalent of SyncTrainingListener.
I regrouped the common code in a few generic classes in the org.deeplearning4j.rl4j.learning.listener namespace and added the listener pattern to AsyncLearning and AsyncThread. The dependency to DataManager has been removed (with the exception of IDataManager.StatEntry) from all learning classes.

How was this patch tested?

Unit tests

Quick checklist

The following checklist helps ensure your PR is complete:

  • Eclipse Contributor Agreement signed, and signed commits - see IP Requirements page for details
  • Reviewed the Contributing Guidelines and followed the steps within.
  • Created tests for any significant new code additions.
  • Relevant tests for your changes are passing.
Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>
@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Aug 3, 2019

Previous commit: I extracted parts of AsyncThread.run() to private methods to help with code clarity. There is some kind of inverse-order-logic going on in there, which is not bad per se, but the logic should be clearer.
Also, I had to change a bit the code around setExpReplay in QLearning in order to make the unit tests compile.

Signed-off-by: unknown <aboulang2002@yahoo.com>
@aboulang2002 aboulang2002 changed the title [WIP] RL4J - AsyncTrainingListener RL4J - AsyncTrainingListener Aug 8, 2019
@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Aug 8, 2019

Change summary:

  • Removed IDataManager from all learning classes & deprecated ctors that uses it
  • Added async training listeners & extracted common code from sync training learning into org.deeplearning4j.rl4j.learning.listener
  • Added listeners support to AsyncLearning and AsyncThread
  • Moved mdp field from derived classes down to AsyncThread base class (required by the unit tests)
  • AsyncGlobal: changed setRunning(false) to terminate() and fixed a possible memory leak
  • Added interface to Policy (also required by the unit tests)
@aboulang2002 aboulang2002 marked this pull request as ready for review Aug 8, 2019
@saudet

This comment has been minimized.

Copy link
Member

saudet commented Aug 8, 2019

I'm starting to wonder, what's so different between SyncTrainingListener and AsyncTrainingListener that they need separate class hierarchies? And which one would get used for algorithms like A2C that are multithreaded, but not "async"?

@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Aug 8, 2019

Edit: I think I can come up with a TrainingListener that would work for both sync and async case. I won't be able to work on this over the weekend but I'll update this PR in the coming days.

@aboulang2002 aboulang2002 changed the title RL4J - AsyncTrainingListener [WIP] RL4J - AsyncTrainingListener Aug 9, 2019
@saudet

This comment has been minimized.

Copy link
Member

saudet commented Aug 12, 2019

Yes, ok, I was about to say that it doesn't
matter where events are fired from, and we can have dummy events that are not used for some algorithms. In any case, we'll have a better idea of what we need once we start implementing more algorithms (A2C, PPO, IMPALA, etc), so it doen't need to be perfect now.

@saudet saudet self-requested a review Aug 13, 2019
Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>
@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Aug 16, 2019

  • Finally, I kept the trainingStart / end; they're simple and straightfoward so why not keep them.
  • Renamed onEpochStarted to onNewEpoch and onEpochFinished to onEpochTrainingResult
  • Removed onTrainingProgress. I think I will instead add (not in this PR) an event in AsyncGlobal (that keeps the 'master' NN) and in the setTarget of a more derived class of SyncLearning.
  • Removed SyncListener / AsyncListener listeners. Merged all logic in Listener.
  • I changed/added protected build*Event() methods in Learning and AsyncThread. That way a user can override these methods if he has a more specific need.

And as you said, @saudet, we'll probably have to change a few things as we continue this refac.

@aboulang2002 aboulang2002 changed the title [WIP] RL4J - AsyncTrainingListener RL4J - AsyncTrainingListener Aug 16, 2019
Copy link
Member

saudet left a comment

The changes look OK, but I'm not getting anything in ~/rl4j-data anymore by default when launching say the A3CALE example. What's up with that?

/**
* The base definition of all training events
*/
public class TrainingEvent {

This comment has been minimized.

Copy link
@saudet

saudet Aug 18, 2019

Member

Let's declare this abstract? Since it doesn't hold any data or define any interface though, sounds like we could use generics for that instead. What do you think?

This comment has been minimized.

Copy link
@aboulang2002

aboulang2002 Aug 19, 2019

Author Contributor

In a previous commit, I used generics. I used a different generic for each event (training start, end, epoch start, end) because one might need a different event for training start than for epoch start. Overall, it seemed a bit overkill to me but, on the other hand, classes that will use the listener will look cleaner. I will add it.

This comment has been minimized.

Copy link
@saudet

saudet Aug 19, 2019

Member

It should define some sort of generic interface or contain data common to all events then that would make sense.

This comment has been minimized.

Copy link
@saudet

saudet Aug 20, 2019

Member

I didn't mean literally a Java interface. I mean, the user will get a TrainingEvent or ITrainingEvent, whatever, doesn't matter, but that has no methods, no data. What are we supposed to do with it?

This comment has been minimized.

Copy link
@aboulang2002

aboulang2002 Aug 29, 2019

Author Contributor

Suppose that a user has a custom listener that need some value or some object instance when calling for example onTrainingStarted(). That user will need to create a class that inherit TrainingEvent or implements ITrainingEvent, override Learning.buildTrainingStartedEvent() and make it return an instance of his custom TrainingEvent.
TrainingEvent is the base event, and it's empty because RL4J doesn't use it, but a user might need to inherit from it.
But that class indeed look useless. We could remove TrainingEvent and just leave ITrainingEvent, and make buildTrainingStartedEvent() and buildTrainingFinishedEvent() return null instead. That would remove the empty, useless-looking class but a user could still pass custom events to his custom listener.

This comment has been minimized.

Copy link
@aboulang2002

aboulang2002 Sep 3, 2019

Author Contributor

I see what you mean but in the sync case, there is only SyncLearning and in the async case, there are AsyncLearning, AsyncThread and maybe AsyncGlobal.
What do you think if we design this listener and any future events/listeners in RL4J with this design rule:

Everything that change (i.e. is a different value or instance) between calls are received through a getter of the event parameter. Everything else should be passed as a ctor argument of the particular listener.

What I mean is this:

public class MyCustomTrainingListener implements TrainingListener {
    private final MyCustomLearning learning;
	
    public MyCustomTrainingListener(MyCustomLearning learning) { // Type safety at compile time
        this.learning = learning; 
    }
	
    @Override
    public ListenerResponse onNewEpoch(IEpochTrainingEvent event) {
        DoSomethingWith(learning.MyCustomField);
        return ListenerResponse.CONTINUE;
    }
}

instead of having to cast a learning member of event:

public class MyCustomTrainingListener implements TrainingListener {

    @Override
    public ListenerResponse onNewEpoch(IEpochTrainingEvent event) {
        MyCustomLearning learning = ((MyCustomEpochTrainingEvent)event).getLearning(); // 'Type safety' at run time
        DoSomethingWith(learning.MyCustomField);
        return ListenerResponse.CONTINUE;
    }
}

Also, in the case of onNewEpoch and onEpochTrainingResult, there is no way to find the calling AsyncThread instance, we don't have the thread num or the object. Maybe we should add that.

This comment has been minimized.

Copy link
@saudet

saudet Sep 4, 2019

Member

Ah, I see, there is a missing base class for Learning. If we had that, would we still need EpochTrainingEvent? Or is the idea with async learning to keep running the threads in the background even while calling the listeners and so we want to take snapshots of the state?

This comment has been minimized.

Copy link
@aboulang2002

aboulang2002 Sep 4, 2019

Author Contributor

The base of the problem is that we want the same listener be able to fetch the info it needs no matter who calls it and sync and async behaves very differently. We could create an interface that contains what is common between SyncLearning and AsyncThread and pass that to the event but, if a user tries to implement a custom learning algorithm and he needs to pass some info to a custom listener, he should be able to do so without having to add a new member to the caller's state (for example, because it wouldn't make sense from the use case's point of view).
Can you tell me for what reason you would like the EpochTrainingEvent gone?

This comment has been minimized.

Copy link
@saudet

saudet Sep 5, 2019

Member

It doesn't need to be gone, but it does feel strange... I think we could create a parent ILearning interface or something with these properties instead and pass that to the listeners? Then they could cast that reference however they need to get more information, when needed, without having to create temporary objects like EpochTrainingEvent.

This comment has been minimized.

Copy link
@aboulang2002

aboulang2002 Sep 9, 2019

Author Contributor

I have removed the 'Event' classes. See comment below for details

@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Aug 18, 2019

Please, @saudet, can you point me to a guide on how to get ALE to run with dl4j-examples?
(I work on Windows with intelliJ if that makes a difference.)

@saudet

This comment has been minimized.

Copy link
Member

saudet commented Aug 19, 2019

@aboulang2002 It's basically just changing the versions here:
https://github.com/deeplearning4j/dl4j-examples/blob/master/pom.xml#L37-L45
And uncommenting the right version of ALE here, which hasn't changed:
https://github.com/deeplearning4j/dl4j-examples/blob/master/rl4j-examples/pom.xml#L105

@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Aug 19, 2019

For ~/rl4j-data being empty, I temporarily removed the video recording. I plan to use the VideoRecorder in my other PR.
I also removed the info file. Do you know if it's still used? It seems to be always the same data, except a timestamp.

@saudet

This comment has been minimized.

Copy link
Member

saudet commented Aug 19, 2019

It should contain information about the training, yes, most importantly the cumulative rewards, but other things as well. If it wasn't doing that, then it's a bug...

@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Aug 19, 2019

It doesn't. Each line seems to contain the config used (which never changes), the stepCounter and a timestamp:
{"trainingName":"A3CDiscreteConv","mdpName":"ALEMDP","conf":{"seed":123,"maxEpochStep":10000,"maxStep":8000000,"numThread":8,"nstep":32,"updateStart":500,"rewardFactor":0.1,"gamma":0.99,"errorClamp":10.0,"targetDqnUpdateFreq":-1},"stepCounter":4154,"millisTime":1566219515877}

Maybe it's the stats file you have in mind?
Example line from the stat file:
{"stepCounter":2782,"epochCounter":3,"reward":1.0,"episodeLength":661,"score":-0.03606547849873702}

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>
@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Aug 19, 2019

Summary of changes:

  • Added interfaces to events
  • Added common data to epoch events
  • Added missing headers & javadocs
@saudet

This comment has been minimized.

Copy link
Member

saudet commented Aug 20, 2019

Right, the config is there too. We can limit its output to a single line, but either way, right now it's not outputting anything and that's an issue.

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>
@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Aug 29, 2019

I re-introduced the training progress event in order to fix the missing info log file.
I also fixed the bug (see #8107) in QLearningDiscrete. I'd appreciate if someone could double-check the fix. I used the breakout ALE ROM and let it run a few hours. It seems to learn but, before my fix, it seemed to learn too, so I hope I'm not breaking things instead of fixing them.

Summary of changes:

  • re-introduced a progress event
  • updated unit tests to reflect changes
  • fixed issue #8107
Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>
Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>
@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Aug 30, 2019

Previous commit:

  • Resolved conflict (I used github directly. Did not realize the commit would not be signed off... sorry for that)
  • Removed the class TrainingEvent; it was useless and simpler is better.
@saudet

This comment has been minimized.

Copy link
Member

saudet commented Aug 31, 2019

@flyheroljg Does this fix your issue?

Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>
@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Sep 2, 2019

Previous commit:

  • Removed the param from onTrainingStart() and onTrainingEnd(), adjusted code / unit tests.
Signed-off-by: Alexandre Boulanger <aboulang2002@yahoo.com>
@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Sep 9, 2019

I have removed the 'event' classes used in the calls to TrainingListener and replaced them with an interface 'IEpochTrainer' that contains what's common between Learning and AsyncThread.
In DataManagerTrainingListener, while making a final checkup, I realized that some features were missing; Video recording and model saving. While we'll need to record videos differently later, I don't want to break things in the meantime so, I added the missing calls to startMonitor() and stopMonitor().

There was also a call to policy.play() in AsyncLearning. I left this one out because it didn't seem to do anything.

Changes summary:

  • Removed event classes and changed TrainingListener accordingly
  • Introduced IEpochTrainer with what's common between Learning and AsyncThread
  • In DataManager.save(): changed parameter type from Learning to ILearning
  • DataManagerTrainingListener: added missing call to dataManager.save(learning) and video recording
  • Added getHistoryProcessor() to ILearning
  • Added DataManagerTrainingListener unit tests
@saudet

This comment has been minimized.

Copy link
Member

saudet commented Sep 17, 2019

This is looking good! Thanks! The only thing is that I don't seem to be able to make the Cartpole example convergence anymore. Could you move the "fixes" for QLearning in another pull request to get this processed separately and get this PR merged right away?

@aboulang2002

This comment has been minimized.

Copy link
Contributor Author

aboulang2002 commented Sep 17, 2019

Thanks @saudet and good catch! I will try to fix the issue and open a new PR.
Also, thanks for your feedback on this PR, I greatly appreciated it.

@saudet saudet requested review from AlexDBlack and saudet Sep 18, 2019
@saudet
saudet approved these changes Sep 18, 2019
@saudet

This comment has been minimized.

Copy link
Member

saudet commented Sep 18, 2019

@AlexDBlack I had to create a CQ because this PR changes over 1000 lines:
https://dev.eclipse.org/ipzilla/show_bug.cgi?id=20832
No need to wait after it to merge this though. Please feel free to merge if it's OK with you.

Copy link
Contributor

AlexDBlack left a comment

Only quickly scanned, but LGTM 👍

@AlexDBlack AlexDBlack merged commit 59f1cbf into eclipse:master Sep 19, 2019
1 check failed
1 check failed
eclipsefdn/eca The author(s) of the pull request is not covered by necessary legal agreements in order to proceed.
Details
@aboulang2002 aboulang2002 deleted the aboulang2002:ab2002_asynclearning_listener branch Sep 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.