New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workspace optimizations and refactoring #4900
Conversation
6e0dd08
to
c5240a9
Compare
026e42e
to
4ff5996
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall (of course, what else!) The only real criticism I have is that changing the API so that "almost everything" takes a workspace manager (WSM) seems a little excessive at first glance. I'm sure you had your reasons, maybe it doesn't really work any other way. My hunch says it should've been possible to set a WSM config at layer and network level.
For instance, in the Keras preprocessor changes you had to change the signature of preprocess
, but the WSM isn't used at all in the body (which my IDE complains about).
Another idea worth exploring is to keep the old methods and wrap them like this:
public foo bar(INDArray input) {
return bar(input, LayerWorkspaceMgr.noWorkspaces())
}
This way you don't have to provide no workspaces manually all the time.
Best regards,
Max P. (stolen from raver)
@@ -174,7 +176,7 @@ public static INDArray reshapeTimeSeriesTo2d(INDArray labels) { | |||
return new Pair<>(labels2d, predicted2d); | |||
} | |||
|
|||
INDArray oneDMask = TimeSeriesUtils.reshapeTimeSeriesMaskToVector(outputMask); | |||
INDArray oneDMask = TimeSeriesUtils.reshapeTimeSeriesMaskToVector(outputMask, LayerWorkspaceMgr.noWorkspacesImmutable(), ArrayType.INPUT); //TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I can't judge what's left to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, just an old comment.
public enum FwdPassType { | ||
STANDARD, | ||
RNN_TIMESTEP, | ||
RNN_ACT_STORED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use RNN_ACTIVATIONS_STORED
, if that's what's meant by it. I assume that's only used internally, so doesn't hurt to be a little more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good idea... renamed. 👍
public ActivationLayer(Activation activation){ | ||
this(new Builder().activation(activation)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useful :)
* INPUT: The array set to the input field of a layer (i.e., input activations)<br> | ||
* ACTIVATIONS: The output activations for a layer's feed-forward method<br> | ||
* ACTIVATION_GRAD | ||
* FF_WORKING_MEM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really appreciate to have docs here! (Going through the PR top to bottom, I wasn't sure at first what ACTIVATION_GRAD
was.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, good catch. I intended to fill out the rest :)
return this; | ||
} | ||
|
||
public Builder with(ArrayType type, String workspaceName, WorkspaceConfiguration configuration){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's cool builder syntax 👍
It isn't the greatest for (developer) usability, sure - though doesn't impact end users unless they write custom layers etc as it's internal. I initially considered a "Layer.setWorkspaceManager" type design, but eventually rejected this as likely to introduce bugs. We'd be continually swapping out workspace managers in the layers (note different feed-forward methods have very different workspace configs), and if we forget to do it at some point, we can end up with crashes and really hard to track down bugs and performance issues.
Good question, but I also considered and rejected it. I did it in one place (time series utils) but I'm not adding this to the layers, again to avoid hidden performance/memory issues. Also we have too many Layer methods already, and I'm very hesitant to add even more :)
That was actually a bug (well, incomplete implementation) on my part, for those keras preprocessors. I've fixed it now. :) |
…arlier versions of guava
e8283ab
to
04cc528
Compare
COMPLETE BUT REQUIRES ADDITIONAL QA BEFORE MERGING
All deeplearning4j-core tests are passing (CPU) but additional testing is required.
ND4J PR to go with this: https://github.com/deeplearning4j/nd4j/pull/2823
This PR overhauls workspaces in DL4J, to improve performance and memory use.
It also adds a lot of workspace validation, to detect invalid workspace use (bugs, bad layer implementations etc).
Goals here:
a. Workspace-related code had become unweildy in MultiLayerNetwork and ComputationGraph FF/BP methods, and has been simplified (and made more maintainable)
b. A lot of additional workspace-related validation has been added - this should make it considerably easier to detect workspace-related bugs in the future (due to bugs in layers, etc)
In terms of design:
a. They are built with workspaces in mind from scratch.
b. Reduced code redundancy (we had essentially the same FF loops in a number of places, with minor variants)
Also fixes: