Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspace optimizations and refactoring #4900

Merged
merged 79 commits into from Apr 24, 2018

Conversation

@AlexDBlack
Copy link
Contributor

commented Apr 6, 2018

COMPLETE BUT REQUIRES ADDITIONAL QA BEFORE MERGING
All deeplearning4j-core tests are passing (CPU) but additional testing is required.

ND4J PR to go with this: deeplearning4j/nd4j#2823

This PR overhauls workspaces in DL4J, to improve performance and memory use.
It also adds a lot of workspace validation, to detect invalid workspace use (bugs, bad layer implementations etc).

Goals here:

  1. Eliminate performance bottleneck of migrateInputs() etc: all arrays are allocated in the final workspace they should be in
  2. Reduce memory requirements
  3. Improve maintainability and considerably simplify workspace usage:
    a. Workspace-related code had become unweildy in MultiLayerNetwork and ComputationGraph FF/BP methods, and has been simplified (and made more maintainable)
    b. A lot of additional workspace-related validation has been added - this should make it considerably easier to detect workspace-related bugs in the future (due to bugs in layers, etc)

In terms of design:

  1. Workspaces-related code should be handled at the level of array types (input, activations, etc) not workspace names ("loop_ff" etc). Layers etc don't need to know/care about the workspace configuration - simply that input arrays should use the INPUT enumeration, activation arrays should use the ACTIVATION enumeration, etc.
  2. Layers, preprocessors, etc will know about workspaces. This means API changes
  3. I've also slightly simplified the layer API by cleaning up a number of superfluous methods (unnecessary activate/output methods)
  4. We're looking at substantial rewrite of the MLN and CG feedforward and backprop methods:
    a. They are built with workspaces in mind from scratch.
    b. Reduced code redundancy (we had essentially the same FF loops in a number of places, with minor variants)
  5. Important note: I've deprecated WorkspaceMode.SINGLE/SEPARATE and added WorkspaceMode.ENABLED - single/separate modes don't really exist anymore

Also fixes:

@maxpumperla
Copy link

left a comment

LGTM overall (of course, what else!) The only real criticism I have is that changing the API so that "almost everything" takes a workspace manager (WSM) seems a little excessive at first glance. I'm sure you had your reasons, maybe it doesn't really work any other way. My hunch says it should've been possible to set a WSM config at layer and network level.

For instance, in the Keras preprocessor changes you had to change the signature of preprocess, but the WSM isn't used at all in the body (which my IDE complains about).

Another idea worth exploring is to keep the old methods and wrap them like this:

public foo bar(INDArray input) {
   return bar(input, LayerWorkspaceMgr.noWorkspaces())
}

This way you don't have to provide no workspaces manually all the time.

Best regards,
Max P. (stolen from raver)

@@ -174,7 +176,7 @@ public static INDArray reshapeTimeSeriesTo2d(INDArray labels) {
return new Pair<>(labels2d, predicted2d);
}

INDArray oneDMask = TimeSeriesUtils.reshapeTimeSeriesMaskToVector(outputMask);
INDArray oneDMask = TimeSeriesUtils.reshapeTimeSeriesMaskToVector(outputMask, LayerWorkspaceMgr.noWorkspacesImmutable(), ArrayType.INPUT); //TODO

This comment has been minimized.

Copy link
@maxpumperla

maxpumperla Apr 16, 2018

At this point I can't judge what's left to do here.

This comment has been minimized.

Copy link
@AlexDBlack

AlexDBlack Apr 17, 2018

Author Contributor

Fixed, just an old comment.

public enum FwdPassType {
STANDARD,
RNN_TIMESTEP,
RNN_ACT_STORED

This comment has been minimized.

Copy link
@maxpumperla

maxpumperla Apr 16, 2018

maybe use RNN_ACTIVATIONS_STORED, if that's what's meant by it. I assume that's only used internally, so doesn't hurt to be a little more explicit.

This comment has been minimized.

Copy link
@AlexDBlack

AlexDBlack Apr 17, 2018

Author Contributor

Yep, good idea... renamed. 👍

public ActivationLayer(Activation activation){
this(new Builder().activation(activation));
}

This comment has been minimized.

Copy link
@maxpumperla
* INPUT: The array set to the input field of a layer (i.e., input activations)<br>
* ACTIVATIONS: The output activations for a layer's feed-forward method<br>
* ACTIVATION_GRAD
* FF_WORKING_MEM

This comment has been minimized.

Copy link
@maxpumperla

maxpumperla Apr 16, 2018

I'd really appreciate to have docs here! (Going through the PR top to bottom, I wasn't sure at first what ACTIVATION_GRAD was.)

This comment has been minimized.

Copy link
@AlexDBlack

AlexDBlack Apr 17, 2018

Author Contributor

Fixed, good catch. I intended to fill out the rest :)

return this;
}

public Builder with(ArrayType type, String workspaceName, WorkspaceConfiguration configuration){

This comment has been minimized.

Copy link
@maxpumperla

maxpumperla Apr 16, 2018

that's cool builder syntax 👍

@AlexDBlack

This comment has been minimized.

Copy link
Contributor Author

commented Apr 17, 2018

@maxpumperla

The only real criticism I have is that changing the API so that "almost everything" takes a workspace manager (WSM) seems a little excessive at first glance.

I'm sure you had your reasons, maybe it doesn't really work any other way. My hunch says it should've been possible to set a WSM config at layer and network level.

It isn't the greatest for (developer) usability, sure - though doesn't impact end users unless they write custom layers etc as it's internal.

I initially considered a "Layer.setWorkspaceManager" type design, but eventually rejected this as likely to introduce bugs. We'd be continually swapping out workspace managers in the layers (note different feed-forward methods have very different workspace configs), and if we forget to do it at some point, we can end up with crashes and really hard to track down bugs and performance issues.
tl;dr I went with explicit and stateless as it's the least likely to introduce bugs and hidden performance/memory issues (unnecessary/unintended detaches), at the expense of those API changes.

Another idea worth exploring is to keep the old methods and wrap them like this:

Good question, but I also considered and rejected it. I did it in one place (time series utils) but I'm not adding this to the layers, again to avoid hidden performance/memory issues. Also we have too many Layer methods already, and I'm very hesitant to add even more :)

For instance, in the Keras preprocessor changes you had to change the signature of preprocess, but the WSM isn't used at all in the body (which my IDE complains about).

That was actually a bug (well, incomplete implementation) on my part, for those keras preprocessors. I've fixed it now. :)

@AlexDBlack AlexDBlack force-pushed the ab_workspace_opt branch from e8283ab to 04cc528 Apr 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.