Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspace optimizations and refactoring #4900

Merged
merged 79 commits into from Apr 24, 2018
Merged

Workspace optimizations and refactoring #4900

merged 79 commits into from Apr 24, 2018

Conversation

AlexDBlack
Copy link
Contributor

@AlexDBlack AlexDBlack commented Apr 6, 2018

COMPLETE BUT REQUIRES ADDITIONAL QA BEFORE MERGING
All deeplearning4j-core tests are passing (CPU) but additional testing is required.

ND4J PR to go with this: https://github.com/deeplearning4j/nd4j/pull/2823

This PR overhauls workspaces in DL4J, to improve performance and memory use.
It also adds a lot of workspace validation, to detect invalid workspace use (bugs, bad layer implementations etc).

Goals here:

  1. Eliminate performance bottleneck of migrateInputs() etc: all arrays are allocated in the final workspace they should be in
  2. Reduce memory requirements
  3. Improve maintainability and considerably simplify workspace usage:
    a. Workspace-related code had become unweildy in MultiLayerNetwork and ComputationGraph FF/BP methods, and has been simplified (and made more maintainable)
    b. A lot of additional workspace-related validation has been added - this should make it considerably easier to detect workspace-related bugs in the future (due to bugs in layers, etc)

In terms of design:

  1. Workspaces-related code should be handled at the level of array types (input, activations, etc) not workspace names ("loop_ff" etc). Layers etc don't need to know/care about the workspace configuration - simply that input arrays should use the INPUT enumeration, activation arrays should use the ACTIVATION enumeration, etc.
  2. Layers, preprocessors, etc will know about workspaces. This means API changes
  3. I've also slightly simplified the layer API by cleaning up a number of superfluous methods (unnecessary activate/output methods)
  4. We're looking at substantial rewrite of the MLN and CG feedforward and backprop methods:
    a. They are built with workspaces in mind from scratch.
    b. Reduced code redundancy (we had essentially the same FF loops in a number of places, with minor variants)
  5. Important note: I've deprecated WorkspaceMode.SINGLE/SEPARATE and added WorkspaceMode.ENABLED - single/separate modes don't really exist anymore

Also fixes:

Copy link
Contributor

@maxpumperla maxpumperla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall (of course, what else!) The only real criticism I have is that changing the API so that "almost everything" takes a workspace manager (WSM) seems a little excessive at first glance. I'm sure you had your reasons, maybe it doesn't really work any other way. My hunch says it should've been possible to set a WSM config at layer and network level.

For instance, in the Keras preprocessor changes you had to change the signature of preprocess, but the WSM isn't used at all in the body (which my IDE complains about).

Another idea worth exploring is to keep the old methods and wrap them like this:

public foo bar(INDArray input) {
   return bar(input, LayerWorkspaceMgr.noWorkspaces())
}

This way you don't have to provide no workspaces manually all the time.

Best regards,
Max P. (stolen from raver)

@@ -174,7 +176,7 @@ public static INDArray reshapeTimeSeriesTo2d(INDArray labels) {
return new Pair<>(labels2d, predicted2d);
}

INDArray oneDMask = TimeSeriesUtils.reshapeTimeSeriesMaskToVector(outputMask);
INDArray oneDMask = TimeSeriesUtils.reshapeTimeSeriesMaskToVector(outputMask, LayerWorkspaceMgr.noWorkspacesImmutable(), ArrayType.INPUT); //TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I can't judge what's left to do here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, just an old comment.

public enum FwdPassType {
STANDARD,
RNN_TIMESTEP,
RNN_ACT_STORED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use RNN_ACTIVATIONS_STORED, if that's what's meant by it. I assume that's only used internally, so doesn't hurt to be a little more explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good idea... renamed. 👍

public ActivationLayer(Activation activation){
this(new Builder().activation(activation));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useful :)

* INPUT: The array set to the input field of a layer (i.e., input activations)<br>
* ACTIVATIONS: The output activations for a layer's feed-forward method<br>
* ACTIVATION_GRAD
* FF_WORKING_MEM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really appreciate to have docs here! (Going through the PR top to bottom, I wasn't sure at first what ACTIVATION_GRAD was.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, good catch. I intended to fill out the rest :)

return this;
}

public Builder with(ArrayType type, String workspaceName, WorkspaceConfiguration configuration){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's cool builder syntax 👍

@AlexDBlack
Copy link
Contributor Author

AlexDBlack commented Apr 17, 2018

@maxpumperla

The only real criticism I have is that changing the API so that "almost everything" takes a workspace manager (WSM) seems a little excessive at first glance.

I'm sure you had your reasons, maybe it doesn't really work any other way. My hunch says it should've been possible to set a WSM config at layer and network level.

It isn't the greatest for (developer) usability, sure - though doesn't impact end users unless they write custom layers etc as it's internal.

I initially considered a "Layer.setWorkspaceManager" type design, but eventually rejected this as likely to introduce bugs. We'd be continually swapping out workspace managers in the layers (note different feed-forward methods have very different workspace configs), and if we forget to do it at some point, we can end up with crashes and really hard to track down bugs and performance issues.
tl;dr I went with explicit and stateless as it's the least likely to introduce bugs and hidden performance/memory issues (unnecessary/unintended detaches), at the expense of those API changes.

Another idea worth exploring is to keep the old methods and wrap them like this:

Good question, but I also considered and rejected it. I did it in one place (time series utils) but I'm not adding this to the layers, again to avoid hidden performance/memory issues. Also we have too many Layer methods already, and I'm very hesitant to add even more :)

For instance, in the Keras preprocessor changes you had to change the signature of preprocess, but the WSM isn't used at all in the body (which my IDE complains about).

That was actually a bug (well, incomplete implementation) on my part, for those keras preprocessors. I've fixed it now. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants