Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DepthwiseConvolution2DLayer.java #6885

Closed
wants to merge 0 commits into from
Closed

Update DepthwiseConvolution2DLayer.java #6885

wants to merge 0 commits into from

Conversation

sorech
Copy link

@sorech sorech commented Dec 17, 2018

interpret weight size dimensions correctly (like other conv layers)

@raver119
Copy link
Contributor

Please explain this change.

@sorech
Copy link
Author

sorech commented Dec 17, 2018

the standard in other conv layers is
int outDepth = (int) weights.size(0);
int inDepth = (int) weights.size(1);
int kH = (int) weights.size(2);
int kW = (int) weights.size(3);
here depthmultiplier replaces outdepth

Runningit without this change it throws an error claiming to want input depth equal to whatever you set as kernel height.

@maxpumperla
Copy link
Contributor

@sorech I appreciate the effort, but then you should go all the way. The weights come in a certain order, see here:

val weightsShape = new long[] {kernel[0], kernel[1], inputDepth, depthMultiplier};

which have been chosen like this since they reflect our c++ impl:

auto weights = INPUT_VARIABLE(1); // [kH, kW, iC, mC] always

Now, unless you want to change everything down to that level (including tests, because what you did right now will break this layer), I'd suggest not to do it. The weights aren't magically in the order you want them to be :D.

@sorech
Copy link
Author

sorech commented Dec 17, 2018

That all sound great, but it does not work with the shape of the weight matrix I get using nn.conf.DepthwiseConvolution2D which in turn calls nn.params.DepthwiseConvolutionParamInitializer.

The conf is:
depthMultiplier = 5
kernelSize = [4, 2]
nIn = 7
and the resulting weight tensor has shapeInformation = [4,5,7,4,2,56,8,2,1,0,1,99].
That maps poorly to the expected shape of the {kernel[0], kernel[1], inputDepth, depthMultiplier}

Should the fix rather go in DepthwiseConvolutionParamInitializer?

@AlexDBlack
Copy link
Contributor

@sorech Maybe start with a test that reproduces the issue?
Given we already have tests for this layer (that are passing) the problem you are running into might not be exactly what it seems...
https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-core/src/test/java/org/deeplearning4j/gradientcheck/CNNGradientCheckTest.java#L1184-L1250

@sorech
Copy link
Author

sorech commented Dec 20, 2018

Sorry for the delay, been busy. That test uses a different code path (initializeParams = true) in.

if (initializeParams) {
Distribution dist = Distributions.createDistribution(layerConf.getDist());
int[] kernel = layerConf.getKernelSize();
int[] stride = layerConf.getStride();
val inputDepth = layerConf.getNIn();
double fanIn = inputDepth * kernel[0] * kernel[1];
double fanOut = depthMultiplier * kernel[0] * kernel[1] / ((double) stride[0] * stride[1]);
val weightsShape = new long[] {kernel[0], kernel[1], inputDepth, depthMultiplier};
return WeightInitUtil.initWeights(fanIn, fanOut, weightsShape, layerConf.getWeightInit(), dist, 'c',
weightView);
} else {
int[] kernel = layerConf.getKernelSize();
return WeightInitUtil.reshapeWeights(
new long[] {depthMultiplier, layerConf.getNIn(), kernel[0], kernel[1]}, weightView, 'c');
}
}

While my code uses initializeParams = false so shapes are given by line 209 instead of 202 that is used in the test.

As you can see 209 uses {depthMultiplier, layerConf.getNIn(), kernel[0], kernel[1]}
instead of {kernel[0], kernel[1], inputDepth, depthMultiplier}

@raver119
Copy link
Contributor

Ah, so that's what should be fixed then :)

@sorech
Copy link
Author

sorech commented Dec 20, 2018

@raver119 Will you take care of it?

@maxpumperla
Copy link
Contributor

@sorech @raver119 I can take care of it. cheers

@maxpumperla
Copy link
Contributor

@sorech maybe file an issue for this and I start a new PR, ok?

@zhangy10
Copy link

@maxpumperla Hi, do you have the DepthwiseConvolution2D examples? I am looking for how to use DepthwiseConvolution2D to build a MobileNet (see open issue #8423).

So, is it possible that using DepthwiseConvolution2D to build up a MobileNet? Any examples would be great. Thanks.

@maxpumperla
Copy link
Contributor

@zhangy10 unfortunately we don't have any advanced examples for this layer, but all gradient tests pass and so does Keras model import. The layer can be used pretty much the same way as many other conv 2d layers. If your goal is to replicate a MobileNet architecture, that shouldn't cause you more problems than other components of this network.

Is there anything in particular that you're struggling with? What have you tried so far?

@zhangy10
Copy link

@AlexDBlack Hi Alex, do you have any examples to show how to use DepthwiseConvolution2D in the latest release version? And is it good to build a simple MobileNet?

Many thanks.

@AlexDBlack
Copy link
Contributor

@zhangy10 No examples, because it should be possible to use a depthwise convolution layer anywhere you use a "normal" convolution layer. We have unit tests though:
https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-core/src/test/java/org/deeplearning4j/gradientcheck/CNNGradientCheckTest.java#L1178-L1246

As for whether your can use it in a mobilenet architecture - I've never tried it, but I don't see why not. Whether it's better than standard conv2d layers is something you would have to test.

@zhangy10
Copy link

@AlexDBlack Thanks for that, I will have a look at the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants