[nnpack] update && support more op #4519

Merged
merged 6 commits into from Jan 8, 2017

Projects

None yet

3 participants

@tornadomeet
Contributor

this pr is moved from #4373 @mli @clcarwin

- algorithm = nnp_convolution_algorithm_implicit_gemm;
+ nnp_convolution_transform_strategy kts = nnp_convolution_transform_strategy_tuple_based;
+ nnp_status status = nnp_status_success;
+ if (batch_size == 1) {
@piiswrong
piiswrong Jan 4, 2017 Member

test is train here?

@tornadomeet
tornadomeet Jan 5, 2017 Contributor

@piiswrong yes, nnpack here support both train and test.

+ // nnp_fully_connected_output will do optimization for batch-size > 1
+ // but just found FullyConnected in NNPACK result is wrong when batch_size != 2^n
+ // so here only using NNPACK when batch_size = 2^n.
+ if ((batch_size > 1) && (!(batch_size & (batch_size - 1)))) {
@piiswrong
piiswrong Jan 4, 2017 Member

do this in create operator

@tornadomeet
tornadomeet Jan 5, 2017 edited Contributor

@piiswrong this cannot be done in fc op(but conv op can be done, and i had update in conv), because we cannot get batch size CreateOp, only if we change the interface like in conv.

@piiswrong
piiswrong Jan 7, 2017 Member

Could you change the interface? I don't see any harm in that.

@tornadomeet
tornadomeet Jan 7, 2017 Contributor

no probs. i'll change it.

+ use_nnpack = true;
+ }
+ // nnp_convolution_output will do optimization for batch-size > 1
+ if ((batch_size > 1) && (param_.stride[0] == 1) &&
@piiswrong
piiswrong Jan 4, 2017 Member

do this in create operator

@tornadomeet
tornadomeet Jan 5, 2017 edited Contributor

@piiswrong Done.

@clcarwin
Contributor
clcarwin commented Jan 5, 2017

@mli @tornadomeet

nnpack performance on android

cpu: 4xARM Cortex-A53 1.21GHz
system: armv7l 32bit android 4.4.4

network batch size nnpack+openblas(ms) openblas(ms) speedup
caffenet 1 1469 1540 1.05x
caffenet 2 3027 2456 0.81x
caffenet 4 4444 3941 0.89x
caffenet 8 7191 6898 0.96x
caffenet 16 12724 13224 1.04x
vgg16 1 5719 17567 3.07x
vgg16 2 12836 OOM
vgg16 4 16954 OOM
vgg16 8 27548 OOM
inception-bn 1 4275 6520 1.52x
inception-bn 2 8798 10308 1.17x
inception-bn 4 11667 17646 1.51x
inception-bn 8 17073 30554 1.79x
inception-bn 16 29097 54431 1.87x
inception-v3 1 12524 12388 0.99x
inception-v3 2 22660 22324 0.98x
inception-v3 4 40081 40017 0.99x
inception-v3 8 71303 70954 0.99x
gen_v3 1 3012 66429 20.05x
gen_v3 2 11724 141053 12.03x
gen_v3 4 16689 OOM
gen_v3 8 31948 OOM
gen_v3 16 57284 OOM

gen_v3: mxnet/example/neural-style/end_to_end/gen_v3.py

@tornadomeet
Contributor
tornadomeet commented Jan 5, 2017 edited

@clcarwin thanks very much! it seems NNPACK is very useful for 3*3 conv, like gen_v3,vgg. another, it semms that nnpack will save memory, why

@tornadomeet
Contributor

@piiswrong had moved use_nnpack to CreateOp() of convolution operator.

@piiswrong
Member

LGTM after changing FC layer CreateOp

tornadomeet added some commits Jan 4, 2017
@tornadomeet tornadomeet [nnpack]docs and makefile 6cefc30
@tornadomeet tornadomeet add missing files f386480
@tornadomeet tornadomeet udpate with recently docs change a50bbd3
@tornadomeet tornadomeet move use_nnpack to when creating op c0d2817
@tornadomeet tornadomeet change fully_connected createop interface to get batch-size
381534c
@tornadomeet
Contributor

@piiswrong done.

@piiswrong piiswrong Merge branch 'master' into nnpack2
cec9f4b
@piiswrong piiswrong merged commit 29307c2 into dmlc:master Jan 8, 2017

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
default Build finished.
Details
@tornadomeet tornadomeet deleted the tornadomeet:nnpack2 branch Jan 9, 2017
@tornadomeet
Contributor

@clcarwin would your add a description to https://github.com/dmlc/mxnet/blob/master/docs/how_to/nnpack.md of how to build/install nnapck with MXNet on ARM/android, this should be very helpful for some newbies. thanks.

@clcarwin
Contributor

@tornadomeet I use Android.mk and Application.mk to build my android mxnet projects for using ndk-build system. It is incompatible and more complex than mxnet's amalgamation method. My experience may not suit for newbies.

If mxnet team want to deprecate amalgamation method someday, I can pr ndk-build method.

@tornadomeet
Contributor

@clcarwin got it, thanks.

@rravu3 rravu3 pushed a commit to rravu3/mxnet that referenced this pull request Jan 21, 2017
@tornadomeet tornadomeet + Rahul Ravu [nnpack] update && support more op (#4519)
* [nnpack]docs and makefile

* add missing files

* udpate with recently docs change

* move use_nnpack to when creating op

* change fully_connected createop interface to get batch-size
b732569
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment