You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use your ChannelwiseConvolution to implement mobilenet. However i only can get (2s/image without mkl, 0.90 with mkl) in cpu while tensorflow-mobilenet is 0.059s/image. So i ask for some idea to improve speed in cpu.
The text was updated successfully, but these errors were encountered:
The running speed of channel wise convolution operation really depends on the parallelization strategy.
My implementation uses BatchGEMM which is slightly faster on very small feature maps (e.g. size=7x7 or size=14x14).
While for larger feature maps (e.g. size=56x56 & size=28x28), I'd recommond you to use the official convolutional layer with option 'num_group = num_filter'.
But still, I don't think it can achieve very high training/testing speed by only using these high-level interfaces. A deeply optimized CUDA code is necessary for fast channel wise convolution. : )
@cypw Thanks. I use official convolutional layer with group to deal with larger feature maps. I can get 0.401s/image(cpu) in mxnet with mkl. It can be used in some of my classification task.
I use your ChannelwiseConvolution to implement mobilenet. However i only can get (2s/image without mkl, 0.90 with mkl) in cpu while tensorflow-mobilenet is 0.059s/image. So i ask for some idea to improve speed in cpu.
The text was updated successfully, but these errors were encountered: