Support multithreading in the CPU mode of Solver::Solve #79

kloudkl · 2014-02-07T07:29:09Z

In each iteration of Solver::Solve, there are four chances to accelerate the computation.
The first opportunity is the most complex one since Net::ForwardBackward invokes the Forward and Backward of all the layers that comprise a net.

      Dtype loss = net_->ForwardBackward(bottom_vec);

The second chance is more straightforward. An OpenMP directive is enough to parallelize the independent computation for each param_id.

      ComputeUpdateValue();

The only extra trick that is needed to deal with the next occasion is to distinguish CPU and GPU mode.

      net_->Update();

The last one involves a plain old OpenMP friendly nested for loop.

      Test();

kloudkl · 2014-02-07T08:40:22Z

The codes do not have to be changed. Thanks to Michael Rutter, multi-threaded OpenBLAS package is available on all the versions of Ubuntu since Precise (12.04).
Multi-threaded OpenBLAS backported to recent Ubuntu releases
Follow the steps to take advantage of the powerful acceleration:

sudo add-apt-repository ppa:marutter/rdev
sudo apt-get update
sudo apt-get install libopenblas-base

Benchmark results are demonstrated in the related issue: #16

Yangqing · 2014-02-07T16:02:13Z

ComputeUpdatedValue() is not a big issue when doing large networks. It is the ForwardBackward() function, and the individual layers that takes the most time. Thus, parallellizing it will not give us much gain.

sguada · 2014-02-07T17:59:15Z

Within the ForwardBackward() computation, the convolutional layers are the ones which take most of the time (see #83) therefore parallelizing the loops there will be the most effective

issue: #79

issue: BVLC#79

Cherry-pick batchnorm fixes

Yangqing closed this as completed Feb 7, 2014

shelhamer pushed a commit that referenced this issue Feb 10, 2014

Replace atlas with multithreaded OpenBLAS to speed-up on multi-core CPU

33e7141

issue: #79

shelhamer pushed a commit that referenced this issue Feb 14, 2014

Replace atlas with multithreaded OpenBLAS to speed-up on multi-core CPU

e92601d

issue: #79

shelhamer pushed a commit that referenced this issue Feb 15, 2014

Replace atlas with multithreaded OpenBLAS to speed-up on multi-core CPU

abfc399

issue: #79

shelhamer pushed a commit that referenced this issue Feb 27, 2014

Replace atlas with multithreaded OpenBLAS to speed-up on multi-core CPU

d83aa49

issue: #79

shelhamer pushed a commit that referenced this issue Mar 20, 2014

Replace atlas with multithreaded OpenBLAS to speed-up on multi-core CPU

90cb070

issue: #79

shelhamer pushed a commit that referenced this issue Mar 21, 2014

Replace atlas with multithreaded OpenBLAS to speed-up on multi-core CPU

1cf822e

issue: #79

mitmul pushed a commit to mitmul/caffe that referenced this issue Sep 30, 2014

Replace atlas with multithreaded OpenBLAS to speed-up on multi-core CPU

92be635

issue: BVLC#79

thatguymike pushed a commit to thatguymike/caffe that referenced this issue Dec 2, 2015

Merge pull request BVLC#79 from lukeyeager/nvidia/fix-batchnorm

31ee158

Cherry-pick batchnorm fixes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multithreading in the CPU mode of Solver::Solve #79

Support multithreading in the CPU mode of Solver::Solve #79

kloudkl commented Feb 7, 2014

kloudkl commented Feb 7, 2014

Yangqing commented Feb 7, 2014

sguada commented Feb 7, 2014

Support multithreading in the CPU mode of Solver::Solve #79

Support multithreading in the CPU mode of Solver::Solve #79

Comments

kloudkl commented Feb 7, 2014

kloudkl commented Feb 7, 2014

Yangqing commented Feb 7, 2014

sguada commented Feb 7, 2014