Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multithreading in the CPU mode of Solver::Solve #79

Closed
kloudkl opened this issue Feb 7, 2014 · 3 comments
Closed

Support multithreading in the CPU mode of Solver::Solve #79

kloudkl opened this issue Feb 7, 2014 · 3 comments

Comments

@kloudkl
Copy link
Contributor

kloudkl commented Feb 7, 2014

In each iteration of Solver::Solve, there are four chances to accelerate the computation.
The first opportunity is the most complex one since Net::ForwardBackward invokes the Forward and Backward of all the layers that comprise a net.

      Dtype loss = net_->ForwardBackward(bottom_vec);

The second chance is more straightforward. An OpenMP directive is enough to parallelize the independent computation for each param_id.

      ComputeUpdateValue();

The only extra trick that is needed to deal with the next occasion is to distinguish CPU and GPU mode.

      net_->Update();

The last one involves a plain old OpenMP friendly nested for loop.

      Test();
@kloudkl
Copy link
Contributor Author

kloudkl commented Feb 7, 2014

The codes do not have to be changed. Thanks to Michael Rutter, multi-threaded OpenBLAS package is available on all the versions of Ubuntu since Precise (12.04).
Multi-threaded OpenBLAS backported to recent Ubuntu releases
Follow the steps to take advantage of the powerful acceleration:

sudo add-apt-repository ppa:marutter/rdev
sudo apt-get update
sudo apt-get install libopenblas-base

Benchmark results are demonstrated in the related issue: #16

@Yangqing
Copy link
Member

Yangqing commented Feb 7, 2014

ComputeUpdatedValue() is not a big issue when doing large networks. It is the ForwardBackward() function, and the individual layers that takes the most time. Thus, parallellizing it will not give us much gain.

@sguada
Copy link
Contributor

sguada commented Feb 7, 2014

Within the ForwardBackward() computation, the convolutional layers are the ones which take most of the time (see #83) therefore parallelizing the loops there will be the most effective

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants