Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threading particle swarm #735

Closed
wants to merge 3 commits into from

Conversation

tbeason
Copy link

@tbeason tbeason commented Aug 13, 2019

I added Threads.@threads in two spots: compute_cost! and limit_X!. The speedup on my particular problem was approximately 3x going from 1 thread to 6, the bulk of which comes from parallel execution of the cost function.

The user does not have to do anything additional, just needs to start up julia with multiple threads.

I have not run any further tests, and I am not sure how well threading would work on every problem. If there is IO or something else funky inside the cost function (calls to a RNG?), there could be an issue with multi-threading (from what I read on the recent blog post on the homepage).

I am on julia v1.3.0-alpha.

…icles within bounds. speedup on my particular problem was at least 2x from 1 thread to 6.
@ChrisRackauckas
Copy link
Contributor

I think this is the wrong way to go. Instead, I think the interface should be changed so that the user gives a function f!(dx,x) which computes the loss at multiple points. That way if they want it multithreaded, GPU'd, etc. they can do it. Building it all into the package will never make everything work.

@tbeason
Copy link
Author

tbeason commented Aug 13, 2019

That sounds nice but it also sounds like a major change. I just wanted to implement something here that can at least get this more on par with solvers in languages like MATLAB where this is already possible.

Like I said, I didn't test it on any problems other than mine, but it seems like it should work fine as long as the cost function is relatively standalone.

@antoine-levitt
Copy link
Contributor

Also tricky because most optimizers don't support multiple evaluations at the same time, which would make it sort of weird to change the API for this single particular case. In general there's an explosion in the properties of the objective functions (differentiable, in place, multiple evaluations) that's pretty nasty to handle...

Short term I think this feature could be useful, provided it's toggled by a flag (also objective functions that have threaded BLAS calls inside them should not be threaded, at least for now)

@codecov
Copy link

codecov bot commented Aug 13, 2019

Codecov Report

Merging #735 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #735      +/-   ##
==========================================
- Coverage   81.69%   81.64%   -0.05%     
==========================================
  Files          43       43              
  Lines        2414     2414              
==========================================
- Hits         1972     1971       -1     
- Misses        442      443       +1
Impacted Files Coverage Δ
...ultivariate/solvers/zeroth_order/particle_swarm.jl 98.21% <100%> (ø) ⬆️
src/multivariate/solvers/constrained/samin.jl 75.55% <0%> (-0.75%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40e939b...92aa083. Read the comment docs.

@codecov
Copy link

codecov bot commented Aug 13, 2019

Codecov Report

Merging #735 into master will decrease coverage by 0.16%.
The diff coverage is 42.85%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #735      +/-   ##
==========================================
- Coverage   81.69%   81.52%   -0.17%     
==========================================
  Files          43       43              
  Lines        2414     2419       +5     
==========================================
  Hits         1972     1972              
- Misses        442      447       +5
Impacted Files Coverage Δ
...ultivariate/solvers/zeroth_order/particle_swarm.jl 96.5% <42.85%> (-1.71%) ⬇️
src/multivariate/solvers/constrained/samin.jl 75.55% <0%> (-0.75%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40e939b...bb258a6. Read the comment docs.

@tbeason
Copy link
Author

tbeason commented Aug 13, 2019

I can add a user supplied threaded::Bool option to the ParticleSwarm type/constructor and a note in the docs tomorrow morning.

@tbeason
Copy link
Author

tbeason commented Aug 14, 2019

I added a user-supplied flag to enable the multi-threading, which when enabled just calls the threaded version of the compute_cost! function. I updated the documentation. I removed the threading in the limit_X! function because it would likely only provide noticeable benefit if both the number of parameters and the number of particles were very large.

There is one outstanding issue with this that I've noticed and it is that the tracking of the total number of function calls is off. When run single-threaded, the number of function calls seems to be (N+1)*P + 1 where N is the number of iterations and P is the number of particles. With threading enabled, it always reports something just slightly less. I snooped around a bit and it seems this comes more from some of the internals of Optim which is a bit beyond me. If someone else could take a look that would be helpful.

@antoine-levitt
Copy link
Contributor

Hm, that's right. The easy way out is to make the add atomic in the counter incrementation. That's in NLSolversBase, in the objective function wrappers.

@pkofod
Copy link
Member

pkofod commented Aug 15, 2019

We discussed this on slack so I just want to note that I've seen it. I'm still digesting the change and comments.

For some things @ChrisRackauckas's approach can be nice, but in other cases what's in here is what you need. I think we may just have different modes if we want to support this. As @antoine-levitt mentions, if it's threading you're after then many times you can take advantage of all your available threads in the objective function yourself. Sometimes the objective function is inherently serial, so it may only be possible to thread at the f call level. Other times, @ChrisRackauckas is exactly right. I know his use case is that it can be beneficial to collect N, where N is big, different x's to evaluate, and then send them off to a compute node that efficiently handles big batches of solves. So we're really after some ParallelMode subtypes (or symbols, or whatever) to control various modes of parallellization. Merging this PR doesn't exclude us from experimenting in the future, but it's totally correct that Chris' need is the hardest to accommodate in terms of rewriting Optim.

@pkofod
Copy link
Member

pkofod commented Aug 16, 2019

need is the hardest to accommodate in terms of rewriting Optim.

so I have a proto type of this, and it wasn't hard to do at the PSO level, but it won't really play nice with NDifferentiable types. This is the "hardest part" I was talking about.

@pkofod
Copy link
Member

pkofod commented Aug 16, 2019

What about something like

@enum ParallelMode Serial Threading Distributed Batch

?

Serial is what it is now, Threading is a @threads for loop, Distributed is a pmap over an X which contains the points to evaluate as elements, Batch is what Chris described.

@pkofod
Copy link
Member

pkofod commented Mar 10, 2020

The new PSO will support this. But thanks for bringing this up! :)

@pkofod pkofod closed this Mar 10, 2020
@pkofod
Copy link
Member

pkofod commented Nov 14, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants