-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asynchronous solvers #5
Comments
In particular I was reading this paper on asynchronous sgd: https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf It's obviously focused on deep learning applications, but thought that some of the ideas may carry over here as well. I'm unsure. |
this is asynchronous variant of ADMM: http://jmlr.org/proceedings/papers/v32/zhange14.pdf However, i am not sure if we need it since the global updates are the fastest (unlike line search step above). |
We may still run into straggler issues where we have to wait for a few workers to finish the last pieces of data before we can synchronize updates and broadcast them out to start work everywhere again. However, it's not clear that this will be a performance issue in practice. |
I would like to push back against this as a significant distraction from the core focus of this effort. We should not be moving towards a tradeoff between accuracy and speed at this point, which is exactly what a move to asynchronous approaches represent. Besides: I'm not convinced yet that the white space above is due to fundamental limits in the algorithms. I'm waiting on some example notebooks from Chris to do some testing, but I have a suspicion there is an error in our existing design. |
I agree that the whitespace above can probably be resolved through other methods. My main goal in this issue is to establish that this is a tool in our toolbox, should it become valuable. I'm fine waiting to think about this idea until synchronization costs become a performance issue. |
@mcg1969 if this is useful I do the following (also from @moody-marlin): import dask.array as da
import numpy as np
from dask_glm.logistic import *
from dask_glm.utils import *
from distributed import Client
c = Client()
## size of problem (no. observations)
N = 1e8
chunks = 1e6
seed = 20009
X = da.random.random((N,2), chunks=chunks)
y = make_y(X, beta=np.array([-1.5, 3]), chunks=chunks)
X, y = persist(X, y)
# newton(X,y)
bfgs(X,y,tol=1e-8) Requires master versions of dask and dask-glm |
That paper shows faster convergence than what you would expect just by filling in whitespace. I suspect that we accelerate a bit just because there is redundancy in the data itself, and so many-small-fast updates on partial datasets converge more quickly. |
Anyway, writing the communication side of that algorithm looks more-or-less trivial from a dask.distributed perspective. I think it would be a fun demonstration. I'm not sure what the update functions should be though. |
@moody-marlin any thoughts on solidifying the async admm solver in algorithms.py? Presumably this would need some hardening of the logic to support convergence checks and whatnot. |
Yea that sounds like a good plan; I can look into this more on Thursday / Friday. |
So far all of our solvers are synchronous. They compute full results in lock-step, for example switching between performing a parallel mat-vec, then doing a line search, and then doing a mat-vec again. These algorithms are common on single-machine hardware but may not be ideal for larger clusters.
The distributed scheduler provides some decent capabilities for full asynchronous computing, which may open us up to new algorithms. Are there asynchronous variants to some of these algorithms that may interest us?
Quick example of asynchronous code:
@hussainsultan @mcg1969 @jcrist @moody-marlin
See Also
The text was updated successfully, but these errors were encountered: