Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous solvers #5

Open
mrocklin opened this issue Jan 26, 2017 · 11 comments
Open

Asynchronous solvers #5

mrocklin opened this issue Jan 26, 2017 · 11 comments

Comments

@mrocklin
Copy link
Member

mrocklin commented Jan 26, 2017

So far all of our solvers are synchronous. They compute full results in lock-step, for example switching between performing a parallel mat-vec, then doing a line search, and then doing a mat-vec again. These algorithms are common on single-machine hardware but may not be ideal for larger clusters.

The distributed scheduler provides some decent capabilities for full asynchronous computing, which may open us up to new algorithms. Are there asynchronous variants to some of these algorithms that may interest us?

Quick example of asynchronous code:

data_futures = client.map(load_chunk, chunks)
params = {...}

futures = client.map(compute_update, random.sample(100, data_futures), **params)

ac = as_completed(futures)  # collection of running futures that yield in order of completion
for future in ac:
    update_info, score = ac.result()
    if is_good(score):
        break
    update_params(params, update_info)
    new_future = client.submit(compute_update, random.choice(data_futures), **params)
    ac.add(new_future)

@hussainsultan @mcg1969 @jcrist @moody-marlin

See Also

@mrocklin
Copy link
Member Author

In particular I was reading this paper on asynchronous sgd: https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf

It's obviously focused on deep learning applications, but thought that some of the ideas may carry over here as well. I'm unsure.

@mrocklin
Copy link
Member Author

To motivate this thinking, I'm trying to eliminate the white space in profiles like the following:

@hussainsultan
Copy link
Collaborator

this is asynchronous variant of ADMM: http://jmlr.org/proceedings/papers/v32/zhange14.pdf However, i am not sure if we need it since the global updates are the fastest (unlike line search step above).

@mrocklin
Copy link
Member Author

We may still run into straggler issues where we have to wait for a few workers to finish the last pieces of data before we can synchronize updates and broadcast them out to start work everywhere again.

However, it's not clear that this will be a performance issue in practice.

@mcg1969
Copy link

mcg1969 commented Jan 26, 2017

I would like to push back against this as a significant distraction from the core focus of this effort. We should not be moving towards a tradeoff between accuracy and speed at this point, which is exactly what a move to asynchronous approaches represent.

Besides: I'm not convinced yet that the white space above is due to fundamental limits in the algorithms. I'm waiting on some example notebooks from Chris to do some testing, but I have a suspicion there is an error in our existing design.

@mrocklin
Copy link
Member Author

I agree that the whitespace above can probably be resolved through other methods. My main goal in this issue is to establish that this is a tool in our toolbox, should it become valuable. I'm fine waiting to think about this idea until synchronization costs become a performance issue.

@mrocklin
Copy link
Member Author

I'm waiting on some example notebooks from Chris to do some testing

@mcg1969 if this is useful I do the following (also from @moody-marlin):

import dask.array as da
import numpy as np
from dask_glm.logistic import *
from dask_glm.utils import *

from distributed import Client
c = Client()

## size of problem (no. observations)
N = 1e8
chunks = 1e6
seed = 20009

X = da.random.random((N,2), chunks=chunks)
y = make_y(X, beta=np.array([-1.5, 3]), chunks=chunks)

X, y = persist(X, y)

# newton(X,y)

bfgs(X,y,tol=1e-8)

Requires master versions of dask and dask-glm

@mrocklin
Copy link
Member Author

this is asynchronous variant of ADMM: http://jmlr.org/proceedings/papers/v32/zhange14.pdf However, i am not sure if we need it since the global updates are the fastest (unlike line search step above).

That paper shows faster convergence than what you would expect just by filling in whitespace. I suspect that we accelerate a bit just because there is redundancy in the data itself, and so many-small-fast updates on partial datasets converge more quickly.

@mrocklin
Copy link
Member Author

Anyway, writing the communication side of that algorithm looks more-or-less trivial from a dask.distributed perspective. I think it would be a fun demonstration. I'm not sure what the update functions should be though.

@mrocklin
Copy link
Member Author

@moody-marlin any thoughts on solidifying the async admm solver in algorithms.py? Presumably this would need some hardening of the logic to support convergence checks and whatnot.

@cicdw
Copy link
Collaborator

cicdw commented Apr 25, 2017

Yea that sounds like a good plan; I can look into this more on Thursday / Friday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants