Asynchronous solvers #5

mrocklin · 2017-01-26T14:05:18Z

So far all of our solvers are synchronous. They compute full results in lock-step, for example switching between performing a parallel mat-vec, then doing a line search, and then doing a mat-vec again. These algorithms are common on single-machine hardware but may not be ideal for larger clusters.

The distributed scheduler provides some decent capabilities for full asynchronous computing, which may open us up to new algorithms. Are there asynchronous variants to some of these algorithms that may interest us?

Quick example of asynchronous code:

data_futures = client.map(load_chunk, chunks)
params = {...}

futures = client.map(compute_update, random.sample(100, data_futures), **params)

ac = as_completed(futures)  # collection of running futures that yield in order of completion
for future in ac:
    update_info, score = ac.result()
    if is_good(score):
        break
    update_params(params, update_info)
    new_future = client.submit(compute_update, random.choice(data_futures), **params)
    ac.add(new_future)

@hussainsultan @mcg1969 @jcrist @moody-marlin

import dask.array as da
import numpy as np
from dask_glm.logistic import *
from dask_glm.utils import *

from distributed import Client
c = Client()

## size of problem (no. observations)
N = 1e8
chunks = 1e6
seed = 20009

X = da.random.random((N,2), chunks=chunks)
y = make_y(X, beta=np.array([-1.5, 3]), chunks=chunks)

X, y = persist(X, y)

# newton(X,y)

bfgs(X,y,tol=1e-8)

Requires master versions of dask and dask-glm

mrocklin · 2017-03-12T14:24:52Z

this is asynchronous variant of ADMM: http://jmlr.org/proceedings/papers/v32/zhange14.pdf However, i am not sure if we need it since the global updates are the fastest (unlike line search step above).

That paper shows faster convergence than what you would expect just by filling in whitespace. I suspect that we accelerate a bit just because there is redundancy in the data itself, and so many-small-fast updates on partial datasets converge more quickly.

mrocklin · 2017-03-12T14:26:15Z

Anyway, writing the communication side of that algorithm looks more-or-less trivial from a dask.distributed perspective. I think it would be a fun demonstration. I'm not sure what the update functions should be though.

mrocklin · 2017-04-25T20:26:56Z

@moody-marlin any thoughts on solidifying the async admm solver in algorithms.py? Presumably this would need some hardening of the logic to support convergence checks and whatnot.

cicdw · 2017-04-25T21:44:21Z

Yea that sounds like a good plan; I can look into this more on Thursday / Friday.

This was referenced Mar 2, 2018

distributed and/or asynchronous algorithms for numeric methods #68

Closed

distributed and/or asynchronous algorithms for numeric methods dask/dask#3241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asynchronous solvers #5

Asynchronous solvers #5

mrocklin commented Jan 26, 2017 •

edited

Loading

mrocklin commented Jan 26, 2017

mrocklin commented Jan 26, 2017

hussainsultan commented Jan 26, 2017

mrocklin commented Jan 26, 2017

mcg1969 commented Jan 26, 2017

mrocklin commented Jan 26, 2017

mrocklin commented Jan 26, 2017

mrocklin commented Mar 12, 2017

mrocklin commented Mar 12, 2017

mrocklin commented Apr 25, 2017

cicdw commented Apr 25, 2017

Asynchronous solvers #5

Asynchronous solvers #5

Comments

mrocklin commented Jan 26, 2017 • edited Loading

See Also

mrocklin commented Jan 26, 2017

mrocklin commented Jan 26, 2017

hussainsultan commented Jan 26, 2017

mrocklin commented Jan 26, 2017

mcg1969 commented Jan 26, 2017

mrocklin commented Jan 26, 2017

mrocklin commented Jan 26, 2017

mrocklin commented Mar 12, 2017

mrocklin commented Mar 12, 2017

mrocklin commented Apr 25, 2017

cicdw commented Apr 25, 2017

mrocklin commented Jan 26, 2017 •

edited

Loading