Merge branches/forks #2

hussainsultan · 2017-01-19T01:28:15Z

This PR merges all the work done in forks and branches that @moody-marlin would like to retain.

bfgs
gradient descent
newtons method
proximal gradient descent
admm for l1 logistic regression problem

I went ahead and rebased from moody-marlin/dask-glm@no-api branch. I also cherry picked some commits around flake8 from moody-marlin/dask-glm@master.

Most of the code here (except some ADMM things) is not tested and tests will be added in future PRs. I tried my best to retain as much of history as possible while squashing some commits together where it made sense.

resolves #4

hussainsultan · 2017-01-19T01:55:25Z

Travis is failing - i need to investigate why tests are failing while they pass on my local machine.

mrocklin

This looks cool. I'm not yet familiar with the algorithm but hopefully the comments here on computation can be of use.

mrocklin · 2017-01-19T12:47:01Z

dask_glm/logistic.py

+
+
+def logistic_regression(X, y, alpha, rho, over_relaxation):
+    N = 5


What is this N parameter?

N is the number of equal sized chunks. This needs to be calculated with the dask array.

mrocklin · 2017-01-19T12:48:48Z

dask_glm/logistic.py

+    RELTOL = 1e-2
+
+    for k in range(MAX_ITER):
+        beta_x = y.map_blocks(local_update, X, beta_old.T, z[:, 0], u.T, rho, chunks=(5, 2)).compute()


If we are only using dask.array for map_blocks then it might also make sense to use dask.delayed. Keeping all arrays as numpy arrays (or lists of numpy arrays for X and y) might make things more explicit.

Separately, it might be nice to allow this function to be used with numpy inputs rather than require that the inputs be dask.arrays.

If using delayed, would that mean abandoning dask.array as an input? I think it makes sense but just want to make sure i am reading it correctly.

You would do something like the following:

if isinstance(X, da.Array): X = X.to_delayed()

This really doesn't matter. The two methods are equivalent. It's just a question as to what is easier for people to reason about. If you prefer dask.array then that's probably best.

mrocklin · 2017-01-19T12:49:36Z

dask_glm/logistic.py

+    return 1 / (1 + np.exp(-x))
+
+
+def logistic_loss(w, X, y):


We might consider decorating functions like this with numba.jit if available

(Although for this particular function doesn't seem to help much (presumably most of the cost is in the dot call.)

In [1]: def logistic_loss(w, X, y): ...: y = y.ravel() ...: z = X.dot(w) ...: yz = y * z ...: idx = yz > 0 ...: out = np.zeros_like(yz) ...: out[idx] = np.log(1 + np.exp(-yz[idx])) ...: out[~idx] = (-yz[~idx] + np.log(1 + np.exp(yz[~idx]))) ...: out = out.sum() ...: return out ...: In [2]: import numpy as np In [3]: X = np.random.random((1000000, 20)); y = np.random.random(1000000); w = np.random.random( ...: 20) In [4]: %time logistic_loss(w, X, y) CPU times: user 172 ms, sys: 220 ms, total: 392 ms Wall time: 105 ms Out[4]: 210159.75275204808 In [5]: import numba In [6]: fast_logistic_loss = numba.jit(logistic_loss) In [7]: %time fast_logistic_loss(w, X, y) # first time incurs compilation cost CPU times: user 324 ms, sys: 216 ms, total: 540 ms Wall time: 302 ms Out[7]: 210159.75275204808 In [8]: %time fast_logistic_loss(w, X, y) # subsequent times are faster CPU times: user 132 ms, sys: 212 ms, total: 344 ms Wall time: 90.3 ms Out[8]: 210159.75275204808 In [9]: %time fast_logistic_loss(w, X, y) # subsequent times are faster CPU times: user 132 ms, sys: 216 ms, total: 348 ms Wall time: 90.9 ms Out[9]: 210159.75275204808

I like the idea of using numba here. I would hold back on it until we understand computational bottleneck before optimizing for speed.

Yea, and we should consolidate our loss / gradient / hessian calculations. This immediately brings up API considerations, so I'll leave that for the near-future.

hussainsultan · 2017-01-19T15:22:28Z

@mrocklin thanks for the comments. This is really helpful.

mrocklin · 2017-01-25T20:46:13Z

Heh, that's quite a push :)

Note that @moody-marlin 's revert commits can now be removed. We've merged the persist PR into Dask.

Also, in moody-marlin/dask-glm@master there were some flake8 commits by an external contributor. They may have some value.

hussainsultan · 2017-01-25T20:47:08Z

@mrocklin yes! i am just making sure all test pass before i consolidate some of the commits.

…ion.

1. Keep `y` as a dask array rather than a numpy one (this may introduce a performance regression for threaded use) 2. Use dispatched `dot` rather than `ndarray.dot` in a few cases 3. Use numpy for the lstsq computation

- uses dask for x-update step in ADMM

- Flake8 fixes

…nsultan-master

hussainsultan · 2017-01-26T01:57:43Z

@moody-marlin @mrocklin could you please provide a review? All tests are passing on python 2 and 3. Thank you!

mrocklin · 2017-01-26T13:11:13Z

dask_glm/logistic.py

+        Xstep = dot(X, step)
+
+        Xbeta, gradient, func, steplen, step, Xstep = da.compute(
+            Xbeta, gradient, func, steplen, step, Xstep)


This line should be removed.

We want to avoid ever calling compute on a large result (this causes cluster-to-client communication). Instead we're starting to use persist

This is the only function that we have optimized in this way so far

good catch. i will remove it in a separate commit.

mrocklin · 2017-01-26T14:38:17Z

If we can merge this soon that would allow me to try optimizing a few other algorithms (I have some time today).

hussainsultan · 2017-01-26T14:49:15Z

@mrocklin feel free to merge it once the tests pass. Thank you!

mrocklin · 2017-01-26T14:50:24Z

I think that @moody-marlin should probably make the final call on this.

cicdw

I'm going to go ahead and merge this PR with the caveat that the ADMM code still needs further testing (for correctness and input shape / chunksize considerations), which will be my priority for the next two days so this doesn't linger too long.

cicdw · 2017-01-26T14:39:07Z

dask_glm/logistic.py

+    RELTOL = 1e-2
+
+    for k in range(MAX_ITER):
+        beta_x = y.map_blocks(local_update, X, beta_old.T, z[:, 0], u.T, rho,


I'm getting a ValueError here whenever I run this with my own data, I think possibly because the chunksize is hardcoded? To reproduce, try:

from dask_glm.utils import make_y import numpy as np X = da.random.random((1e6, 2), chunks=(1e4,2)) y = make_y(X, beta=np.array([1.5, -3.]), chunks=(1e4,1)) ## need to expand y along a new axis y = y[:, None] logistic_regression(X,y,0.1,0.1,0.1)

This can be fixed by making sure both X and y are split into 5 chunks (row-wise).

Thanks. Could we open issue for this so i can handle it in a separate PR? Thank you

cicdw · 2017-01-26T14:42:31Z

dask_glm/logistic.py

+    return beta
+
+
+def logistic_regression(X, y, alpha, rho, over_relaxation):


Right now it appears your function accepts a y dask array whose shape is (X.shape[0], 1) whereas the other algorithms are currently implemented accepting y with shape (X.shape[0],). Maybe just add a line right in the beginning y = y[:, None] so the convention agrees and your code runs as-is. We can dig deeper into this later.

cicdw · 2017-01-26T15:06:56Z

dask_glm/logistic.py

+    return 1 / (1 + np.exp(-x))
+
+
+def logistic_loss(w, X, y):


Yea, and we should consolidate our loss / gradient / hessian calculations. This immediately brings up API considerations, so I'll leave that for the near-future.

sync with upstream

mrocklin reviewed Jan 19, 2017

View reviewed changes

mrocklin and others added 16 commits January 25, 2017 16:09

Update travis.yml (#1)

0133780

Added backtracking line search tests.

ebc74b7

Reorganized code into base.py and models.py

b53a3c9

Changed the algorithm API slightly. Series based tests still can fail.

401ebf4

Refactored to standalone optimization algorithms for Logistic Regress…

885cde0

…ion.

Added proximal gradient method.

ed681ff

Added a function for creating some logistic output.

bee0d7a

Edited some default settings.

5fc17a7

Performance tweaks

f63c37e

1. Keep `y` as a dask array rather than a numpy one (this may introduce a performance regression for threaded use) 2. Use dispatched `dot` rather than `ndarray.dot` in a few cases 3. Use numpy for the lstsq computation

Separated out line search, used numba for log likelihood.

bfcf708

use persist function from dask

dfe55eb

Add initial implementation of logistic regression with l1 penalty

4d02925

- uses dask for x-update step in ADMM

Add dask bleeding edge version

04247a8

- Flake8 fixes

Add dask bleeding edge version

611cd21

Merge branch 'master' of https://github.com/dask/dask-glm into hussai…

f788301

…nsultan-master

Merge branch 'hussainsultan-master'

cb0f41e

hussainsultan changed the title ~~WIP: Add initial implementation of logistic regression with l1 penalty~~ WIP: Merge branches/forks Jan 26, 2017

Add flake8 compatability

989ec80

mrocklin reviewed Jan 26, 2017

View reviewed changes

Remove compute from bfgs

7950204

hussainsultan changed the title ~~WIP: Merge branches/forks~~ Merge branches/forks Jan 26, 2017

cicdw approved these changes Jan 26, 2017

View reviewed changes

cicdw merged commit be8fb6f into dask:master Jan 26, 2017

cicdw mentioned this pull request Jan 26, 2017

ADMM shape changes #8

Closed

TomAugspurger pushed a commit that referenced this pull request Sep 25, 2020

Merge pull request #2 from dask/master

81258fd

sync with upstream

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge branches/forks #2

Merge branches/forks #2

hussainsultan commented Jan 19, 2017 •

edited

Loading

hussainsultan commented Jan 19, 2017

mrocklin left a comment

mrocklin Jan 19, 2017

hussainsultan Jan 19, 2017

mrocklin Jan 19, 2017

mrocklin Jan 19, 2017

hussainsultan Jan 19, 2017

mrocklin Jan 19, 2017

mrocklin Jan 19, 2017

mrocklin Jan 19, 2017

mrocklin Jan 19, 2017

hussainsultan Jan 19, 2017

cicdw Jan 26, 2017

hussainsultan commented Jan 19, 2017

mrocklin commented Jan 25, 2017

hussainsultan commented Jan 25, 2017

hussainsultan commented Jan 26, 2017

mrocklin Jan 26, 2017

mrocklin Jan 26, 2017

mrocklin Jan 26, 2017

hussainsultan Jan 26, 2017

mrocklin commented Jan 26, 2017

hussainsultan commented Jan 26, 2017

mrocklin commented Jan 26, 2017

cicdw left a comment

cicdw Jan 26, 2017

hussainsultan Jan 26, 2017

cicdw Jan 26, 2017

cicdw Jan 26, 2017

cicdw Jan 26, 2017



		def logistic_regression(X, y, alpha, rho, over_relaxation):
		N = 5

		return beta


		def logistic_regression(X, y, alpha, rho, over_relaxation):

Merge branches/forks #2

Merge branches/forks #2

Conversation

hussainsultan commented Jan 19, 2017 • edited Loading

hussainsultan commented Jan 19, 2017

mrocklin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hussainsultan commented Jan 19, 2017

mrocklin commented Jan 25, 2017

hussainsultan commented Jan 25, 2017

hussainsultan commented Jan 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrocklin commented Jan 26, 2017

hussainsultan commented Jan 26, 2017

mrocklin commented Jan 26, 2017

cicdw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hussainsultan commented Jan 19, 2017 •

edited

Loading