-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge branches/forks #2
Conversation
Travis is failing - i need to investigate why tests are failing while they pass on my local machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks cool. I'm not yet familiar with the algorithm but hopefully the comments here on computation can be of use.
|
||
|
||
def logistic_regression(X, y, alpha, rho, over_relaxation): | ||
N = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this N
parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N is the number of equal sized chunks. This needs to be calculated with the dask array.
RELTOL = 1e-2 | ||
|
||
for k in range(MAX_ITER): | ||
beta_x = y.map_blocks(local_update, X, beta_old.T, z[:, 0], u.T, rho, chunks=(5, 2)).compute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are only using dask.array for map_blocks
then it might also make sense to use dask.delayed. Keeping all arrays as numpy arrays (or lists of numpy arrays for X
and y
) might make things more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately, it might be nice to allow this function to be used with numpy inputs rather than require that the inputs be dask.arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using delayed, would that mean abandoning dask.array as an input? I think it makes sense but just want to make sure i am reading it correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would do something like the following:
if isinstance(X, da.Array):
X = X.to_delayed()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really doesn't matter. The two methods are equivalent. It's just a question as to what is easier for people to reason about. If you prefer dask.array then that's probably best.
return 1 / (1 + np.exp(-x)) | ||
|
||
|
||
def logistic_loss(w, X, y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might consider decorating functions like this with numba.jit
if available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Although for this particular function doesn't seem to help much (presumably most of the cost is in the dot
call.)
In [1]: def logistic_loss(w, X, y):
...: y = y.ravel()
...: z = X.dot(w)
...: yz = y * z
...: idx = yz > 0
...: out = np.zeros_like(yz)
...: out[idx] = np.log(1 + np.exp(-yz[idx]))
...: out[~idx] = (-yz[~idx] + np.log(1 + np.exp(yz[~idx])))
...: out = out.sum()
...: return out
...:
In [2]: import numpy as np
In [3]: X = np.random.random((1000000, 20)); y = np.random.random(1000000); w = np.random.random(
...: 20)
In [4]: %time logistic_loss(w, X, y)
CPU times: user 172 ms, sys: 220 ms, total: 392 ms
Wall time: 105 ms
Out[4]: 210159.75275204808
In [5]: import numba
In [6]: fast_logistic_loss = numba.jit(logistic_loss)
In [7]: %time fast_logistic_loss(w, X, y) # first time incurs compilation cost
CPU times: user 324 ms, sys: 216 ms, total: 540 ms
Wall time: 302 ms
Out[7]: 210159.75275204808
In [8]: %time fast_logistic_loss(w, X, y) # subsequent times are faster
CPU times: user 132 ms, sys: 212 ms, total: 344 ms
Wall time: 90.3 ms
Out[8]: 210159.75275204808
In [9]: %time fast_logistic_loss(w, X, y) # subsequent times are faster
CPU times: user 132 ms, sys: 216 ms, total: 348 ms
Wall time: 90.9 ms
Out[9]: 210159.75275204808
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of using numba here. I would hold back on it until we understand computational bottleneck before optimizing for speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, and we should consolidate our loss / gradient / hessian calculations. This immediately brings up API considerations, so I'll leave that for the near-future.
@mrocklin thanks for the comments. This is really helpful. |
Heh, that's quite a push :) Note that @moody-marlin 's revert commits can now be removed. We've merged the persist PR into Dask. Also, in moody-marlin/dask-glm@master there were some flake8 commits by an external contributor. They may have some value. |
@mrocklin yes! i am just making sure all test pass before i consolidate some of the commits. |
1. Keep `y` as a dask array rather than a numpy one (this may introduce a performance regression for threaded use) 2. Use dispatched `dot` rather than `ndarray.dot` in a few cases 3. Use numpy for the lstsq computation
- uses dask for x-update step in ADMM
- Flake8 fixes
…nsultan-master
@moody-marlin @mrocklin could you please provide a review? All tests are passing on python 2 and 3. Thank you! |
Xstep = dot(X, step) | ||
|
||
Xbeta, gradient, func, steplen, step, Xstep = da.compute( | ||
Xbeta, gradient, func, steplen, step, Xstep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to avoid ever calling compute
on a large result (this causes cluster-to-client communication). Instead we're starting to use persist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only function that we have optimized in this way so far
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. i will remove it in a separate commit.
If we can merge this soon that would allow me to try optimizing a few other algorithms (I have some time today). |
@mrocklin feel free to merge it once the tests pass. Thank you! |
I think that @moody-marlin should probably make the final call on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to go ahead and merge this PR with the caveat that the ADMM code still needs further testing (for correctness and input shape / chunksize considerations), which will be my priority for the next two days so this doesn't linger too long.
RELTOL = 1e-2 | ||
|
||
for k in range(MAX_ITER): | ||
beta_x = y.map_blocks(local_update, X, beta_old.T, z[:, 0], u.T, rho, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting a ValueError
here whenever I run this with my own data, I think possibly because the chunksize
is hardcoded? To reproduce, try:
from dask_glm.utils import make_y
import numpy as np
X = da.random.random((1e6, 2), chunks=(1e4,2))
y = make_y(X, beta=np.array([1.5, -3.]), chunks=(1e4,1))
## need to expand y along a new axis
y = y[:, None]
logistic_regression(X,y,0.1,0.1,0.1)
This can be fixed by making sure both X
and y
are split into 5 chunks (row-wise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Could we open issue for this so i can handle it in a separate PR? Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #8
return beta | ||
|
||
|
||
def logistic_regression(X, y, alpha, rho, over_relaxation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now it appears your function accepts a y
dask array whose shape is (X.shape[0], 1)
whereas the other algorithms are currently implemented accepting y
with shape (X.shape[0],)
. Maybe just add a line right in the beginning y = y[:, None]
so the convention agrees and your code runs as-is. We can dig deeper into this later.
return 1 / (1 + np.exp(-x)) | ||
|
||
|
||
def logistic_loss(w, X, y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, and we should consolidate our loss / gradient / hessian calculations. This immediately brings up API considerations, so I'll leave that for the near-future.
This PR merges all the work done in forks and branches that @moody-marlin would like to retain.
I went ahead and rebased from
moody-marlin/dask-glm@no-api
branch. I also cherry picked some commits around flake8 frommoody-marlin/dask-glm@master
.Most of the code here (except some ADMM things) is not tested and tests will be added in future PRs. I tried my best to retain as much of history as possible while squashing some commits together where it made sense.
resolves #4