Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branches/forks #2

Merged
merged 18 commits into from
Jan 26, 2017
Merged

Merge branches/forks #2

merged 18 commits into from
Jan 26, 2017

Conversation

hussainsultan
Copy link
Collaborator

@hussainsultan hussainsultan commented Jan 19, 2017

This PR merges all the work done in forks and branches that @moody-marlin would like to retain.

  • bfgs
  • gradient descent
  • newtons method
  • proximal gradient descent
  • admm for l1 logistic regression problem

I went ahead and rebased from moody-marlin/dask-glm@no-api branch. I also cherry picked some commits around flake8 from moody-marlin/dask-glm@master.

Most of the code here (except some ADMM things) is not tested and tests will be added in future PRs. I tried my best to retain as much of history as possible while squashing some commits together where it made sense.

resolves #4

@hussainsultan
Copy link
Collaborator Author

Travis is failing - i need to investigate why tests are failing while they pass on my local machine.

Copy link
Member

@mrocklin mrocklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks cool. I'm not yet familiar with the algorithm but hopefully the comments here on computation can be of use.



def logistic_regression(X, y, alpha, rho, over_relaxation):
N = 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this N parameter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N is the number of equal sized chunks. This needs to be calculated with the dask array.

RELTOL = 1e-2

for k in range(MAX_ITER):
beta_x = y.map_blocks(local_update, X, beta_old.T, z[:, 0], u.T, rho, chunks=(5, 2)).compute()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are only using dask.array for map_blocks then it might also make sense to use dask.delayed. Keeping all arrays as numpy arrays (or lists of numpy arrays for X and y) might make things more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, it might be nice to allow this function to be used with numpy inputs rather than require that the inputs be dask.arrays.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using delayed, would that mean abandoning dask.array as an input? I think it makes sense but just want to make sure i am reading it correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would do something like the following:

if isinstance(X, da.Array):
    X = X.to_delayed()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really doesn't matter. The two methods are equivalent. It's just a question as to what is easier for people to reason about. If you prefer dask.array then that's probably best.

return 1 / (1 + np.exp(-x))


def logistic_loss(w, X, y):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might consider decorating functions like this with numba.jit if available

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Although for this particular function doesn't seem to help much (presumably most of the cost is in the dot call.)

In [1]: def logistic_loss(w, X, y):
   ...:     y = y.ravel()
   ...:     z = X.dot(w)
   ...:     yz = y * z
   ...:     idx = yz > 0
   ...:     out = np.zeros_like(yz)
   ...:     out[idx] = np.log(1 + np.exp(-yz[idx]))
   ...:     out[~idx] = (-yz[~idx] + np.log(1 + np.exp(yz[~idx])))
   ...:     out = out.sum()
   ...:     return out
   ...: 

In [2]: import numpy as np

In [3]: X = np.random.random((1000000, 20)); y = np.random.random(1000000); w = np.random.random(
   ...: 20)

In [4]: %time logistic_loss(w, X, y)
CPU times: user 172 ms, sys: 220 ms, total: 392 ms
Wall time: 105 ms
Out[4]: 210159.75275204808

In [5]: import numba

In [6]: fast_logistic_loss = numba.jit(logistic_loss)

In [7]: %time fast_logistic_loss(w, X, y)  # first time incurs compilation cost
CPU times: user 324 ms, sys: 216 ms, total: 540 ms
Wall time: 302 ms
Out[7]: 210159.75275204808

In [8]: %time fast_logistic_loss(w, X, y)  # subsequent times are faster
CPU times: user 132 ms, sys: 212 ms, total: 344 ms
Wall time: 90.3 ms
Out[8]: 210159.75275204808

In [9]: %time fast_logistic_loss(w, X, y)  # subsequent times are faster
CPU times: user 132 ms, sys: 216 ms, total: 348 ms
Wall time: 90.9 ms
Out[9]: 210159.75275204808

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using numba here. I would hold back on it until we understand computational bottleneck before optimizing for speed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, and we should consolidate our loss / gradient / hessian calculations. This immediately brings up API considerations, so I'll leave that for the near-future.

@hussainsultan
Copy link
Collaborator Author

@mrocklin thanks for the comments. This is really helpful.

@mrocklin
Copy link
Member

Heh, that's quite a push :)

Note that @moody-marlin 's revert commits can now be removed. We've merged the persist PR into Dask.

Also, in moody-marlin/dask-glm@master there were some flake8 commits by an external contributor. They may have some value.

@hussainsultan
Copy link
Collaborator Author

@mrocklin yes! i am just making sure all test pass before i consolidate some of the commits.

@hussainsultan hussainsultan changed the title WIP: Add initial implementation of logistic regression with l1 penalty WIP: Merge branches/forks Jan 26, 2017
@hussainsultan
Copy link
Collaborator Author

@moody-marlin @mrocklin could you please provide a review? All tests are passing on python 2 and 3. Thank you!

Xstep = dot(X, step)

Xbeta, gradient, func, steplen, step, Xstep = da.compute(
Xbeta, gradient, func, steplen, step, Xstep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to avoid ever calling compute on a large result (this causes cluster-to-client communication). Instead we're starting to use persist

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only function that we have optimized in this way so far

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. i will remove it in a separate commit.

@mrocklin
Copy link
Member

If we can merge this soon that would allow me to try optimizing a few other algorithms (I have some time today).

@hussainsultan
Copy link
Collaborator Author

@mrocklin feel free to merge it once the tests pass. Thank you!

@mrocklin
Copy link
Member

I think that @moody-marlin should probably make the final call on this.

@hussainsultan hussainsultan changed the title WIP: Merge branches/forks Merge branches/forks Jan 26, 2017
Copy link
Collaborator

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to go ahead and merge this PR with the caveat that the ADMM code still needs further testing (for correctness and input shape / chunksize considerations), which will be my priority for the next two days so this doesn't linger too long.

RELTOL = 1e-2

for k in range(MAX_ITER):
beta_x = y.map_blocks(local_update, X, beta_old.T, z[:, 0], u.T, rho,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a ValueError here whenever I run this with my own data, I think possibly because the chunksize is hardcoded? To reproduce, try:

from dask_glm.utils import make_y
import numpy as np

X = da.random.random((1e6, 2), chunks=(1e4,2))
y = make_y(X, beta=np.array([1.5, -3.]), chunks=(1e4,1))

## need to expand y along a new axis
y = y[:, None]
logistic_regression(X,y,0.1,0.1,0.1)

This can be fixed by making sure both X and y are split into 5 chunks (row-wise).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Could we open issue for this so i can handle it in a separate PR? Thank you

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #8

return beta


def logistic_regression(X, y, alpha, rho, over_relaxation):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it appears your function accepts a y dask array whose shape is (X.shape[0], 1) whereas the other algorithms are currently implemented accepting y with shape (X.shape[0],). Maybe just add a line right in the beginning y = y[:, None] so the convention agrees and your code runs as-is. We can dig deeper into this later.

return 1 / (1 + np.exp(-x))


def logistic_loss(w, X, y):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, and we should consolidate our loss / gradient / hessian calculations. This immediately brings up API considerations, so I'll leave that for the near-future.

@cicdw cicdw merged commit be8fb6f into dask:master Jan 26, 2017
@cicdw cicdw mentioned this pull request Jan 26, 2017
TomAugspurger pushed a commit that referenced this pull request Sep 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Merge forks/branches
3 participants