Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cholesky fails for sparse matrices #5373

Open
TomMaullin opened this issue Sep 8, 2019 · 9 comments
Open

Cholesky fails for sparse matrices #5373

TomMaullin opened this issue Sep 8, 2019 · 9 comments
Labels

Comments

@TomMaullin
Copy link

Hi I am trying to run a cholesky decomposition on a Sparse matrix like so:

# Make a random positive definite sparse matrix in numpy
X = np.random.randn(100,100)
X[X<1.7]=0
X = X + np.eye(100)
XtX = np.matmul(X.transpose(), X)

# Map it to a dask sparse matrix
XtX_da = da.from_array(XtX, chunks=(10, 10)).map_blocks(sparse.COO).rechunk('auto')

# Perform cholesky decomposition
L_da_chol = da.linalg.cholesky(XtX_da).compute()

But I get the following error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-23-8ac43cb2d1f0> in <module>()
     66 
     67     t1 = time.time()
---> 68     L_da_chol = da.linalg.cholesky(XtX_da).compute()
     69     t2 = time.time()
     70     daptime=t2-t1

14 frames
/usr/local/lib/python3.6/dist-packages/dask/base.py in compute(self, **kwargs)
    154         dask.base.compute
    155         """
--> 156         (result,) = compute(self, traverse=False, **kwargs)
    157         return result
    158 

/usr/local/lib/python3.6/dist-packages/dask/base.py in compute(*args, **kwargs)
    396     keys = [x.__dask_keys__() for x in collections]
    397     postcomputes = [x.__dask_postcompute__() for x in collections]
--> 398     results = schedule(dsk, keys, **kwargs)
    399     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    400 

/usr/local/lib/python3.6/dist-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
     74     results = get_async(pool.apply_async, len(pool._pool), dsk, result,
     75                         cache=cache, get_id=_thread_get_id,
---> 76                         pack_exception=pack_exception, **kwargs)
     77 
     78     # Cleanup pools associated to dead threads

/usr/local/lib/python3.6/dist-packages/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs)
    460                         _execute_task(task, data)  # Re-execute locally
    461                     else:
--> 462                         raise_exception(exc, tb)
    463                 res, worker_id = loads(res_info)
    464                 state['cache'][key] = res

/usr/local/lib/python3.6/dist-packages/dask/compatibility.py in reraise(exc, tb)
    110         if exc.__traceback__ is not tb:
    111             raise exc.with_traceback(tb)
--> 112         raise exc
    113 
    114     import pickle as cPickle

/usr/local/lib/python3.6/dist-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    228     try:
    229         task, data = loads(task_info)
--> 230         result = _execute_task(task, data)
    231         id = get_id()
    232         result = dumps((result, id))

/usr/local/lib/python3.6/dist-packages/dask/core.py in _execute_task(arg, cache, dsk)
    116     elif istask(arg):
    117         func, args = arg[0], arg[1:]
--> 118         args2 = [_execute_task(a, cache) for a in args]
    119         return func(*args2)
    120     elif not ishashable(arg):

/usr/local/lib/python3.6/dist-packages/dask/core.py in <listcomp>(.0)
    116     elif istask(arg):
    117         func, args = arg[0], arg[1:]
--> 118         args2 = [_execute_task(a, cache) for a in args]
    119         return func(*args2)
    120     elif not ishashable(arg):

/usr/local/lib/python3.6/dist-packages/dask/core.py in _execute_task(arg, cache, dsk)
    117         func, args = arg[0], arg[1:]
    118         args2 = [_execute_task(a, cache) for a in args]
--> 119         return func(*args2)
    120     elif not ishashable(arg):
    121         return arg

/usr/local/lib/python3.6/dist-packages/dask/array/linalg.py in _cholesky_lower(a)
    940 def _cholesky_lower(a):
    941     import scipy.linalg
--> 942     return scipy.linalg.cholesky(a, lower=True)
    943 
    944 

/usr/local/lib/python3.6/dist-packages/scipy/linalg/decomp_cholesky.py in cholesky(a, lower, overwrite_a, check_finite)
     89     """
     90     c, lower = _cholesky(a, lower=lower, overwrite_a=overwrite_a, clean=True,
---> 91                          check_finite=check_finite)
     92     return c
     93 

/usr/local/lib/python3.6/dist-packages/scipy/linalg/decomp_cholesky.py in _cholesky(a, lower, overwrite_a, clean, check_finite)
     17     """Common code for cholesky() and cho_factor()."""
     18 
---> 19     a1 = asarray_chkfinite(a) if check_finite else asarray(a)
     20     a1 = atleast_2d(a1)
     21 

/usr/local/lib/python3.6/dist-packages/numpy/lib/function_base.py in asarray_chkfinite(a, dtype, order)
    493 
    494     """
--> 495     a = asarray(a, dtype=dtype, order=order)
    496     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
    497         raise ValueError(

/usr/local/lib/python3.6/dist-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    536 
    537     """
--> 538     return array(a, dtype, copy=False, order=order)
    539 
    540 

/usr/local/lib/python3.6/dist-packages/sparse/_sparse_array.py in __array__(self, **kwargs)
    210         from ._settings import AUTO_DENSIFY
    211         if not AUTO_DENSIFY:
--> 212             raise RuntimeError('Cannot convert a sparse array to dense automatically. '
    213                                'To manually densify, use the todense method.')
    214 

RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method.

This was run on a CPU in a google colab notebook; any help on resolving this would be greatly appreciated... ideally I'm hoping to find a solution that at least allows some of the computation to be done using sparse operations as time efficiency is a big factor in what I am working on (the above is a toy example... my actual work uses much larger matrices).

@TomAugspurger
Copy link
Member

I'm not aware of a cholesky decomposition implementation that works with sparse arrays.

@TomMaullin
Copy link
Author

Update:

Apologies this works if I set the following:

import os
os.environ["SPARSE_AUTO_DENSIFY"] = "1"

@TomMaullin
Copy link
Author

TomMaullin commented Sep 8, 2019

I believe the package cvxoptdoes a sparse cholesky decomposition but this uses their own sparse datatype.

@mrocklin
Copy link
Member

mrocklin commented Sep 8, 2019 via email

@TomAugspurger
Copy link
Member

So if pydata/sparse implements a sparse Cholesky decomposition, what will dask need to do to use it? Right now we call scipy.linalg.cholesky. We would need to dispatch to pydata/sparse's based no the type of the concrete array? If so we can leave this issue open, but it's blocked until there's a sparse implementation we can use.

@mrocklin
Copy link
Member

If that happens (and it seems like it might be slightly far away) then I wonder if SciPy might be interested in adopting NEP-18 support. cc @rgommers

@rgommers
Copy link
Contributor

We are at least interested in exploring it: http://scipy.github.io/devdocs/roadmap.html#support-for-distributed-arrays-and-gpu-arrays

@MickaelRigault
Copy link

Hi, I don't know if it helps but https://scikit-sparse.readthedocs.io/en/latest/cholmod.html is an implementation of a Sparse Cholesky decomposition (cython). We use it quite a lot. If it could be natively implemented in dask, that would be very useful.

@jsignell
Copy link
Member

If you are interested in taking this on @MickaelRigault a pull request would be welcome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants