Cholesky fails for sparse matrices #5373

TomMaullin · 2019-09-08T15:22:49Z

Hi I am trying to run a cholesky decomposition on a Sparse matrix like so:

# Make a random positive definite sparse matrix in numpy
X = np.random.randn(100,100)
X[X<1.7]=0
X = X + np.eye(100)
XtX = np.matmul(X.transpose(), X)

# Map it to a dask sparse matrix
XtX_da = da.from_array(XtX, chunks=(10, 10)).map_blocks(sparse.COO).rechunk('auto')

# Perform cholesky decomposition
L_da_chol = da.linalg.cholesky(XtX_da).compute()

But I get the following error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-23-8ac43cb2d1f0> in <module>()
     66 
     67     t1 = time.time()
---> 68     L_da_chol = da.linalg.cholesky(XtX_da).compute()
     69     t2 = time.time()
     70     daptime=t2-t1

14 frames
/usr/local/lib/python3.6/dist-packages/dask/base.py in compute(self, **kwargs)
    154         dask.base.compute
    155         """
--> 156         (result,) = compute(self, traverse=False, **kwargs)
    157         return result
    158 

/usr/local/lib/python3.6/dist-packages/dask/base.py in compute(*args, **kwargs)
    396     keys = [x.__dask_keys__() for x in collections]
    397     postcomputes = [x.__dask_postcompute__() for x in collections]
--> 398     results = schedule(dsk, keys, **kwargs)
    399     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    400 

/usr/local/lib/python3.6/dist-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
     74     results = get_async(pool.apply_async, len(pool._pool), dsk, result,
     75                         cache=cache, get_id=_thread_get_id,
---> 76                         pack_exception=pack_exception, **kwargs)
     77 
     78     # Cleanup pools associated to dead threads

/usr/local/lib/python3.6/dist-packages/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs)
    460                         _execute_task(task, data)  # Re-execute locally
    461                     else:
--> 462                         raise_exception(exc, tb)
    463                 res, worker_id = loads(res_info)
    464                 state['cache'][key] = res

/usr/local/lib/python3.6/dist-packages/dask/compatibility.py in reraise(exc, tb)
    110         if exc.__traceback__ is not tb:
    111             raise exc.with_traceback(tb)
--> 112         raise exc
    113 
    114     import pickle as cPickle

/usr/local/lib/python3.6/dist-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    228     try:
    229         task, data = loads(task_info)
--> 230         result = _execute_task(task, data)
    231         id = get_id()
    232         result = dumps((result, id))

/usr/local/lib/python3.6/dist-packages/dask/core.py in _execute_task(arg, cache, dsk)
    116     elif istask(arg):
    117         func, args = arg[0], arg[1:]
--> 118         args2 = [_execute_task(a, cache) for a in args]
    119         return func(*args2)
    120     elif not ishashable(arg):

/usr/local/lib/python3.6/dist-packages/dask/core.py in <listcomp>(.0)
    116     elif istask(arg):
    117         func, args = arg[0], arg[1:]
--> 118         args2 = [_execute_task(a, cache) for a in args]
    119         return func(*args2)
    120     elif not ishashable(arg):

/usr/local/lib/python3.6/dist-packages/dask/core.py in _execute_task(arg, cache, dsk)
    117         func, args = arg[0], arg[1:]
    118         args2 = [_execute_task(a, cache) for a in args]
--> 119         return func(*args2)
    120     elif not ishashable(arg):
    121         return arg

/usr/local/lib/python3.6/dist-packages/dask/array/linalg.py in _cholesky_lower(a)
    940 def _cholesky_lower(a):
    941     import scipy.linalg
--> 942     return scipy.linalg.cholesky(a, lower=True)
    943 
    944 

/usr/local/lib/python3.6/dist-packages/scipy/linalg/decomp_cholesky.py in cholesky(a, lower, overwrite_a, check_finite)
     89     """
     90     c, lower = _cholesky(a, lower=lower, overwrite_a=overwrite_a, clean=True,
---> 91                          check_finite=check_finite)
     92     return c
     93 

/usr/local/lib/python3.6/dist-packages/scipy/linalg/decomp_cholesky.py in _cholesky(a, lower, overwrite_a, clean, check_finite)
     17     """Common code for cholesky() and cho_factor()."""
     18 
---> 19     a1 = asarray_chkfinite(a) if check_finite else asarray(a)
     20     a1 = atleast_2d(a1)
     21 

/usr/local/lib/python3.6/dist-packages/numpy/lib/function_base.py in asarray_chkfinite(a, dtype, order)
    493 
    494     """
--> 495     a = asarray(a, dtype=dtype, order=order)
    496     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
    497         raise ValueError(

/usr/local/lib/python3.6/dist-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    536 
    537     """
--> 538     return array(a, dtype, copy=False, order=order)
    539 
    540 

/usr/local/lib/python3.6/dist-packages/sparse/_sparse_array.py in __array__(self, **kwargs)
    210         from ._settings import AUTO_DENSIFY
    211         if not AUTO_DENSIFY:
--> 212             raise RuntimeError('Cannot convert a sparse array to dense automatically. '
    213                                'To manually densify, use the todense method.')
    214 

RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method.

This was run on a CPU in a google colab notebook; any help on resolving this would be greatly appreciated... ideally I'm hoping to find a solution that at least allows some of the computation to be done using sparse operations as time efficiency is a big factor in what I am working on (the above is a toy example... my actual work uses much larger matrices).

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-09-08T19:15:27Z

I'm not aware of a cholesky decomposition implementation that works with sparse arrays.

TomMaullin · 2019-09-08T19:19:22Z

Update:

Apologies this works if I set the following:

import os
os.environ["SPARSE_AUTO_DENSIFY"] = "1"

TomMaullin · 2019-09-08T19:30:47Z

I believe the package cvxoptdoes a sparse cholesky decomposition but this uses their own sparse datatype.

mrocklin · 2019-09-08T19:36:33Z

You might want to engage with the github. com/pydata/sparse package, which is probably where this sort of change would need to be made.

…

On Sun, Sep 8, 2019, 12:30 PM TomMaullin ***@***.***> wrote: I believe the package cvxoptdoes a sparse cholesky decomposition. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#5373?email_source=notifications&email_token=AACKZTFWLSBSWCZLXWA3FWDQIVHGRA5CNFSM4IUTZCJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6FXSRY#issuecomment-529234247>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACKZTGLOV2UQFP7FTGK6X3QIVHGRANCNFSM4IUTZCJQ> .

TomAugspurger · 2019-09-10T11:35:51Z

So if pydata/sparse implements a sparse Cholesky decomposition, what will dask need to do to use it? Right now we call scipy.linalg.cholesky. We would need to dispatch to pydata/sparse's based no the type of the concrete array? If so we can leave this issue open, but it's blocked until there's a sparse implementation we can use.

mrocklin · 2019-09-10T14:38:19Z

If that happens (and it seems like it might be slightly far away) then I wonder if SciPy might be interested in adopting NEP-18 support. cc @rgommers

rgommers · 2019-09-10T14:51:48Z

We are at least interested in exploring it: http://scipy.github.io/devdocs/roadmap.html#support-for-distributed-arrays-and-gpu-arrays

MickaelRigault · 2021-11-23T18:18:44Z

Hi, I don't know if it helps but https://scikit-sparse.readthedocs.io/en/latest/cholmod.html is an implementation of a Sparse Cholesky decomposition (cython). We use it quite a lot. If it could be natively implemented in dask, that would be very useful.

jsignell · 2022-01-12T16:15:35Z

If you are interested in taking this on @MickaelRigault a pull request would be welcome :)

TomMaullin mentioned this issue Sep 8, 2019

Sparse Cholesky Decomposition pydata/sparse#286

Open

jakirkham added the array label Sep 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cholesky fails for sparse matrices #5373

Cholesky fails for sparse matrices #5373

TomMaullin commented Sep 8, 2019

TomAugspurger commented Sep 8, 2019

TomMaullin commented Sep 8, 2019

TomMaullin commented Sep 8, 2019 •

edited

mrocklin commented Sep 8, 2019 via email

TomAugspurger commented Sep 10, 2019

mrocklin commented Sep 10, 2019

rgommers commented Sep 10, 2019

MickaelRigault commented Nov 23, 2021

jsignell commented Jan 12, 2022

Cholesky fails for sparse matrices #5373

Cholesky fails for sparse matrices #5373

Comments

TomMaullin commented Sep 8, 2019

TomAugspurger commented Sep 8, 2019

TomMaullin commented Sep 8, 2019

TomMaullin commented Sep 8, 2019 • edited

mrocklin commented Sep 8, 2019 via email

TomAugspurger commented Sep 10, 2019

mrocklin commented Sep 10, 2019

rgommers commented Sep 10, 2019

MickaelRigault commented Nov 23, 2021

jsignell commented Jan 12, 2022

TomMaullin commented Sep 8, 2019 •

edited