When working with sparse matrices, it is desired to have a way to work with them as
if they were a regular numpy.array
s. Yet, many popular methods for arrays don't exist for
sparse matrices. spartans
wishes to help, with many operations to work with
Full example notebook
- Free software: GNU General Public License v3
- Documentation: https://spartans.readthedocs.io.
- Mathematical Operations
- Rich set of operations not supported on sparse matrices like
variance
,cov
(covariance matrix) andcorrcoef
(correlation matrix). - Easy Indexing
- Convenient methods to index for "extra" sparse features by variance or by quantity.
- Masking
- Many algorithms consider the zeros in a sparse matrix as missing data. Or considering missing
data as zeros. Depending on the use-case.
spartans
- FeatureMatrix
- FeatureMatrix is a
spartan
's first-class citizen. It is a wrapper aroundscipy.sparse.csr
Matrix built with data analysis and data-science in mind.
Full example notebook
>>> import spartans as st
>>> from scipy.sparse import csr_matrix
>>> import numpy as np
>>> m = np.array([[1, -2, 0, 50],
[0, 0, 0, 100],
[1, 0, 0, 80],
[1, 4, 0, 0],f
[0, 0, 0, 0],
[0, 4, 0, 0],
[0, 0, 0, -50]])
>>> c = csr_matrix(m)
We can get the the correlation matrix of m using numpy.
>>> np.corrcoef(m, rowvar=False)
Out[]: array([[ 1. , -0.08, nan, 0.31],
[-0.08, 1. , nan, -0.35],
[ nan, nan, nan, nan],
[ 0.31, -0.35, nan, 1. ]])
This won't work with the sparse matrix c
>>> np.corrcoef(c, rowvar=False)
AttributeError: 'float' object has no attribute 'shape'
But with spartans
this can be done.
>>> st.corr(c)
Out[]: array([[ 1. , -0.08, nan, 0.31],
[-0.08, 1. , nan, -0.35],
[ nan, nan, nan, nan],
[ 0.31, -0.35, nan, 1. ]])
The column and row with nan
is because the original matrix has a columns (feature) which is
zero for the entire column. spartans
can handle that using st.non_zero_index(c, axis=0, as_bool=False)
which will return array([0, 1, 3])
.
A lot more functionality is in the notebook.
- This open-source project is backed by SentinelOne
- This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.