[Official Link](https://docs.scipy.org/doc/scipy/tutorial/sparse.html)

### 存储稀疏矩阵时，有各种不同的格式，不同格式有不同的特点和适用范围

* COO格式可以调用一些聚合方法，但是无法切片

In [2]:
from scipy import sparse
import numpy as np

In [3]:
# 常规矩阵
dense = np.array([[1, 0, 0, 2], [0, 4, 1, 0], [0, 0, 5, 0]])

In [7]:
sparse_array = sparse.coo_array(dense)

In [14]:
sparse_array.mean(axis=0)

array([0.33333333, 1.33333333, 2.        , 0.66666667])

In [15]:
sparse_array[0,:]

TypeError: 'coo_array' object is not subscriptable

* Compressed Sparse Row(**CSR**)格式可以切片

In [16]:
sparse_array.tocsr()[2:]

<1x4 sparse array of type '<class 'numpy.int64'>'
	with 1 stored elements in Compressed Sparse Row format>

* 有时，两个A格式的稀疏矩阵做乘法，会生成一个B格式的稀疏矩阵

In [17]:
sparse_array @ sparse_array.T

<3x3 sparse array of type '<class 'numpy.int64'>'
	with 5 stored elements in Compressed Sparse Row format>

* 通过指定行列idx对应的数来构建稀疏矩阵

In [18]:
# csr[row[k], row[k]] = data[row[k], row[k]]
row = [0, 0, 1, 1, 2]
col = [0, 3, 1, 2, 2]
data = [1, 2, 4, 1, 5]

csr = sparse.csr_array((data, (row, col)))

In [24]:
class VerifyError(Exception):
    def __init__(self, message: str) -> None:
        super().__init__(message)
        

In [25]:
def verify(non_zero_data: list, sparse_mat: np.ndarray):
    for i in range(len(data)):
        if sparse_mat[row[i], col[i]] != data[i]:
            raise VerifyError('构建稀疏矩阵失败')
        else:
            pass
    print('构建稀疏矩阵成功')
    return 

In [26]:
verify(data, csr)

构建稀疏矩阵成功


**稀疏矩阵特别适合于表示网络图中节点之间的连通性和权重**

In [28]:
row = [0,0,1,1,2,2]
col = [0,3,1,2,2,3]
data = [1,2,4,1,5,0]

csr = sparse.csr_array((data, (row, col)))
csr

<3x4 sparse array of type '<class 'numpy.int64'>'
	with 6 stored elements in Compressed Sparse Row format>

In [29]:
csr.eliminate_zeros()
csr

<3x4 sparse array of type '<class 'numpy.int64'>'
	with 5 stored elements in Compressed Sparse Row format>

* 当采用(data, (row, col))的方式构造稀疏矩阵时，如果row, col出现了重复idx，稀疏矩阵中会进行分别存储，转化为dense或调用聚合函数时会直接加和

In [44]:
row = [0,0,1,1,1,2]
col = [0,3,1,1,2,2]
data = [1,2,1,3,1,5]

dupes = sparse.csr_array((data, (row, col)))
dupes[2, 2]

5

In [45]:
dupes.todense()

array([[1, 0, 0, 2],
       [0, 4, 1, 0],
       [0, 0, 5, 0]])

In [46]:
dupes.has_canonical_format

True