In [3]:
import scipy.sparse as sps
import numpy as np

## scipy.sparse --- Overview

##### 稀疏矩阵的类
    bsr_matrix - Block Sparse Row matrix
    coo_matrix - A sparse matrix in COOrdinate format
    csc_matrix - Compressed Sparse Column matrix
    csr_matrix - Compressed Sparse Row matrix
    dia_matrix - Sparse matrix with DIAgonal storage
    dok_matrix - Dictionary Of Keys based sparse matrix
    lil_matrix - Row-based list of lists sparse matrix
    spmatrix - Sparse matrix base class

##### 子模块
    csgraph - Compressed sparse graph routines
    linalg - sparse linear algebra routines


##### 异常类
    SparseEfficiencyWarning
    SparseWarning

##### 函数
    eye - Sparse MxN matrix whose k-th diagonal is all ones
    identity - Identity matrix in sparse format
    kron - kronecker product of two sparse matrices
    kronsum - kronecker sum of sparse matrices
    diags - Return a sparse matrix from diagonals
    spdiags - Return a sparse matrix from diagonals
    block_diag - Build a block diagonal sparse matrix
    tril - Lower triangular portion of a matrix in sparse format
    triu - Upper triangular portion of a matrix in sparse format
    bmat - Build a sparse matrix from sparse sub-blocks
    hstack - Stack sparse matrices horizontally (column wise)
    vstack - Stack sparse matrices vertically (row wise)
    rand - Random values in a given shape
    random - Random values in a given shape
    find


##### 稀疏矩阵的保存与加载
    save_npz - Save a sparse matrix to a file using `.npz` format.
    load_npz - Load a sparse matrix from a file using `.npz` format.


##### 稀疏矩阵的判定
    issparse
    isspmatrix
    isspmatrix_csc
    isspmatrix_csr
    isspmatrix_bsr
    isspmatrix_lil
    isspmatrix_dok
    isspmatrix_coo
    isspmatrix_dia


##### DATA
    __all__ = ['SparseEfficiencyWarning', 'SparseWarning', 'base', 'block_...


### 使用

SciPy 提供了 7 种稀疏矩阵类型：
1. csc_matrix: Compressed Sparse Column format
2. csr_matrix: Compressed Sparse Row format
3. bsr_matrix: Block Sparse Row format
4. lil_matrix: List of Lists format
5. dok_matrix: Dictionary of Keys format
6. coo_matrix: COOrdinate format (aka IJV, triplet format)
7. dia_matrix: DIAgonal format

可以使用`dok_matrix`或`lil_matrix`来快速构建一个矩阵，其中`lil_matrix`支持类似 NumPy 数组的语法，例如切片、索引等操作。

尽管这些矩阵与 NumPy 数组类似，但由于 NumPy 无法正确地对其转换并用于计算，从而可能会得到错误输出；因此如果想对某数组应用 NumPy 函数时，建议先查看 SciPy 类是否有其自己的实现函数，或事先将这些稀疏矩阵转换为 NumPy 数组。执行乘法或求逆之类的操作时，建议先将矩阵转换为`CSC`或`CSR`格式；`lil_matrix`格式是基于行的，因此`CSR`格式计算效率更高，而`CSC`会稍差一些；所有`CSR`、`CSC`和`COO`格式之间的转换时间都是线性的。


#### Matrix vector product

The CSR format is specially suitable for fast matrix vector products.

Example 1

Construct a 1000x1000 lil_matrix and add some values to it:
``
>>> from scipy.sparse import lil_matrix
>>> from scipy.sparse.linalg import spsolve
>>> from numpy.linalg import solve, norm
>>> from numpy.random import rand

>>> A = lil_matrix((1000, 1000))
>>> A[0, :100] = rand(100)
>>> A[1, 100:200] = A[0, :100]
>>> A.setdiag(rand(1000))
``
Now convert it to CSR format and solve A x = b for x:
``
>>> A = A.tocsr()
>>> b = rand(1000)
>>> x = spsolve(A, b)
``
Convert it to a dense matrix and solve, and check that the result
is the same:
``
>>> x_ = solve(A.toarray(), b)
``
Now we can compute norm of the error with:
``
>>> err = norm(x-x_)
>>> err < 1e-10
True
``
It should be small :)


Example 2

Construct a matrix in COO format:
``
>>> from scipy import sparse
>>> from numpy import array
>>> I = array([0,3,1,0])
>>> J = array([0,3,1,2])
>>> V = array([4,5,7,9])
>>> A = sparse.coo_matrix((V,(I,J)),shape=(4,4))
``
Notice that the indices do not need to be sorted.

Duplicate (i,j) entries are summed when converting to CSR or CSC.
``
>>> I = array([0,0,1,3,1,0,0])
>>> J = array([0,2,1,3,1,0,0])
>>> V = array([1,1,1,1,1,1,1])
>>> B = sparse.coo_matrix((V,(I,J)),shape=(4,4)).tocsr()
``
This is useful for constructing finite-element stiffness and mass matrices.

### Further details

CSR column indices are not necessarily sorted. Likewise for CSC row
indices. Use the .sorted_indices() and .sort_indices() methods when
sorted indices are required (e.g., when passing data to other libraries).

# 

# 

## `sps.dok_matrix(arg1, shape=None, dtype=None, copy=False)`

基于键字典的稀疏矩阵。该类型可以递增式构造系数矩阵，并支持加、减、乘、除和矩阵幂等数学运算，不支持复制。其对单个元素的访问速度为 O(1)。

##### Args
- arg1 : 可以是稀疏矩阵，也可以是稠密矩阵；当不指定此参数时，必须指定`shape`参数
- shape : 矩阵形状，应是元祖类型
- dtype : 略
- copy : 是否对原矩阵进行复制

##### Attributes
- shape : 是一个 2 元元祖
- ndim : 矩阵维度，即等于 2
- nnz：非零元素个数


##### Examples

In [13]:
S = sps.dok_matrix((5, 5))
for i in range(5):
    for j in range(5):
        S[i, j] = i + j

print(S.keys())
print(S.items())

dict_keys([(0, 1), (0, 2), (0, 3), (0, 4), (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4)])
dict_items([((0, 1), 1.0), ((0, 2), 2.0), ((0, 3), 3.0), ((0, 4), 4.0), ((1, 0), 1.0), ((1, 1), 2.0), ((1, 2), 3.0), ((1, 3), 4.0), ((1, 4), 5.0), ((2, 0), 2.0), ((2, 1), 3.0), ((2, 2), 4.0), ((2, 3), 5.0), ((2, 4), 6.0), ((3, 0), 3.0), ((3, 1), 4.0), ((3, 2), 5.0), ((3, 3), 6.0), ((3, 4), 7.0), ((4, 0), 4.0), ((4, 1), 5.0), ((4, 2), 6.0), ((4, 3), 7.0), ((4, 4), 8.0)])


In [36]:
S = sps.dok_matrix([[1, 0, 0], [2, 3, 4], [0, 5, 0]])
print(S)
S

  (0, 0)	1
  (1, 0)	2
  (1, 1)	3
  (1, 2)	4
  (2, 1)	5


<3x3 sparse matrix of type '<class 'numpy.int32'>'
	with 5 stored elements in Dictionary Of Keys format>

#### `dok_matrix.tocsr(copy=False)`

将矩阵转换为压缩行稀疏 (Compressed Sparse Row) 格式；`copy=False`时，`S`与生成的`csr_matrix`之间共享数据/索引

#### `dok_matrix.dot(other)`

即简单的矩阵乘法。具体而言，当`dok_matrix`和`other`都是矩阵时，二者维数应相互匹配；当`other`为向量且不是稀疏矩阵时，其可以是行形式也可以列形式。

需要注意的是，从 NumPy 1.7 开始，`np.dot`便不会区分稀疏矩阵了，进而使用该函数的计算结果会出错；因此应先将系数矩阵转换为密集矩阵，但这样也就失去了相应的性能优势了。

##### Examples
例如下面的例子，

In [80]:
A = sps.csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
v = sps.csr_matrix([1, 0, -1])
print(v.dot(A))
print(A.dot(v.T))

In [85]:
v = [1, 0, -1]
v = [[1], [0], [-1]]
A.dot(v)

array([[ 1],
       [-3],
       [-1]], dtype=int32)

#### `sps.dok_matrix.todense(order=None, out=None)`

返回该矩阵的密集形式的 `np.matrix` 数组

##### Args
- order : 应是`'C'`或`'F'`，即使用 C (行为主) 还是 Fortran (列为主) 格式返回数组，默认是 C 顺序；该参数不能与`out`一起指定
- out : 如果没指定该参数，则会将计算结果分配给一个新数组；否则会将其结果与`out`进行绑定，此时`out`的形状和数据类型应与稀疏矩阵相同

In [98]:
A = sps.csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
A = A.todense()
print(type(A))

<class 'numpy.matrix'>


# 

# 

## `sps.csr_matrix(arg1, shape=None, dtype=None, copy=False)`

压缩行稀疏矩阵，支持加、减、乘、除、矩阵幂等数学运算。这种格式的数组优势在于，其对于`CSR + CSR`, `CSR * CSR`等的运算、矩阵-向量乘法、行切片会很高效；而其劣势在于，不便于对数组的稀疏结构的修改 (可以考虑`LIL`、`DOK`)，以及不便于进行列切片(可以考虑`CSC`)。

##### Args
- arg1
    - 可以是类数组的 2 维稀疏/密集矩阵
    - 也可以是`(data, (row, col)`形式，这 3 个变量均为 1 维类数组类型，且长度应当相等；此时代表`a[row[k], col[k]] = data[k]`，其中相同索引的元素会加起来（例1）
    - 也可以是`(data, indices, indptr)`形式，这是 CSR 的标准储存格式，其中第 i 行的列索引被储存为`indices[indptr[i]:indptr[i+1]]`，相应元素被储存为`data[indptr[i]:indptr[i+1]]`（例2）
    - 也可以不指定该参数，此时必须声明`shape`，此时返回数组为零矩阵
- shape : 应为 2 元元祖，在`arg1`指定具体的 2D 数组时，应与该数组形状相同；对于其他情况，在不指定时会根据`arg1`进行推断，然而此时推断出的会是满足`arg1`的最小形状的矩阵，进而如果需要指定形状更大的矩阵时，需要对此参数进行声明；
- dtype : 略
- copy : 是否 in-place 地操作，默认为 False

##### Attributes
- ndim : 整型，即数组维度，显然该参数总会是 2
- nnz : 所存储的元素个数，包括显式指定的 0
- data : 矩阵元素的 CSR 格式数组
- indices : 矩阵索引的 CSR 格式数组
- indptr : 矩阵索引指针的 CSR 格式数组
- has_sorted_indices : 是否储存了索引值

##### Examples
1. 对`arg1`指定`(data, (row, col))`形式，其中相同索引的元素会加起来：

In [132]:
row = [0, 0, 1, 2, 2, 2]
col = [0, 2, 2, 0, 0, 2]
data = [1, 2, 3, 4, 5, 6]
s = sps.csr_matrix((data, (row, col)))
print(s.toarray())

[[1 0 2]
 [0 0 3]
 [9 0 6]]


2. 对`arg1`指定`(data, indices, indptr)`形式：

In [127]:
indptr = [0, 2, 3, 6]
indices = [0, 2, 2, 0, 1, 2]
data = [1, 2, 3, 4, 5, 6]
s1 = sps.csr_matrix((data, indices, indptr), shape=(3, 3))
print(s1.toarray())

# equivalent to
row = [[i]*(indptr[i+1]-indptr[i])  for i in range(len(indptr)-1)]
col = [indices[indptr[i]:indptr[i+1]] for i in range(len(indptr)-1)]
col = [c for col_ in col for c in col_]
row = [r for row_ in row for r in row_]
s2 = sps.csr_matrix((data, (row, col)), shape=(3, 3))
print(s2.toarray())

[[1 0 2]
 [0 0 3]
 [4 5 6]]
[[1 0 2]
 [0 0 3]
 [4 5 6]]


3. 人为指定`shape`参数，若不指定，`shape`会推断为 3\*4：

In [136]:
row = [0, 0, 1, 2, 2, 2, 2]
col = [0, 2, 2, 0, 1, 2, 3]
data = [1, 2, 3, 4, 5, 6, 7]
s = sps.csr_matrix((data, (row, col)), shape=(5, 4))
s_ = sps.csr_matrix((data, (row, col)))
print(s.toarray())
print(s_.shape)

[[1 0 2 0]
 [0 0 3 0]
 [4 5 6 7]
 [0 0 0 0]
 [0 0 0 0]]
(3, 4)


4. 递增式构造 CSR 矩阵的示例：

In [None]:
docs = [["hello", "world", "hello"], ["goodbye", "cruel", "world"]]
indptr = [0]
indices = []
data = []
vocabulary = {}
for d in docs:
    for term in d:
        index = vocabulary.setdefault(term, len(vocabulary))
        indices.append(index)
        data.append(1)
    indptr.append(len(indices))
csr_matrix((data, indices, indptr), dtype=int).toarray()

In [163]:
np.random.seed(0)
idx = tuple(np.random.randint(0, 10, size=2))
a = np.arange(100).reshape([10, -1])
# print(a[idx])
stack = [idx]
k = stack.pop()
x, y = k
# for i in range(10):
#     stack += [[]]