# Table of content

- Sparse matrix representation
    - [CSR](#csr-representation)
    - [DOK](#dok-representation)
    - [LIL](#lil-representation)


# A simple, sparse matrix

A simple, sparse matrix will be constructed to show the representation formats of a sparse matrix in Python.

In [2]:
import numpy as np
from scipy import sparse

In [4]:
X = np.array([[0,0,0,3,0,0,4],
              [0,5,0,0,0,0,0],
              [0,0,5,0,0,4,0],
              [4,0,0,0,0,0,1],
              [0,2,0,0,3,0,0]])
print(X)

[[0 0 0 3 0 0 4]
 [0 5 0 0 0 0 0]
 [0 0 5 0 0 4 0]
 [4 0 0 0 0 0 1]
 [0 2 0 0 3 0 0]]


There's many zeros in this matrix, so let's calculate the sparsity of the matrix:

In [5]:
sparsity = 1.0 - (np.count_nonzero(X) / X.size)
print("The sparsity of X is ", sparsity)

The sparsity of X is  0.7428571428571429


### CSR representation


We can convert  this **dense** matrix into a sparse matrix by using the `sparse.csr_matrix()` function. The row/column indices of nonzero values are stored in a *Compressed Sparse Row (CSR)* matrix:

In [7]:
# Convert X to a sparse Matrix

S1 = sparse.csr_matrix(X)

print(f"""
Type of sparse matrix representation: {type(S1)}

Sparse Matrix:\n{S1}

Sparse Data: {S1.data}

Indices of columns: {S1.indices}

Pointers for data: {S1.indptr}
""")


Type of sparse matrix representation: <class 'scipy.sparse._csr.csr_matrix'>

Sparse Matrix:
  (0, 3)	3
  (0, 6)	4
  (1, 1)	5
  (2, 2)	5
  (2, 5)	4
  (3, 0)	4
  (3, 6)	1
  (4, 1)	2
  (4, 4)	3

Sparse Data: [3 4 5 5 4 4 1 2 3]

Indices of columns: [3 6 1 2 5 0 6 1 4]

Pointers for data: [0 2 3 5 7 9]



### DOK representation

Another efficient structure for constructing sparse matrices is the **Dictionary of Keys (DOK)**, where a python dictionary is used to represent non-zero values of sparse matrix.

In this representation, `keys()` is used for indices, and `values()` is used for values of non-zero elements:

In [9]:
S2 = sparse.dok_matrix(X)

print(f"""
Type of sparse matrix representation: {type(S2)}

Sparse Matrix:\n{S2}

Keys in dictionary: {S2.keys()}

Values in dictionary: {S2.values()}
""")


Type of sparse matrix representation: <class 'scipy.sparse._dok.dok_matrix'>

Sparse Matrix:
  (0, 3)	3
  (0, 6)	4
  (1, 1)	5
  (2, 2)	5
  (2, 5)	4
  (3, 0)	4
  (3, 6)	1
  (4, 1)	2
  (4, 4)	3

Keys in dictionary: dict_keys([(0, 3), (0, 6), (1, 1), (2, 2), (2, 5), (3, 0), (3, 6), (4, 1), (4, 4)])

Values in dictionary: dict_values([3, 4, 5, 5, 4, 4, 1, 2, 3])



### LIL representation

The Last representation format shown here is a row-based **list of lists sparse (LIL)** matrix. The first list stores column indices for each row, and the second list is used to store the element's row values.

In [10]:
S3 = sparse.lil_matrix(X)

print(f"""
Type of sparse matrix representation: {type(S3)}

Sparse Matrix:\n{S3}

Lists for rows: {S3.rows}

Lists for columns: {S3.data}
""")


Type of sparse matrix representation: <class 'scipy.sparse._lil.lil_matrix'>

Sparse Matrix:
  (0, 3)	3
  (0, 6)	4
  (1, 1)	5
  (2, 2)	5
  (2, 5)	4
  (3, 0)	4
  (3, 6)	1
  (4, 1)	2
  (4, 4)	3

Lists for rows: [list([3, 6]) list([1]) list([2, 5]) list([0, 6]) list([1, 4])]

Lists for columns: [list([3, 4]) list([5]) list([5, 4]) list([4, 1]) list([2, 3])]



In `scipy.sparse` package, there is also a `todense()` function for converting a sparse matrix to a dense matrix:

In [11]:
# Convert the sparse matrix to a dense matrix

X  = S1.todense()

print(X)

[[0 0 0 3 0 0 4]
 [0 5 0 0 0 0 0]
 [0 0 5 0 0 4 0]
 [4 0 0 0 0 0 1]
 [0 2 0 0 3 0 0]]


This can be useful when exploring data, but if your dataset is large, the dense matrix won't fit in memory and may cause an error.

# Sparse Matrix from a real dataset