### SPARSE MATRIX

Sparse Matrix is a matrix which the majority of the elements are zero.

These zeros even though doesn't add information to the data in most cases, still consume same memory as non-zero digits. It is widely used in Machine Learning, Natural Language Processing and Large-scale data processing where storing zero elements is inefficient.

Its benefits include: reduced memory usage and faster computation.

### SPARSE MATRIX FORMAT

##### DIAGONAL FORMAT

In [4]:
import numpy as np
from scipy import sparse

# create a 2d array for practice
data = np.array([[1,2,3,4]]).repeat(3, axis=0)

# define the offsets
offsets = np.array([0,-1,2])

# define the diagonal matrix
mtx = sparse.dia_matrix((data, offsets), shape=(4,4))

# print the dense matrix representation of the sparse array
mtx.todense()

matrix([[1, 0, 3, 0],
        [1, 2, 0, 4],
        [0, 2, 3, 0],
        [0, 0, 3, 4]])

In [5]:
# print the sparse matrix

print(mtx)

  (0, 0)	1
  (1, 1)	2
  (2, 2)	3
  (3, 3)	4
  (1, 0)	1
  (2, 1)	2
  (3, 2)	3
  (0, 2)	3
  (1, 3)	4


##### LIST OF LISTS FORMAT

In [9]:
# create an empty LIL matrix
mtx = sparse.lil_matrix((4,5))

# prepare random data
from numpy.random import rand

data = np.round(rand(2,3))

# assign data using fancy indexing
mtx[:2, [1,2,3]] = data

# view the random data
data

array([[1., 0., 1.],
       [1., 0., 0.]])

In [10]:
# print the dense matrix representation of the sparse array
mtx.todense()

matrix([[0., 1., 0., 1., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

In [11]:
# print the sparse matrix
print(mtx)

  (0, 1)	1.0
  (0, 3)	1.0
  (1, 1)	1.0


##### DICTIONARY OF KEYS (DOK)

In [12]:
# create a DOK matrix element by element
mtx = sparse.dok_matrix((5,5), dtype=np.float64)

# assign some values
for ir in range(5):
    for ic in range(5):
        mtx[ir, ic] = 1 * (ir != ic)

# print the dense matrix representation of the sparse array
mtx.todense()

matrix([[0., 1., 1., 1., 1.],
        [1., 0., 1., 1., 1.],
        [1., 1., 0., 1., 1.],
        [1., 1., 1., 0., 1.],
        [1., 1., 1., 1., 0.]])

In [13]:
# print the sparse matrix
print(mtx)

  (0, 1)	1.0
  (0, 2)	1.0
  (0, 3)	1.0
  (0, 4)	1.0
  (1, 0)	1.0
  (1, 2)	1.0
  (1, 3)	1.0
  (1, 4)	1.0
  (2, 0)	1.0
  (2, 1)	1.0
  (2, 3)	1.0
  (2, 4)	1.0
  (3, 0)	1.0
  (3, 1)	1.0
  (3, 2)	1.0
  (3, 4)	1.0
  (4, 0)	1.0
  (4, 1)	1.0
  (4, 2)	1.0
  (4, 3)	1.0


##### COORDINATE FORMAT

In [14]:
# create an empty COO matrix
mtx = sparse.coo_matrix((3,4), dtype=np.int8)

# view the empty matrix before assignment of data
mtx.todense()

matrix([[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]], dtype=int8)

In [15]:
# create data using (data, (row, col)) format
row = np.array([0, 3, 1, 0])
col = np.array([0, 2, 1, 2])
data = np.array([4,5,7,1])

# assign the values to the COO matrix
mtx = sparse.coo_matrix((data, (row, col)), shape=(4,4))

# view the dense matrix representation of the sparse array
mtx.todense()

matrix([[4, 0, 1, 0],
        [0, 7, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 5, 0]])

In [None]:
# slicing the sparse matrix to get an element would return an error
mtx[2, 3]

TypeError: 'coo_matrix' object is not subscriptable