## Tutorial 18: Sparse Matricies

How to create, modify, and use sparse matricies using the
`scipy` module.

### Sparsity

In many data science applications, such as text analysis, genetics,
and network analysis, there is a common need to work with matricies
that contain many zeros. Sometimes as much as 99.999% of a matrix may
all be zeros. For space and efficeny, it is useful to store these
matricies in a different format. Specifically, only the non-zero entries
are included and it is assumed that missing values are all zero. In
these notes we will show how to create and work with such matricies.
To start, load the `numpy` and `scipy` modules:

In [None]:
import numpy as np
import scipy as sp

### Coordinate format

The easiest way to create a sparse matrix is to start with a format called
the COO format (COOrdinate). Here, we specify the row and column index along
with each value. For example, here we create a sparse 4-by-4 matrix with only
4 non-zero entries:

In [None]:
from scipy.sparse import coo_matrix

row  = np.array([0, 3, 1, 0])
col  = np.array([0, 3, 1, 2])
data = np.array([4, 5, 7, 9])
A = coo_matrix((data, (row, col)), shape=(4, 4))
A

For illustration, we convert this to a dense numpy matrix to see the results:

In [None]:
A.toarray()

### Compressed Sparse Row (CSR) matrix

The coordinate format is not easy for algorithms to work with. It is best
to quickly convert to another format, such as CSR. This is made easy with
scipy:

In [None]:
B = A.tocsr()
B

In this data format we can do many matrix operations without breaking the sparsity,
for example taking a matrix product:

In [None]:
C = B @ B
C

These sparse matricies will be useful as we move towards machine learning applications
with textual data.

-------

## Practice