# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>Day 52- Scipy Sparse Data</b></div>

## **What is Sparse Data**
* Sparse data is data that has mostly unused elements (elements that don't carry any information ).

* It can be an array like this one:

[1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 0]

**Sparse Data**: is a data set where most of the item values are zero.

**Dense Array**: is the opposite of a sparse array: most of the values are not zero.

## **How to Work With Sparse Data**
* SciPy has a module, scipy.sparse that provides functions to deal with sparse data.

* There are primarily two types of sparse matrices that we use:

**CSC** - Compressed Sparse Column. For efficient arithmetic, fast column slicing.

**CSR** - Compressed Sparse Row. For fast row slicing, faster matrix vector products



## **CSR Matrix**
* We can create CSR matrix by passing an arrray into function scipy.sparse.csr_matrix().

### **Create a CSR matrix from an array:**

In [1]:
import numpy as np
from scipy.sparse import csr_matrix

arr = np.array([0, 0, 0, 0, 0, 1, 1, 0, 2])

print(csr_matrix(arr))

  (0, 5)	1
  (0, 6)	1
  (0, 8)	2


## **Sparse Matrix Methods**
### **Viewing stored data (not the zero items) with the data property:**

In [2]:
import numpy as np
from scipy.sparse import csr_matrix

arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])

print(csr_matrix(arr).data)

[1 1 2]


### **Counting nonzeros with the count_nonzero() method:**

In [3]:
import numpy as np
from scipy.sparse import csr_matrix

arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])

print(csr_matrix(arr).count_nonzero())

3


### **Removing zero-entries from the matrix with the eliminate_zeros() method:**

In [4]:
import numpy as np
from scipy.sparse import csr_matrix

arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])

mat = csr_matrix(arr)
mat.eliminate_zeros()

print(mat)

  (1, 2)	1
  (2, 0)	1
  (2, 2)	2


### **Eliminating duplicate entries with the sum_duplicates() method:**

In [5]:
import numpy as np
from scipy.sparse import csr_matrix

arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])

mat = csr_matrix(arr)
mat.sum_duplicates()

print(mat)


  (1, 2)	1
  (2, 0)	1
  (2, 2)	2


### **Converting from csr to csc with the tocsc() method:**