# Dealing with missing numbers in NumPy

### Sections
* [Sample data from a CSV file](###Sample-data-from-a-CSV-file)
* [Determining if a value is missing](###Determining-if-a-value-is-missing)
* [Counting the number of missing values](###Counting-the-number-of-missing-values)
* [Calculating the sum of an array that contains NaNs](###Calculating-the-sum-of-an-array-that-contains-`NaN`s)
* [Removing all rows that contain missing values](###Removing-all-rows-that-contain-missing-values)
* [Converting missing values to 0](###Convert-missing-values-to-0)
* [Converting certain numbers to NaN](###Converting-certain-numbers-to-`NaN`)

### Sample data from a CSV file

Let's assume we have a CSV file with missing elements that looks like:

```
%%file example.csv
1,2,3,4
5,6,,8
10,11,12,
```

The `np.genfromtxt` function has a `missing_values` parameter which translates missing values into `np.nan` objects by default. With this, we can construct a new NumPy `ndarray` object, even with missing elements.

In [None]:
import numpy as np
arr = np.genfromtxt('./example.csv', delimeter=',')

print(f'{arr.shape[0]} x {arr.shape[1] array:}')
print(arr)
# >> 3 x 4 array
#     [[  1.   2.   3.   4.]
#     [  5.   6.  nan   8.]
#     [ 10.  11.  12.  nan]]

### Determining if a value is missing

The easiest built-in to use to test `NaN`s is to use the `np.isnan` function

In [None]:
np.isnan(np.nan)
# >> True

It is super useful to create a boolean mask for the fancy indexing of NumPy arrays

In [None]:
np.isnan(arr)
# array([[False, False, False, False],
#       [False, False,  True, False],
#       [False, False, False,  True]], dtype=bool)

### Counting the number of missing values

We can use `np.isnan` to find out how many elements are missing in our array

In [None]:
np.count_nonzero(np.isnan(arr))
# 2

If we want to determine the number of non-missing elements, we can simply revert the returned `Boolean` mask via the tilde (`~`) sign

In [None]:
np.count_nonzero(~np.isnan(arr))
# 10

### Calculating the sum of an array that contains `NaN`s

If a NumPy array contains `NaN`s, we cannot use `sum` to calculate the sum of the array

In [None]:
np.sum(arr)
# nan

Since `np.sum` doesn't work, use `np.nansum` instead:

In [None]:
print('total sum: ', np.nansum(arr))
# total sum: 62.0

print('colun sums: ', np.nansum(arr, axis=0))
# column sums: [16. 19. 15. 12.]

print('row sums:' np.nansum(arr, axis=1))
# row sums: [10. 19. 33.]

### Removing all rows that contain missing values

Here, we again use the `Boolean mask` from above to return only those rows that **don't contain `NaNs`**. Note - if we wanted the rows that **do** contain `NaNs`, we can just drop the tilde `~`.

In [None]:
arr[`np.isnan(arr).any(1)]
# array([[ 1., 2., 3., 4.]])

### Convert missing values to 0

Certain operations simply don't work with `NaN` objects. To rectify this, we can use the `np.nan_to_num` function which converts `NaN`s to 0.

In [None]:
arr0 = np.nan_to_num(arr)
arr0
# array([[  1.,   2.,   3.,   4.],
#       [  5.,   6.,   0.,   8.],
#       [ 10.,  11.,  12.,   0.]])

### Converting certain numbers to `NaN`

Opposite of above - we can convert any number to an `np.NaN` object.

In [None]:
arr0[arr0 == 0] = np.nan
arr0
# array([[  1.,   2.,   3.,   4.],
#       [  5.,   6.,  nan,   8.],
#       [ 10.,  11.,  12.,  nan]])