# Locating Missing Data

In [1]:
import numpy as np

In [2]:
x = np.array([2, np.nan, 5, 3, np.nan])
x

array([ 2., nan,  5.,  3., nan])

In [3]:
x == np.nan

array([False, False, False, False, False])

We might consider locating NaN values in a vector using logical indexing. As we can see in the starter script, this will not work because NaN values are not equal to each other. Therefore, all comparisons involving NaN will return false.
Instead of ==, we can use the [isnan()](https://numpy.org/doc/stable/reference/generated/numpy.isnan.html#numpy-isnan) function to identify NaN values. The isnan() function takes an array as input and returns a logical array of the same size.

In [4]:
idx = np.isnan(x)
idx

array([False,  True, False, False,  True])

The [count_nonzero()](https://numpy.org/doc/stable/reference/generated/numpy.count_nonzero.html#numpy-count-nonzero) function counts the number of non-zero elements in a logical array.

In [5]:
number_missing = np.count_nonzero(idx)
number_missing

2

Some functions allow us to skip, or ignore, missing data. For instance, the [nanmean()](https://numpy.org/doc/stable/reference/generated/numpy.nanmean.html#numpy-nanmean) and [nanprod()](https://numpy.org/doc/stable/reference/generated/numpy.nanprod.html#numpy-nanprod) functions.

In [6]:
x_mean = np.nanmean(x)
x_mean

3.3333333333333335

In [7]:
x_prod = np.nanprod(x)
x_prod

30.0

Sometimes a missing value has a specific meaning, like 0 measurement. We can use the logical vector that identifies missing data to access and change them.

In [8]:
x[idx] = 0
x

array([2., 0., 5., 3., 0.])

In [9]:
x_mean = x.mean()
x_mean

2.0

In [10]:
x_prod = x.prod()
x_prod

0.0