# Missing Data in Pandas

## `None`: Pythonic missing data

In [1]:
import numpy as np
import pandas as pd

In [2]:
vals1 = np.array([1, None, 3, 4])
vals1

array([1, None, 3, 4], dtype=object)

The `dtype=object` means tehat the best inference that numpy could make is that the values in the array are Python objects. This kind of array has some benefits for some purposes, but any operation done on teh data will be at the Python level, with much more overhead than teh typically fast operations on arrays with native types:

In [3]:
for dtype in ['object', 'int']:
    print("dtype =", dtype)
    %timeit np.arange(1E6, dtype=dtype).sum()
    print()

dtype = object
60.8 ms ± 459 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

dtype = int
1.81 ms ± 34.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)



Using python objects also brings up an error if you try to perfom operations like `sum()` or `min()` on an array with `None` value:

In [5]:
vals1.sum()

TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

This shows that the operation of addition between an integer and `None` is undefined

## `NaN`

In [6]:
np.array_split?