# Missing Data Handeling
* Missing data in Pandas is ofthen represented by NumPy "NaN" values
* Pandas treats NaN vaules as a float, which allows them to be used in a vectorized operation

In [1]:
import numpy as np
import pandas as pd

In [2]:
sales = [0, 5, 155, np.nan, 518]
items = ["coffee", "bananas", "tea", "coconut", "sugar"]

sales_series = pd.Series(sales, index=items, name="Sales")
sales_series

coffee       0.0
bananas      5.0
tea        155.0
coconut      NaN
sugar      518.0
Name: Sales, dtype: float64

### Pandas released its own missing data type, "NA".
* This allows missing values to be stored as integers, instead of needing to convert to float
* This is still a new feature, but most bugs end up converting the data to NumPy's NaN

In [3]:
sales = [0, 5, 155, pd.NA, 518]
items = ["coffee", "bananas", "tea", "coconut", "sugar"]

sales_series = pd.Series(sales, index=items, name="Sales")
sales_series

coffee        0
bananas       5
tea         155
coconut    <NA>
sugar       518
Name: Sales, dtype: object

## Identifying Missing Data

* The .isna() and .value_counts() methods let you identify missing data in a Series
* The .isna() method returns True if a value is missing, and False otherwise


In [4]:
checklist = pd.Series(['Complete', np.NaN, np.NaN, np.NaN, 'Complete'])
checklist

0    Complete
1         NaN
2         NaN
3         NaN
4    Complete
dtype: object

In [6]:
checklist.isna() 

0    False
1     True
2     True
3     True
4    False
dtype: bool

* The .value_count() mehtod returns unique values and their frequency

In [8]:
checklist.value_counts() # .value_counts() supresses NaN and NONE values

Complete    2
Name: count, dtype: int64

In [None]:
checklist.value_counts(dropna=False) # The dropna=False argument resolves this problem

## Handeling Missing Data

* The .dropna() and .fillna() methods let you handle missing data in a Series
* The .dropna() method removes NaN values from your Series or DataFrame