# Missing Values

Dealing with missing data is a common challenge when working with real-world data. Lets explore the various methods provided by pandas for identifying and handling missing values within datasets.

**Sample data**:

In [1]:
import pandas as pd

In [2]:
data = {
    'employee_id': [2, 3, 4, 7, 8, float('nan')],
    'name': ['Sal', 'Yang', 'Khaya', 'Lin', 'Eve', 'Mike'],
    'department': ['Sales', 'Marketing', 'Engineering', 'Sales', 'Engineering', None],
    'salary': [60000, 75000, 80000, 62000, 90000, 70000]
}

employees = pd.DataFrame(data)
employees

Unnamed: 0,employee_id,name,department,salary
0,2.0,Sal,Sales,60000
1,3.0,Yang,Marketing,75000
2,4.0,Khaya,Engineering,80000
3,7.0,Lin,Sales,62000
4,8.0,Eve,Engineering,90000
5,,Mike,,70000


## Identifying missing values

The `isna` method will return `True` for any missing entries in a dataset when indicated in a standard way. 

Note that both id and department entries are identified is missing entries.

In [3]:
employees.isna()

Unnamed: 0,employee_id,name,department,salary
0,False,False,False,False
1,False,False,False,False
2,False,False,False,False
3,False,False,False,False
4,False,False,False,False
5,True,False,True,False


To see the number of missing entries in each column we can sum the result of `isna`.

In [4]:
employees.isna().sum()

employee_id    1
name           0
department     1
salary         0
dtype: int64

## Handling missing data by dropping values

Drop rows with any missing values

In [5]:
employees.dropna()

Unnamed: 0,employee_id,name,department,salary
0,2.0,Sal,Sales,60000
1,3.0,Yang,Marketing,75000
2,4.0,Khaya,Engineering,80000
3,7.0,Lin,Sales,62000
4,8.0,Eve,Engineering,90000


We can also drop observations based on more specific conditions.

Drop rows where all cells in that row is NA:
```python
df.dropna(how='all')
```

Drop column if they only contain missing values:
```python
df.dropna(axis=1, how='all')
```

## Handling missing data by filling values

We can also fill in missing values with appropriate substitutes. For example when creating models we need all entries to have some value.

Fill in missing data with zeros:

In [6]:
employees.fillna(0)

Unnamed: 0,employee_id,name,department,salary
0,2.0,Sal,Sales,60000
1,3.0,Yang,Marketing,75000
2,4.0,Khaya,Engineering,80000
3,7.0,Lin,Sales,62000
4,8.0,Eve,Engineering,90000
5,0.0,Mike,0,70000


Fill in missing value with mean of that column. 
Say we had missing values for salary, we could use the mean of the values we have for the fill value.

```python
mean_salary = employees.salary.mean()
employees.fillna(mean_salary)
```