# Missing Data Points.

### Filling out missing data points with pandas.

Usually when working with data, we could sometimes run into data that seems to be missing. This could be an issue, however, pandas automatically fills out the missing data with a null or NaN value.

In [2]:
import pandas as pd 
import numpy as np

In [5]:
d = {'A':[1,2,np.nan], 'B':[5,np.nan, np.nan], 'C':[1,2,3]}
df = pd.DataFrame(d)

In [7]:
df

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2
2,,,3


There are two methods for dropping/ removing NaN valued data points. The first is to remove all the rows that have NaN values using the `df.dropna()` function. The second is to drop all the columns that have NaN values using the `df.dropna()` function, but specifying the axis. The default is set to `axis = 0`, but can be changed to `axis = 1` to drop the columns.

In [11]:
df.dropna()

Unnamed: 0,A,B,C
0,1.0,5.0,1


In [10]:
df.dropna(axis=1)

Unnamed: 0,C
0,1
1,2
2,3


We can also decide how many rows to drop based on the number of non-NaN value data points. This is done using the `thresh` parameter within the `dropna()` function. 

This works by selecting the minimal number of non-NaN value data points (for example, a minimum of 2 non-Nan values) and then dropping the rows that have more than the threshold allows for. 

**(As seen below)**

In [12]:
df.dropna(thresh=2)
#Here we can see that row 3 has been dropped because it passed the threshold.

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,,2


The previous portion was the removal of NaN values from the data frame, but now we will focus on filling/replacing the missing values with the `fillna()` function.

In the example below, we filled the missing values with a string containing "FILL VALUE" to demonstrate how it would be done. However, usually you can fill the missing values with the mean of existing values.

In [13]:
df.fillna(value = "FILL VALUE")

Unnamed: 0,A,B,C
0,1.0,5.0,1
1,2.0,FILL VALUE,2
2,FILL VALUE,FILL VALUE,3


In [14]:
df['A'].fillna(value=df['A'].mean())

0    1.0
1    2.0
2    1.5
Name: A, dtype: float64