Many real-world applications require you to deal with missing data problems. Both Pandas and Sklearn offer you methods to fill the missing data. Please run the following examples to ensure that you are familiar with the method in Pandas or Sklearn to deal with missing data.

(1)	Fill the missing data by 0, predefined values, average values in Pandas.

In [1]:
import numpy as np
import pandas as pd

df = pd.DataFrame([[np.nan, 2, np.nan, 0], [3, 4, np.nan, 1], [np.nan, np.nan, np.nan, 5], [np.nan, 3, np.nan, 4]], columns=list('ABCD'))
df.head(4)
filleddf = df.fillna(0)

values = {'A': 0, 'B':1, 'C':2, 'D':3}
df_predefine = df.fillna(value = values)

df_mean = df.fillna(df.mean())
print(df)
print(filleddf)
print(df_predefine)
print(df_mean)


     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4
     A    B    C  D
0  0.0  2.0  0.0  0
1  3.0  4.0  0.0  1
2  0.0  0.0  0.0  5
3  0.0  3.0  0.0  4
     A    B    C  D
0  0.0  2.0  2.0  0
1  3.0  4.0  2.0  1
2  0.0  1.0  2.0  5
3  0.0  3.0  2.0  4
     A    B   C  D
0  3.0  2.0 NaN  0
1  3.0  4.0 NaN  1
2  3.0  3.0 NaN  5
3  3.0  3.0 NaN  4


(2) Fill the missing data by using Sklearn functions

In [1]:
import numpy as np
from sklearn.impute import SimpleImputer
imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]])
X = [[np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9]]
X_filled = imp_mean.transform(X)
print(X_filled)


[[ 7.   2.   3. ]
 [ 4.   3.5  6. ]
 [10.   3.5  9. ]]
