# Simple Imputer


***The SimpleImputer class from the sklearn.impute module is a useful tool for handling missing data in your machine learning pipelines. It provides a straightforward way to impute missing values with either a constant value or the mean, median, or most frequent value of the corresponding feature***

***Replace missing values using a descriptive statistic (e.g. mean, median, or most frequent) along each column, or using a constant value.***

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
data = pd.read_csv('LoanApprovalPrediction.csv')
data.head(3)

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0.0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1.0,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0.0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y


In [4]:
data.isnull().sum()

Loan_ID               0
Gender                0
Married               0
Dependents           12
Education             0
Self_Employed         0
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount           21
Loan_Amount_Term     14
Credit_History       49
Property_Area         0
Loan_Status           0
dtype: int64

In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 598 entries, 0 to 597
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Loan_ID            598 non-null    object 
 1   Gender             598 non-null    object 
 2   Married            598 non-null    object 
 3   Dependents         586 non-null    float64
 4   Education          598 non-null    object 
 5   Self_Employed      598 non-null    object 
 6   ApplicantIncome    598 non-null    int64  
 7   CoapplicantIncome  598 non-null    float64
 8   LoanAmount         577 non-null    float64
 9   Loan_Amount_Term   584 non-null    float64
 10  Credit_History     549 non-null    float64
 11  Property_Area      598 non-null    object 
 12  Loan_Status        598 non-null    object 
dtypes: float64(5), int64(1), object(7)
memory usage: 60.9+ KB


In [6]:
data.select_dtypes(include=['float64']).columns

Index(['Dependents', 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term',
       'Credit_History'],
      dtype='object')

***SimpleImputer***

In [16]:

# ***SimpleImputer***

from sklearn.impute import SimpleImputer

Si = SimpleImputer(strategy='mean')

Array  = Si.fit_transform(data[['Dependents', 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term',
       'Credit_History']])

In [17]:
new_data = pd.DataFrame(Array , columns=data.select_dtypes(include=['float64']).columns)

In [18]:
new_data.head(5)

Unnamed: 0,Dependents,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History
0,0.0,0.0,144.968804,360.0,1.0
1,1.0,1508.0,128.0,360.0,1.0
2,0.0,0.0,66.0,360.0,1.0
3,0.0,2358.0,120.0,360.0,1.0
4,0.0,0.0,141.0,360.0,1.0


In [19]:
new_data.isna().sum()

Dependents           0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
dtype: int64