<a href="https://colab.research.google.com/github/Zakibrahmi/imputingValues/blob/main/MICE_IterativeImputer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer

**Initialize the IterativeImputer**.
IterativeImputer is a method from the sklearn.impute module in scikit-learn that provides an iterative approach to impute missing values in a dataset. It models each feature with missing values as a function of the other features, and the missing values are estimated through multiple iterations using regression techniques.
By default, the number of iterations is set to 10 via the max_iter argument. If there are many missing values, you may consider increasing this value to achieve greater accuracy, as more iterations may be needed."

In [2]:
# Define imputer
imputer = IterativeImputer(random_state=100, max_iter=10)
import pandas as pd
file_path = "https://raw.githubusercontent.com/selva86/datasets/master/Churn_Modelling_m.csv"
df = pd.read_csv(file_path)
df.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619.0,France,Female,42.0,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608.0,Spain,Female,41.0,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502.0,France,,,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699.0,France,,39.0,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850.0,Spain,Female,43.0,2,,1,1,1,79084.1,0


Let consider 3 numuricals features Balance, Age, and Exited. These features will be used for training of the Imputer model

In [3]:
df_train = df.loc[:, ["Balance", "Age", "Exited"]]
df_train.head()

Unnamed: 0,Balance,Age,Exited
0,0.0,42.0,1
1,83807.86,41.0,0
2,159660.8,,1
3,0.0,39.0,0
4,,43.0,0


The main step is the training step and the prediction of the missing values using the `transform` method.

In [4]:
imputer.fit(df_train)
df_imputed = imputer.transform(df_train)
df_imputed[:10]

array([[0.00000000e+00, 4.20000000e+01, 1.00000000e+00],
       [8.38078600e+04, 4.10000000e+01, 0.00000000e+00],
       [1.59660800e+05, 4.47681408e+01, 1.00000000e+00],
       [0.00000000e+00, 3.90000000e+01, 0.00000000e+00],
       [7.25930035e+04, 4.30000000e+01, 0.00000000e+00],
       [1.13755780e+05, 4.40000000e+01, 1.00000000e+00],
       [0.00000000e+00, 5.00000000e+01, 0.00000000e+00],
       [1.15046740e+05, 2.90000000e+01, 1.00000000e+00],
       [1.42051070e+05, 4.40000000e+01, 0.00000000e+00],
       [1.34603880e+05, 2.70000000e+01, 0.00000000e+00]])

The latest step is to replace the imputed values. As we say there is no missing values.



In [5]:
df.loc[:, ["Balance", "Age", "Exited"]] = df_imputed
df.head(10)

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619.0,France,Female,42.0,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608.0,Spain,Female,41.0,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502.0,France,,44.768141,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699.0,France,,39.0,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850.0,Spain,Female,43.0,2,72593.003473,1,1,1,79084.1,0
5,6,15574012,Chu,645.0,Spain,Male,44.0,8,113755.78,2,1,0,,1
6,7,15592531,Bartlett,822.0,France,Male,50.0,7,0.0,2,1,1,10062.8,0
7,8,15656148,Obinna,376.0,Germany,Female,29.0,4,115046.74,4,1,0,119346.88,1
8,9,15792365,He,501.0,France,Male,44.0,4,142051.07,2,0,1,74940.5,0
9,10,15592389,H?,,France,Male,27.0,2,134603.88,1,1,1,71725.73,0
