MICS stands for Multivariate Imputation by Chained Equation


Assumptions 
1. MCAR(Missing Completely at Random)
2. MAR(Missing at Random)
3. MNAR(Missing not at Random )


MICE is mainly used at missing at random datasets.
Advantages : - Accurate 
Disadvantages : - Slow and Memory

The MICE algorithm, short for "Multiple Imputation by Chained Equations," is a method used for imputing missing values in a dataset through iterative regression modeling. Here's a brief outline of its steps:

Initialization: Start with a dataset containing missing values.

Imputation Rounds: Repeat a series of imputation steps until convergence:

Step 1 (Sequential Imputation):
For each variable with missing values, impute those missing values using regression models based on observed values of other variables.
Step 2 (Update):
Update the imputed values.
Step 3 (Repeat):
Repeat steps 1 and 2 until convergence (no significant change in imputed values).
Convergence: Stop when the imputed values stabilize or after a predetermined number of iterations.

Output: Return the dataset with imputed missing values.

This process allows for more accurate imputation by taking into account the relationships between variables while handling missing data.

<h3>IterativeImputer</h3>

The "IterativeImputer" is a class in scikit-learn that implements the MICE (Multiple Imputation by Chained Equations) algorithm for imputing missing values in datasets. It iteratively estimates the missing values using regression models, incorporating information from other features in the dataset.

In [15]:
import pandas as pd
import numpy as np
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.linear_model import LinearRegression

# Create a sample dataset with missing values
data = {
    'age': [25, 35, np.nan, 45, 30, np.nan, 40, 55],
    'income': [50000, np.nan, 60000, np.nan, 70000, 80000, np.nan, 90000],
    'education': ['Bachelor', 'Master', 'PhD', np.nan, 'Master', 'Bachelor', np.nan, 'PhD']
}

df = pd.DataFrame(data)


In [16]:
from sklearn.preprocessing import LabelEncoder

# Apply label encoding to the 'education' column
label_encoder = LabelEncoder()
encoded_education = label_encoder.fit_transform(df['education'].astype(str)) # Ensure data type is string

# Replace the 'education' column with the encoded values
df['education'] = encoded_education

# Now, use the IterativeImputer with linear regression
mice_imputer = IterativeImputer(estimator=LinearRegression())

# Fit and transform the dataset
imputed_data = mice_imputer.fit_transform(df)

# Convert the imputed array back to a DataFrame
imputed_df = pd.DataFrame(imputed_data, columns=df.columns)

print(imputed_df)


         age        income  education
0  25.000000  50000.000000        0.0
1  35.000000  68051.101334        1.0
2  29.142931  60000.000000        2.0
3  45.000000  80987.102651        3.0
4  30.000000  70000.000000        1.0
5  44.125405  80000.000000        0.0
6  40.000000  75094.445888        3.0
7  55.000000  90000.000000        2.0




In [19]:
# Initialize the MICE imputer
mice_imputer = IterativeImputer()

# Fit and transform the dataset
imputed_data = mice_imputer.fit_transform(imputed_df)




In [20]:
df.isna().sum()
df.isnull().sum()


age          2
income       3
education    0
dtype: int64

In [21]:
mice_imputer = IterativeImputer(estimator=LinearRegression())

# Fit and transform the dataset
imputed_data = mice_imputer.fit_transform(imputed_df)

# Convert the imputed array back to a DataFrame
imputed_df = pd.DataFrame(imputed_data, columns=df.columns)

print(imputed_df)

         age        income  education
0  25.000000  50000.000000        0.0
1  35.000000  68051.101334        1.0
2  29.142931  60000.000000        2.0
3  45.000000  80987.102651        3.0
4  30.000000  70000.000000        1.0
5  44.125405  80000.000000        0.0
6  40.000000  75094.445888        3.0
7  55.000000  90000.000000        2.0


In [23]:
imputed_df.isna().sum()
imputed_df.isnull().sum()

age          0
income       0
education    0
dtype: int64