# **Random Imputation**

Random imputation involves replacing missing values with a **randomly selected value** from the same variable. This method preserves the variable's distribution but can introduce noise and bias if not used carefully.

- **Advantages**:
    - Simple to implement
    - Preserves the variable's distribution
    - Quick to compute
    - Well suited for linear models, as it **doesn't distort the distribution**, regardless of the % of NA
- **Disadvantages**:
    - Can introduce noise and bias
    - Does not account for relationships between variables
    - May not be appropriate for all variables
    - Affect the covariance with other features
    - **Can use it only by pandas, not available in Sklearn**
    - Memory heavy for deployment, as we need to store the original trainig set to extract values from and replace the NA in coming observations




In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer

In [5]:
df = pd.read_csv('train.csv')[['Age','Fare','Survived']]
df.head()

Unnamed: 0,Age,Fare,Survived
0,22.0,7.25,0
1,38.0,71.2833,1
2,26.0,7.925,1
3,35.0,53.1,1
4,35.0,8.05,0


In [None]:
df.isnull().mean()*100

Age         19.86532
Fare         0.00000
Survived     0.00000
dtype: float64

In [12]:
x = df.drop('Survived', axis=1)
y = df['Survived']  

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)




