# Simple Imputer


In [1]:
import pandas as pd
from sklearn.impute import SimpleImputer

In [2]:
dataset = pd.read_csv("../../Datasets/Data.csv")

In [3]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Country    10 non-null     object 
 1   Age        8 non-null      float64
 2   Salary     8 non-null      float64
 3   Purchased  10 non-null     object 
dtypes: float64(2), object(2)
memory usage: 452.0+ bytes


In [4]:
dataset.describe()

Unnamed: 0,Age,Salary
count,8.0,8.0
mean,40.25,62750.0
std,6.734771,12691.391908
min,30.0,48000.0
25%,36.5,53500.0
50%,39.0,59500.0
75%,45.0,70000.0
max,50.0,83000.0


In [5]:
dataset.isna().sum()

Country      0
Age          2
Salary       2
Purchased    0
dtype: int64

In [6]:
dataset.columns

Index(['Country', 'Age', 'Salary', 'Purchased'], dtype='object')

`SimpleImputer` is a class in scikit-learn that is used to impute missing values in a dataset. It replaces missing values with a specified strategy, such as the mean, median, or most frequent value of each column.

The `SimpleImputer` class takes several parameters that control the behavior of the imputation process. Here are the parameters of the `SimpleImputer` class:

- `missing_values`: The placeholder for missing values in the dataset. The default value is `np.nan`.
- `strategy`: The imputation strategy to use. The default value is `'mean'`. Other options include `'median'`, `'most_frequent'`, and `'constant'`.
- `fill_value`: The value to use for imputation if the strategy is `'constant'`. The default value is `None`.
- `copy`: Whether to create a copy of the input data before imputing missing values. The default value is `True`.

Here's an example of how to use `SimpleImputer` to impute missing values in a dataset:


In [7]:
imputer = SimpleImputer()
dataset[["Age", "Salary"]] = imputer.fit_transform(dataset[["Age", "Salary"]])
dataset

Unnamed: 0,Country,Age,Salary,Purchased
0,France,44.0,62750.0,No
1,Spain,40.25,48000.0,Yes
2,Germany,30.0,54000.0,No
3,Spain,38.0,61000.0,No
4,Germany,40.0,62750.0,Yes
5,France,35.0,58000.0,Yes
6,Spain,40.25,52000.0,No
7,France,48.0,79000.0,Yes
8,Germany,50.0,83000.0,No
9,France,37.0,67000.0,Yes


In this example, we are creating a `SimpleImputer` object called `imputer` with the mean strategy. We then use the `fit_transform` method of the `imputer` object to impute missing values in the dataset.
