# Data Preprocessing

## P3: Feature Scaling - Standardization & Normalization


Many machine learning models are sensitive to feature scales, especially those based on distance or gradient-based optimization.

### Techniques Covered:
- **Standardization**: Centers data around mean 0 with standard deviation 1.
- **Normalization**: Scales all data to range [0, 1] (Min-Max Scaling).


### Step 1: Import libraries and create sample data

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# Sample dataset
data = {'Height': [150, 160, 165, 170, 180],
        'Weight': [50, 55, 60, 65, 70]}
df = pd.DataFrame(data)
df

### Step 2: Standardization using `StandardScaler`

In [None]:
scaler_std = StandardScaler()
scaled_std = scaler_std.fit_transform(df)

df_std = pd.DataFrame(scaled_std, columns=['Height_std', 'Weight_std'])
df_std

### Step 3: Normalization using `MinMaxScaler`

In [None]:
scaler_minmax = MinMaxScaler()
scaled_minmax = scaler_minmax.fit_transform(df)

df_minmax = pd.DataFrame(scaled_minmax, columns=['Height_norm', 'Weight_norm'])
df_minmax

### Step 4: Comparison of Original and Scaled Data

In [None]:
pd.concat([df, df_std, df_minmax], axis=1)

### Conclusion:
- Use **Standardization** for algorithms like Logistic Regression, SVM, KNN, etc.
- Use **Normalization** when features have different ranges and you want bounded data.


1. Data Preprocessing


1.1. P3: Feature Scaling - Standardization & Normalization

Many machine learning models are sensitive to feature scales, especially those based on distance or gradient-based optimization.

1.1.1. Techniques Covered:

-Standardization: Centers data around mean 0 with standard deviation 

-Normalization: Scales all data to range [0, 1] (Min-Max Scaling).


1.1.2. Step 1: Import libraries and create sample data¶

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# Sample dataset
data = {'Height': [150, 160, 165, 170, 180],
        'Weight': [50, 55, 60, 65, 70]}
df = pd.DataFrame(data)
df

Unnamed: 0,Height,Weight
0,150,50
1,160,55
2,165,60
3,170,65
4,180,70


1.1.3. Step 2: Standardization using StandardScaler

In [2]:
scaler_std = StandardScaler()
scaled_std = scaler_std.fit_transform(df)

df_std = pd.DataFrame(scaled_std, columns=['Height_std', 'Weight_std'])
df_std

Unnamed: 0,Height_std,Weight_std
0,-1.5,-1.414214
1,-0.5,-0.707107
2,0.0,0.0
3,0.5,0.707107
4,1.5,1.414214


1.1.4. Step 3: Normalization using MinMaxScaler¶