# **Feature scaling**
Feature scaling is a technique used in machine learning to standardize the range of independent variables or features of data. It is essential for algorithms that compute distances between data points

# Why Feature Scaling is Important:
**Improves Model Performance:**<br>` Many algorithms perform better or converge faster when features are on a similar scale.`
<br>**Prevents Dominance:**<br> `Without scaling, features with larger ranges can dominate the distance calculations, leading to biased results.`
<br>**Facilitates Fair Comparisons:**<br> `It allows for fair comparisons between features measured in different units.`

In [6]:
import numpy as np
import pandas as pd


# Create a dummy dataset
data = {
    'Feature1': np.random.randint(0, 100, 10),
    'Feature2': np.random.rand(10) * 100,
    'Feature3': np.random.normal(50, 15, 10)
}

# Convert to a DataFrame
df = pd.DataFrame(data)

print("Original Data:")
print(df)

Original Data:
   Feature1   Feature2   Feature3
0         3  27.381574  60.458675
1        77  66.301887  63.601052
2        50  61.073219  57.498848
3        21   0.538940  22.183264
4        17  58.523628  21.815965
5        79  28.268845  67.968153
6        14  27.471242  55.786147
7        51   3.559545  27.193534
8        87  66.467743  26.444807
9        14  10.557644  51.054420


# **Min-Max Scaling:**

Formula: <br>
𝑥
′=
(𝑥
−
min
(
x
))/(
max
(
x
)
−
min
(
x
))

​
 


In [None]:
# Initialize the MinMaxScaler
from sklearn.preprocessing import MinMaxScaler

df['Feature1']= MinMaxScaler().fit_transform(df[['Feature1']])

print(df)

   Feature1   Feature2   Feature3
0  0.000000  27.381574  60.458675
1  0.880952  66.301887  63.601052
2  0.559524  61.073219  57.498848
3  0.214286   0.538940  22.183264
4  0.166667  58.523628  21.815965
5  0.904762  28.268845  67.968153
6  0.130952  27.471242  55.786147
7  0.571429   3.559545  27.193534
8  1.000000  66.467743  26.444807
9  0.130952  10.557644  51.054420


# **Standard scaling <br>(also known as z-score normalization):**

**𝑧 = (𝑥 - 𝜇)/𝜎**
 
Where:<br>
z is the standardized value (z-score).<br>
x is the original value.<br>
μ is the mean of the dataset.<br>
σ is the standard deviation of the dataset.

In [10]:

from sklearn.preprocessing import StandardScaler

df['Feature2'] = StandardScaler().fit_transform(df[['Feature2']])
print(df)

   Feature1  Feature2   Feature3
0  0.000000 -0.308119  60.458675
1  0.880952  1.262996  63.601052
2  0.559524  1.051928  57.498848
3  0.214286 -1.391689  22.183264
4  0.166667  0.949007  21.815965
5  0.904762 -0.272302  67.968153
6  0.130952 -0.304499  55.786147
7  0.571429 -1.269754  27.193534
8  1.000000  1.269691  26.444807
9  0.130952 -0.987259  51.054420


# **Formula for robust scaling**<br>

x' = (x - median(x)) / (quantile75 - quantile25)

In [None]:

from sklearn.preprocessing import RobustScaler

df['Feature3'] = RobustScaler().fit_transform(df[['Feature3']])
print(df)

   Feature1  Feature2  Feature3
0  0.000000 -0.308119  0.212726
1  0.880952  1.262996  0.307699
2  0.559524  1.051928  0.123269
3  0.214286 -1.391689 -0.944095
4  0.166667  0.949007 -0.955196
5  0.904762 -0.272302  0.439689
6  0.130952 -0.304499  0.071505
7  0.571429 -1.269754 -0.792667
8  1.000000  1.269691 -0.815296
9  0.130952 -0.987259 -0.071505
