# Feature scaling

Feature scaling is a specific type of data transformation that focuses on adjusting the scale or range of the individual features (variables) within your dataset. The goal of feature scaling is to ensure that all features have similar scales, which can help improve the performance of many machine learning algorithms.

Some common methods for feature scaling:

1.Min-Max Scaling (Normalization)

Scale features to a specific range, usually between 0 and 1.
Formula: **X_normalized = (X - X_min) / (X_max - X_min)**

Use MinMaxScaler from scikit-learn or implement it manually.

2.Standardization

Scale features to have a mean of 0 and a standard deviation of 1.
Formula: **X_standardized = (X - X_mean) / X_std_dev**
Use StandardScaler from scikit-learn or implement it manually.

# Min-Max Scaling (Normalization)

In [1]:
import pandas as pd

data = {
    'Feature1' : [10,20,30,40,50],
    'Feature2' : [1,2,3,4,5],
    'Feature3' : [1.5,2.3,3.8,4.9,5.1]
}

df = pd.DataFrame(data)

In [2]:
df

Unnamed: 0,Feature1,Feature2,Feature3
0,10,1,1.5
1,20,2,2.3
2,30,3,3.8
3,40,4,4.9
4,50,5,5.1


In [3]:
from sklearn.preprocessing import MinMaxScaler

# Initialize the scaler
scaler = MinMaxScaler()

In [4]:
# Scaling calculation
scaler.fit_transform(df)

array([[0.        , 0.        , 0.        ],
       [0.25      , 0.25      , 0.22222222],
       [0.5       , 0.5       , 0.63888889],
       [0.75      , 0.75      , 0.94444444],
       [1.        , 1.        , 1.        ]])

In [5]:
scaled_array = scaler.fit_transform(df)

In [6]:
# Convert Array to DF
scaled_df = pd.DataFrame(scaled_array, columns=df.columns)

In [7]:
scaled_df

Unnamed: 0,Feature1,Feature2,Feature3
0,0.0,0.0,0.0
1,0.25,0.25,0.222222
2,0.5,0.5,0.638889
3,0.75,0.75,0.944444
4,1.0,1.0,1.0


# Standardization

In [8]:
import pandas as pd

data = {
    'Feature1' : [10,20,30,40,50],
    'Feature2' : [1,2,3,4,5],
    'Feature3' : [1.5,2.3,3.8,4.9,5.1]
}

df = pd.DataFrame(data)

In [9]:
df

Unnamed: 0,Feature1,Feature2,Feature3
0,10,1,1.5
1,20,2,2.3
2,30,3,3.8
3,40,4,4.9
4,50,5,5.1


In [10]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

In [11]:
standardized_df = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

In [12]:
standardized_df

Unnamed: 0,Feature1,Feature2,Feature3
0,-1.414214,-1.414214,-1.42494
1,-0.707107,-0.707107,-0.860607
2,0.0,0.0,0.197516
3,0.707107,0.707107,0.973474
4,1.414214,1.414214,1.114557
