## **Feature Scaling**

### **1- Min-Max Scaler**

Min-Max Scaling (Normalization): This method scales the features to a fixed range, usually between 0 and 1.

The formula for min-max scaling is:
X_scaled = (X - X_min) / (X_max - X_min)

where X is the original feature value, X_min is the minimum value of the feature, and X_max is the maximum value of the feature.

In [None]:
8,9,3,4,2
8 ---> (8-2)/ (9-2)
6/7 = 0.85 -->(0,1)

In [None]:
from sklearn.preprocessing import MinMaxScaler

In [None]:
data = np.array([[10,1000], [5,500], [3,300], [8,800]])
data

array([[  10, 1000],
       [   5,  500],
       [   3,  300],
       [   8,  800]])

In [None]:
#min max scaler
min_max_sacler = MinMaxScaler()
#fit & transform MinMaxScaler on the data
data_Scaled = min_max_sacler.fit_transform(data)

In [None]:
data_Scaled

array([[1.        , 1.        ],
       [0.28571429, 0.28571429],
       [0.        , 0.        ],
       [0.71428571, 0.71428571]])

### **2- Standardization**

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
#call StandardScaler
scaler = StandardScaler()

#fit & transform StandardScaler
scaled_data = scaler.fit_transform(data)

In [None]:
scaled_data

array([[ 1.29986737,  1.29986737],
       [-0.55708601, -0.55708601],
       [-1.29986737, -1.29986737],
       [ 0.55708601,  0.55708601]])

In [None]:
# inverse transform to go to the original data
scaled_data = scaler.inverse_transform(scaled_data)

In [None]:
scaled_data

array([[  10., 1000.],
       [   5.,  500.],
       [   3.,  300.],
       [   8.,  800.]])

## **Example**

In [None]:
1- read the dataset
2- info
3- describe
4- unique

minmax scaler
standard scaler

In [None]:
import pandas as pd
import numpy as np
data = pd.read_csv("C:/Users/Ram/Desktop/NTI ML track/Day 1 (data prep)/SampleFile.csv")
data.head()

Unnamed: 0,LotArea,MSSubClass
0,8450,60
1,9600,20
2,11250,60
3,9550,70
4,14260,60


In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   LotArea     1460 non-null   int64
 1   MSSubClass  1460 non-null   int64
dtypes: int64(2)
memory usage: 22.9 KB


In [None]:
data.shape

(1460, 2)

In [None]:
data['LotArea'].unique()

array([ 8450,  9600, 11250, ..., 17217, 13175,  9717], dtype=int64)

In [None]:
data['MSSubClass'].unique()

array([ 60,  20,  70,  50, 190,  45,  90, 120,  30,  85,  80, 160,  75,
       180,  40], dtype=int64)

In [None]:
min_lotarea = np.min(data["LotArea"])
min_lotarea

1300

In [None]:
max_lotarea = np.max(data["LotArea"])
max_lotarea

215245

In [None]:
# 1- min max scaler
min_max_scaler = MinMaxScaler()

min_max_scaler_data = min_max_scaler.fit_transform(data)

In [None]:
min_max_scaler_data

array([[0.0334198 , 0.23529412],
       [0.03879502, 0.        ],
       [0.04650728, 0.23529412],
       ...,
       [0.03618687, 0.29411765],
       [0.03934189, 0.        ],
       [0.04037019, 0.        ]])

In [None]:
min_max_scaler_data = pd.DataFrame(min_max_scaler_data, columns = data.columns)
min_max_scaler_data

Unnamed: 0,LotArea,MSSubClass
0,0.033420,0.235294
1,0.038795,0.000000
2,0.046507,0.235294
3,0.038561,0.294118
4,0.060576,0.235294
...,...,...
1455,0.030929,0.235294
1456,0.055505,0.000000
1457,0.036187,0.294118
1458,0.039342,0.000000


In [None]:
max_value_minmax = np.max(min_max_scaler_data)
max_value_minmax

  return reduction(axis=axis, out=out, **passkwargs)


LotArea       1.0
MSSubClass    1.0
dtype: float64

In [None]:
min_value_minmax = np.min(min_max_scaler_data)
min_value_minmax

  return reduction(axis=axis, out=out, **passkwargs)


LotArea       0.0
MSSubClass    0.0
dtype: float64

In [None]:
# 2- Standard scaler
std_scaler = StandardScaler()
std_scaler_data = std_scaler.fit_transform(data)
std_scaler_data

array([[-0.20714171,  0.07337496],
       [-0.09188637, -0.87256276],
       [ 0.07347998,  0.07337496],
       ...,
       [-0.14781027,  0.30985939],
       [-0.08016039, -0.87256276],
       [-0.05811155, -0.87256276]])

In [None]:
std_scaler_data = pd.DataFrame(std_scaler_data, columns = data.columns)
std_scaler_data

Unnamed: 0,LotArea,MSSubClass
0,-0.207142,0.073375
1,-0.091886,-0.872563
2,0.073480,0.073375
3,-0.096897,0.309859
4,0.375148,0.073375
...,...,...
1455,-0.260560,0.073375
1456,0.266407,-0.872563
1457,-0.147810,0.309859
1458,-0.080160,-0.872563


In [None]:
print(np.max(std_scaler_data))

LotArea       20.518273
MSSubClass     3.147673
dtype: float64


  return reduction(axis=axis, out=out, **passkwargs)


In [None]:
print(np.min(std_scaler_data))

LotArea      -0.923729
MSSubClass   -0.872563
dtype: float64


  return reduction(axis=axis, out=out, **passkwargs)


In [None]:
# 3- max Abs scaler
max_abs_scaler = MaxAbsScaler()
max_abs_scaler_data = max_abs_scaler.fit_transform(data)
max_abs_scaler_data = pd.DataFrame(max_abs_scaler_data, columns = data.columns)
max_abs_scaler_data.head()

Unnamed: 0,LotArea,MSSubClass
0,0.039258,0.315789
1,0.0446,0.105263
2,0.052266,0.315789
3,0.044368,0.368421
4,0.06625,0.315789


In [None]:
print(np.max(max_abs_scaler_data))

LotArea       1.0
MSSubClass    1.0
dtype: float64


  return reduction(axis=axis, out=out, **passkwargs)


In [None]:
print(np.min(max_abs_scaler_data))

LotArea       0.006040
MSSubClass    0.105263
dtype: float64


  return reduction(axis=axis, out=out, **passkwargs)
