<center>
    <h1 id='scaling-standardization-and-normalization' style='color:#7159c1'>🔨 Scaling, Standardization and Normalization 🔨</h1>
    <i>Transforming Numerical Variables</i>
</center>

```
- Scaling
- Standardization
- Normalization
- Conclusions
```

---

<h1 id='0-scaling' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>0 | Scaling</h1>

It's used to change the RANGE of the datas. The RANGE goes from 0 to 1.

About the models, you'll need to scale the datas when you're using methods based on measures of how far apart data points are, like the models:
	
```
/ Gradient Descent Optimization
/ Support Vector Machines (SVM)
/ K-Nearest Neighbors (KNN) 
```

In [10]:
# ---- Reading Dataset ----
import pandas as pd # pip install pd
from sklearn.model_selection import train_test_split # pip install sklearn
from sklearn.preprocessing import MinMaxScaler

# ---- Scaling ----
houses_df = pd.read_csv('./datasets/melb_data.csv')
houses_df = houses_df.select_dtypes(exclude=['object'])
x_train_df, x_valid_df, y_train_df, y_valid_df = train_test_split(
    houses_df.loc[:, 'Price':]
    , houses_df.loc[:, 'Rooms']
    , train_size=0.70
    , test_size=0.30
)

min_max_scaler = MinMaxScaler()
scaled_x_train_df = min_max_scaler.fit_transform(x_train_df)
scaled_x_valid_df = min_max_scaler.transform(x_valid_df)

scaled_x_train_df

array([[0.15729627, 0.08316008, 0.05834186, ..., 0.52926281, 0.49078253,
        0.24690435],
       [0.05255843, 0.3014553 , 0.04298874, ..., 0.59203897, 0.36868869,
        0.04065231],
       [0.15552748, 0.28690229, 0.16888434, ..., 0.33322136, 0.57651802,
        0.50091117],
       ...,
       [0.10511687, 0.09147609, 0.03172979, ..., 0.5112917 , 0.4492287 ,
        0.15625438],
       [0.09854706, 0.23284823, 0.12998976, ..., 0.45870908, 0.61933562,
        0.24335311],
       [0.36955148, 0.06860707, 0.21084954, ..., 0.43312834, 0.47079296,
        0.1416289 ]])

<h1 id='1-standardization' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>1 | Standardization</h1>

It's like the Scale, but the scale range doesn't go from 0 to 1, it varies.

About the models, you'll need to scale the datas when you're using methods based on measures of how far apart data points are, like the models:

```
/ Gradient Descent Optimization
/ Support Vector Machines (SVM)
/ K-Nearest Neighbors (KNN)
```

In [12]:
# ---- Robust Scaler ----
from sklearn.preprocessing import RobustScaler

robust_scaler = RobustScaler()
robust_scaled_x_train_df = robust_scaler.fit_transform(x_train_df)
robust_scaled_x_valid_df = robust_scaler.transform(x_valid_df)

robust_scaled_x_train_df

array([[ 0.619585  , -0.79411765, -0.25961538, ...,  0.29404171,
        -0.19422401, -0.16730164],
       [-0.58755005,  0.75      , -0.40384615, ...,  0.77656405,
        -1.12983494, -0.8984595 ],
       [ 0.59919913,  0.64705882,  0.77884615, ..., -1.21281033,
         0.46277117,  0.7331456 ],
       ...,
       [ 0.01820167, -0.73529412, -0.50961538, ...,  0.15590864,
        -0.512653  , -0.4886533 ],
       [-0.05751729,  0.26470588,  0.41346154, ..., -0.24826216,
         0.79088446, -0.17989067],
       [ 3.06589006, -0.89705882,  1.17307692, ..., -0.4448858 ,
        -0.34740503, -0.54050025]])

In [14]:
# ---- Standard Scaler ----
from sklearn.preprocessing import StandardScaler

standard_scaler = StandardScaler()
standard_scaled_x_train_df = standard_scaler.fit_transform(x_train_df)
standard_scaled_x_valid_df = standard_scaler.transform(x_valid_df)

standard_scaled_x_train_df

array([[ 0.3873995 , -1.05932158, -0.53401387, ...,  0.46163746,
        -0.19196051, -0.43866295],
       [-0.90198338,  0.74442916, -0.69943005, ...,  1.07550397,
        -1.35938169, -1.44046555],
       [ 0.36562463,  0.62417911,  0.65698262, ..., -1.45538466,
         0.62781409,  0.79509304],
       ...,
       [-0.25495892, -0.99060726, -0.82073524, ...,  0.28590411,
        -0.58928459, -0.87896586],
       [-0.33583698,  0.17753607,  0.2379283 , ..., -0.22828332,
         1.03722188, -0.45591193],
       [ 3.0003829 , -1.17957163,  1.10912018, ..., -0.47842855,
        -0.3830942 , -0.95000442]])

<h1 id='2-normalization' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>2 | Normalization</h1>

It's used to change the DISTRIBUTION of the data. In a nutshell, Normalization just changes the distribution of the datas in order to get a Normal Distribution (Gaussian Distribution or Bell Curve).

About the models, you'll need to normalize the datas when using:

```
/ Linear Discriminant Analysis (LDA)
/ Gaussian Naive Bayes
```

Tip: any method with "Gaussian" in the name probably needs that you normalize the datas.

In [16]:
# ---- Normalization ----
from sklearn.preprocessing import Normalizer

x_train_df.dropna(inplace=True)
x_valid_df.dropna(inplace=True)

normalizer = Normalizer()
normalized_x_train_df = normalizer.fit_transform(x_train_df)
normalized_x_valid_df = normalizer.transform(x_valid_df)

normalized_x_train_df

array([[ 9.99987672e-01,  3.00748172e-06,  2.29846790e-03, ...,
        -2.84003190e-05,  1.09002438e-04,  4.16009909e-03],
       [ 9.99970967e-01,  2.89412755e-05,  6.07168001e-03, ...,
        -7.52958179e-05,  2.89119949e-04,  2.23346809e-03],
       [ 9.99990810e-01,  7.88855859e-06,  3.49880775e-03, ...,
        -4.39141208e-05,  1.68075184e-04,  9.30385881e-04],
       ...,
       [ 9.99984701e-01,  4.79818177e-06,  3.30529294e-03, ...,
        -4.12063489e-05,  1.58049490e-04,  3.91815161e-03],
       [ 9.99970921e-01,  1.29476004e-05,  3.61492378e-03, ...,
        -4.37299422e-05,  1.67744486e-04,  6.30848707e-03],
       [ 9.99998638e-01,  1.09634402e-06,  1.06511483e-03, ...,
        -1.25738367e-05,  4.81577750e-05,  1.08969951e-03]])

<h1 id='3-conclusions' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>3 | Conclusions</h1>

> **Explanation Scale/Standardization**

It's like to scale Real to Dollar, where 1 dollar is equals 5 reals nowadays. So, if we don't use the Scale, the model will consider 1 dollar equals to 1 real, and that's not true.

Another example is the height and weight, where we gotta scale the datas, like where 1 inch is equals 2.54 cm, and 1 pound is equals 0.45 kg.

---

> **Another Explanation Just to Get the Feeling**

Scale, Standardization and Normalization avoid the model considers some features more important than others by the scale, like consider the salary (from 40,000 to 210,000) more important than the age (from 18 to 100).

---

<h1 id='reach-me' style='color:#7159c1; border-bottom:3px solid #7159c1; letter-spacing:2px; font-family:JetBrains Mono; font-weight: bold; text-align:left; font-size:240%;padding:0'>📫 | Reach Me</h1>

> **Email** - [csfelix08@gmail.com](mailto:csfelix08@gmail.com?)

> **Linkedin** - [linkedin.com/in/csfelix/](https://www.linkedin.com/in/csfelix/)

> **GitHub:** - [CSFelix](https://github.com/CSFelix)

> **Kaggle** - [DSFelix](https://www.kaggle.com/dsfelix)

> **Portfolio** - [CSFelix.io](https://csfelix.github.io/).