# CHAPTER 4: FEATURE ENGINEERING AND SELECTION
## Feature Scaling
In this notebook, I will explore the critical aspect of feature scaling in machine learning (ML). Feature scaling is a fundamental preprocessing step aimed at standardizing or normalizing the range of independent variables or features in the dataset. This process ensures that each feature contributes equally to the model training process, preventing certain features from dominating due to their larger scales.

Throughout this notebook, I will delve into various techniques and methodologies for feature scaling using Python. Leveraging libraries such as scikit-learn and pandas, I will demonstrate how to preprocess data by scaling features, handling outliers, and transforming variables to meet the assumptions of ML algorithms effectively.

#### *Jose Ruben Garcia Garcia*
#### *February 2024*
*Reference: Practical Machine Learning Python Problems Solver*

## Feature scaling

### Loading and Viz data

In [4]:
#Importing proper libraries
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
import numpy as np
import pandas as pd
np.set_printoptions(suppress=True)

In [5]:
views = pd.DataFrame([1295.,25.,19000., 5., 1., 300.], columns=['views'])
views

Unnamed: 0,views
0,1295.0
1,25.0
2,19000.0
3,5.0
4,1.0
5,300.0


### Standarized Scaling
#### Z - SCORE Scaling
This method tries to standarize each value in a feature column by removing the mand and sacling the variance to be 1 from the values

In [6]:
ss = StandardScaler()
views['zscore'] = ss.fit_transform(views[['views']])
views

Unnamed: 0,views,zscore
0,1295.0,-0.307214
1,25.0,-0.489306
2,19000.0,2.231317
3,5.0,-0.492173
4,1.0,-0.492747
5,300.0,-0.449877


We can see that the values in the zcore column are standarized

In [8]:
### Calculating the Zscore mathematically for the firs row

vw = np.array(views['views'])
(vw[0] - np.mean(vw)) / np.std(vw) 

-0.30721413311687235

### Min-Max Scaling
Whith this feature we can transform and scale our feature values such that each value is within the range 0 to 1 

In [10]:
# Method with the MinMaxScaler function
mms = MinMaxScaler()
views['minmax'] = mms.fit_transform(views[['views']])
views

Unnamed: 0,views,zscore,minmax
0,1295.0,-0.307214,0.068109
1,25.0,-0.489306,0.001263
2,19000.0,2.231317,1.0
3,5.0,-0.492173,0.000211
4,1.0,-0.492747,0.0
5,300.0,-0.449877,0.015738


In [12]:
### Math Method
(vw[0] - np.min(vw)) / (np.max(vw) - np.min(vw))

0.06810884783409653

### Robust Scaling
The disadvantage of the min-max scaler is that often the presence of outlier affects the scaled values for any feature. Robust scaling tries to use specific stats measures to scale feature without being affected by outliers 

In [13]:
rs = RobustScaler()
views['robust'] = rs.fit_transform(views[['views']])
views

Unnamed: 0,views,zscore,minmax,robust
0,1295.0,-0.307214,0.068109,1.092883
1,25.0,-0.489306,0.001263,-0.13269
2,19000.0,2.231317,1.0,18.178528
3,5.0,-0.492173,0.000211,-0.15199
4,1.0,-0.492747,0.0,-0.15585
5,300.0,-0.449877,0.015738,0.13269


In [14]:
quartiles = np.percentile(vw, (25., 75.))
iqr = quartiles[1] - quartiles[0]
(vw[0] - np.median(vw)) / iqr

1.0928829915560916