# <p style='text-align: center;'> Feature Scaling </p>

## Feature Scaling :
- Feature Scaling refers to the methods or techniques used to normalize the range of independent variables in our data, or in 
  other words, the methods to set the feature value range within a similar scale.
  
  
- Variables with bigger magnitude / larger value range dominate over those with smaller magnitude / value range.


- Scale of the features is an important consideration when building machine learning models.


- Feature scaling is generally the last step in the data preprocessing pipeline, performed just before training the machine 
  learning algorithms.
  
  
- Preserves the shape of the original distribution.


- The minimum and maximum values of the different variables may vary.


- Preserves Outliers.


- Feature scaling importance in some machine learning algorithms.


## Various Feature Scaling Techniques :
- Standardization.
- Normalization.

<b> Here we are going to create our own data-set with one column as "x" variable, dataframe name as "df".

In [1]:
# Importing the necessary libraries.
import pandas as pd

# Create DataFrame.
df = pd.DataFrame({"x":[1,2,3,4,5]})

# print the DataFrame.
df

Unnamed: 0,x
0,1
1,2
2,3
3,4
4,5


### Standard Scaling / Standardization :
- Standardization invilves centering the variable mean at zero (0), and standardization the variance to one (1).


- Standardize features by removing the mean and scaling to unit variance.


- The standard score of a sample x is calculated as :

        z = (x - u) / s
        
        
- where,
        u is the mean of the training samples or zero if with_mean=False.
        s is the standard deviation of the training samples or one if with_std=False.
        
        
<b> Standardization :
- Centers the mean at zero (0).
- Scales the variance at one (1).
    

<b> Standardization using standard scaler in sklearn.

In [2]:
# WAPP to demonstrate the Standardization on "x" variable by StandardScaler.

# import StandardScaler from sklearn.preprocessing.
from sklearn.preprocessing import StandardScaler

# Create an instance of StandardScaler() and store it in "sc" object.
sc = StandardScaler()

# Apply fit_transform() method and the same is stored in new column called “size_le_enc”.
df["X-sc_sk"] = sc.fit_transform(df[["x"]])

# Print the DataFrame.
df


Unnamed: 0,x,X-sc_sk
0,1,-1.414214
1,2,-0.707107
2,3,0.0
3,4,0.707107
4,5,1.414214


- In the above example, we have Created an instance of StandardScaler() and stored it in "sc" object, after then Applied 
  fit_transform does the mean and variance of every feature reported in our data are calculated using the fit approach. The 
  transform method transforms all features using the corresponding means and variances. And the same is stored in new column 
  called "X-sc_sk" and printed the result.
  
  
- Here we can see that the centering the variable mean is "0", the standard deviation of 1 indicates the value is 1 standard 
  deviations away from the mean is 0.707 and the standard deviation of 2 indicates the value is 2 standard deviations away from 
  the mean is 1.414.

<b> Standardization using pandas.

In [3]:
# WAPP to demonstrate the Standardization on "x" variable by pandas.

# import pandas as pd
import pandas as pd

# calculate the standardization and the same is stored in new column called "X_sc_p".
df["X_sc_p"] = (df["x"] - df["x"].mean())/df["x"].std(ddof=0)

# print the result.
df

Unnamed: 0,x,X-sc_sk,X_sc_p
0,1,-1.414214,-1.414214
1,2,-0.707107,-0.707107
2,3,0.0,0.0
3,4,0.707107,0.707107
4,5,1.414214,1.414214


- In the above example, we have calculated Standardization by pandas.


- We can see here the result of Standardization from the sklearn (StandardScaler) and from the pandas both are same.


- Above example calculated Standardization by using pandas as follows :

            df[“x”] – df[“x”].mean()
            ------------------------
                df[“x”].std()
                
                
- Where, 
        df["x"] : each data-point's from the "x" variable.
        df[“x”].mean() : mean of the "x" variable.
        df[“x”].std()  : Standard Deviation of the "x" variable.
        ddof=0 can be set to normalize by N instead of N-1 for calculation of Standard Deviation.
        


- Here we can see that the centering the variable mean is "0", the standard deviation of 1 indicates the value is 1 standard 
  deviations away from the mean is 0.707 and the standard deviation of 2 indicates the value is 2 standard deviations away from 
  the mean is 1.414.

### Min Max Scaling :
Min-max scaling is similar to z-score normalization in that it will replace every value in a column with a new value using a 
  formula In this case, that formula is :
  
                     (x -xmin) 
         x_scaled = ------------
                    (xmax -xmin)
             
             
Where:

   - x_scaled is our new value
   - x is the original cell value
   - xmin is the minimum value of the column
   - xmax is the maximum value of the column
      
      
Using this formula, we will see that the values of each column will now be between zero and one.

<b> Here we are going to create our own data-set with one column as "x" variable, dataframe name as "df1".

In [6]:
# Importing the necessary libraries.
import pandas as pd

# Create DataFrame.
df1 = pd.DataFrame({"x":[1,2,3,4,5]})

# print the DataFrame.
df1

Unnamed: 0,x
0,1
1,2
2,3
3,4
4,5


<b> Min Max Scaling using MinMaxScaler in sklearn.

In [8]:
# WAPP to demonstrate the Min Max Scaling on "x" variable by MinMaxScaler.

# import MinMaxScaler from sklearn.preprocessing.
from sklearn.preprocessing import MinMaxScaler

# Create an instance of MinMaxScaler() and store it in "min_max" object.
min_max = MinMaxScaler()

# Apply fit_transform() method and the same is stored in new column called “X-mm_sk”.
df1["X-mm_sk"] = min_max.fit_transform(df1[["x"]])

# Print the DataFrame.
df1


Unnamed: 0,x,X-mm_sk
0,1,0.0
1,2,0.25
2,3,0.5
3,4,0.75
4,5,1.0


- In the above example, we have Created an instance of MinMaxScaler() and stored it in "min_max" object, after then Applied fit_transform does compute the minimum and maximum to be used for later scaling, then transform it. And the same is stored in new column called "X-mm_sk" and printed the result.



- Here we can see that the scaler is defined, fit on the whole dataset and then used to create a transformed version of the dataset with each column normalized independently. We can see that the largest raw value for each column now has the value 1.0 and the smallest value for each column now has the value 0.0.

<b> Min Max Scaling using pandas.

In [9]:
# WAPP to demonstrate the Min Max Scaling on "x" variable by pandas.

# import pandas as pd
import pandas as pd

# calculate the Min Max Scaling and the same is stored in new column called "X_sc_p".
df1["X_mm_p"] = (df1["x"] - df1["x"].min())/(df1["x"].max() - df1["x"].min())

# print the result.
df1

Unnamed: 0,x,X-mm_sk,X_mm_p
0,1,0.0,0.0
1,2,0.25,0.25
2,3,0.5,0.5
3,4,0.75,0.75
4,5,1.0,1.0


- In the above example, we have calculated Min Max Scaling by pandas.


- We can see here the result of Min Max Scaling from the sklearn (MinMaxScaler) and from the pandas both are same.


- Above example calculated Min Max Scaling by using pandas as follows :
    
    
                      (df1["x"] - df1["x"].min())
    df1["X_mm_p"] = ---------------------------------
                    (df1["x"].max() - df1["x"].min())
        
        
- Where, 
        df1["x"] : each data-point's from the "x" variable.
        df1["x"].min() : minimum value of the "x" variable.
        df1["x"].max()  : maximum value of the "x" variable.
        
        
- Here we can see that the largest raw value for each column now has the value 1.0 and the smallest value for each column now 
  has the value 0.0.
  