#### Practising Feature transformation and scaling with an example to understand the topic


In [102]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


df = pd.DataFrame({
    'Income': [250000, 18000, 12000, 1000000],
    'Age': [25, 30, 51, 42],
    'Department': ['HR','Insurance','Marketing','Finance'],
    'Balance' : [100.0, -263.0, 2000.0, -5.0]
})

In [103]:
df

Unnamed: 0,Income,Age,Department,Balance
0,250000,25,HR,100.0
1,18000,30,Insurance,-263.0
2,12000,51,Marketing,2000.0
3,1000000,42,Finance,-5.0


### Step 1:
 

Why do we need Feature Transformation and Scaling?
Oftentimes, we have datasets in which different columns have different units – like one column can be in kilograms, while another column can be in centimeters. Furthermore, we can have columns like income which can range from 20,000 to 100,000, and even more; while an age column which can range from 0 to 100(at the most). Thus, Income is about 1,000 times larger than age.


Before directly applying any feature transformation or scaling technique, we need to remember the categorical column: because we cannot scale non-numeric values.


In [104]:
df_scaled = df.copy()
col_names = ['Income', 'Age']
features = df_scaled[col_names]

### MinMax Scaler

- Scales the data between 0 and 1 which is the default range. We can also define our min and max range.

- MinMax Scaler does does the scaling for every feature separately. 

- This transformation is often used as an alternative to zero mean, unit variance scaling.

-  MinMaxScaler doesn’t reduce the effect of outliers, but it linearily scales them down into a fixed range, where the largest occuring data point corresponds to the maximum value and the smallest one corresponds to the minimum value

#### Transformation formula 

X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))

X_scaled = X_std * (max - min) + min

where  min, max = feature_range.

In [105]:
# import the library

from sklearn.preprocessing import MinMaxScaler

In [106]:
scaler = MinMaxScaler()
df_scaled[col_names] = scaler.fit_transform(features.values)

In [107]:
df_scaled  #maximum values gets replaced by 1 and min by 0

Unnamed: 0,Income,Age,Department,Balance
0,0.240891,0.0,HR,100.0
1,0.006073,0.192308,Insurance,-263.0
2,0.0,1.0,Marketing,2000.0
3,1.0,0.653846,Finance,-5.0


In [108]:
# Mentioning the range 
 
scaler = MinMaxScaler(feature_range=(5, 10))

df_scaled[col_names] = scaler.fit_transform(features.values)
df_scaled

Unnamed: 0,Income,Age,Department,Balance
0,6.204453,5.0,HR,100.0
1,5.030364,5.961538,Insurance,-263.0
2,5.0,10.0,Marketing,2000.0
3,10.0,8.269231,Finance,-5.0


### Standard Scaler

- StandardScaler is sensitive to outliers, and the features may scale differently from each other in the presence of outliers.

- Standardize features by removing the mean and scaling to unit variance.

    The standard score of a sample x is calculated as:

           z = (x - u) / s

where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.

In [109]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

In [110]:
df_scaled[col_names] = scaler.fit_transform(features.values)
df_scaled

Unnamed: 0,Income,Age,Department,Balance
0,-0.1732,-1.179536,HR,100.0
1,-0.747236,-0.688062,Insurance,-263.0
2,-0.762082,1.376125,Marketing,2000.0
3,1.682519,0.491473,Finance,-5.0


In [111]:
df_scaled.describe() # the values are not exactly, but very close to 0(same with standard deviation). 
#This occurs due to the numerical precision of floating-point numbers in Python.

Unnamed: 0,Income,Age,Balance
count,4.0,4.0,4.0
mean,0.0,5.5511150000000004e-17,458.0
std,1.154701,1.154701,1039.252616
min,-0.762082,-1.179536,-263.0
25%,-0.750948,-0.8109308,-69.5
50%,-0.460218,-0.09829464,47.5
75%,0.290729,0.7126361,575.0
max,1.682519,1.376125,2000.0


### MaxAbsScaler

- Scale each feature by its maximum absolute value.

- This estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. 

- It does not shift/center the data, and thus does not destroy any sparsity.

- This scaler can also be applied to sparse CSR or CSC matrices.

- MaxAbsScaler doesn’t reduce the effect of outliers; it only linearily scales them down.

In [112]:
from sklearn.preprocessing import MaxAbsScaler
scaler = MaxAbsScaler()



In [113]:
df_scaled[col_names] = scaler.fit_transform(features.values)


In [114]:
df_scaled

Unnamed: 0,Income,Age,Department,Balance
0,0.25,0.490196,HR,100.0
1,0.018,0.588235,Insurance,-263.0
2,0.012,1.0,Marketing,2000.0
3,1.0,0.823529,Finance,-5.0


In [93]:
df["Income"].max(), df["Age"].max(), df['Balance'].max()
#MaxAbs Scaler works as expected by printing the maximum values of each column before we scaled

(1000000, 51, 2000.0)

1. each value in the Income column is divided by 1000000
2. each value in the Age column is divided by 51
3. each value in the Balance column is divided by 2000

### RobustScaler

- Scale features using statistics that are robust to outliers.

-  Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).

- Forumula

        x_scaled = (x-Q1)/(Q3-Q1)

In [115]:
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()

In [116]:
df_scaled[col_names] = scaler.fit_transform(features.values)
df_scaled

Unnamed: 0,Income,Age,Department,Balance
0,0.275534,-0.709677,HR,100.0
1,-0.275534,-0.387097,Insurance,-263.0
2,-0.289786,0.967742,Marketing,2000.0
3,2.057007,0.387097,Finance,-5.0


### Quantile Transformer Scaler

- Transform features using quantiles information.

- This method transforms the features to follow a uniform or a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values. 

- It also reduces the impact of (marginal) outliers: this is therefore a robust preprocessing scheme.

- The transformation is applied on each feature independently. First an estimate of the cumulative distribution function of a feature is used to map the original values to a uniform distribution. 

- The obtained values are then mapped to the desired output distribution using the associated quantile function. Features values of new/unseen data that fall below or above the fitted range will be mapped to the bounds of the output distribution. Note that this transform is non-linear. 

- It may distort linear correlations between variables measured at the same scale but renders variables measured at different scales more directly comparable.

- Better to use for non-linear data

In [117]:
from sklearn.preprocessing import QuantileTransformer
scaler = QuantileTransformer()

df_scaled[col_names] = scaler.fit_transform(features.values)
df_scaled



Unnamed: 0,Income,Age,Department,Balance
0,0.666667,0.0,HR,100.0
1,0.333333,0.333333,Insurance,-263.0
2,0.0,1.0,Marketing,2000.0
3,1.0,0.666667,Finance,-5.0


- The effects of both the RobustScaler and the QuantileTransformer can be seen on a larger dataset instead of one with 4 row.

### Log Transform

- Used to convert a skewed distrbution to a normal distribution / less-skewed distribution 

- Takes the log values of the column and use these values as a separate column

- The log is applied to every single distribution of the data and the result from the log is considered the final day to feed the machine learning algorithms.

- that log transforms performs so well on the right-skewed data. It transforms the right-skewed data into normally distributed data so well.

In [118]:
df['log_income'] = np.log(df['Income'])

In [119]:
df #new column is created which stores the log values of Income column

Unnamed: 0,Income,Age,Department,Balance,log_income
0,250000,25,HR,100.0,12.429216
1,18000,30,Insurance,-263.0,9.798127
2,12000,51,Marketing,2000.0,9.392662
3,1000000,42,Finance,-5.0,13.815511


### Power Transformer Scaler

- Changes the distribution of the variable hence making it more Gaussian (Normal)

- Power is applied to the data observations for transforming the data.

- There are two types of Power Transformation techniques:

        a. Box-Cox Transform
        b. Yeo-Johnson Transform
        
- Power transformer automates the decision of making a choice over the distribution by introducing a parameter called Lambda.        

##### Box - Cox transform 

- This transform technique is mainly used for transforming the data observations by applying power to them. 

- The power of the data observations is denoted by Lambda(λ). 

- There are mainly two conditions associated with the power in this transform, which is lambda equals zero and not equal to zero. 

In [101]:
from sklearn.preprocessing import PowerTransformer
scaler = PowerTransformer(method = 'box-cox')

In [120]:
df_scaled[col_names] = scaler.fit_transform(features.values)
df_scaled



Unnamed: 0,Income,Age,Department,Balance
0,0.666667,0.0,HR,100.0
1,0.333333,0.333333,Insurance,-263.0
2,0.0,1.0,Marketing,2000.0
3,1.0,0.666667,Finance,-5.0


##### Yeo - Johnson Transform

- This transformation technique is also a power transform technique, where the power of the data observations is applied to transform the data.
- This is an advanced form of a box cox transformations technique where it can be applied to even zero and negative values of data observations also
- In scikit learn the default parameter is set to Yeo Johnson in the Power Transformer class.

In [121]:
from sklearn.preprocessing import PowerTransformer
scaler = PowerTransformer(method = 'yeo-johnson')


df_scaled[col_names] = scaler.fit_transform(features.values)
df_scaled


Unnamed: 0,Income,Age,Department,Balance
0,0.654936,-1.272634,HR,100.0
1,-0.835315,-0.610047,Insurance,-263.0
2,-1.105017,1.284852,Marketing,2000.0
3,1.285397,0.597829,Finance,-5.0


### Unit Vector Scaler/Normalizer

- Normalization is the process of scaling individual samples to have unit norm 

- The Normalizer works on the rows! Each row of the dataframe with at least one non-zero component is rescaled independently of other samples so that its norm (l1, l2, or inf) equals one.

##### Points to remember

- If we are using L1 norm, the values in each column are converted so that the sum of their absolute values along the row = 1
- If we are using L2 norm, the values in each column are first squared and added so that the sum of their absolute values along the row = 1

In [122]:
from sklearn.preprocessing import Normalizer
scaler = Normalizer(norm = 'l2')
# norm = 'l2' is default

df_scaled[col_names] = scaler.fit_transform(features.values)
df_scaled

Unnamed: 0,Income,Age,Department,Balance
0,1.0,0.0001,HR,100.0
1,0.999999,0.001667,Insurance,-263.0
2,0.999991,0.00425,Marketing,2000.0
3,1.0,4.2e-05,Finance,-5.0


### Custom transformer

- Based on the domain knowledge of the data, custom transformations can be applied to transform the data into a normal distribution. 

- The custom transforms here can be any function or parameter like sin, cos, tan, cube, etc.

##### Example

1. sin_tranformed_data = np.sin(df) 
2. cos_tranformed_data = np.cos(df) 
3. tan_tranformed_data = np.tan(df) 

In [127]:
# A simple example.- Applying a functional transformer

from sklearn.preprocessing import FunctionTransformer
transformer = FunctionTransformer(np.log2, validate = True)

df_scaled[col_names] = transformer.transform(features.values)
df_scaled



Unnamed: 0,Income,Age,Department,Balance
0,17.931569,4.643856,HR,100.0
1,14.135709,4.906891,Insurance,-263.0
2,13.550747,5.672425,Marketing,2000.0
3,19.931569,5.392317,Finance,-5.0


Output with log-base 2 applied on Age and Income



### Revision

#### Feature Transformation Techniques

1. Function Transformers
2. Power Transformers
3. Quantile Transformers

#### 1.Function Transformers 

1. Log Transforms - Log transform is one of the simplest transformations on the data in which the log is applied to every single distribution of the data and the result from the log is considered the final day to feed the machine learning algorithms.

              Syntax: df['log_income'] = np.log(df['Income'])

2. Square Transforms - Square transform is the type of transformer in which the square of the data is considered instead of the normal data. In simple words, in this transformed the data is applied with the square function, where the square of every single observation will be considered as the final transformed data.

                
              Syntax:  tranformed_data = np.square(data)
                
3. Square Root Transforms - This transform performs so well on the left-skewed data and efficiently transformed the left-skewed data into normally distributed data.

              Syntax:  tranformed_data = np.sqrt(data)
              
4. Reciprocal Transforms - The reciprocal of every observation is considered. 
- This transform is useful in some of the datasets as the reciprocal of the observations works well to achieve normal distributions. 

              Syntax: tranformed_data = np.reciprocal(data)   
              
5. Custom Transform               

#### 2. Power Transformers


1. Box-Cox Transform - This transform technique is mainly used for transforming the data observations by applying power to them. The power of the data observations is denoted by Lambda(λ). There are mainly two conditions associated with the power in this transform, which is lambda equals zero and not equal to zero

                    Syntax: from sklearn.preprocessing import PowerTransformer 
                            boxcox = PowerTransformer(method='box-cox') 
                            data_transformed = boxcox.fit_transform(data) 
                            
2. Yeo-Johnson Transform - This transformation technique is also a power transform technique, where the power of the data observations is applied to transform the data. This is an advanced form of a box cox transformations technique where it can be applied to even zero and negative values of data observations also.

                    Syntax: from sklearn.preprocessing import PowerTransformer
                            scaler = PowerTransformer(method = 'yeo-johnson')


#### 3. Quantile Transformers

- Quantile transformation techniques are the type of feature transformation technique that can be applied to NY numerical data observations. 

- In this transformation technique, the input data can be fed to this transformer where this transformer makes the distribution of the output data normal to fed to the further machine learning algorithm

                    Syntax: from sklearn.preprocessing import QuantileTransformer
                            scaler = QuantileTransformer()

Learning Source : Analytics Vidhya and Geeks for Geeks