# **Introduction:**


In my machine learning journey, more often than not, I have found that feature preprocessing is a more effective technique in improving my evaluation metric than any other step, like choosing a model algorithm, hyperparameter tuning, etc.

  **Feature preprocessing is one of the most crucial steps in building a Machine learning model.** Too few features and your model won’t have much to learn from. Too many features and we might be feeding unnecessary information to the model. Not only this, but the values in each of the features need to be considered as well.

  We know that there are some set rules of dealing with categorical data, as in, encoding them in different ways. However, a large chunk of the process involves dealing with continuous variables. There are various methods of dealing with continuous variables. Some of them include converting them to a normal distribution or converting them to categorical variables, etc.

**These techniques are:**

*  **Feature Transformation and**
*  **Feature Scaling.**

# **Why do we need Feature Transformation and Scaling?**


Oftentimes, we have datasets in which different columns have different units – like one column can be in kilograms, while another column can be in centimeters. Furthermore, we can have columns like income which can range from 20,000 to 100,000, and even more; while an age column which can range from 0 to 100(at the most). Thus, Income is about 1,000 times larger than age.

But how can we be sure that the model treats both these variables equally? When we feed these features to the model as is, there is every chance that the income will influence the result more due to its larger value. But this doesn’t necessarily mean it is more important as a predictor. So, to give importance to both Age, and Income, we need feature scaling.

In most examples of machine learning models, you would have observed either the Standard Scaler or MinMax Scaler. However, the powerful sklearn library offers many other feature transformations scaling techniques as well, which we can leverage depending on the data we are dealing with. 

# **1. Data Transformation:**






## **1.1 MinMax Scaler:**

The MinMax scaler is one of the simplest scalers to understand.  It just scales all the data between 0 and 1. The formula for calculating the scaled value is-

x_scaled = (x – x_min)/(x_max – x_min)

Thus, a point to note is that it does so for every feature separately. Though (0, 1) is the default range, we can define our range of max and min values as well.

## **1.2 Standarization or Standard Scaler:**


Standarization is a scaler technique where the values are centred around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resulant distribution has a unit standard deviation.

mean = 0
standard deviation = 1

x_scaled = x – (mean/std_dev)



## **1.3 MaxAbs Scaler:**

In simplest terms, the MaxAbs scaler takes the absolute maximum value of each column and divides each value in the column by the maximum value.

Thus, it first takes the absolute value of each value in the column and then takes the maximum value out of those. This operation scales the data between the range **[-1, 1]**.

## **1.4 Robust Scaler:**

If you have noticed in the scalers we used so far, each of them was using values like the mean, maximum and minimum values of the columns. All these values are sensitive to outliers. If there are too many outliers in the data, they will influence the mean and the max value or the min value. Thus, even if we scale this data using the above methods, we cannot guarantee a balanced data with a normal distribution.



The Robust Scaler, as the name suggests is not sensitive to outliers. This scaler-

*  removes the median from the data,
*  scales the data by the InterQuartile Range(IQR).


Inter-Quartile Range is nothing but the difference between the first and third quartile of the variable. The interquartile range can be defined as-

IQR = Q3 – Q1

Thus, the formula would be:

x_scaled = (x – Q1)/(Q3 – Q1)



## **1.5 Quantile Transformer Scaler:**


One of the most interesting feature transformation techniques that I have used, the **Quantile Transformer Scaler converts the variable distribution to a normal distribution.** and scales it accordingly. Since it makes the variable normally distributed, it also deals with the outliers. Here are a few important points regarding the Quantile Transformer Scaler:

* It computes the cumulative distribution function of the variable

* It uses this cdf to map the values to a normal distribution

* Maps the obtained values to the desired output distribution using the associated quantile function



# **2. Scaling Techniques:**



## **2.1 Log Transform:**


The Log Transform is one of the most popular Transformation techniques out there. It is primarily used to convert a skewed distribution to a normal distribution/less-skewed distribution. In this transform, we take the log of the values in a column and use these values as the column instead.

Why does it work? It is because the log function is equipped to deal with large numbers. Here is an example-

log(10) = 1

log(100) = 2, and

log(10000) = 4.



## **2.2 Recipocal Transformation:**

reciVal = (1/features_value)

## **2.3 Square Root Transformation:**

## **2.4. Box-Cox Transformation:**

## **2.5 Other Custome Transformation:**