Most machine learning algorithms are statistics dependent, meaning that all of the algorithms are indirectly using a statistical approach to solve the complex problems in the data. In statistics, the normal distribution of the data is one that a statistician desires to be. 

A normal distribution of the data helps statisticians to solve the complex patterns of the data and gain valuable insights from the same. 

But for the algorithm scenario, a normal distribution of the data can not be desired every time with every type of dataset, which means the data which is not normally distributed needs preprocessing and cleaning before applying the machine learning algorithm to it.

Transformers are the type of functions that are applied to data that is not normally distributed, and once applied there is a high of getting normally distributed data.

Feature transformation techniques:
Function Transformers
Power Transformers
Quantile Transformers

A. Function Transformers
Uses a particular function to transform the data to the normal distribution. Here the particular function is applied to the data observations.

Mostly there are 5 types of function transformers that are used and which also solve the issue of normal distribution almost every time.

Log Transform
log is applied to every single distribution of the data and the result from the log is considered the final day to feed the machine learning algorithms.
Log transforms performs so well on the right-skewed data. It transforms the right-skewed data into normally distributed data so well.

In [59]:
import numpy as np
from sklearn.preprocessing import FunctionTransformer 
transform = FunctionTransformer(func=np.log1p)
data = np.random.randint(10,99,1000)
transformed_data = transform.fit_transform(data) 
transformed_data[:10]

array([4.48863637, 4.15888308, 4.2341065 , 3.93182563, 3.63758616,
       4.06044301, 4.59511985, 4.59511985, 3.95124372, 4.57471098])

Square Root Transform.
In this transform, the square root of the data is calculated. This transform performs so well on the left-skewed data and efficiently transformed the left-skewed data into normally distributed data.

In [61]:
tranformed_data = np.sqrt(data)
tranformed_data[:10]

array([9.38083152, 7.93725393, 8.24621125, 7.07106781, 6.08276253,
       7.54983444, 9.89949494, 9.89949494, 7.14142843, 9.79795897])

Reciprocal Transform
In this transformation, the reciprocal of every observation is considered. This transform is useful in some of the datasets as the reciprocal of the observations works well to achieve normal distributions.

In [66]:
tranformed_data = np.reciprocal(data) 
tranformed_data[:5]

array([0, 0, 0, 0, 0])

Custom Transforms
In every dataset, the log and square root transforms can not be used, as every data can have different patterns and complexity. Based on the domain knowledge of the data, custom transformations can be applied to transform the data into a normal distribution. The custom transforms here can be any function or parameter like sin, cos, tan, cube, etc.

In [74]:
import numpy as np 
sin_tranformed_data = np.sin(data) 
cos_tranformed_data = np.cos(data) 
tan_tranformed_data = np.tan(data)
print(f"sin_tranformed_data:{sin_tranformed_data[:5]},cos_tranformed_data:{cos_tranformed_data[:4]},tan_tranformed_data:{tan_tranformed_data[:3]}")

sin_tranformed_data:[ 0.0353983   0.1673557  -0.89792768 -0.26237485 -0.64353813],cos_tranformed_data:[0.99937328 0.98589658 0.44014302 0.96496603],tan_tranformed_data:[ 0.0354205   0.16974975 -2.0400816 ]


B. Power Transformers
Type of feature transformation technique where the power is applied to the data observations for transforming the data.

Two types of Power Transformation techniques:
Box-Cox Transform
Yeo-Johnson Transform

Box-Cox Transform
This transform technique is mainly used for transforming the data observations by applying power to them.
The power of the data observations is denoted by Lambda(λ).
X¡^⁁ = In X¡; for λ=0  or (X¡^⁁ - 1)/λ; for λ‡0
Based upon the iteration technique every single value of the lambda is examined and the best fit value of the lambda is then applied to the data to transform it.
transformed value of every data observation will lie between 5 to -5.
One major disadvantage associated with this transformation technique is that this technique can only be applied to positive observations. it is not applicable for negative and zero values of the data observations.

In [95]:
from sklearn.preprocessing import PowerTransformer 
boxcox = PowerTransformer(method='box-cox')
data2 = np.random.randint(100,size=(3,10)) #3 rows, each row containing 10 random integers from 0 to 100:
data_transformed = boxcox.fit_transform(data2)
data_transformed

array([[ 0.30149286,  1.35736088,  0.80430489,  1.3882269 ,  1.21758767,
        -0.44174165,  0.79672738,  0.03676794,  0.39007534,  0.4481453 ],
       [ 1.04584311, -0.33490933,  0.60523042, -0.4604052 , -1.23177874,
        -0.94259307, -1.41025257, -1.24271484,  0.98219677, -1.38569838],
       [-1.34733598, -1.02245155, -1.40953531, -0.9278217 ,  0.01419107,
         1.38433472,  0.61352519,  1.20594691, -1.37227211,  0.93755309]])

Yeo Johnson Transform
This transformation technique is also a power transform technique, where the power of the data observations is applied to transform the data. This is an advanced form of a box cox transformations technique where it can be applied to even zero and negative values of data observations also.

In scikit learn the default parameter is set to Yeo Johnson in the Power Transformer class.

In [105]:
boxcox = PowerTransformer() 
"""
Use np.random.randint and pass the size parameter.
For values in [-100, 100) and size (3, 100)  
"""
data2 = np.random.randint(-100,100, size = (3,10))
data_transformed = boxcox.fit_transform(data2)
data_transformed

array([[ 0.40386597, -1.30433908, -1.39924902,  0.69410758,  1.25060315,
        -1.40270371,  1.30585089, -1.24192224, -1.23652482,  0.69568874],
       [-1.37567453,  1.12548198,  0.52192636, -1.41413439, -1.19713531,
         0.54541364, -0.18274986,  1.20681252,  1.21261481, -1.41415243],
       [ 0.97180856,  0.1788571 ,  0.87732266,  0.72002681, -0.05346783,
         0.85729007, -1.12310103,  0.03510971,  0.02391001,  0.71846369]])

Quantile Transformers
Quantile transformation techniques are the type of feature transformation technique that can be applied to NY numerical data observations. This transformation technique can be implemented using sklearn.

In this transformation technique, the input data can be fed to this transformer where this transformer makes the distribution of the output data normal to fed to the further machine learning algorithm.

Here there is a paramere called output_distribution, which value can be set to uniform or normal.

In [129]:
from sklearn.preprocessing import QuantileTransformer 
quantile_trans = QuantileTransformer(output_distribution='normal') 
data = np.random.randint(-10,10, (3,2))
data_transformed = quantile_trans.fit_transform(data) 
transformed_data[:5]



array([4.48863637, 4.15888308, 4.2341065 , 3.93182563, 3.63758616])

Key Takeaways
The featured transformation techniques are used to transform the data to normal distribution for better performance of the algorithm.
The Log transforms perform so well on the right-skewed data. Whereas the square root transformers perform so well on left-skewed data.
Based on the domain knowledge of the problem statement and the data, the custom data transformations technique can be also applied efficiently.
Box-Cox transformations can be applied to only positive data observations which return the transformed values between -5 to 5.
Yeo Johnson’s transformations technique can be applied to zero and negative values as well.
Conclusion
In this article, we discussed some of the famous and most used data transformation techniques that are used to transform the data from any other distribution to normal distribution. this will help one to apply data preprocessing and cleaning techniques n the complex data easily and will help one to answer some of the interview questions related to it very efficiently.