## Mathematical Transformation

In the real world, we do not get any normally distributed data, so we have to think about how to convert data that looks normal in Normal Distribution.

So we use different TRANSFORMATION for this.

- Transformations are nothing but mathematical transforms where by applying some mathematical operation we can try to convert our distribution into a Normal Distribution.

There are so many transformations, some of them are:
- Log Transform  
- Box-Cox Transform  
- Reciprocal Transform  
- Power Transform  
- Yeo-Jonson Transform  

### What happens when we apply this transformation? How does the performance of the Model improve?

- The answer is our data distribution the PDF (Probability Density Function) converts into Normal Distribution.

### Why do we want to transform into Normal distribution? What Is the reason or benefit behind doing this?

- As we know statistics is the mother field of Machine Learning, and whenever any statistician works on any problem whatever complex problem upon which he is working anywhere he finds normal distribution he becomes happy, now can imagine how important is Normal Distribution.
- By looking at the normally distributed data statisticians started feeling that they could solve the problem.
- In terms of machine learning, we have some algorithms like ‚ÄúLinear Regression‚Äù, and ‚ÄúLogistic Regression‚Äù We work assuming that the data we are using is Normally distributed data.
- If we do not have normal data we want to make them normal. Some other machine learning algorithms like ‚ÄúDecision Tree‚Äù, and ‚ÄúRandom Forest‚Äù give a dam, which means they don‚Äôt get bothered about the distribution type of data.

In our machine learning library, we have inside sklearn three most used transformers given:

1. Function Transformer  
   (can do inside ‚Äî Log Transform, Square/Square root Transform, Reciprocal Transform, any custom function also)

2. Power Transformer  
   (can be done inside BOX-COX, YOE-JHONSON)

3. Quantile Transformer


# Function Transformation

## 1. Log Transformation

<p align="center">
  <img src="https://miro.medium.com/v2/resize:fit:640/format:webp/1*hd5oOa6YFMxkLvMNQlM-9Q.png" width="300px"><br>
  <em>Right skewed to Normal distributed</em>
</p>

If we want to apply LOG TRANSFORM we apply log.

- Ex: Suppose we have an age column from the Titanic dataset, and we want to apply a logarithmic transformation. Then, we take the logarithm for all values inside the age column. We can choose to use it either log base 2 or log base 10, depending on our preference. By taking the logarithm, the data will be converted into a normal distribution, although not completely, it will be improved from the current situation.

#### When to use Log Transform:-

- We can not apply log transform upon Negative values as we can not take the Log of negative values.
- If we have RIGHT SKEWED data by applying log transform it shifts this right skewed into centre.
- By applying Log we convert a very big range into an equivalent scale, that‚Äôs why we get normal values and everything looks linear, so linear models like linear regression or logistic regression perform better.


## 2. Reciprocal (1/x) Transformation

In 1/x all big values will convert into small values and small values convert into big values.  
This is a very different transform, sometimes we use this.

## 3. Square (x¬≤) Transformation

<p align="center">
  <img src="https://miro.medium.com/v2/resize:fit:480/format:webp/1*CbcBp8OrbZiVc38sp-2uNw.png" width="300px"><br>
  <em>Square</em>
</p>

- (x) √ó (x) (APPLY WHEN WE HAVE LEFT SKEWED DATA)
- This is especially used for ‚ÄúLEFT SKEWED DATA‚Äù

## 4. Sqrt Transformation

To be honest, it is not very useful, but we can try this.

**Note:**  
Transformation can not apply if we have missing value in data so before applying transformation we are supposed to deal with missing value.

### Example

**Titanic dataset:**  
Columns taken ‚Äúage‚Äù, ‚Äúfare‚Äù, and ‚Äúsurvived‚Äù will check the before-transformation and after-transformation differences upon the Titanic dataset.

```python
import pandas as pd
import numpy as np

import scipy.stats as stats

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

from sklearn.preprocessing import FunctionTransformer
from sklearn.compose import ColumnTransformer

# importing all necessary libraries
```
```python
# Titanic dataset with 3 columns uploaded 
df = pd.read_csv('train.csv',usecols=['Age','Fare','Survived'])

# checking Null Values
df.isnull().sum()

Output:
Survived      0
Age         177
Fare          0
dtype: int64
```
```python
# Replacing Nullvalues with Mean
df['Age'].fillna(df['Age'].mean(),inplace=True)

# splitting the data into Input and target variables
X = df.iloc[:,1:3]
y = df.iloc[:,0]

# splitting the dataste into training and testset

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
```
```python
# applyied PDF & QQ Plot Before Transformation to check Normality

# Age Column PDF & QQ Plot
plt.figure(figsize=(14,4))
plt.subplot(121)
sns.distplot(X_train['Age'])
plt.title('Age PDF')

plt.subplot(122)
stats.probplot(X_train['Age'], dist="norm", plot=plt)
plt.title('Age QQ Plot')

plt.show()
```
<p align="center">
  <img src="https://miro.medium.com/v2/resize:fit:1100/format:webp/1*zvNHsUkW_CM1q0YK_DVwZA.png" width="300px"><br>
  <em>PDF & QQ Plot before Transformation ‚Äî Titanic Age Column</em>
</p>

```python
# PDF & QQ Plot upon Fare Column Before Transformation

plt.figure(figsize=(14,4))
plt.subplot(121)
sns.distplot(X_train['Fare'])
plt.title('Fare PDF')

plt.subplot(122)
stats.probplot(X_train['Fare'], dist="norm", plot=plt)
plt.title('Fare QQ Plot')

plt.show()
```
<p align="center">
  <img src="https://miro.medium.com/v2/resize:fit:1100/format:webp/1*bVEqbN_LMtdf-yeiUXR01g.png" width="300px"><br>
  <em>PDF & QQ lot Before Transformation Upon Titanic Fare Column</em>
</p>

```python
# applying linear and tree based algorithm
clf = LogisticRegression()
clf2 = DecisionTreeClassifier()

clf.fit(X_train,y_train)
clf2.fit(X_train,y_train)
    
y_pred = clf.predict(X_test)
y_pred1 = clf2.predict(X_test)
    
print("Accuracy LR",accuracy_score(y_test,y_pred))
print("Accuracy DT",accuracy_score(y_test,y_pred1))

output:-
Accuracy LR 0.6480446927374302
Accuracy DT 0.6480446927374302
```
```python
# Now applying transformation "log transformation" technique upon column 'age' and 'fare' 
#to see it will tend towards normal distribution or not
trf = FunctionTransformer(func=np.log1p)

X_train_transformed = trf.fit_transform(X_train)
X_test_transformed = trf.transform(X_test)

clf = LogisticRegression()
clf2 = DecisionTreeClassifier()

clf.fit(X_train_transformed,y_train)
clf2.fit(X_train_transformed,y_train)
    
y_pred = clf.predict(X_test_transformed)
y_pred1 = clf2.predict(X_test_transformed)
    
print("Accuracy LR",accuracy_score(y_test,y_pred))
print("Accuracy DT",accuracy_score(y_test,y_pred1))

Output:-
LR 0.678027465667915
DT 0.6610736579275905


# for assurance of result doing Cross Validation 10 times

X_transformed = trf.fit_transform(X)

clf = LogisticRegression()
clf2 = DecisionTreeClassifier()

print("LR",np.mean(cross_val_score(clf,X_transformed,y,scoring='accuracy',cv=10)))
print("DT",np.mean(cross_val_score(clf2,X_transformed,y,scoring='accuracy',cv=10)))

output:- 
LR 0.678027465667915
DT 0.6565917602996254
```
```python
plt.figure(figsize=(14,4))

plt.subplot(121)
stats.probplot(X_train['Fare'], dist="norm", plot=plt)
plt.title('Fare Before Log')

plt.subplot(122)
stats.probplot(X_train_transformed['Fare'], dist="norm", plot=plt)
plt.title('Fare After Log')

plt.show()
```
<p align="center">
  <img src="https://miro.medium.com/v2/resize:fit:1100/format:webp/1*7I_ozvl0vOYeV9LFtCGjsw.png" width="300px"><br>
  <em>Fare data point distribution Before and after transformation</em>
</p>

```python
plt.figure(figsize=(14,4))

plt.subplot(121)
stats.probplot(X_train['Age'], dist="norm", plot=plt)
plt.title('Age Before Log')

plt.subplot(122)
stats.probplot(X_train_transformed['Age'], dist="norm", plot=plt)
plt.title('Age After Log')

plt.show()
```
<p align="center">
  <img src="https://miro.medium.com/v2/resize:fit:1100/format:webp/1*Cr7gBM5HjCWmASK_gShX3g.png" width="300px"><br>
  <em>Age Data Point before and after Transformation/em>
</p>


### Observation

- The distribution of the age column before transformation also appears to be normally distributed. In the QQ Plot, most of the data points are located above the line, with only a few deviations from the line.
- However, the fare column distribution clearly exhibits a right-skewed pattern, indicating the need for a log transformation. Before the transformation, we applied two different algorithms: a linear-based algorithm (Logistic Regression) and a tree-based algorithm (Decision Tree), to assess accuracy. The Logistic Regression model achieved 64% accuracy, while the Decision Tree model achieved 68% accuracy.
- Next, we applied a log transformation to both columns using `np.log1p` instead of `np.log` to avoid potential issues with zero values in the dataset. Interestingly, the transformation had a more significant impact on the linear model compared to the tree-based model, as decision trees inherently segment the data.
- To validate the transformation‚Äôs effectiveness, we conducted a cross-validation with 10 iterations, resulting in a consistent 67% accuracy for the logistic regression model. This outcome confirms that the transformation successfully improved the data distribution.
- Upon visualization, we observed that the fare column, which was previously far from a normal distribution, moved closer to normality after transformation. Conversely, the age column, which initially exhibited a satisfactory distribution, performed poorly after transformation. This suggests that transformation may not be necessary when the data distribution is already close to normal.


# Power Transformation

## 1. Box-Cox Trnsformation

<p align="center">
  <img src="https://miro.medium.com/v2/resize:fit:1100/format:webp/1*_aCKBEOaJ54i-L6-bG36vA.png" width="300px"><br>
  <em>PDF & QQ Plot before Transformation ‚Äî Titanic Age Column</em>
</p>

- Based on two computer scientists Box and Cox this transform name.
- This is ‚ÄúGeneral Transform‚Äù which can apply upon any dataset.
- By using BOX_COX TRANSFORM we can convert any type of distribution into ‚ÄúNORMAL DISTRIBUTION‚Äù
- This is general transformation and its special case is ‚ÄúLog transform‚Äù and ‚ÄúSquare Root Transform‚Äù

$$
y(\lambda) = 
\begin{cases} 
\frac{y^\lambda - 1}{\lambda}, & \text{if } \lambda \neq 0 \\
\ln(y), & \text{if } \lambda = 0 
\end{cases}
$$

---

### Parameters and Constraints

* **$y$**: The original data value.  
  **Note:** The Box-Cox transformation requires all $y$ values to be strictly positive ($y > 0$).
* **$\lambda$**: The transformation parameter. Common values include:
  * **$\lambda = 1$**: No transformation (identity).
  * **$\lambda = 0$**: Logarithmic transformation ($\ln(y)$).
  * **$\lambda = 0.5$**: Square root transformation ($\sqrt{y}$).
  * **$\lambda = -1$**: Reciprocal transformation ($1/y$).
* **$\ln$**: The natural logarithm.

Here, X represents the input variable on which we apply the transformation (X^Œª ‚àí 1 / Œª), depending on the value of lambda.

Lambda, denoted by Œª, essentially signifies the power to which X is raised. We aim to determine the optimal power for X, which could be any value such as X squared, X cubed, or even X raised to the power of 1.5.

The exponent here is a variable called lambda that varies over the range of -5 to 5 and in the process of searching,  
We examine all values of lambda. So basically we try randomly all values between -5 to 5 to check which one is the best.

Finally, we choose the optimal value (resulting in the best approximation to a normal distribution) for your variable. It means the value gives the best normal distribution upon data that the lambda value we choose for transformation.  
The lambda value which gives the best normal distribution will take those lambda values.

There are two techniques to find this:-

A- MAXIMUM LIKELIHOOD (used in Logistic Regression)

B- BAYESIAN STATISTICS (it comes under ‚ÄòInferential statistics‚Äô)

Boxcox transform is strictly applicable to only NUMBERS that are GREATER THAN ZERO.

N > 0 is only applicable for box-cox, Zero, and negative won't work.


## 2. Yeo-Johnson Transformation

$$
y(\lambda) = 
\begin{cases} 
\frac{(y + 1)^\lambda - 1}{\lambda}, & \text{if } \lambda \neq 0, y \ge 0 \\
\ln(y + 1), & \text{if } \lambda = 0, y \ge 0 \\
\frac{-[(-y + 1)^{2-\lambda} - 1]}{2 - \lambda}, & \text{if } \lambda \neq 2, y < 0 \\
-\ln(-y + 1), & \text{if } \lambda = 2, y < 0
\end{cases}
$$

- It is used to solve the restrictions of Box-Cox Transform. As we know Box-Cox can not apply to Zero and Negative Numbers.
- This is also found by 2 computer scientists named Yeo, Johnson.
- This is a kind of VARIATION of Box-Cox Transform.
- All variations come from that it can work upon ‚ÄúNegative‚Äù and ‚ÄúZero‚Äù values.
- This transformation is somewhat of an adjustment to the Box-Cox transformation, by which we can apply it to negative numbers.

**Note:**  
We have to use the ‚ÄúPOWER TRANSFORM‚Äù class of scikit-learn and we get both implementations inside that, and where we feel like ‚Äúdistribution is not normal‚Äù, we are using such algorithm which works well upon normally distributed data like Linear Regression, Logistic Regression, KNN, then we need to apply both transformations. In the end, we need to do only parameter training and we get to know whether we should apply Box-Cox or we should go with Yeo-Johnson.


---

## Credits

**Prepared by:**  
**Chetan Sharma**  
AIML / Data Science Notes  

üîó **GitHub:** [github.com/Chetan559](https://github.com/Chetan559)  
üåê **Portfolio:** [chetan559.github.io](https://chetan559.github.io)  
üíº **LinkedIn:** [linkedin.com/in/sharma-chetan-k](https://www.linkedin.com/in/sharma-chetan-k/)  

These notes were compiled for learning, revision, and academic understanding. 
