# üìå 1Ô∏è‚É£ Mathematical Transformations
#### Mathematical transformations are applied to numerical features to:

#### Reduce skewness
#### Make data more normally distributed
#### Reduce effect of outliers
#### Improve model performance (especially Linear Regression)

# üîπ A) Function Transformations
#### These are simple mathematical functions applied directly to a feature.

## 1Ô∏è‚É£ Log Transformation

Formula:
```sh
X‚Ä≤=log(X)
```

### ‚úÖ Used When:
<ul>
    <li>Data is right-skewed</li>
    <li> Large positive values dominate</li>
</ul>
 
### Example:
```sh
Income: 10k, 15k, 20k, 500k
```
#### After log ‚Üí large values shrink.
#### ‚ö†Ô∏è Important:

<ul>
    <li>Only works for positive values</li>
    <li>Cannot apply log to 0 or negative numbers</li>
</ul>


# 2Ô∏è‚É£ Reciprocal Transformation

### Formula:
```sh
X‚Ä≤= (1 / X)‚Äã
```

### ‚úÖ Used When:

<ul>
    <li>Extreme large values need compression</li>
    <li>Strong right skew</li>
</ul>

### ‚ö†Ô∏è Risk:
<ul>
    <li>Very sensitive to small values</li>
    <li>Cannot divide by 0</li>
</ul>

## 3Ô∏è‚É£ Square Transformation

### Formula:
```sh
X = X^2
```
### ‚úÖ Used When:
<ul>
    <li>Data is left-skewed</li>
    <li>Want to amplify larger values</li>
</ul>


## 4Ô∏è‚É£ Square Root Transformation

### Formula:	‚Äã
```sh
X = sqrt(X)
```

### ‚úÖ Used When:
<ul>
    <li>Mild right skew</li>
    <li>Counts or frequency data</li>
</ul>



# üéØ Why Function Transformations?

## Many ML models (especially Linear Regression) assume:
<ul>
    <li>Normal distribution</li>
    <li>Linear relationship</li>
    <li>Homoscedasticity (constant variance)</li>
</ul>

### Transformations help meet these assumptions.

# üîπ B) Power Transformations

#### More advanced than simple function transforms.
#### They automatically find best transformation parameter.

--- 

## 1Ô∏è‚É£ Box-Cox Transformation

### Formula (general form):
```sh
X' = (X^Œª - 1)/Œª
```

### ‚úÖ Used When:

<ul>
    <li>Data is strictly positive</li>
    <li>Need to reduce skewness</li>
    <li>Want automatic transformation</li>
</ul>

### ‚ö†Ô∏è Limitation:
#### Works only for positive data

### In sklearn:
```sh
from sklearn.preprocessing import PowerTransformer
pt = PowerTransformer(method='box-cox')
```

--- 

## 2Ô∏è‚É£ Yeo-Johnson Transformation
#### Similar to Box-Cox but:

### ‚úÖ Works With:
<ul>
    <li>Positive values</li>
    <li>Zero values</li>
    <li>Negative values</li>
</ul>

#### More flexible.

### In sklearn:
```sh
pt = PowerTransformer(method='yeo-johnson')
```

---

# üìåQQ Plot (Quantile-Quantile Plot)

#### QQ Plot is used to check whether data follows a normal distribution.

## üß† What It Does

#### It compares:
<ul>
    <li>Quantiles of your data</li>
    <li>Quantiles of a normal distribution</li>
</ul>

## üìà Interpretation

#### If points lie roughly on a straight diagonal line ‚Üí
#### Data is approximately normally distributed.

#### If points curve away ‚Üí
#### Data is skewed or non-normal.

## Right Skew:
#### Points curve upward at right end.

## Left Skew:
#### Points curve downward at left end.

### Example:
```sh
import scipy.stats as stats
import matplotlib.pyplot as plt

stats.probplot(df['age'], dist="norm", plot=plt)
plt.show()
```

---

## üéØ Why QQ Plot Is Important

#### Before applying:
<ul>
    <li>Linear Regression</li>
    <li>Parametric tests</li>
    <li>Statistical modeling</li>
</ul>

####  We check normality.

#### After transformation, we re-check QQ plot to see improvement.