today topic : 

1) log transform 
2) reciprocal transform
3) power(sqr / sqrt)  

next day 
1) Box- Cox
2) yeo -johnson

# Day 30 ‚Äì Functional Transformer (Variable Transformer)

## Is the end goal a normal distribution?

**Short answer:** ‚ùå No ‚Äî not always  
**Better answer:** üéØ The goal is a *more usable* distribution for the model, not a perfectly normal one.

---

## üéØ The Real End Goal

> **To make features more suitable for a model‚Äôs assumptions and learning behavior.**

Normal distribution is a **means**, not the **end**.

---

## üìà Why People Care About Normal Distribution

Some ML models **assume or benefit** from features that are roughly normally distributed:

- Linear Regression  
- Logistic Regression  
- Support Vector Machines (SVM)  
- Principal Component Analysis (PCA)  
- K-Nearest Neighbors (KNN)

For these models, reduced skewness helps with:
- Stable gradients  
- Faster convergence  
- Better interpretability  
- Improved performance

---

## üß† Role of FunctionalTransformer

**FunctionalTransformer does NOT aim to normalize data.**

Its purpose is:

> **Apply custom Python functions safely inside an sklearn pipeline.**

Typical use cases:
- Log / square root transformations  
- Feature creation  
- Outlier capping  
- Domain-specific logic  
- Text or date preprocessing  

Sometimes these transformations make data closer to normal ‚Äî sometimes they don‚Äôt.  
And that‚Äôs perfectly fine.

---

## ‚ö° When Normal Distribution *Actually* Matters

That‚Äôs where **PowerTransformer** is useful.

### PowerTransformer (Box-Cox / Yeo-Johnson):
- Explicitly attempts to **Gaussianize features**
- Automatically finds the optimal power (Œª)
- Reduces skewness and stabilizes variance

**Difference:**
- `FunctionalTransformer` ‚Üí *You define the logic*
- `PowerTransformer` ‚Üí *Algorithm finds the best transformation*

---

## üå≥ Tree-Based Models & Distribution

Tree-based models **do NOT require** normal distributions:

- Decision Trees  
- Random Forest  
- XGBoost  
- LightGBM  

For these models:
- Feature scaling is often unnecessary  
- Skewness usually doesn‚Äôt hurt  
- Splits are based on thresholds, not distribution shape  

---

## üß† Correct Mental Model

Think in this order:

1. Model assumptions  
2. Feature behavior (skew, outliers)  
3. Interpretability  
4. Generalization  

üëâ Normality is **optional**  
üëâ Feature usefulness is **mandatory**

---


sklearn library  -> 1)function transformer 2) Power transformer 3) Quantile transformer

In [1]:
# 1) function transform : a) log trans b) reciprocal c) sqr / sqrt d) custom function

###### how we to find if data is normal for model?

#### Models that CARE about normality  : 
1) Linear Regression
2) Logistic Regression
3) SVM
4) PCA
5) KNN

#### Models that DON‚ÄôT CARE
1) Decision Trees
2) Random Forest
3) XGBoost / LightGBM

If your model doesn‚Äôt care ‚Üí don‚Äôt waste time forcing normality.

## QQ plots??

Q‚ÄìQ plot points should lie close to the straight diagonal line.

## log transform 

A log transform applies the logarithm to a feature to:
Reduce right skew, compress large values, and stabilize variance.

1) not on -ve values 
2) on right skewd data : it take the data to center 


Log transform is used when:

1) Data is right-skewed
2) Few large values dominate the feature
3) Variance increases with magnitude
4) Relationship with target is non-linear

Common real-world examples:
1) Income
2) Salary
3) House prices
4) Population
5) Medical counts

```python 
from sklearn.preprocessing import FunctionTransformer
import numpy as np

log_transformer = FunctionTransformer(np.log1p)
```

In [3]:
## Reciprocal transfrom (1/x)
## square (x^2)
## square root (x^ [-1/2])


**Use cases:**
- When large values dominate the feature
- When the relationship with the target is inverse
- To reduce the effect of extreme large values

‚ö†Ô∏è **Important:** Handle zero values carefully.

```python
from sklearn.preprocessing import FunctionTransformer

reciprocal_transformer = FunctionTransformer(
    func=lambda x: 1 / (x + 1)
)


# Square Transformation (x¬≤)


---

### Purpose
- Emphasizes larger values
- Increases separation between higher magnitudes
- Models non-linear relationships

---

### Effect on Distribution
- Large values grow much faster than small values
- Can **increase right skewness**
- Variance increases significantly

---

### When to Use
- When larger values should have more influence
- When feature‚Äìtarget relationship is non-linear
- For polynomial or interaction-based models

---

### When NOT to Use
- Already highly right-skewed data
- Presence of extreme outliers
- When model is sensitive to scale

---

### Python Example
```python
from sklearn.preprocessing import FunctionTransformer

square_transformer = FunctionTransformer(
    func=lambda x: x ** 2
)


# Square Root Transformation (‚àöx)

Note:  
`x^(-1/2)` represents **1 / ‚àöx**, which is a *different* transformation.

---

## Purpose

- Reduce right skewness  
- Compress large values  
- Stabilize variance  
- Improve model performance for linear and distance-based models  

---

## Effect on Distribution

- Large values are compressed more than small ones  
- Distribution becomes more symmetric  
- Less aggressive than log transformation  
- Helps reduce the impact of outliers  

üìå Does **not** guarantee normal distribution.

---

##  When to Use

- Moderately right-skewed data  
- Count-based features (e.g. frequency, visits, events)  
- When log transformation is too strong  
- When values are non-negative  

---

##  When NOT to Use

- Negative values (unless data is shifted)  
- Left-skewed distributions  
- When strong compression is required  

---

## Python Implementation

### Using NumPy
```python
import numpy as np

X_sqrt = np.sqrt(X)
