# Function Transformer & Power Transformer - Complete Guide

## Table of Contents
1. [Overview](#overview)
2. [Function Transformer](#function-transformer)
3. [Power Transformer](#power-transformer)
4. [Comparison & When to Use](#comparison--when-to-use)
5. [Implementation Examples](#implementation-examples)

---

## Overview

### Why Transform Data?

Data transformation is crucial in machine learning because:
- **Many ML algorithms assume normally distributed data** (e.g., Linear Regression, Logistic Regression)
- **Skewed data reduces model accuracy**
- **Normal distribution improves convergence** in gradient-based algorithms
- **Transformations handle outliers** more effectively

### Types of Transformers

1. **Function Transformers** - Apply custom mathematical functions
2. **Power Transformers** - Box-Cox & Yeo-Johnson transformations (using log, and etc)
3. **Quantile Transformers** - Transform to uniform or normal distribution

---

## Function Transformer

### What is Function Transformer?

`FunctionTransformer` from sklearn allows you to apply **any custom mathematical function** to your data. It's a wrapper that makes custom transformations compatible with sklearn pipelines.

### Common Mathematical Transformations

#### 1. Log Transform
```python
from sklearn.preprocessing import FunctionTransformer
import numpy as np

# Using log1p (log(1+x)) - safer with zeros
trf = FunctionTransformer(func=np.log1p)
```

**Use Case:**
- **Right-skewed data** (long tail on right)
- **Positive values only**
- Compresses large values, expands small values

**Example:** Income, prices, population data

#### 2. Square Root Transform
```python
trf = FunctionTransformer(func=np.sqrt)
```

**Use Case:**
- **Moderately right-skewed data**
- **Count data** (Poisson distributed)
- Less aggressive than log transform

#### 3. Square/Power Transform
```python
trf = FunctionTransformer(func=lambda x: x**2)
trf = FunctionTransformer(lambda x: x ** -1)
trf = FunctionTransformer(lambda x: np.power(x, -1))
```

**Use Case:**
- **Left-skewed data** (long tail on left)
- Amplifies larger values

#### 4. Reciprocal Transform
```python
trf = FunctionTransformer(func=lambda x: 1/x)
```

**Use Case:**
- **Highly right-skewed data**
- Inverts magnitudes (small → large, large → small)

**Warning:** Cannot use on zeros or negative values

#### 5. Custom Functions
```python
# Custom transformation
def custom_transform(x):
    return np.log(x + 1) * 2

trf = FunctionTransformer(func=custom_transform)
```

### Checking Data Distribution

#### Method 1: Distplot (PDF - Probability Density Function)
```python
import seaborn as sns
import matplotlib.pyplot as plt

# Visualize distribution
sns.histplot(data=df, x='Fare', kde=True)
plt.show()
```
- Bell curve = normal distribution
- Tail on right = right-skewed
- Tail on left = left-skewed

#### Method 2: Skewness Value
```python
import pandas as pd

skewness = df['Fare'].skew()
print(f"Skewness: {skewness}")
```

**Interpretation:**
- `0` = perfectly normal
- `0.5 to 1` = moderately skewed
- `> 1` = highly skewed
- Positive = right-skewed
- Negative = left-skewed

#### Method 3: QQ Plot (Quantile-Quantile Plot)
```python
import scipy.stats as stats

stats.probplot(df['Fare'], dist="norm", plot=plt)
plt.show()
```

**Interpretation:**
- Points on diagonal line = normal distribution
- Points deviate from line = non-normal

### Implementation Example

```python
from sklearn.preprocessing import FunctionTransformer
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split
import numpy as np

# Load data
df = pd.read_csv('titanic.csv')
x = df[['Age', 'Fare']]
y = df['Survived']

# Split data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Apply log transform to Fare column
trf = ColumnTransformer([
    ('log', FunctionTransformer(np.log1p), ['Fare'])
], remainder='passthrough')

x_train_transformed = trf.fit_transform(x_train)
x_test_transformed = trf.transform(x_test)
```

### Real-World Example: Titanic Dataset

**Problem:** Fare column is highly right-skewed

**Before Transformation:**
- Logistic Regression: 70% accuracy
- Decision Tree: 68% accuracy

**After Log Transformation:**
- Logistic Regression: 73% accuracy ✅
- Decision Tree: 73% accuracy ✅

**Result:** +3% improvement!

---

## Power Transformer

### What is Power Transformer?

`PowerTransformer` applies **Box-Cox** or **Yeo-Johnson** transformations to make data more Gaussian-like (normally distributed). Unlike Function Transformer, it **automatically finds the optimal transformation parameter (λ lambda)**.

### Import
```python
from sklearn.preprocessing import PowerTransformer
```

### Output Range
Transformed values typically fall between **-3 and +3** (standardized)

---

### 1. Box-Cox Transformation

#### Characteristics
- **Only for positive values (x > 0)**
- Automatically finds optimal λ (lambda) parameter
- More powerful than simple log transform
- Part of the power transform family

#### Formula
```
y = (x^λ - 1) / λ   when λ ≠ 0
y = log(x)          when λ = 0
```

#### Implementation
```python
from sklearn.preprocessing import PowerTransformer

# Box-Cox transformation
pt = PowerTransformer(method='box-cox')
x_transformed = pt.fit_transform(x_train)

# View lambda parameters for each feature
print(pt.lambdas_)
```

#### Limitations
- **CANNOT handle zeros or negative values**
- Data must be strictly positive (x > 0)
- If data contains zeros or negatives → use Yeo-Johnson

---

### 2. Yeo-Johnson Transformation

#### Characteristics
- **Works with ANY real numbers** (positive, negative, zero)
- More flexible than Box-Cox
- Default choice for most cases
- Handles all value ranges

#### Implementation
```python
# Yeo-Johnson transformation (handles all values)
pt = PowerTransformer(method='yeo-johnson')
x_transformed = pt.fit_transform(x_train)

# View lambda parameters
print(pt.lambdas_)
```

#### Advantages over Box-Cox
✅ Handles negative values  
✅ Handles zeros  
✅ More robust  
✅ Recommended as default

---

### Understanding Lambda (λ) Values

The lambda parameter indicates **how much transformation** was needed:

```python
# Compare lambdas between Box-Cox and Yeo-Johnson
pd.DataFrame({
    'feature': x_train.columns,
    'box_cox_lambda': pt_boxcox.lambdas_,
    'yeo_johnson_lambda': pt_yeojohnson.lambdas_
})
```

**Interpretation:**

| Lambda (λ) | Transformation Applied |
|------------|----------------------|
| `λ = 1` | No transformation (x¹ = x) |
| `λ = 0.5` | Square root (√x) |
| `λ = 0` | Log transform |
| `λ = -1` | Reciprocal (1/x) |
| `λ = 2` | Square (x²) |

**Important Notes:**
- **Lower λ ≠ worse transformation**
- Lower λ means less adjustment needed (data was already close to normal)
- Different λ values are optimal for different distributions
- Always validate with plots, not just λ values

---

### Real-World Example: Concrete Strength Dataset

**Dataset:** 1030 samples, 9 features
- Cement, Blast Furnace Slag, Fly Ash, Water, Superplasticizer, Coarse Aggregate, Fine Aggregate, Age
- **Target:** Strength (concrete compressive strength)

#### Lambda Values Comparison

```
Feature              | Box-Cox λ | Yeo-Johnson λ
---------------------|-----------|---------------
Cement               | 0.172     | 0.170
Blast Furnace Slag   | 0.025     | 0.017
Fly Ash              | -0.032    | -0.136
Water                | 0.810     | 0.808
Superplasticizer     | 0.100     | 0.264
Coarse Aggregate     | 1.129     | 1.129
Fine Aggregate       | 1.830     | 1.831
Age                  | 0.049     | 0.002
```

**Observations:**
- Features like Fly Ash needed more transformation (negative λ)
- Water and aggregates needed minimal transformation (λ near 1)
- Yeo-Johnson provides different optimal parameters

---

### Visualization Function

```python
def plots(original, transformed):
    """Compare original vs transformed distributions"""
    
    fig, axes = plt.subplots(1, 4, figsize=(20, 4))
    
    # PDF (Probability Density Function)
    sns.histplot(original, kde=True, ax=axes[0])
    axes[0].set_title('Original Distribution')
    
    sns.histplot(transformed, kde=True, ax=axes[1])
    axes[1].set_title('Transformed Distribution')
    
    # QQ Plots
    stats.probplot(original.flatten(), dist="norm", plot=axes[2])
    axes[2].set_title('Original QQ Plot')
    
    stats.probplot(transformed.flatten(), dist="norm", plot=axes[3])
    axes[3].set_title('Transformed QQ Plot')
    
    plt.tight_layout()
    plt.show()

# Usage
plots(x_train, x_train_transformed)
```

---

## Comparison & When to Use

### Function Transformer vs Power Transformer

| Aspect | Function Transformer | Power Transformer |
|--------|---------------------|-------------------|
| **Transformation** | Manual (you choose function) | Automatic (finds optimal λ) |
| **Flexibility** | Highly flexible (any function) | Fixed methods (Box-Cox/Yeo-Johnson) |
| **Optimization** | No automatic optimization | Automatically finds best transformation |
| **Use Case** | When you know which function to use | When unsure about transformation |
| **Output Range** | Depends on function | Typically -3 to +3 (standardized) |
| **Complexity** | Simple | More complex (statistical) |

### Decision Tree: Which to Use?

```
Is your data skewed?
    │
    ├─ No → Maybe no transformation needed
    │
    └─ Yes → Is it right-skewed?
            │
            ├─ Yes → Does it contain zeros/negatives?
            │        │
            │        ├─ No → Try: Log Transform OR Box-Cox
            │        │
            │        └─ Yes → Try: log1p Transform OR Yeo-Johnson
            │
            └─ No (left-skewed) → Try: Square/Power Transform OR Yeo-Johnson
```

### Quick Reference Guide

#### Use Function Transformer When:
✅ You know exactly which mathematical function to apply  
✅ Need custom/domain-specific transformations  
✅ Want full control over transformation  
✅ Building pipelines with custom logic  

#### Use Power Transformer When:
✅ Unsure which transformation is best  
✅ Want automatic optimization  
✅ Need standardized output  
✅ Working with multiple features simultaneously  
✅ Want statistical rigor (Box-Cox/Yeo-Johnson are well-established)  

---

## Implementation Examples

### Example 1: Function Transformer in Pipeline

```python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from sklearn.linear_model import LogisticRegression
import numpy as np

# Create pipeline with log transformation
pipeline = Pipeline([
    ('log_transform', FunctionTransformer(np.log1p)),
    ('classifier', LogisticRegression())
])

# Fit and predict
pipeline.fit(x_train, y_train)
y_pred = pipeline.predict(x_test)
```

### Example 2: ColumnTransformer with Function Transformer

```python
from sklearn.compose import ColumnTransformer

# Apply different transformations to different columns
trf = ColumnTransformer([
    ('log', FunctionTransformer(np.log1p), ['Fare']),
    ('sqrt', FunctionTransformer(np.sqrt), ['Age'])
], remainder='passthrough')

x_transformed = trf.fit_transform(x)
```

### Example 3: Power Transformer (Box-Cox)

```python
from sklearn.preprocessing import PowerTransformer

# Box-Cox (positive values only)
pt = PowerTransformer(method='box-cox')
x_transformed = pt.fit_transform(x_train)

# Check transformation parameters
print("Lambda values:", pt.lambdas_)
```

### Example 4: Power Transformer (Yeo-Johnson)

```python
# Yeo-Johnson (handles all values)
pt = PowerTransformer(method='yeo-johnson')
x_transformed = pt.fit_transform(x_train)

# In pipeline
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression

pipeline = Pipeline([
    ('power_transform', PowerTransformer(method='yeo-johnson')),
    ('regressor', LinearRegression())
])

pipeline.fit(x_train, y_train)
```

### Example 5: Comparing Transformations

```python
from sklearn.model_selection import cross_val_score

# Original data
lr = LogisticRegression()
score_original = cross_val_score(lr, x_train, y_train, cv=5, scoring='accuracy').mean()

# With log transform
trf = FunctionTransformer(np.log1p)
x_train_log = trf.fit_transform(x_train[['Fare']])
score_log = cross_val_score(lr, x_train_log, y_train, cv=5, scoring='accuracy').mean()

# With power transform
pt = PowerTransformer(method='yeo-johnson')
x_train_power = pt.fit_transform(x_train)
score_power = cross_val_score(lr, x_train_power, y_train, cv=5, scoring='accuracy').mean()

print(f"Original: {score_original:.4f}")
print(f"Log Transform: {score_log:.4f}")
print(f"Power Transform: {score_power:.4f}")
```

---

## Best Practices

### 1. Always Visualize Before & After
```python
# Before transformation
sns.histplot(data=df, x='Fare', kde=True)
plt.title('Before Transformation')
plt.show()

# Apply transformation
trf = FunctionTransformer(np.log1p)
df['Fare_transformed'] = trf.fit_transform(df[['Fare']])

# After transformation
sns.histplot(data=df, x='Fare_transformed', kde=True)
plt.title('After Transformation')
plt.show()
```

### 2. Use Cross-Validation to Evaluate
```python
from sklearn.model_selection import cross_val_score

# Don't rely on single train-test split
# Use cross-validation for robust evaluation
scores = cross_val_score(model, x_transformed, y, cv=5)
print(f"Mean accuracy: {scores.mean():.4f} (+/- {scores.std():.4f})")
```

### 3. Handle Zeros in Log Transform
```python
# ❌ Bad - will cause errors with zeros
np.log(x)

# ✅ Good - log1p handles zeros safely
np.log1p(x)  # equivalent to log(1 + x)
```

### 4. Document Your Transformations
```python
# Keep track of what transformations were applied
transformation_history = {
    'Fare': 'log1p',
    'Age': 'sqrt',
    'Income': 'yeo-johnson'
}
```

### 5. Apply Same Transformation to Test Set
```python
# ✅ Correct way
trf = PowerTransformer()
x_train_transformed = trf.fit_transform(x_train)
x_test_transformed = trf.transform(x_test)  # Only transform, don't fit again

# ❌ Wrong way - will cause data leakage
x_test_transformed = trf.fit_transform(x_test)
```

---

## Key Libraries

```python
# Core libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

# Sklearn transformers
from sklearn.preprocessing import FunctionTransformer, PowerTransformer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Model evaluation
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score, r2_score
```

---

## Summary

### Function Transformer
- Manual control over transformation
- Use specific mathematical functions (log, sqrt, square, reciprocal)
- Great when you know what transformation to apply
- Flexible for custom transformations

### Power Transformer
- **Box-Cox**: Positive values only, automatic optimization
- **Yeo-Johnson**: Works with all values (recommended default)
- Automatically finds optimal transformation parameter (λ)
- Standardized output (-3 to +3)

### When to Transform
- Data is significantly skewed (|skewness| > 1)
- Model performance is suboptimal
- Algorithm assumes normality (Linear/Logistic Regression)
- Presence of outliers affecting model

### When NOT to Transform
- Data is already normally distributed
- Using tree-based models (they handle non-normal data well)
- Transformation doesn't improve validation metrics
- Domain knowledge suggests transformation is inappropriate

**Remember:** Always validate improvements using cross-validation, not just visual inspection or lambda values!
