# Box Cox Transform 

## Box–Cox Transformation (Theoretical Explanation)

The **Box–Cox transformation** is a family of power transformations used to make data more **normally distributed** and to **stabilize variance**.  
It is widely used in statistical modeling and machine learning when features are **strictly positive**.

---

## Why Box–Cox Transformation?

Many statistical and ML models assume:
- Linear relationships
- Homoscedasticity (constant variance)
- Approximately normal feature distributions

However, real-world data is often:
- Right-skewed
- Heavy-tailed
- Heteroscedastic

> **Box–Cox addresses these issues by automatically selecting a power parameter (λ) that best normalizes the data.**

---

## Mathematical Definition

For a strictly positive variable \( x > 0 \), the Box–Cox transformation is defined as:

\[
x^λ =
\begin{cases}
\dfrac{x^{\lambda} - 1}{\lambda}, & \lambda \neq 0 \\
\log(x), & \lambda = 0
\end{cases}
\]

Where:
- \( λ) is a real-valued parameter
- \( λ) controls the strength and type of transformation

---

##  Role of the λ (Lambda) Parameter

The value of \( \lambda \) determines the shape of the transformation:

| λ value | Equivalent transformation |
|------|---------------------------|
| 1 | No transformation |
| 0 | Log transformation |
| 0.5 | Square root |
| -1 | Reciprocal |
| 2 | Square |

 **Box–Cox automatically estimates the optimal λ** using maximum likelihood estimation.

---

## Key Assumptions

The Box–Cox transformation has a **strict requirement**:

> **All data values must be strictly positive (\( x > 0 \)).**

This means:
- Zero values are not allowed
- Negative values are not allowed

---

## What Box–Cox Achieves

- Reduces skewness
- Stabilizes variance
- Makes data more symmetric
- Improves linear model performance

It does **not guarantee perfect normality**, but often brings the distribution much closer.

---

## When to Use Box–Cox

> Feature values are strictly positive
> Strong right skew is present  
> Linear or parametric models are used  

Examples:
- Prices
- Income
- Population counts
- Ticket fares (like Titanic Fare)

---

## When NOT to Use Box–Cox

>  Zero values present  
>  Negative values present  
>  Tree-based models (often unnecessary)  

In such cases, **Yeo–Johnson transformation** is preferred.

---

## Key Takeaway

> **Box–Cox is a data-driven power transformation that automatically selects the best exponent to make strictly positive data more normally distributed.**

---


> **“Box–Cox is a power transformation that estimates an optimal λ to reduce skewness and stabilize variance, but it requires strictly positive data.”**


## Estimating λ in Box–Cox Transformation

There are two main techniques to estimate the power parameter λ:

### Maximum Likelihood Estimation (MLE)
- Chooses λ that maximizes normality of transformed data
- Fast and data-driven
- Most commonly used in practice (default in sklearn)

### Bayesian Estimation
- Treats λ as a random variable
- Combines prior belief with observed data
- More robust but computationally expensive

### Key Point
> **MLE is preferred for most ML workflows; Bayesian methods are used when uncertainty or prior knowledge is important.**


# Yeo-Johnson Transform 

The **Yeo–Johnson transformation** is a power transformation used to make data more normally distributed and to stabilize variance.

---

### Key Idea
Yeo–Johnson extends the Box–Cox transformation to support zero and negative values.

---

### Main Characteristics
- Automatically estimates the optimal power parameter (λ)
- Reduces skewness
- Improves distribution symmetry
- More flexible than Box–Cox

---

### Data Requirements
- Works with positive values
- Works with zero
- Works with negative values

---

### When to Use
- Data contains zero or negative values
- Strict positivity cannot be guaranteed
- Using linear or distance-based models

---

### Comparison with Box–Cox
- Box–Cox requires strictly positive data
- Yeo–Johnson works on all real values

---



## Yeo–Johnson Transformation (Short Notes + Formula)

The **Yeo–Johnson transformation** is a power transformation used to make data more normally distributed and to stabilize variance.  
It is a generalized version of Box–Cox that works with **all real values**.

---

### Key Idea
Yeo–Johnson extends Box–Cox by allowing:
- Zero values
- Negative values

---

## Yeo–Johnson Transformation Formula

For a real-valued variable `x` and power parameter `λ`:

### Case 1: x ≥ 0

- If λ ≠ 0  
  f(x) = ((x + 1)^λ − 1) / λ

- If λ = 0  
  f(x) = log(x + 1)

---

### Case 2: x < 0

- If λ ≠ 2  
  f(x) = − [ ( (−x + 1)^(2 − λ) − 1 ) / (2 − λ) ]

- If λ = 2  
  f(x) = − log(−x + 1)

---

### What λ Controls
- λ determines the strength and shape of the transformation
- The optimal λ is estimated automatically (usually via Maximum Likelihood)

---

### Main Characteristics
- Reduces skewness
- Stabilizes variance
- Produces more symmetric distributions
- Safer default than Box–Cox

---

### When to Use
- Data contains zero or negative values
- Distribution is skewed
- Using linear or distance-based models

---

### Comparison with Box–Cox
- Box–Cox: requires x > 0
- Yeo–Johnson: works for all real x

---

### Key Takeaway
Yeo–Johnson is a flexible, data-driven power transformation that automatically adapts to both positive and negative values.
