# üõ†Ô∏è Data Transformation: Reshaping for Better Modeling

Most Machine Learning algorithms (like Linear Regression, Logistic Regression, and LDA) assume that features are **normally distributed** and on a **similar scale**. Transformation is how we force "messy" real-world data to meet these assumptions.

---

### 1. Power Transformations (Fixing Shape)
These are used to reduce skewness and make a distribution look more Gaussian (Normal).

#### A. Log Transformation ($y = \ln(x)$)
* **Best for:** Right-skewed data, Log-normal data, or data with multiplicative growth (e.g., income, house prices).
* **Effect:** Compresses the long tail and spreads out the smaller values.
* **Constraint:** Only works for strictly positive data ($x > 0$).



#### B. Box-Cox Transformation
* **Description:** An automated power transformation that searches for a parameter $\lambda$ (lambda) to find the best way to make the data normal.
* **Formula:** $y = \frac{x^\lambda - 1}{\lambda}$ (if $\lambda \neq 0$).
* **Constraint:** Only works for positive data.

#### C. Yeo-Johnson Transformation
* **Description:** A modern version of Box-Cox.
* **Advantage:** Works for positive, zero, **and negative** values.

---

### 2. Scaling Transformations (Fixing Magnitude)
These do not change the *shape* of the distribution, but they change the *range* of the values so that one feature doesn't "overpower" another due to its size.

| Method | Formula | Range | Best For... |
| :--- | :--- | :--- | :--- |
| **Standardization** (Z-score) | $z = \frac{x - \mu}{\sigma}$ | $\mu=0, \sigma=1$ | Algorithms that assume Normality (SVM, Logistic Reg, PCA). |
| **Normalization** (Min-Max) | $x' = \frac{x - x_{min}}{x_{max} - x_{min}}$ | $[0, 1]$ | Image processing and Neural Networks. |
| **Robust Scaling** | $x' = \frac{x - Q_2}{Q_3 - Q_1}$ | Varies | Data with **many outliers** (uses median/IQR instead of mean/std). |



---

### 3. Why Transform? (The "Data Science" Impact)

1. **Stabilize Variance:** Ensures the error (residuals) are the same across all values (Homoscedasticity).
2. **Handle Outliers:** Log or Square Root transforms reduce the "pull" of extreme outliers.
3. **Linearity:** Can turn a curved relationship (exponential) into a straight-line relationship that Linear Regression can understand.
4. **Algorithm Convergence:** Gradient Descent (used in Deep Learning) converges much faster when all features are on a similar scale.

---

### üêç Python: Applying Transformations


In [1]:

import numpy as np
import pandas as pd
from scipy import stats
from sklearn.preprocessing import PowerTransformer, StandardScaler

# 1. Create Skewed Data
data = np.random.pareto(a=2, size=1000).reshape(-1, 1)

# 2. Apply Power Transformer (Yeo-Johnson)
pt = PowerTransformer(method='yeo-johnson')
data_transformed = pt.fit_transform(data)

# 3. Apply Standardization
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data_transformed)

# Result: Data is now Normal (via pt) and Scaled (via scaler)
print(f"Mean: {np.mean(data_scaled):.2f}, Std: {np.std(data_scaled):.2f}")

Mean: -0.00, Std: 1.00
