## **Non-Gaussian Distribution?**


## 🔍 **What is a Probability Distribution?**

A **probability distribution** shows how the values of a random variable are distributed. It tells you:
- **Which values** the variable can take.
- **How likely** each value is to occur.

The most commonly known probability distribution is the **Gaussian distribution (Normal distribution)**, which looks like a **bell curve**. It assumes that data is centered around a mean, with symmetrical spread (variance) on both sides.



## 🚨 **What is a Non-Gaussian Distribution?**

A **non-Gaussian distribution** is any distribution that does **NOT** follow the shape of the normal bell curve. These distributions can be skewed, heavy-tailed, multimodal, or have other unique characteristics.

In real-world data, many datasets are **non-Gaussian**!

### 🔎 **How to Identify a Non-Gaussian Distribution?**
Look for these signs:

| Property               | Gaussian (Normal) Distribution      | Non-Gaussian Distribution            |
|------------------------|-------------------------------------|--------------------------------------|
| Shape                  | Bell curve                          | Skewed, heavy-tailed, multimodal     |
| Mean = Median = Mode   | Yes                                  | No                                   |
| Symmetry               | Symmetrical                         | Asymmetrical                         |
| Tails                  | Thin (light tails)                   | Thick (heavy tails)                  |



### 📊 **Types of Non-Gaussian Distributions**

1️⃣ **Skewed Distribution**  
- **Example**: Income distribution in a population.
- **Explanation**: Most people earn around a lower amount, but a few individuals earn very high salaries, causing a **right-skewed** distribution.

2️⃣ **Bimodal Distribution**  
- **Example**: Test scores from two groups of students.
- **Explanation**: The distribution has **two peaks** (modes), meaning two different groups exist within the data.

3️⃣ **Heavy-Tailed Distribution**  
- **Example**: Stock market returns.
- **Explanation**: Heavy-tailed distributions have more extreme values (outliers) compared to a Gaussian distribution. This makes risk analysis in finance very challenging.

4️⃣ **Uniform Distribution**  
- **Example**: Rolling a fair die.
- **Explanation**: Each outcome has **equal probability**.

5️⃣ **Exponential Distribution**  
- **Example**: Time between arrivals of customers at a store.
- **Explanation**: This is used to model **time-based events**, where shorter intervals are more likely than longer intervals.



### 🧪 **Why Does Non-Gaussian Data Matter in Machine Learning?**

Most ML algorithms (like Linear Regression, Logistic Regression, etc.) **assume Gaussian distributions** for features. However, if your data is non-Gaussian, you might need to apply **transformations** to make it more normal-like.

🔧 **Techniques to Handle Non-Gaussian Data:**
1. **Log Transformation** – Used for right-skewed data.
2. **Box-Cox Transformation** – Makes data more Gaussian-like.
3. **Standard Scaling (Z-score)** – Centers data but doesn’t change distribution shape.



### 📚 **Real-World Examples of Non-Gaussian Data in Machine Learning**

| Use Case               | Distribution Type         | Explanation                               |
|------------------------|---------------------------|-------------------------------------------|
| Customer Income        | Skewed Distribution        | Most customers earn low, few earn high.   |
| Website Traffic        | Bimodal Distribution       | Peaks during morning and evening.         |
| Social Media Likes     | Heavy-Tailed Distribution  | Some posts go viral, most don’t.          |
| Call Center Wait Times | Exponential Distribution   | Short wait times are more common.         |


### 🤔 **Key Takeaways**
- **Gaussian distribution** is not always a good assumption for real-world data.
- **Non-Gaussian distributions** can be skewed, multimodal, or heavy-tailed.
- Handling non-Gaussian data requires **feature transformation techniques** in ML.
- Understanding the distribution helps in **better model selection** and **more accurate predictions**.

---



![](non-gauss.png)

---

![](uniform.png)

---

## 🔧 **1️⃣ Log Transformation (for Right-Skewed Data)**
### 📘 **What is Log Transformation?**
A **log transformation** is a mathematical technique used to handle **right-skewed data**. It **compresses larger values more than smaller values**, reducing the impact of outliers and making the data closer to a normal distribution.

### 📈 **When to Use:**
- When your data has a **long right tail** (right-skewed).
- When dealing with **income**, **prices**, **population sizes**, etc., where the values grow exponentially.



### 🤔 **Why Use Log Transformation?**
Imagine you have the following data:

| Income ($) |
|------------|
| 30,000     |
| 50,000     |
| 100,000    |
| 500,000    |
| 1,000,000  |

Without a log transformation, the **larger incomes dominate** the dataset. A log transformation **scales down** these large values, making the dataset more balanced.



### ✅ **Example:**
```python
import numpy as np
import matplotlib.pyplot as plt

# Original right-skewed data
data = np.random.exponential(scale=2, size=1000)

# Apply log transformation
log_data = np.log1p(data)

# Plot original vs transformed data
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.hist(data, bins=30, color='orange', alpha=0.7)
plt.title("Original Data (Right-Skewed)")

plt.subplot(1, 2, 2)
plt.hist(log_data, bins=30, color='green', alpha=0.7)
plt.title("Log Transformed Data")

plt.show()
```



### 🔍 **Key Points:**
- **`np.log1p(data)`** applies a **log transformation**.
- Notice how the long right tail is **compressed**, making the data closer to a normal distribution.



## 🔧 **2️⃣ Box-Cox Transformation (General Transformation)**
### 📘 **What is Box-Cox Transformation?**
The **Box-Cox Transformation** is a **general method** to make data **more Gaussian-like** by adjusting the distribution shape using a **lambda (λ)** parameter.

Unlike the log transformation, **Box-Cox** works for both **right-skewed** and **left-skewed** data.



### 🤔 **Why Use Box-Cox?**
It’s more flexible than a log transformation because it can handle **various types of skewness** by tuning the **lambda (λ)** parameter.

| Lambda (λ) Value | Transformation Applied     |
|------------------|----------------------------|
| λ = 0            | Log Transformation         |
| λ = 1            | No Transformation (Original Data) |
| λ = -1           | Reciprocal Transformation   |



### ✅ **Example:**
```python
from scipy import stats

# Original right-skewed data
data = np.random.exponential(scale=2, size=1000)

# Apply Box-Cox transformation
box_cox_data, _ = stats.boxcox(data + 1)  # Add 1 to avoid issues with zero values

# Plot original vs transformed data
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.hist(data, bins=30, color='orange', alpha=0.7)
plt.title("Original Data (Right-Skewed)")

plt.subplot(1, 2, 2)
plt.hist(box_cox_data, bins=30, color='blue', alpha=0.7)
plt.title("Box-Cox Transformed Data")

plt.show()
```



### 🔍 **Key Points:**
- **Box-Cox works best for positive values.** 
- The transformation adjusts based on the **lambda (λ)** parameter to make data more Gaussian-like.



## 🔧 **3️⃣ Standard Scaling (Z-Score Normalization)**
### 📘 **What is Standard Scaling?**
**Standard Scaling** (also known as **Z-score normalization**) transforms the data by **centering it at 0** and scaling it to have a **standard deviation of 1**.

Unlike log or Box-Cox transformations, **Z-score scaling doesn’t change the distribution shape**. It just **standardizes** the values to make them easier to compare.



### 🧮 **Formula for Z-Score:**
$$
Z = \frac{X - \mu}{\sigma}
$$
Where:
- $ X $ = Original value
- $ \mu $ = Mean of the data
- $ \sigma $ = Standard deviation of the data



### 🤔 **Why Use Standard Scaling?**
- When you want to ensure all features in your dataset are on the **same scale**.
- It’s particularly useful for **machine learning algorithms** that are sensitive to different scales, such as **KNN**, **SVM**, and **PCA**.



### ✅ **Example:**
```python
from sklearn.preprocessing import StandardScaler

# Sample data
data = np.random.rand(100, 1) * 100  # Random values between 0 and 100

# Apply Standard Scaling
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

# Plot original vs scaled data
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.hist(data, bins=30, color='orange', alpha=0.7)
plt.title("Original Data")

plt.subplot(1, 2, 2)
plt.hist(scaled_data, bins=30, color='green', alpha=0.7)
plt.title("Standard Scaled Data")

plt.show()
```



### 🔍 **Key Points:**
- **StandardScaler** in **Scikit-Learn** is used for Z-score scaling.
- Notice how the scaled data is centered around **0** with a **standard deviation of 1**.



## 🛠 **Summary: When to Use Which Transformation?**

| Transformation      | Use Case                                | Example                                  |
|---------------------|-----------------------------------------|------------------------------------------|
| **Log Transformation** | Right-skewed data                      | Income, Prices                           |
| **Box-Cox Transformation** | Both right and left-skewed data        | Power Consumption, Population Sizes      |
| **Standard Scaling**  | Standardizing data without changing shape | Machine Learning Algorithms              |

---

![](log.png)
![](box-cox.png)
![](scaling.png)

---

# 🧪 **What is Yeo-Johnson Transformation?**

The **Yeo-Johnson Transformation** is a **generalized transformation method** that helps make your data look more **Gaussian-like** (normally distributed). It is similar to the **Box-Cox Transformation**, but with one important difference:

✅ **It can handle both positive and negative values!**  
✅ **It works for skewed data (both left-skewed and right-skewed).**  



## 📈 **When to Use Yeo-Johnson Transformation?**
Use the Yeo-Johnson Transformation when:

- Your data contains **both positive and negative values**.
- Your data is **skewed** (either left or right).
- You need a **more flexible transformation** compared to Box-Cox.



## 🧮 **The Formula for Yeo-Johnson:**

$$
y(\lambda) = 
\begin{cases} 
\left[ \left( y + 1 \right)^\lambda - 1 \right] / \lambda & \text{if } y \geq 0, \lambda \neq 0 \\
\log(y + 1) & \text{if } y \geq 0, \lambda = 0 \\
- \left[ \left( -y + 1 \right)^{2 - \lambda} - 1 \right] / (2 - \lambda) & \text{if } y < 0, \lambda \neq 2 \\
-\log(-y + 1) & \text{if } y < 0, \lambda = 2
\end{cases}
$$

Don't worry if this looks complicated! It basically adjusts the transformation based on whether the values are **positive or negative**, and uses a **lambda (λ)** parameter to control the transformation strength.



## 🤔 **Difference Between Yeo-Johnson and Box-Cox:**

| Feature                | Box-Cox Transformation               | Yeo-Johnson Transformation             |
|------------------------|--------------------------------------|---------------------------------------|
| **Handles Negative Values** | ❌ No                                 | ✅ Yes                                 |
| **Requires Positive Values** | ✅ Yes (all values must be > 0)        | ❌ No (can handle both positive & negative) |
| **Flexibility**         | Works for right-skewed & left-skewed | Works for all types of skewed data    |



## ✅ **Example in Python:**

Let's apply the **Yeo-Johnson Transformation** on a dataset with **both positive and negative values**.

```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PowerTransformer

# Generate a dataset with both positive and negative values
data = np.random.normal(loc=0, scale=5, size=1000)  # Normally distributed with mean 0

# Apply Yeo-Johnson Transformation
yeo_johnson = PowerTransformer(method='yeo-johnson')
transformed_data = yeo_johnson.fit_transform(data.reshape(-1, 1))

# Plot original vs transformed data
plt.figure(figsize=(12, 6))

# Original data
plt.subplot(1, 2, 1)
plt.hist(data, bins=30, color='orange', alpha=0.7)
plt.title("Original Data")

# Transformed data
plt.subplot(1, 2, 2)
plt.hist(transformed_data, bins=30, color='green', alpha=0.7)
plt.title("Yeo-Johnson Transformed Data")

plt.tight_layout()
plt.show()
```



## 🔍 **Explanation of the Code:**
- **`PowerTransformer(method='yeo-johnson')`**: This is the Scikit-Learn function to apply the Yeo-Johnson transformation.
- **Original Data:** The histogram shows a skewed distribution with both positive and negative values.
- **Transformed Data:** The Yeo-Johnson transformation adjusts the data to make it look more like a **normal distribution**.



## 📊 **How Yeo-Johnson Works with Different Skewed Data:**

| Data Type        | Effect of Yeo-Johnson Transformation |
|------------------|-------------------------------------|
| Right-skewed     | Reduces the **right tail** to make it more symmetric |
| Left-skewed      | Reduces the **left tail** to make it more symmetric  |
| Positive values  | Adjusts values like **Box-Cox**     |
| Negative values  | Works without requiring any shift   |



## 🧪 **When Should You Use Yeo-Johnson vs. Box-Cox?**

| Scenario                         | Transformation to Use     |
|----------------------------------|--------------------------|
| Data has **only positive values** | Box-Cox Transformation    |
| Data has **positive & negative values** | Yeo-Johnson Transformation |



## 🧩 **Advantages of Yeo-Johnson:**
- Works for both **positive and negative values**.
- Reduces skewness in the data.
- Helps in improving **model performance** for algorithms that assume normally distributed data (e.g., **linear regression**, **SVM**).



## ⚠️ **Important Notes:**
- **Yeo-Johnson** works well for datasets with both **positive and negative values**.
- For **Box-Cox**, all values must be **greater than zero**.
- Both transformations use the **lambda (λ)** parameter to control the transformation strength.



## ✅ **Summary:**

| Transformation      | Handles Negative Values | Handles Skewed Data | Normalizes Data | Common Use Case                |
|---------------------|------------------------|---------------------|-----------------|--------------------------------|
| **Log Transformation** | ❌ No                   | ✅ Yes               | ✅ Yes          | Right-skewed data              |
| **Box-Cox Transformation** | ❌ No                   | ✅ Yes               | ✅ Yes          | Positive right-skewed data     |
| **Yeo-Johnson Transformation** | ✅ Yes                  | ✅ Yes               | ✅ Yes          | Both positive & negative values |

---

![](yeo-jhonson.png)

# 📚 **Transformations in Statistics**

**Transformations** in statistics are mathematical operations applied to a dataset to make it easier to analyze, interpret, and model. They are often used to:

✅ **Reduce skewness**  
✅ **Stabilize variance**  
✅ **Make data more Gaussian-like**  
✅ **Improve model performance**



## 🧩 **Types of Transformations:**

1. **Log Transformation**  
2. **Square Root Transformation**  
3. **Reciprocal Transformation**  
4. **Box-Cox Transformation**  
5. **Yeo-Johnson Transformation**  
6. **Power Transformation**  
7. **Z-score Normalization (Standard Scaling)**  
8. **Min-Max Scaling (Normalization)**

Let’s explore each transformation in detail.



## 🧮 **1️⃣ Log Transformation**

**Formula:**  
$$
y' = \log(y)
$$

**Purpose:**  
- Reduces **right skewness**.  
- Helps when the data has **large outliers**.

**Example:** Income data often has a **right-skewed** distribution, where most people earn average salaries, and a few earn very high amounts.

### ✅ **Python Code Example:**

```python
import numpy as np
import matplotlib.pyplot as plt

# Sample data
data = np.random.exponential(scale=2, size=1000)

# Log transformation
log_data = np.log(data + 1)  # Adding 1 to avoid log(0)

# Plot original vs transformed data
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.hist(data, bins=30, color='orange')
plt.title("Original Data")

plt.subplot(1, 2, 2)
plt.hist(log_data, bins=30, color='green')
plt.title("Log Transformed Data")

plt.show()
```



## 🧮 **2️⃣ Square Root Transformation**

**Formula:**  
$$
y' = \sqrt{y}
$$

**Purpose:**  
- Useful for **count data** (e.g., number of visits to a website).  
- Reduces the impact of **large values** and stabilizes **variance**.

**Example:** If you have count data like **number of customer visits** or **sales transactions**, this transformation helps reduce skewness.

### ✅ **Python Code Example:**

```python
# Sample data
count_data = np.random.poisson(lam=5, size=1000)

# Square root transformation
sqrt_data = np.sqrt(count_data)

# Plot original vs transformed data
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.hist(count_data, bins=30, color='orange')
plt.title("Original Count Data")

plt.subplot(1, 2, 2)
plt.hist(sqrt_data, bins=30, color='green')
plt.title("Square Root Transformed Data")

plt.show()
```



## 🧮 **3️⃣ Reciprocal Transformation**

**Formula:**  
$$
y' = \frac{1}{y}
$$

**Purpose:**  
- Reduces the impact of **large values**.  
- Often used for **right-skewed** data.

**Example:** Useful for **rates or ratios**, like **speed (1/time)** or **frequency**.



## 🧮 **4️⃣ Box-Cox Transformation**

**Formula:**  
$$
y'(\lambda) = 
\begin{cases} 
\frac{y^\lambda - 1}{\lambda} & \lambda \neq 0 \\
\log(y) & \lambda = 0
\end{cases}
$$

**Purpose:**  
- Makes data **more Gaussian-like**.  
- Works only for **positive values**.

**Lambda (λ):** The parameter that controls the transformation.

### ✅ **Python Code Example:**

```python
from scipy.stats import boxcox

# Sample data
data = np.random.exponential(scale=2, size=1000)

# Apply Box-Cox Transformation
boxcox_data, lambda_val = boxcox(data + 1)  # Adding 1 to avoid issues with zero values

print(f"Optimal Lambda: {lambda_val}")

# Plot original vs transformed data
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.hist(data, bins=30, color='orange')
plt.title("Original Data")

plt.subplot(1, 2, 2)
plt.hist(boxcox_data, bins=30, color='green')
plt.title("Box-Cox Transformed Data")

plt.show()
```



## 🧮 **5️⃣ Yeo-Johnson Transformation**  
(Explained earlier)



## 🧮 **6️⃣ Power Transformation (Generalized)**

**Formula:**  
$$
y' = y^\lambda
$$

**Purpose:**  
- Stabilizes **variance**.  
- Reduces **skewness**.

### ✅ **Python Code Example:**

```python
from sklearn.preprocessing import PowerTransformer

# Sample data
data = np.random.normal(loc=5, scale=10, size=1000)

# Apply Power Transformation (Yeo-Johnson method)
pt = PowerTransformer(method='yeo-johnson')
transformed_data = pt.fit_transform(data.reshape(-1, 1))

# Plot original vs transformed data
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.hist(data, bins=30, color='orange')
plt.title("Original Data")

plt.subplot(1, 2, 2)
plt.hist(transformed_data, bins=30, color='green')
plt.title("Power Transformed Data")

plt.show()
```



## 🧮 **7️⃣ Z-score Normalization (Standard Scaling)**

**Formula:**  
$$
z = \frac{x - \mu}{\sigma}
$$

**Purpose:**  
- Centers the data around **mean = 0** and **standard deviation = 1**.  
- Does not change the **shape** of the distribution.



## 🧮 **8️⃣ Min-Max Scaling (Normalization)**

**Formula:**  
$$
x' = \frac{x - \min(x)}{\max(x) - \min(x)}
$$

**Purpose:**  
- Rescales data to a **fixed range**, typically **[0, 1]**.  
- Useful for **features with different scales**.

## 📊 **Comparison of Transformations:**

| Transformation       | Purpose                               | Handles Negative Values? | Stabilizes Variance? | Reduces Skewness? |
|----------------------|---------------------------------------|-------------------------|----------------------|-------------------|
| Log Transformation   | Reduces right skew                    | ❌ No                    | ✅ Yes               | ✅ Yes            |
| Square Root          | Stabilizes variance                   | ✅ Yes                  | ✅ Yes               | ✅ Yes            |
| Reciprocal           | Reduces impact of large values        | ❌ No                    | ✅ Yes               | ✅ Yes            |
| Box-Cox              | Makes data Gaussian-like              | ❌ No                    | ✅ Yes               | ✅ Yes            |
| Yeo-Johnson          | Makes data Gaussian-like              | ✅ Yes                  | ✅ Yes               | ✅ Yes            |
| Power Transformation | Generalized transformation            | ✅ Yes                  | ✅ Yes               | ✅ Yes            |
| Z-score Scaling      | Centers data                          | ✅ Yes                  | ❌ No                | ❌ No             |
| Min-Max Scaling      | Rescales data to [0, 1]               | ✅ Yes                  | ❌ No                | ❌ No             |


## ✅ **Summary:**
Transformations help handle skewed data, stabilize variance, and improve the performance of statistical models. Here’s when to use each:

- **Log Transformation**: Right-skewed data with large outliers.  
- **Box-Cox**: Positive data only, when you want a Gaussian-like distribution.  
- **Yeo-Johnson**: For both positive and negative values.  
- **Z-score Scaling**: Standardizes data without changing distribution shape.  
- **Min-Max Scaling**: Rescales features to a fixed range.

---

![](sqrt.png)