# Boxplot Rule (1.5 × IQR)

This method uses the Interquartile Range (IQR) to define boundaries for "normal" data values. Any points outside these boundaries are considered outliers.

#### What is IQR?

IQR measures the spread of the middle 50% of the data

It's the range between the 25th percentile (Q1) and 75th percentile (Q3)

The Formulas:


``` IQR = Q3 - Q1 ```

Lower Bound = Q1 - 1.5 × IQR
Upper Bound = Q3 + 1.5 × IQR
Outlier Detection Rule:

Any value < Lower Bound is a lower outlier

Any value > Upper Bound is an upper outlier

Why 1.5? Mathematical Intuition
The 1.5 multiplier comes from statistical properties of the normal distribution:

In a perfect normal distribution:

Q1 ≈ μ - 0.675σ

Q3 ≈ μ + 0.675σ

IQR ≈ 1.35σ

1.5 × IQR ≈ 2.0σ

So the bounds are approximately ±2.7 standard deviations from the mean

This captures about 99.3% of data in a normal distribution

### Step-by-Step Example

``` Let's use our familiar dataset:```

``` Dataset: [10, 12, 12, 13, 14, 15, 16, 120]```
``` bash
Step 1: Sort the Data


Sorted: [10, 12, 12, 13, 14, 15, 16, 120]
Step 2: Calculate Q1 (25th percentile)


Position of Q1 = 0.25 × (n + 1) = 0.25 × 9 = 2.25
This means Q1 is 25% of the way between the 2nd and 3rd values

2nd value = 12, 3rd value = 12
Q1 = 12 + 0.25 × (12 - 12) = 12
Step 3: Calculate Q3 (75th percentile)


Position of Q3 = 0.75 × (n + 1) = 0.75 × 9 = 6.75
This means Q3 is 75% of the way between the 6th and 7th values

6th value = 15, 7th value = 16
Q3 = 15 + 0.75 × (16 - 15) = 15.75
Step 4: Calculate IQR


IQR = Q3 - Q1 = 15.75 - 12 = 3.75
Step 5: Calculate Boundaries


Lower Bound = Q1 - 1.5 × IQR = 12 - 1.5 × 3.75 = 12 - 5.625 = 6.375
Upper Bound = Q3 + 1.5 × IQR = 15.75 + 1.5 × 3.75 = 15.75 + 5.625 = 21.375
Step 6: Identify Outliers

Any value < 6.375 → No lower outliers

Any value > 21.375 → 120 is an outlier!

Visualizing with Boxplot
This is exactly how a boxplot works:


Lower Whisker: 10 (smallest value ≥ Lower Bound)
Q1: 12
Median (Q2): 13.5
Q3: 15.75
Upper Whisker: 16 (largest value ≤ Upper Bound)
Outlier: • 120
Different Calculation Methods for Quartiles
Important Note: There are multiple ways to calculate quartiles! The example above used Method 1. Common methods include:

Method 1 (Inclusive): position = 0.25 × (n + 1)
Method 2 (Exclusive): position = 0.25 × n + 0.75
Method 3 (Numpy default): position = 0.25 × (n - 1) + 1

Let's see Method 3 (used in pandas/numpy):

Using Method 3:


Sorted indices: [0, 1, 2, 3, 4, 5, 6, 7] for [10, 12, 12, 13, 14, 15, 16, 120]

Q1 position = 0.25 × (8 - 1) + 1 = 2.75
Q1 = value at position 2.75 = 12 + 0.75 × (12 - 12) = 12

Q3 position = 0.75 × (8 - 1) + 1 = 6.25  
Q3 = value at position 6.25 = 15 + 0.25 × (16 - 15) = 15.25

IQR = 15.25 - 12 = 3.25
Upper Bound = 15.25 + 1.5 × 3.25 = 20.125
Still identifies 120 as an outlier!
```

### When to Use the Boxplot Rule
Excellent for:

Non-normal distributions (doesn't assume normality)

Skewed data (robust to asymmetry)

Quick visual analysis (boxplots are intuitive)

Most real-world scenarios (general-purpose method)

Considerations:

Different software may use different quartile calculation methods

The 1.5 multiplier is conventional but can be adjusted (3.0 for "far outliers")

May be too conservative for very large datasets

In [1]:
import numpy as np

def detect_outliers_iqr(data, method=1):
    """
    Detect outliers using 1.5 × IQR method
    """
    data = np.array(data)
    
    # Calculate quartiles based on method
    if method == 1:
        # Method 1: position = 0.25 * (n + 1)
        n = len(data)
        sorted_data = np.sort(data)
        
        # Q1 position
        q1_pos = 0.25 * (n + 1)
        if q1_pos.is_integer():
            q1 = sorted_data[int(q1_pos) - 1]
        else:
            pos = int(q1_pos) - 1
            frac = q1_pos - int(q1_pos)
            q1 = sorted_data[pos] + frac * (sorted_data[pos + 1] - sorted_data[pos])
        
        # Q3 position
        q3_pos = 0.75 * (n + 1)
        if q3_pos.is_integer():
            q3 = sorted_data[int(q3_pos) - 1]
        else:
            pos = int(q3_pos) - 1
            frac = q3_pos - int(q3_pos)
            q3 = sorted_data[pos] + frac * (sorted_data[pos + 1] - sorted_data[pos])
    
    else:
        # Method 3 (numpy/pandas default)
        q1, q3 = np.percentile(data, [25, 75], method='linear')
    
    # Calculate IQR and bounds
    iqr = q3 - q1
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr
    
    # Identify outliers
    outliers = []
    for i, value in enumerate(data):
        if value < lower_bound or value > upper_bound:
            outliers.append((i, value))
    
    return outliers, q1, q3, iqr, lower_bound, upper_bound

# Example usage
data = [10, 12, 12, 13, 14, 15, 16, 120]
outliers, q1, q3, iqr, lower_bound, upper_bound = detect_outliers_iqr(data)

print(f"Q1: {q1:.2f}, Q3: {q3:.2f}, IQR: {iqr:.2f}")
print(f"Lower Bound: {lower_bound:.2f}, Upper Bound: {upper_bound:.2f}")
print(f"Outliers: {outliers}")

# Using pandas (which uses Method 3)
import pandas as pd
df = pd.DataFrame({'values': data})
Q1 = df['values'].quantile(0.25)
Q3 = df['values'].quantile(0.75)
IQR = Q3 - Q1
pandas_lower = Q1 - 1.5 * IQR
pandas_upper = Q3 + 1.5 * IQR

print(f"\nPandas method - Lower: {pandas_lower:.2f}, Upper: {pandas_upper:.2f}")

Q1: 12.00, Q3: 15.75, IQR: 3.75
Lower Bound: 6.38, Upper Bound: 21.38
Outliers: [(7, np.int64(120))]

Pandas method - Lower: 7.12, Upper: 20.12
