

# Grubbs' Test

 Grubbs' Test is a formal statistical hypothesis test that determines whether the most extreme value in a dataset is a statistically significant outlier.

``` It can only detect ONE outlier at a time. If you suspect multiple outliers, you need to run it iteratively or use Generalized ESD. ```

The Test Statistic:


``` G = |suspect value - mean| / standard deviation```
Hypotheses:

H₀ (Null Hypothesis): There are no outliers in the dataset

H₁ (Alternative Hypothesis): There is at least one outlier

### How It Works
Identify the suspect value - the point farthest from the mean

Calculate the test statistic G

Compare G with the critical value from Grubbs' distribution

If G > critical value, reject H₀ and conclude the suspect is an outlier

Critical Value Formula:


``` G_critical = (n-1)/√n × √(t²₍α/(2n), n-2) / (n - 2 + t²₍α/(2n), n-2))```

Where:

n = sample size

t = critical value from t-distribution

α = significance level (usually 0.05)

### Step-by-Step Example
Let's work through a clear example:

``` Dataset: [8, 10, 12, 13, 14, 19, 25, 30, 100]```
```Parameters: α = 0.05```
``` bash
Step 1: Identify Suspect Value

Data: [8, 10, 12, 13, 14, 19, 25, 30, 100]

The value 100 is visually the most extreme

Step 2: Calculate Test Statistic


Mean (μ) = (8+10+12+13+14+19+25+30+100)/9 = 25.67
Std Dev (σ) = 29.27

G = |100 - 25.67| / 29.27 = 74.33 / 29.27 = 2.54
Step 3: Find Critical Value
For n=9, α=0.05:

From Grubbs' table: G_critical = 2.215

Or calculate: t-value for α/(2n) = 0.05/(2×9) = 0.00278

Degrees of freedom = n-2 = 7

t = 3.499 (from t-distribution)

G_critical = (8/3) × √(3.499² / (7 + 3.499²)) = 2.667 × √(12.243 / 19.243) = 2.215

Step 4: Make Decision


G = 2.54 > G_critical = 2.215
Reject H₀ - 100 is a statistically significant outlier

Two-Sided vs One-Sided Test
Grubbs' Test can detect outliers on both sides or just one:

Two-Sided Test (Most Common):

Tests for max OR min being outlier

G = max(|x_i - mean|) / std

One-Sided Test:

Upper: Only tests maximum value

Lower: Only tests minimum value

```

### When to Use Grubbs' Test
Excellent for:

Formal statistical testing requiring rigor

Quality control processes

Scientific research and publications

Regulated industries (pharma, manufacturing)

Single outlier detection with normal data

#### Requirements:

Data should be approximately normally distributed

Only one outlier expected (or use iteratively)

Sample size typically 3 ≤ n ≤ 100

##### Limitations:

Only detects one outlier at a time

Masking effect: Multiple outliers can hide each other

Assumes normality

Requires iterative application for multiple outliers

In [1]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

def grubbs_test(data, alpha=0.05, test_type='two-sided'):
    """
    Perform Grubbs' Test for Outliers
    """
    data = np.array(data)
    n = len(data)
    
    if n < 3:
        raise ValueError("Grubbs' Test requires at least 3 data points")
    
    # Calculate basic statistics
    mean = np.mean(data)
    std = np.std(data, ddof=1)
    
    # Calculate test statistic based on test type
    if test_type == 'two-sided':
        deviations = np.abs(data - mean)
        max_dev_idx = np.argmax(deviations)
        g_calculated = deviations[max_dev_idx] / std
        suspect_value = data[max_dev_idx]
        
    elif test_type == 'upper':
        g_calculated = (np.max(data) - mean) / std
        suspect_value = np.max(data)
        max_dev_idx = np.argmax(data)
        
    elif test_type == 'lower':
        g_calculated = (mean - np.min(data)) / std
        suspect_value = np.min(data)
        max_dev_idx = np.argmin(data)
    
    # Calculate critical value
    t_critical = stats.t.ppf(1 - alpha / (2 * n), n - 2)
    g_critical = ((n - 1) / np.sqrt(n)) * np.sqrt(t_critical**2 / (n - 2 + t_critical**2))
    
    # Make decision
    is_outlier = g_calculated > g_critical
    
    return {
        'suspect_value': suspect_value,
        'suspect_index': max_dev_idx,
        'test_statistic': g_calculated,
        'critical_value': g_critical,
        'is_outlier': is_outlier,
        'mean': mean,
        'std': std,
        'alpha': alpha,
        'test_type': test_type
    }

# Example 1: Clear outlier
print("=== Example 1: Clear Outlier ===")
data1 = [8, 10, 12, 13, 14, 19, 25, 30, 100]
result1 = grubbs_test(data1)

print(f"Data: {data1}")
print(f"Suspect value: {result1['suspect_value']}")
print(f"Test statistic (G): {result1['test_statistic']:.3f}")
print(f"Critical value: {result1['critical_value']:.3f}")
print(f"Outlier? {result1['is_outlier']}")
print(f"Decision: {'REJECT H0 - Outlier detected' if result1['is_outlier'] else 'FAIL TO REJECT H0 - No outlier'}")

# Example 2: Borderline case
print("\n=== Example 2: Borderline Case ===")
data2 = [10, 11, 12, 13, 14, 15, 16, 17, 18, 30]  # 30 might be outlier
result2 = grubbs_test(data2)

print(f"Suspect value: {result2['suspect_value']}")
print(f"Test statistic (G): {result2['test_statistic']:.3f}")
print(f"Critical value: {result2['critical_value']:.3f}")
print(f"Outlier? {result2['is_outlier']}")

# Example 3: No outlier
print("\n=== Example 3: No Outlier ===")
data3 = [15, 16, 17, 18, 19, 20, 21, 22, 23]
result3 = grubbs_test(data3)

print(f"Suspect value: {result3['suspect_value']}")
print(f"Test statistic (G): {result3['test_statistic']:.3f}")
print(f"Critical value: {result3['critical_value']:.3f}")
print(f"Outlier? {result3['is_outlier']}")

# Iterative Grubbs' Test for multiple outliers
def iterative_grubbs_test(data, alpha=0.05, max_iterations=5):
    """
    Apply Grubbs' Test iteratively to find multiple outliers
    """
    current_data = data.copy()
    outliers = []
    
    for iteration in range(max_iterations):
        if len(current_data) < 3:
            break
            
        result = grubbs_test(current_data, alpha)
        
        if not result['is_outlier']:
            break
            
        # Find original index of the outlier
        outlier_value = result['suspect_value']
        original_idx = np.where(np.array(data) == outlier_value)[0][0]
        
        outliers.append({
            'iteration': iteration + 1,
            'value': outlier_value,
            'original_index': original_idx,
            'test_statistic': result['test_statistic'],
            'critical_value': result['critical_value']
        })
        
        # Remove the outlier for next iteration
        current_data = current_data[current_data != outlier_value]
    
    return outliers

# Test iterative approach
print("\n=== Iterative Grubbs' Test ===")
data_multiple = [8, 10, 12, 13, 14, 19, 25, 30, 100, 150]  # Two outliers
outliers_found = iterative_grubbs_test(data_multiple)

print(f"Original data: {data_multiple}")
print(f"Outliers found: {len(outliers_found)}")
for outlier in outliers_found:
    print(f"Iteration {outlier['iteration']}: Value {outlier['value']} at index {outlier['original_index']}")

# Visualization
def plot_grubbs_results(data, result):
    plt.figure(figsize=(10, 4))
    
    # Plot data points
    colors = ['red' if i == result['suspect_index'] else 'blue' for i in range(len(data))]
    plt.scatter(range(len(data)), data, c=colors, s=100, alpha=0.7)
    
    # Add mean and std lines
    plt.axhline(y=result['mean'], color='green', linestyle='--', label=f'Mean: {result["mean"]:.2f}')
    plt.axhline(y=result['mean'] + result['std'], color='orange', linestyle=':', label='Mean ± 1σ')
    plt.axhline(y=result['mean'] - result['std'], color='orange', linestyle=':')
    
    plt.title(f"Grubbs' Test: G = {result['test_statistic']:.3f}, Critical = {result['critical_value']:.3f}")
    plt.xlabel('Data Index')
    plt.ylabel('Value')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()

plot_grubbs_results(data1, result1)

=== Example 1: Clear Outlier ===
Data: [8, 10, 12, 13, 14, 19, 25, 30, 100]
Suspect value: 100
Test statistic (G): 2.582
Critical value: 2.215
Outlier? True
Decision: REJECT H0 - Outlier detected

=== Example 2: Borderline Case ===
Suspect value: 30
Test statistic (G): 2.535
Critical value: 2.290
Outlier? True

=== Example 3: No Outlier ===
Suspect value: 15
Test statistic (G): 1.461
Critical value: 2.215
Outlier? False

=== Iterative Grubbs' Test ===


TypeError: only integer scalar arrays can be converted to a scalar index