Skip to content

Alanperry1/HELIX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HELIX: Heteroskedasticity Evaluation via Localized Inference and eXtended models

HELIX is a from-scratch implementation of four heteroskedasticity tests that go beyond the standard Breusch-Pagan (BP) test, combining a neural network-based detector, White's general test, and a novel sliding-window Local BP variant that pinpoints where in the feature space variance changes -- something no classical test can do.

Overview

Classical heteroskedasticity tests (Breusch-Pagan, White) are global and rely on asymptotic chi-squared distributions. HELIX extends them along three axes:

Limitation of classical tests How this project addresses it
Linear auxiliary regression only HELIX Neural BP uses an MLP to detect arbitrary nonlinear variance patterns
Asymptotic p-values (breaks on small n) HELIX bootstrap permutation test gives exact finite-sample p-values
Global scope (one number for the whole dataset) HELIX Local BP runs sliding-window tests, revealing where heteroskedasticity lives

The Four Tests in HELIX

1. Classic Breusch-Pagan

Standard LM test. Regresses squared OLS residuals on the original predictors. Tests whether variance depends linearly on X.

  • Statistic: LM = n * R-squared from auxiliary regression
  • Distribution: chi-squared(k-1)
  • Validated against statsmodels.het_breuschpagan (exact match)

2. White's Test

Extends BP by adding squared terms and cross-products (via PolynomialFeatures(degree=2)) to the auxiliary regression. Catches nonlinear heteroskedasticity that BP misses (e.g., U-shaped variance).

  • Statistic: LM = n * R-squared from augmented auxiliary regression
  • Distribution: chi-squared(q-1) where q = number of augmented features + 1

3. Neural BP

Replaces the linear auxiliary regression with a 2-layer MLP (64, 32 hidden units). If the network can predict squared residuals from X, heteroskedasticity exists and may be arbitrarily nonlinear.

  • Uses bootstrap permutation (default 500 iterations) for the p-value
  • No distributional assumptions -- works on any sample size
  • Catches patterns that even White's test misses

4. Local BP

Runs the classic BP test in sliding windows along a chosen predictor, with 50% overlap. Outputs a per-window p-value curve showing exactly which regions of the feature space are heteroskedastic and which are not.

  • Window size: configurable fraction of n (default 0.3)
  • Sort column: configurable (default: first predictor)
  • Unique capability: no other standard test tells you where the problem is

Visualization

plot_diagnostics() produces a 2x2 diagnostic figure:

Panel Plot What it shows
Top-left Residuals vs Fitted Classic residual plot with LOWESS trend
Top-right Scale-Location sqrt(abs(residuals)) vs fitted -- rising trend = heteroskedasticity
Bottom-left Local p-value curve Per-window BP p-values with alpha threshold and shaded rejection regions
Bottom-right Variance heatmap Feature-space scatter colored by abs(residual) -- shows where variance concentrates

Installation

pip install numpy scipy scikit-learn matplotlib

Optional (for cross-validation only):

pip install statsmodels

Quick Start

import numpy as np
from bp_tester import HeteroskedasticityTester

np.random.seed(42)
n = 200
x1 = np.random.uniform(0, 5, n)
x2 = np.random.uniform(0, 5, n)
X = np.column_stack([x1, x2])

eps = np.random.normal(0, 0.4 * x1)
y = 2 + 1.5 * x1 - 0.8 * x2 + eps

tester = HeteroskedasticityTester(X, y)
results = tester.run_all(alpha=0.05)

Run individual tests

tester.breusch_pagan()
tester.white_test()
tester.neural_bp(n_bootstrap=300)
tester.local_bp(window_size=0.25, sort_col=0)
tester.plot_diagnostics()

API Reference

HeteroskedasticityTester(X, y)

Parameter Type Description
X array (n, p) Feature matrix, no constant column (added internally)
y array (n,) Response vector

Methods

Method Returns Key Parameters
breusch_pagan() {test, LM, p_value} --
white_test() {test, LM, p_value} --
neural_bp(n_bootstrap, random_state) {test, R2, bootstrap_p, n_bootstrap} n_bootstrap=500, random_state=42
local_bp(window_size, sort_col) {test, sort_col, windows} window_size=0.3, sort_col=0
plot_diagnostics(alpha, sort_col) matplotlib Figure alpha=0.05, sort_col=0
run_all(alpha, neural_bootstrap, sort_col) dict of all results Runs everything, prints summary, shows plots

Project Structure

BP test/
    bp_tester.py      Core module -- HeteroskedasticityTester class (the HELIX engine)
    demo.ipynb         Demo notebook with synthetic data, full walkthrough, and comparison figures
    research_paper.md  Research paper on HELIX findings and methodology
    figures/           Auto-generated diagnostic figures from the notebook
    README.md          This file
    .gitignore

What to Do When Heteroskedasticity is Detected

Strategy When to use it
HC3 robust standard errors Quick fix -- keeps OLS coefficients, corrects SEs
WLS (Weighted Least Squares) When you can estimate the variance function
Log-transform y When variance grows proportionally with the mean
GLS (Generalized Least Squares) Full generalized error structure
Model the variance explicitly GARCH (time series), evidential regression

License

MIT

About

Implementation of four heteroskedasticity tests that go beyond the standard Breusch-Pagan (BP) test, combining a neural network-based detector.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors