In [None]:
# Example: Breusch-Pagan Test in Python
# For this example we’ll use the following dataset that describes the attributes of 10 basketball players

In [3]:
import numpy as np
import pandas as pd

#create dataset
df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86],
                   'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19],
                   'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5],
                   'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]})

#view dataset
df

Unnamed: 0,rating,points,assists,rebounds
0,90,25,5,11
1,85,20,7,8
2,82,14,7,10
3,88,16,8,6
4,94,27,5,6
5,90,20,7,9
6,76,12,6,6
7,75,15,9,10
8,87,14,9,10
9,86,19,5,7


# Step 1: Fit a multiple linear regression model.
# First, we’ll fit a multiple linear regression model:

Dependent variable: Dependent variable is one that is going to depend on other variables. 
In this regression analysis Y (rating) is our dependent variable because we want to analyse the effect of 
X (points – assists - rebounds) on Y (rating).
Model: The method of Ordinary Least Squares(OLS) is most widely used model due to its efficiency. 
This model gives best approximate of true population regression line. The principle of OLS is to minimize the square of errors ( ∑ei2 ).
Number of observations: The number of observation is the size of our sample, i.e. N = 10.



In [4]:
import statsmodels.formula.api as smf

#fit regression model
fit = smf.ols('rating ~ points+assists+rebounds', data=df).fit()

#view model summary
print(fit.summary())

                            OLS Regression Results                            
Dep. Variable:                 rating   R-squared:                       0.623
Model:                            OLS   Adj. R-squared:                  0.434
Method:                 Least Squares   F-statistic:                     3.299
Date:                Tue, 10 Dec 2024   Prob (F-statistic):             0.0995
Time:                        19:01:19   Log-Likelihood:                -26.862
No. Observations:                  10   AIC:                             61.72
Df Residuals:                       6   BIC:                             62.93
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     62.4716     14.588      4.282      0.0



In [7]:
# Next, we’ll perform a Breusch-Pagan test to determine if heteroscedasticity is present.
# Use het_breuschpagan(model.resid, model.model.exog) to perform the Breusch-Pagan test.
# model.resid: Residuals from the fitted model.
# model.model.exog: Exogenous variables (independent variables) of the model.
from statsmodels.stats.diagnostic import het_breuschpagan

#perform Bresuch-Pagan test
names = ['Lagrange multiplier statistic', 'p-value', 'f-value', 'f p-value']
test = het_breuschpagan(fit.resid, fit.model.exog)

(names, test)

(['Lagrange multiplier statistic', 'p-value', 'f-value', 'f p-value'],
 (6.003951995818434,
  0.11141811013399579,
  3.004944880309619,
  0.11663863538255272))

A Breusch-Pagan test uses the following null and alternative hypotheses:

The null hypothesis (H0): Homoscedasticity is present.

The alternative hypothesis: (Ha): Homoscedasticity is not present (i.e. heteroscedasticity exists)


In this example, the Lagrange multiplier statistic for the test is 6.004 and the corresponding p-value is 0.1114.
Because this p-value is not less than 0.05, we fail to reject the null hypothesis. 
We do not have sufficient evidence to say that heteroscedasticity is present in the regression model.