### Interpretation

One of the key advantages of linear regression is its interpretability. Each coefficient directly represents the size and direction of the relationship between a predictor and the target variable, holding other predictors constant. This makes it straightforward to assess which factors are most influential and whether their effect is positive or negative. In contrast, many machine learning algorithms such as neural networks may achieve higher predictive accuracy, but often function as 'black boxes'. While they can model complex, nonlinear relationships, it is usually much harder to disentangle and explain the contribution of each predictor to the final prediction. Linear regression models, by comparison, provide a transparent, human-readable representation of the underlying process. This makes them particularly valuable not only for prediction but also for developing a deeper understanding of the phenomena under study.

In the example below we plot a simple linear model and will discuss each element of the summary.


In [None]:
import numpy as np
import statsmodels.api as sm

n = 1000
X = np.random.normal(size=(n, 2))
X1 = X[:, 0]
X2 = X[:, 1]
eps = np.random.normal(size=n)
y = 1 + 2 * X1 + 3 * X2 + eps

X_with_const = sm.add_constant(X)
model = sm.OLS(y, X_with_const)
results = model.fit()

print(results.summary())


                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.929
Model:                            OLS   Adj. R-squared:                  0.929
Method:                 Least Squares   F-statistic:                     6522.
Date:                Tue, 02 Sep 2025   Prob (F-statistic):               0.00
Time:                        10:14:28   Log-Likelihood:                -1414.8
No. Observations:                1000   AIC:                             2836.
Df Residuals:                     997   BIC:                             2850.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.9918      0.032     31.432      0.0

#### Summary
This section provides the following summary statistics about the model:
- Dep. Variable: The name of the response variable (y in this case).
- Model: The type of regression used (OLS = ordinary least squares).
- Method: How the parameters were estimated (least squares).
- No. Observations: Number of data points used (1000).
- Df Residuals: Degrees of freedom for residuals = (observations − number of parameters estimated). Here, 1000 − 3 = 997.
- Df Model: Number of predictors included (2 in this case: x1 and x2).
- R-squared: Proportion of variance in y explained by the predictors. Here 0.929 means the model explains 92.9% of the variation.
- Adj. R-squared: Adjusted R² accounts for the number of predictors, penalising unnecessary variables.
- F-statistic: Tests whether the model as a whole is statistically significant (very large here = strong evidence that predictors matter).
- Prob (F-statistic): The p-value for the F-test (0.00 indicates significance).
- Log-Likelihood: A measure of model fit; higher (less negative) is better.
- AIC/BIC: Model selection criteria (lower values generally mean better fit, penalising for complexity).
- Covariance Type: The type of standard errors reported (here: nonrobust).

#### Coefficients
This section provides information about the estimates of the coefficients
- coef: Model estimate of the coefficient
- std err: Standard error in the estimate of the coefficient
- t: t-statistic for whether the coefficient is statistically different from 0
- P>|t|: p-value for the test of whether the coefficient is statistically different from 0
- [0.025: Lower bound of 95% confidence interval for coefficient
- 0.975]: Upper bound of 95% confidence interval for coefficient

When interpretting the coefficients, it is important to check if they are statistically significant since if not then the model does not think the predictor has any effect on the target. In this case, the predict may be ommitted. It is also useful to check the size of the effect (i.e. the size of the coefficient) since if a predictor only has a very small effect it may also be worth ommitting it. 

We interpret the coefficients as follows:
- "x1 has a positive impact on y with a unit increase in x1 on average leading to a 2.0130 increase in y, holding all else equal"
- "x2 has a positive impact on y with a unit increase in x2 on average leading to a 2.9594 increase in y, holding all else equal"
- "If all predictors are set to 0, on-average we predict y to be 0.9918 (i.e. the constant term)

Note that the interpretation of the constant term may not always make logic sense in the real world (e.g. if one of the predictors was height, we can't set height to 0)

#### Test Statistics & Diagnostics
These statistics help assess whether the assumptions of OLS regression hold and whether the model is reliable:
- Omnibus: A combined test for skewness and kurtosis in the residuals. A low value (and high p-value) suggests the residuals are approximately normally distributed.
- Prob(Omnibus): The p-value for the Omnibus test. A high value (> 0.05) indicates we fail to reject normality of residuals.
- Jarque-Bera (JB): Another test of whether the residuals have skewness and kurtosis consistent with a normal distribution. Like Omnibus, a high p-value suggests normal residuals.
- Skew: Measures asymmetry in the distribution of residuals. A value close to 0 indicates symmetry. Positive skew means a long right tail; negative skew means a long left tail.
- Kurtosis: Measures the “peakedness” of the residual distribution relative to normal. A value close to 3 indicates normal-like tails. Values > 3 mean heavier tails, < 3 mean - lighter tails.
- Durbin-Watson: Tests for autocorrelation (correlation between successive residuals). A value of 2 indicates no autocorrelation. Values < 2 suggest positive autocorrelation, > 2 - suggest negative autocorrelation.
- Cond. No. (Condition Number): Indicates multicollinearity (how correlated predictors are). Small values (near 1) mean predictors are independent. Large values (typically > 30) indicate potential multicollinearity problems.