# Phase 2 â€” OLS Replication (SPSS-Equivalent)

This notebook replicates the original multiple linear regression analysis conducted in SPSS using Python, prior to any methodological improvements.

## Data Source

This analysis uses the cleaned and constructed dataset produced in `01_data_cleaning.ipynb`.
All composite variables and preprocessing steps are documented there.

In [1]:
# Imports
import pandas as pd
import statsmodels.api as sm

In [2]:
# Load cleaned data

df = pd.read_csv("E:/kaust_fellowship_bootcamp/projects/undergraduate_thesis_python/data/processed/ta_christian_constructed.csv")

In [3]:
df.shape
df[["X1","X2","X3","Y"]].head()

Unnamed: 0,X1,X2,X3,Y
0,19.0,15.0,17.0,11.0
1,15.0,12.0,15.0,13.0
2,20.0,14.0,15.0,10.0
3,20.0,18.0,20.0,17.0
4,16.0,15.0,15.0,15.0


In [4]:
X = df[["X1", "X2", "X3"]]
X = sm.add_constant(X)   # intercept
y = df["Y"]

ols_model = sm.OLS(y, X).fit()

In [5]:
ols_model.summary()

0,1,2,3
Dep. Variable:,Y,R-squared:,0.136
Model:,OLS,Adj. R-squared:,0.102
Method:,Least Squares,F-statistic:,3.941
Date:,"Tue, 16 Dec 2025",Prob (F-statistic):,0.0115
Time:,16:17:44,Log-Likelihood:,-178.08
No. Observations:,79,AIC:,364.2
Df Residuals:,75,BIC:,373.6
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,8.8652,2.510,3.532,0.001,3.865,13.865
X1,-0.0688,0.138,-0.500,0.618,-0.343,0.205
X2,0.3471,0.101,3.431,0.001,0.146,0.549
X3,-0.0617,0.172,-0.359,0.720,-0.404,0.280

0,1,2,3
Omnibus:,0.422,Durbin-Watson:,1.825
Prob(Omnibus):,0.81,Jarque-Bera (JB):,0.584
Skew:,0.12,Prob(JB):,0.747
Kurtosis:,2.654,Cond. No.,249.0


## Comparison with SPSS Results

The OLS regression estimated in Python produces coefficient estimates, standard errors, and significance levels that are numerically equivalent to those obtained using SPSS. The sign, magnitude, and statistical significance of all coefficients are identical up to rounding differences, confirming the correctness of the Python implementation.


**Status:** Phase 2 complete. The baseline OLS results match the original SPSS analysis.