In [38]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.datasets import load_diabetes

In [39]:
#Load the diabetes dataset, split data and target and get the features names
diabetes = load_diabetes()
x = diabetes.data
y = diabetes.target
featureNames = diabetes.feature_names

#This DataFrame is just for a better summary
xWithConst = sm.add_constant(x)
xDF = pd.DataFrame(xWithConst, columns=['const'] + list(featureNames))

#Create and fit the model
model = sm.OLS(y, xDF)
results = model.fit()

#Specific features summary creation
specificSummary = pd.DataFrame({
    'Coefficient': results.params,
    'Std Error': results.bse,
    't-value': results.tvalues,
    'P-value': results.pvalues,
})

In [40]:
#We can also use the already implemented function in the library
print("\nLinear Regression Model Summary: \n")
print(results.summary())


Linear Regression Model Summary: 

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.518
Model:                            OLS   Adj. R-squared:                  0.507
Method:                 Least Squares   F-statistic:                     46.27
Date:                Tue, 22 Oct 2024   Prob (F-statistic):           3.83e-62
Time:                        08:14:46   Log-Likelihood:                -2386.0
No. Observations:                 442   AIC:                             4794.
Df Residuals:                     431   BIC:                             4839.
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        152

In [41]:
print("\nSpecific Features Statistics Summary: \n")
print(specificSummary)


Specific Features Statistics Summary: 

       Coefficient   Std Error    t-value        P-value
const   152.133484    2.575854  59.061366  1.010082e-208
age     -10.009866   59.749247  -0.167531   8.670306e-01
sex    -239.815644   61.222344  -3.917126   1.041671e-04
bmi     519.845920   66.533445   7.813302   4.296391e-14
bp      324.384646   65.421992   4.958343   1.024278e-06
s1     -792.175639  416.679870  -1.901161   5.794761e-02
s2      476.739021  339.030495   1.406183   1.603902e-01
s3      101.043268  212.531457   0.475427   6.347233e-01
s4      177.063238  161.475795   1.096531   2.734587e-01
s5      751.273700  171.899982   4.370412   1.555899e-05
s6       67.626692   65.984282   1.024891   3.059895e-01


To conclude, in the regression model, the variables sex, BMI, blood pressure (BP), and S5 are highly significant, with high t-values and very low p-values, indicating a relevant impact on the dependent variable. In contrast, variables such as age, S1, S2, S3, S4, and S6 do not show statistical significance, with low t-values and confidence intervals that include zero, suggesting they do not contribute meaningfully to the model.