# Example of Linear Regression 

In this notebook, we use linear regression to summarize the relationship between the target and one feature. We use the statsmodels librarie to model linear regression.

In [24]:
# import libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn import datasets
import statsmodels.api as sm

In [5]:
# Load the diabetes dataset
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

In [6]:
# Fit and summarize OLS model
mod = sm.OLS(diabetes_X, diabetes_y)
res = mod.fit()

In [15]:
# Summarize results
diabetes_X

array([-0.00188202, -0.04464164, -0.05147406, -0.02632753, -0.00844872,
       -0.01916334,  0.07441156, -0.03949338, -0.06833155, -0.09220405])

In [19]:
# list of feature
list_feature = []

# record observation
for i in range(len(diabetes_X[0])):
    list_obs = []
    for liste in diabetes_X:
        list_obs.append(liste[i])
    list_feature.append(list_obs)

In [29]:
# create a DataFrame

## feature
diabetes = pd.DataFrame(
    {
        f'feature_{i}':list_feature[i] for i in range(len(list_feature))
    }
)

## target
diabetes['target'] = list(diabetes_y)

In [49]:
# Fit and summarize OLS model
mod = sm.OLS(diabetes['target'], diabetes.drop('target', axis=1))
res = mod.fit()

In [51]:
# Summarize results
print(res.summary())

                                 OLS Regression Results                                
Dep. Variable:                 target   R-squared (uncentered):                   0.106
Model:                            OLS   Adj. R-squared (uncentered):              0.085
Method:                 Least Squares   F-statistic:                              5.100
Date:                Sun, 24 Nov 2024   Prob (F-statistic):                    4.72e-07
Time:                        01:12:56   Log-Likelihood:                         -2873.9
No. Observations:                 442   AIC:                                      5768.
Df Residuals:                     432   BIC:                                      5809.
Df Model:                          10                                                  
Covariance Type:            nonrobust                                                  
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------

##### Analyse des résultats :
- **$R^{2}$** : $R^{2}=0.106$ suggère que 10.6% de la variation de la variable *target* est expliquée par les features.
- **coef** : Il s'agit des coefficients qui multiplient les variables indépendantes dans le modèle de régression. Cela signifie que pour une augmentation d'une unité du *feature_0*, *target* diminue de 10.0099.
- **$P>\lvert t \rvert$ (p-valeur)** : Les coefficients ont des valeurs de p-values supérieurs à 0.05. Cela signifie qu'ils ne sont pas statistiquement significatives.
- **AIC** & **BIC** : Ces valeurs semblent très élevés, cela prouve que le modèle est de mauvais qualité.