## Analyse Statistique et √âconom√©trique

Dans cette section, nous √©tudierons les relations entre le PIB, les √©missions de CO‚ÇÇ, la consommation d‚Äô√©nergie, et d'autres gaz √† effet de serre pour la France.  
Nous appliquerons des m√©thodes statistiques classiques ainsi que des tests √©conom√©triques pour comprendre les liens, la causalit√© et les dynamiques temporelles.


In [41]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller, grangercausalitytests
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.stats.diagnostic import het_breuschpagan
from scipy.stats import shapiro

#### Chargement et nettoyage des donn√©es

- V√©rification des valeurs manquantes.
- S√©lection des variables pertinentes.


In [42]:
# === üìÇ 2. Chargement des donn√©es ===
df = pd.read_csv("C:/Users/y.cohen/Desktop/M√©moire/Data/owid-co2-data.csv")
print("Colonnes disponibles :", df.columns.tolist())

# Filtrer pour la France
df_fr = df[df["country"] == "France"].copy()

# S√©lection colonnes utiles
cols = [
    "year", "gdp", "population", "co2", "co2_per_capita", "energy_per_capita", 
    "total_ghg", "oil_co2", "gas_co2",
    "primary_energy_consumption", "methane_per_capita", "temperature_change_from_ghg", "co2_per_gdp",
    
]

df_fr_sel = df_fr[cols]

# Cr√©er PIB par habitant (au cas o√π)
df_fr_sel["gdp_per_capita"] = df_fr_sel["gdp"] / df_fr_sel["population"]

# Supprimer les lignes avec valeurs manquantes dans colonnes cl√©s
df_fr_sel.dropna(subset=["gdp", "co2", "energy_per_capita"], inplace=True)

df_fr_sel.head()



Colonnes disponibles : ['country', 'year', 'iso_code', 'population', 'gdp', 'cement_co2', 'cement_co2_per_capita', 'co2', 'co2_growth_abs', 'co2_growth_prct', 'co2_including_luc', 'co2_including_luc_growth_abs', 'co2_including_luc_growth_prct', 'co2_including_luc_per_capita', 'co2_including_luc_per_gdp', 'co2_including_luc_per_unit_energy', 'co2_per_capita', 'co2_per_gdp', 'co2_per_unit_energy', 'coal_co2', 'coal_co2_per_capita', 'consumption_co2', 'consumption_co2_per_capita', 'consumption_co2_per_gdp', 'cumulative_cement_co2', 'cumulative_co2', 'cumulative_co2_including_luc', 'cumulative_coal_co2', 'cumulative_flaring_co2', 'cumulative_gas_co2', 'cumulative_luc_co2', 'cumulative_oil_co2', 'cumulative_other_co2', 'energy_per_capita', 'energy_per_gdp', 'flaring_co2', 'flaring_co2_per_capita', 'gas_co2', 'gas_co2_per_capita', 'ghg_excluding_lucf_per_capita', 'ghg_per_capita', 'land_use_change_co2', 'land_use_change_co2_per_capita', 'methane', 'methane_per_capita', 'nitrous_oxide', 'nitr

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_fr_sel["gdp_per_capita"] = df_fr_sel["gdp"] / df_fr_sel["population"]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_fr_sel.dropna(subset=["gdp", "co2", "energy_per_capita"], inplace=True)


Unnamed: 0,year,gdp,population,co2,co2_per_capita,energy_per_capita,total_ghg,oil_co2,gas_co2,primary_energy_consumption,methane_per_capita,temperature_change_from_ghg,co2_per_gdp,gdp_per_capita
17241,1965,727557400000.0,48772824.0,364.23,7.468,27967.74,500.22,157.921,10.45,1347.172,2.413,0.011,0.501,14917.27046
17242,1966,764514100000.0,49184652.0,381.139,7.749,28277.855,516.293,176.667,11.676,1376.653,2.359,0.011,0.499,15543.752848
17243,1967,799864800000.0,49574907.0,405.496,8.179,29724.305,543.559,203.069,13.199,1461.883,2.37,0.012,0.507,16134.468651
17244,1968,835171400000.0,49947252.0,419.073,8.39,30989.717,560.515,221.078,15.642,1538.555,2.36,0.012,0.502,16721.067765
17245,1969,893107400000.0,50355300.0,444.28,8.823,33774.48,583.036,245.049,17.898,1691.664,2.344,0.012,0.497,17736.115912


## R√©gressions lin√©aires

- R√©gression 1 : `gdp ~ co2 + energy_per_capita`
- R√©gression 2 : `gdp_per_capita ~ total_ghg + oil_co2 + gas_co2`

- Analyse des coefficients, significativit√©, R¬≤.
- V√©rification des hypoth√®ses (normalit√©, homosc√©dasticit√©, multicolin√©arit√©).


In [43]:
# Mod√®le 1 : gdp ~ co2 + energy_per_capita
X1 = df_fr_sel[["co2", "energy_per_capita"]]
X1 = sm.add_constant(X1)
y1 = df_fr_sel["gdp"]

model1 = sm.OLS(y1, X1).fit()
print("=== Mod√®le 1 : GDP ~ CO2 + √ânergie par habitant ===")
print(model1.summary())



=== Mod√®le 1 : GDP ~ CO2 + √ânergie par habitant ===
                            OLS Regression Results                            
Dep. Variable:                    gdp   R-squared:                       0.741
Model:                            OLS   Adj. R-squared:                  0.732
Method:                 Least Squares   F-statistic:                     78.82
Date:                Fri, 01 Aug 2025   Prob (F-statistic):           7.08e-17
Time:                        11:39:36   Log-Likelihood:                -1614.1
No. Observations:                  58   AIC:                             3234.
Df Residuals:                      55   BIC:                             3240.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                        coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------

In [46]:
# Mod√®le 2 : gdp_per_capita ~ total_ghg + oil_co2 + gas_co2
X2 = df_fr_sel[["total_ghg", "oil_co2", "gas_co2"]]
X2 = sm.add_constant(X2)
y2 = df_fr_sel["gdp_per_capita"]

model2 = sm.OLS(y2, X2).fit()
print("\n=== Mod√®le 2 : GDP par habitant ~ total GES + p√©trole CO2 + gaz CO2 ===")
print(model2.summary())



=== Mod√®le 2 : GDP par habitant ~ total GES + p√©trole CO2 + gaz CO2 ===
                            OLS Regression Results                            
Dep. Variable:         gdp_per_capita   R-squared:                       0.972
Model:                            OLS   Adj. R-squared:                  0.971
Method:                 Least Squares   F-statistic:                     632.8
Date:                Fri, 01 Aug 2025   Prob (F-statistic):           4.97e-42
Time:                        11:39:39   Log-Likelihood:                -495.79
No. Observations:                  58   AIC:                             999.6
Df Residuals:                      54   BIC:                             1008.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------

In [49]:
# Mod√®le 3 : gdp_per_capita ~ population + primary_energy_consumption + methane_per_capita + temperature_change_from_ghg
X3 = df_fr_sel[["population", "primary_energy_consumption", "methane_per_capita", "temperature_change_from_ghg"]]
X3 = sm.add_constant(X3)
y3 = df_fr_sel["gdp_per_capita"]

model3 = sm.OLS(y3, X3).fit()
print("\n=== Mod√®le 3 : GDP par habitant ~ population + √©nergie primaire + m√©thane per capita + changement temp√©rature GES ===")
print(model3.summary())



=== Mod√®le 3 : GDP par habitant ~ population + √©nergie primaire + m√©thane per capita + changement temp√©rature GES ===
                            OLS Regression Results                            
Dep. Variable:         gdp_per_capita   R-squared:                       0.994
Model:                            OLS   Adj. R-squared:                  0.993
Method:                 Least Squares   F-statistic:                     2030.
Date:                Fri, 01 Aug 2025   Prob (F-statistic):           2.82e-57
Time:                        11:40:06   Log-Likelihood:                -453.72
No. Observations:                  58   AIC:                             917.4
Df Residuals:                      53   BIC:                             927.7
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                                  coef    std err          t      P>|t|      [0.025    

#### Analyse des coefficients, significativit√©, R¬≤

- On examine la valeur et le signe des coefficients pour interpr√©ter l'impact de chaque variable.
- On v√©rifie les p-values pour la significativit√© statistique (seuil 5%).
- On regarde le R¬≤ pour la qualit√© d'ajustement.


In [None]:
# Fonction pour v√©rifier les hypoth√®ses des mod√®les
def check_model_assumptions(model, X, y, model_name):
    resid = model.resid

    # 1. Normalit√© des r√©sidus
    stat, p = shapiro(resid)
    print(f"\nTest de normalit√© des r√©sidus ({model_name}):")
    print(f"  Statistique Shapiro-Wilk = {stat:.4f}, p-value = {p:.4f}")
    print("  R√©sidus normaux ?" + (" Oui" if p > 0.05 else " Non"))

    # 2. Homosc√©dasticit√© (Breusch-Pagan)
    bp_test = het_breuschpagan(resid, model.model.exog)
    print(f"\nTest d'homosc√©dasticit√© (Breusch-Pagan) ({model_name}):")
    print(f"  Statistique = {bp_test[0]:.4f}, p-value = {bp_test[1]:.4f}")
    print("  Homosc√©dasticit√© ?" + (" Oui" if bp_test[1] > 0.05 else " Non"))

    # 3. Multicolin√©arit√© (VIF)
    vif_data = pd.DataFrame()
    vif_data["variable"] = X.columns
    vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
    print(f"\nMulticolin√©arit√© (VIF) ({model_name}):")
    print(vif_data)


# Application des tests sur les 3 mod√®les
check_model_assumptions(model1, X1, y1, "Mod√®le 1")
check_model_assumptions(model2, X2, y2, "Mod√®le 2")


Test de normalit√© des r√©sidus (Mod√®le 1):
  Statistique Shapiro-Wilk = 0.9318, p-value = 0.0029
  R√©sidus normaux ? Non

Test d'homosc√©dasticit√© (Breusch-Pagan) (Mod√®le 1):
  Statistique = 13.5537, p-value = 0.0011
  Homosc√©dasticit√© ? Non

Multicolin√©arit√© (VIF) (Mod√®le 1):
            variable        VIF
0              const  98.104451
1                co2   1.004767
2  energy_per_capita   1.004767

Test de normalit√© des r√©sidus (Mod√®le 2):
  Statistique Shapiro-Wilk = 0.9589, p-value = 0.0473
  R√©sidus normaux ? Non

Test d'homosc√©dasticit√© (Breusch-Pagan) (Mod√®le 2):
  Statistique = 8.6100, p-value = 0.0350
  Homosc√©dasticit√© ? Non

Multicolin√©arit√© (VIF) (Mod√®le 2):
    variable         VIF
0      const  156.136115
1  total_ghg   11.247694
2    oil_co2    6.924047
3    gas_co2    3.053002

Test de normalit√© des r√©sidus (Mod√®le 3):
  Statistique Shapiro-Wilk = 0.9861, p-value = 0.7471
  R√©sidus normaux ? Oui

Test d'homosc√©dasticit√© (Breusch-Pagan) (M

In [52]:
check_model_assumptions(model3, X3, y3, "Mod√®le 3")


Test de normalit√© des r√©sidus (Mod√®le 3):
  Statistique Shapiro-Wilk = 0.9861, p-value = 0.7471
  R√©sidus normaux ? Oui

Test d'homosc√©dasticit√© (Breusch-Pagan) (Mod√®le 3):
  Statistique = 20.7247, p-value = 0.0004
  Homosc√©dasticit√© ? Non

Multicolin√©arit√© (VIF) (Mod√®le 3):
                      variable           VIF
0                        const  12219.177684
1                   population     68.266941
2   primary_energy_consumption      4.804877
3           methane_per_capita     66.778875
4  temperature_change_from_ghg     81.517664


#### R√©gression alternative : Analyse approfondie avec variables d√©mographiques, √©nerg√©tiques et climatiques

On mod√©lise ici le PIB par habitant en fonction de la population, de la consommation √©nerg√©tique, des √©missions de m√©thane par habitant, du changement de temp√©rature li√© aux GES, et de l'intensit√© carbone du PIB.


=== Mod√®le 3 : GDP par habitant ~ population + consommation √©nergie + m√©thane par habitant + changement temp. + intensit√© CO2 PIB ===
                            OLS Regression Results                            
Dep. Variable:         gdp_per_capita   R-squared:                       0.994
Model:                            OLS   Adj. R-squared:                  0.993
Method:                 Least Squares   F-statistic:                     1594.
Date:                Fri, 01 Aug 2025   Prob (F-statistic):           1.35e-55
Time:                        11:36:31   Log-Likelihood:                -453.72
No. Observations:                  58   AIC:                             919.4
Df Residuals:                      52   BIC:                             931.8
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                                  coef    std err          t      P>|t| 

## Mod√®les log-log

- Application de la transformation logarithmique (log(x + 1) si n√©cessaire).
- Mod√®les log-log pour interpr√©ter √©lasticit√©s.

- Exemples :
    - `log(gdp) ~ log(co2) + log(energy_per_capita)`
    - `log(gdp_per_capita) ~ log(total_ghg) + log(oil_co2) + log(gas_co2)`


## Analyse temporelle des s√©ries

- Test de stationnarit√© (ADF test).
- Analyse de co-int√©gration.
- Si possible : mod√®le √† correction d‚Äôerreur (ECM).


## Causalit√© de Granger

- Test de causalit√© entre √©nergie et PIB.
- Test de causalit√© entre CO‚ÇÇ et PIB.

- Interpr√©tation des r√©sultats.


## Synth√®se

- Discussion des liens statistiques observ√©s.
- Limites et pistes d‚Äôapprofondissement.
