# Your mission

You started working on the Ecowatt project at RTE. In order to avoid possible shortage, one must plan for peaks in national electricity. You manager Mark is going on holidays for a week. You will be sole responsible for forecasting the weekly demand, while he is absent.

In order to prevent electricity shortage, you must accurately forecast the demand 7 days ahead, on an hourly basis.

Your mission is to train an accurate predictive model with the lowest root mean squared error (RMSE). Mark is a very technical guy, he likes to understand all technical details and would like you to compare the performances of classical models and neural-net based models.


Your **target variable** is the consommation_totale

**Data source** : https://data.enedis.fr/pages/accueil/

# Import

In [1]:
import os
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import drive

In [3]:
drive.mount('/content/gdrive')
os.chdir("/content/gdrive/MyDrive/Thales/EI_ST4_G1/EI_TS_CS-20230526T084435Z-001/EI_TS_CS")

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [4]:
%run ./utils.ipynb

In [5]:
FILE_PATH = "data/bilan.csv"
TARGET = "consommation_totale"
EXOGENEOUS= "Température normale lissée (°C)"

## Prepare the data

Define here the range of your train/test split

In [6]:
from re import X
def read_data(data_path : str = "data/bilan-electrique-demi-heure_juin.csv") -> pd.DataFrame:
    df = pd.read_csv(data_path)
    df['horodate'] = pd.to_datetime(df['horodate']) # Convert 'horodate' column to datetime
    df = df.set_index('horodate') # Set 'horodate' as the index
    # Filter rows where the value of column "Mois" is 6
    df = df[df['Mois'].isin([5,6])]
    hourly_avg = df.groupby('Mois')['consommation_totale'].mean().reset_index(name='consommation_totale')

    return df

df = read_data(FILE_PATH)
df.sort_values(by='horodate', inplace = True)
X_train = df[-1000:-100]
X_test = df[-100:]


In [7]:
df

Unnamed: 0_level_0,Unnamed: 0,Mois,Injection RTE (W),Refoulement RTE (W),Pertes modélisées (W),consommation_totale,Consommation totale télérelevée (W),Consommation HTA télérelevée (W),Consommation totale profilée (W),Consommation HTA profilée (W),...,Production décentralisée profilée (W),Production photovoltaïque profilée (W),Production autre profilée (W),Température réalisée lissée (°C),Température normale lissée (°C),Production éolienne totale (W),Production photovoltaïque totale (W),Pseudo rayonnement,Consommation HTA totale (W),Soutirage net vers autres GRD (W)
horodate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-05-13 00:00:00+02:00,0.0,5.0,3.301608e+10,1.388999e+09,1.492712e+09,3.373690e+10,7.529868e+09,7.526492e+09,2.620703e+10,725187087.0,...,17084661.0,0.0,276802.0,14.0,15.2,2.429495e+09,27000.0,29.0,8.251680e+09,309480116.0
2018-05-13 00:30:00+02:00,0.0,5.0,3.033121e+10,1.540171e+09,1.345570e+09,3.118363e+10,7.476941e+09,7.473625e+09,2.370669e+10,704894715.0,...,17117773.0,0.0,276802.0,14.0,15.2,2.546684e+09,28333.0,29.0,8.178520e+09,281335021.0
2018-05-13 01:00:00+02:00,0.0,5.0,2.926180e+10,1.762578e+09,1.287855e+09,3.024552e+10,7.420063e+09,7.416690e+09,2.282546e+10,677423933.0,...,17100606.0,0.0,276802.0,14.0,15.1,2.820163e+09,26333.0,30.0,8.094114e+09,265921997.0
2018-05-13 01:30:00+02:00,0.0,5.0,2.845965e+10,1.913083e+09,1.244898e+09,2.953303e+10,7.370166e+09,7.366722e+09,2.216286e+10,655407562.0,...,17117763.0,0.0,276802.0,14.0,15.1,2.997843e+09,28667.0,30.0,8.022130e+09,248347310.0
2018-05-13 02:00:00+02:00,0.0,5.0,2.788705e+10,2.021851e+09,1.214410e+09,2.899632e+10,7.360914e+09,7.357456e+09,2.163540e+10,646885058.0,...,17218295.0,0.0,276802.0,13.9,15.0,3.105045e+09,26000.0,31.0,8.004341e+09,239865276.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-05-12 21:30:00+02:00,0.0,5.0,3.144377e+10,1.668876e+09,2.130123e+09,3.253997e+10,1.417252e+10,1.037331e+10,1.836745e+10,7857570.0,...,5685293.0,9740.0,1724126.0,13.3,15.6,3.429022e+09,574664.0,32.0,1.038117e+10,248111450.0
2023-05-12 22:00:00+02:00,0.0,5.0,3.167001e+10,1.604080e+09,2.154198e+09,3.279717e+10,1.418533e+10,1.036780e+10,1.861184e+10,8238396.0,...,5628909.0,4413.0,1724126.0,13.2,15.4,3.424402e+09,615659.0,35.0,1.037604e+10,243759584.0
2023-05-12 22:30:00+02:00,0.0,5.0,3.322081e+10,1.710208e+09,2.306844e+09,3.426999e+10,1.393841e+10,1.026750e+10,2.033158e+10,7963496.0,...,5713106.0,2858.0,1724126.0,13.1,15.3,3.644562e+09,542160.0,37.0,1.027547e+10,273891883.0
2023-05-12 23:00:00+02:00,0.0,5.0,3.267876e+10,1.774022e+09,2.269769e+09,3.375851e+10,1.367329e+10,1.014128e+10,2.008523e+10,7761054.0,...,5679794.0,4748.0,1724126.0,13.0,15.2,3.717776e+09,583186.0,40.0,1.014904e+10,278342766.0


# Modeling with ARIMAX


## Modeling
The following code allows ARIMAX modelling using the temperature as an exogeneous variable.

In [9]:
X_test[TARGET]

horodate
2023-05-10 22:00:00+02:00    3.294440e+10
2023-05-10 22:30:00+02:00    3.428132e+10
2023-05-10 23:00:00+02:00    3.371477e+10
2023-05-10 23:30:00+02:00    3.330184e+10
2023-05-11 00:00:00+02:00    3.229186e+10
                                 ...     
2023-05-12 21:30:00+02:00    3.253997e+10
2023-05-12 22:00:00+02:00    3.279717e+10
2023-05-12 22:30:00+02:00    3.426999e+10
2023-05-12 23:00:00+02:00    3.375851e+10
2023-05-12 23:30:00+02:00    3.339388e+10
Name: consommation_totale, Length: 100, dtype: float64

In [8]:
model = ARIMA(X_train[TARGET], order=(2,1,1), exog=X_train[EXOGENEOUS])
model.initialize_approximate_diffuse()
model_fit = model.fit()
yhat = model_fit.forecast(steps=1, exog=np.array(X_test[EXOGENEOUS])[1])

  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  warn('Non-stationary starting autoregressive parameters'
  warn('Non-invertible starting MA parameters found.'
  return get_prediction_index(


In [10]:
predictions = list()
for t in range(1,len(X_test)):
        yhat = model_fit.forecast(exog=25)
        predictions.append(yhat)

predictions


  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return get_prediction_index(
  return

[900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.627631e+10
 dtype: float64,
 900    3.62

In [11]:
yhat, np.array(X_train[TARGET])[1]

(900    3.627631e+10
 dtype: float64,
 26121888309.0)

In [12]:
np.array(X_train[TARGET])[1]

26121888309.0

In [13]:
np.array(X_test[EXOGENEOUS])[1]

15.1

In [14]:
X_t = X_test
for _ in range(100):
    new_row = {col: 0 for col in X_train.columns}  # Crée un dictionnaire avec toutes les colonnes du dataframe et des valeurs de 0
    X_t = X_t.append(new_row, ignore_index=True)

X_t

  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.append(new_row, ignore_index=True)
  X_t = X_t.a

Unnamed: 0.1,Unnamed: 0,Mois,Injection RTE (W),Refoulement RTE (W),Pertes modélisées (W),consommation_totale,Consommation totale télérelevée (W),Consommation HTA télérelevée (W),Consommation totale profilée (W),Consommation HTA profilée (W),...,Production décentralisée profilée (W),Production photovoltaïque profilée (W),Production autre profilée (W),Température réalisée lissée (°C),Température normale lissée (°C),Production éolienne totale (W),Production photovoltaïque totale (W),Pseudo rayonnement,Consommation HTA totale (W),Soutirage net vers autres GRD (W)
0,0.0,5.0,3.218808e+10,1.144189e+09,2.106770e+09,3.294440e+10,1.476568e+10,1.102946e+10,1.817872e+10,8238412.0,...,5741565.0,7202.0,1724126.0,14.4,15.2,2.563959e+09,2307096.0,50.0,1.103770e+10,249722700.0
1,0.0,5.0,3.369206e+10,1.107553e+09,2.246126e+09,3.428132e+10,1.451944e+10,1.093526e+10,1.976188e+10,7946833.0,...,5796968.0,4230.0,1724126.0,14.3,15.1,2.545772e+09,2021571.0,53.0,1.094320e+10,281329083.0
2,0.0,5.0,3.307972e+10,1.178911e+09,2.202671e+09,3.371477e+10,1.427900e+10,1.082357e+10,1.943577e+10,7736071.0,...,5773640.0,4483.0,1724126.0,14.2,15.0,2.635203e+09,1933188.0,55.0,1.083131e+10,288002558.0
3,0.0,5.0,3.262051e+10,1.196851e+09,2.174754e+09,3.330184e+10,1.404498e+10,1.070550e+10,1.925686e+10,7448628.0,...,5673768.0,6123.0,1724126.0,14.2,14.9,2.675458e+09,1843765.0,55.0,1.071295e+10,291647217.0
4,0.0,5.0,3.153582e+10,1.225213e+09,2.083524e+09,3.229186e+10,1.392460e+10,1.065686e+10,1.836727e+10,7221505.0,...,5734526.0,20486.0,1724126.0,14.2,14.9,2.700519e+09,1818326.0,55.0,1.066408e+10,278613300.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,0.0,0.0,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000e+00,0.0,0.0,0.000000e+00,0.0
196,0.0,0.0,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000e+00,0.0,0.0,0.000000e+00,0.0
197,0.0,0.0,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000e+00,0.0,0.0,0.000000e+00,0.0
198,0.0,0.0,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000e+00,0.0,0.0,0.000000e+00,0.0


In [18]:
for i in range(100):
    new_row = pd.Series({'horodate': 2023-05-10 21:30:00+02:00, 'Colonne2': valeur2, ...})  # Remplacez "Colonne1", "Colonne2", ... par les noms de vos colonnes et "valeur1", "valeur2", ... par les valeurs correspondantes pour chaque ligne
    df = df.append(new_row, ignore_index=True)

# Afficher le dataframe résultant


SyntaxError: ignored

In [19]:
parameters = (2,1,1)
errors, predictions = evaluate_arimax_model(
    X_train[TARGET],
    X_t[TARGET],
    X_train[EXOGENEOUS],
    X_t[EXOGENEOUS],
    parameters
    )



  warn('Non-stationary starting autoregressive parameters'
  warn('Non-invertible starting MA parameters found.'


In [20]:
X_t

Unnamed: 0.1,Unnamed: 0,Mois,Injection RTE (W),Refoulement RTE (W),Pertes modélisées (W),consommation_totale,Consommation totale télérelevée (W),Consommation HTA télérelevée (W),Consommation totale profilée (W),Consommation HTA profilée (W),...,Production décentralisée profilée (W),Production photovoltaïque profilée (W),Production autre profilée (W),Température réalisée lissée (°C),Température normale lissée (°C),Production éolienne totale (W),Production photovoltaïque totale (W),Pseudo rayonnement,Consommation HTA totale (W),Soutirage net vers autres GRD (W)
0,0.0,5.0,3.218808e+10,1.144189e+09,2.106770e+09,3.294440e+10,1.476568e+10,1.102946e+10,1.817872e+10,8238412.0,...,5741565.0,7202.0,1724126.0,14.4,15.2,2.563959e+09,2307096.0,50.0,1.103770e+10,249722700.0
1,0.0,5.0,3.369206e+10,1.107553e+09,2.246126e+09,3.428132e+10,1.451944e+10,1.093526e+10,1.976188e+10,7946833.0,...,5796968.0,4230.0,1724126.0,14.3,15.1,2.545772e+09,2021571.0,53.0,1.094320e+10,281329083.0
2,0.0,5.0,3.307972e+10,1.178911e+09,2.202671e+09,3.371477e+10,1.427900e+10,1.082357e+10,1.943577e+10,7736071.0,...,5773640.0,4483.0,1724126.0,14.2,15.0,2.635203e+09,1933188.0,55.0,1.083131e+10,288002558.0
3,0.0,5.0,3.262051e+10,1.196851e+09,2.174754e+09,3.330184e+10,1.404498e+10,1.070550e+10,1.925686e+10,7448628.0,...,5673768.0,6123.0,1724126.0,14.2,14.9,2.675458e+09,1843765.0,55.0,1.071295e+10,291647217.0
4,0.0,5.0,3.153582e+10,1.225213e+09,2.083524e+09,3.229186e+10,1.392460e+10,1.065686e+10,1.836727e+10,7221505.0,...,5734526.0,20486.0,1724126.0,14.2,14.9,2.700519e+09,1818326.0,55.0,1.066408e+10,278613300.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,0.0,0.0,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000e+00,0.0,0.0,0.000000e+00,0.0
196,0.0,0.0,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000e+00,0.0,0.0,0.000000e+00,0.0
197,0.0,0.0,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000e+00,0.0,0.0,0.000000e+00,0.0
198,0.0,0.0,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000e+00,0.0,0.0,0.000000e+00,0.0


## Search for the best ARIMA model
We use grid search to search for the best ARIMA parameters that gives the lowest error. This follows the Box-Jenkins methology.

In [21]:
best_cfg, best_score = arimax_grid_search(X_train[TARGET],
                                            X_test[TARGET],
                                         X_train[EXOGENEOUS],
                                         X_test[EXOGENEOUS],
                                            range(1,3),range(0,3),range(0,3))

ARIMAX(1,0,0) RMSE=3731990544949408256.000
ARIMAX(1,0,1) RMSE=81982533378624208896.000




ARIMAX(1,0,2) RMSE=767421171211751795982336.000
ARIMAX(1,1,0) RMSE=402687624679173248.000
ARIMAX(1,1,1) RMSE=412941514895165312.000
ARIMAX(1,1,2) RMSE=388435846728852544.000
ARIMAX(1,2,0) RMSE=454823868282656832.000
ARIMAX(1,2,1) RMSE=456078297012604480.000
ARIMAX(1,2,2) RMSE=457983412346594752.000
ARIMAX(2,0,0) RMSE=2250822671491883264.000
ARIMAX(2,0,1) RMSE=96462714333990748160.000




ARIMAX(2,0,2) RMSE=13498845818685838503968768.000
ARIMAX(2,1,0) RMSE=399637185980680320.000


  warn('Non-stationary starting autoregressive parameters'
  warn('Non-invertible starting MA parameters found.'


ARIMAX(2,1,1) RMSE=402759836287938496.000
ARIMAX(2,1,2) RMSE=398934802062945088.000
ARIMAX(2,2,0) RMSE=456146767227266240.000


  warn('Non-stationary starting autoregressive parameters'
  warn('Non-invertible starting MA parameters found.'


ARIMAX(2,2,1) RMSE=458625572093431104.000
ARIMAX(2,2,2) RMSE=456493363601663168.000
Best ARIMAX(1, 1, 2) MSE=388435846728852544.000


In [22]:
print(best_cfg, best_score)

(1, 1, 2) 3.8843584672885254e+17


In [23]:
import statsmodels.api as sm
model = sm.tsa.ARIMA(X_train[TARGET], order=(2,1,1))
fitted = model.fit()
fitted.summary()

  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  warn('Non-stationary starting autoregressive parameters'
  warn('Non-invertible starting MA parameters found.'


0,1,2,3
Dep. Variable:,consommation_totale,No. Observations:,900.0
Model:,"ARIMA(2, 1, 1)",Log Likelihood,-19675.396
Date:,"Thu, 29 Jun 2023",AIC,39358.793
Time:,13:28:04,BIC,39377.998
Sample:,0,HQIC,39366.13
,- 900,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
ar.L1,0.2497,0.193,1.296,0.195,-0.128,0.627
ar.L2,0.1250,0.060,2.092,0.036,0.008,0.242
ma.L1,0.0219,0.195,0.112,0.911,-0.361,0.405
sigma2,3.675e+17,,,,,

0,1,2,3
Ljung-Box (L1) (Q):,170.61,Jarque-Bera (JB):,9389.41
Prob(Q):,0.0,Prob(JB):,0.0
Heteroskedasticity (H):,0.57,Skew:,-1.65
Prob(H) (two-sided):,0.0,Kurtosis:,18.48


## Visualization
To have a better view on the difference between true and predict values, we visualize them by plotting both the signals.

In [24]:
# prepare the dataset for plotting
df_reset = df.reset_index()
predict_date = df_reset["horodate"]
df_predict = pd.DataFrame(zip(predict_date[-100:],
                              predictions, X_test[TARGET].values),
                          columns=["date", "predict", "true"])

In [25]:
df_predict

Unnamed: 0,date,predict,true
0,2023-05-10 22:00:00+02:00,3.272017e+10,3.294440e+10
1,2023-05-10 22:30:00+02:00,3.533271e+10,3.428132e+10
2,2023-05-10 23:00:00+02:00,3.327312e+10,3.371477e+10
3,2023-05-10 23:30:00+02:00,3.307255e+10,3.330184e+10
4,2023-05-11 00:00:00+02:00,3.158985e+10,3.229186e+10
...,...,...,...
95,2023-05-12 21:30:00+02:00,3.250881e+10,3.253997e+10
96,2023-05-12 22:00:00+02:00,3.298893e+10,3.279717e+10
97,2023-05-12 22:30:00+02:00,3.527444e+10,3.426999e+10
98,2023-05-12 23:00:00+02:00,3.340724e+10,3.375851e+10


In [29]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score, accuracy_score
import statsmodels.api as sm

model = sm.tsa.ARIMA(X_train[TARGET], order=(1,1,2))
model_fit = model.fit()

# Faire des prédictions sur l'ensemble de test

predictions = model_fit.predict(start=0, end=99)

mae = mean_absolute_error(X_test[TARGET], predictions)
rmse = mean_squared_error(X_test[TARGET], predictions, squared=False)
r2 = r2_score(X_test[TARGET], predictions)

print("MAE:", mae)
print("RMSE:", rmse)
print("R²:", r2)

  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


MAE: 4798225142.591723
RMSE: 6644142375.605713
R²: -2.7316524847200094


In [33]:
df_predict.describe()

Unnamed: 0,predict,true
count,100.0,100.0
mean,33061680000.0,33107450000.0
std,3616209000.0,3456772000.0
min,25799850000.0,25938140000.0
25%,31352390000.0,31179770000.0
50%,33172840000.0,33377770000.0
75%,36090860000.0,35724360000.0
max,39229540000.0,38865680000.0


In [34]:
from datetime import timedelta
d = timedelta(days=357)

data_future = df_predict["date"] + d
X_t['horodate'] = X_t['horodate'] + d
X_t

KeyError: ignored

In [35]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=df_predict["date"], y=df_predict["predict"], name="predict"))
fig.add_trace(go.Scatter(x=df_predict["date"], y=df_predict["true"], name="true"))

fig.update_layout(title="Predictions Juin 2023")

# Modeling with other models

Try other other models : random forest, xgboost ...