# SARIMAX on electricity carbon intensity

The website [**electricitymap.org**](https://electricitymap.org) provides real-time data-viz about the origin of electricity consumed around the world. In particular, it provides (sub) hourly view of the CO2-intensity of electricity in grams of CO2-equivalent per kWh electricity consumed (gCO2e/kWh).

<img src='electricitymap.jpg' width = 500>


Carbon intensity flucutates a lot depending on seasons, weather conditions, imports from neighbors countries, etc...

Your goal is to **forecast the hourly carbon intensity of electricity in France up to 48 hours ahead** so as to inform when is best to consumer electricity (e.g. charge electric car)

## Challenge

In [0]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [0]:
# Load the 50Mo CSV!
df = pd.read_csv('https://wagon-public-datasets.s3.amazonaws.com/electricity_map_france.csv', parse_dates=['datetime'], index_col=['datetime'])
df = df['2016':]

In [0]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 58948 entries, 2013-01-01 23:00:00+00:00 to 2019-09-26 08:00:00+00:00
Data columns (total 72 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   timestamp                                     58948 non-null  int64  
 1   zone_name                                     58948 non-null  object 
 2   carbon_intensity_avg                          58870 non-null  float64
 3   carbon_intensity_production_avg               58870 non-null  float64
 4   carbon_intensity_discharge_avg                36841 non-null  float64
 5   carbon_intensity_import_avg                   28591 non-null  float64
 6   carbon_rate_avg                               42118 non-null  float64
 7   total_production_avg                          58870 non-null  float64
 8   total_storage_avg                             58870 non-null  float64
 9   total_discharg

### Your challenge
- Your goal is to predict `carbon_intensity_avg` up to 48h ahead.
- We have 6 years of data at hourly granularity! Enough to make proper `cross_validated` score of `rmpe` over the whole dataset!
- You have access to exogeneous forecast prefixed by `latest_forecasted_` 
    - e.g. you can use `latest_forecasted_price_avg(t+i)` when trying to predict `carbon_intensity_avg(t+i)`
    - for i in [1..48]

### Hints

- You can build a `SARMIAX` model with `exog` features
- Or, you can build your own "traditional" ML-based model, optimizing for the 48h ahead time horizon

## Your turn

In [0]:
y = df['carbon_intensity_avg']

In [0]:
%matplotlib widget
y.plot()
df.latest_forecasted_price_avg.plot()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.axes._subplots.AxesSubplot at 0x11f0ab550>