Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

334 adaptive conformal predictions for time series #1

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
0b85b9f
check lower and upper bounds for regression coverage
JumpingDino Sep 11, 2023
111d96b
add a function to check array nans and infs
JumpingDino Sep 11, 2023
2cd0093
minor fix to output bool
JumpingDino Sep 11, 2023
0984a54
adding nan checks
JumpingDino Sep 11, 2023
ec1950c
correct raise in nan checking. create check array length
JumpingDino Sep 12, 2023
9a2bacd
add tests
JumpingDino Sep 12, 2023
be293d9
add checks in runtime
JumpingDino Sep 12, 2023
eb32639
fixing checks ordering
JumpingDino Sep 12, 2023
6749c93
evaluation of tests
JumpingDino Sep 12, 2023
8956c3c
flake8 linting
JumpingDino Sep 12, 2023
5b9ba62
fixes for mypy
JumpingDino Sep 12, 2023
d742a48
Merge branch 'scikit-learn-contrib:master' into 330-add-checks-for-me…
JumpingDino Sep 23, 2023
7af0305
break a function to check NaN and inf values and refactor NaN to appl…
rafael-saraiva-dh Sep 23, 2023
30cf7f9
refactor to test array to see if all values are NaNs
rafael-saraiva-dh Sep 23, 2023
5fc0810
add checks for infinite values
rafael-saraiva-dh Sep 23, 2023
426c926
add test for check_array_inf
rafael-saraiva-dh Sep 23, 2023
0832633
ordering imports
rafael-saraiva-dh Sep 25, 2023
c47900e
remove irrelevant try except
rafael-saraiva-dh Sep 25, 2023
678545f
check array length
rafael-saraiva-dh Sep 25, 2023
551bd7c
changing error string
rafael-saraiva-dh Sep 30, 2023
a2188b9
changing error string and check_array_inf function
rafael-saraiva-dh Sep 30, 2023
34f7968
linting
rafael-saraiva-dh Sep 30, 2023
ebb00f3
Update mapie/metrics.py
JumpingDino Oct 3, 2023
3aedeaa
Update mapie/utils.py
JumpingDino Oct 3, 2023
fcea354
Update mapie/utils.py
JumpingDino Oct 3, 2023
90f9ad2
fixing docstring from check_arrays_length
JumpingDino Oct 3, 2023
bf48fd3
Update metrics.py
thibaultcordier Oct 4, 2023
d8ad782
reordering cast
rafael-saraiva-dh Oct 4, 2023
73ca185
reformulating arrays length
rafael-saraiva-dh Oct 5, 2023
e0416b5
merge
rafael-saraiva-dh Oct 6, 2023
3d38ae9
add name and ypdate HISTORY.rst
rafael-saraiva-dh Oct 6, 2023
614293e
Merge pull request #358 from JumpingDino/330-add-checks-for-metrics
thibaultcordier Oct 9, 2023
73746e3
Merge branch 'master' into 334-adaptive-conformal-predictions-for-tim…
thibaultcordier Nov 10, 2023
6ce4d77
UPD: latex formula
Nov 10, 2023
8845539
FIX: remove problematic formula
Nov 10, 2023
6afbaf8
TMP: relax a test
thibaultcordier Nov 13, 2023
e37bb29
UPD: python and numpy version
thibaultcordier Nov 10, 2023
d7db261
UPD: change CI/CD on conditions
Nov 13, 2023
a236b77
Merge branch 'numpy-version-patch' into 334-adaptive-conformal-predic…
Nov 13, 2023
504b75f
UPD: improve metric tests
Nov 13, 2023
6bebaad
FIX: better value error raise
Nov 13, 2023
64bf885
UPD: improve docstring
Nov 13, 2023
ab76b99
UPD: style of test
Nov 13, 2023
f259f4f
UPD: format code example
Nov 13, 2023
f6a6e1d
UPD: better docstring
Nov 13, 2023
721d676
FIX: change mu to 1-alpha
Nov 14, 2023
bd013b9
FIX: ssl default
Nov 14, 2023
8052192
UPD: remove large files and transform nb to py file
Nov 14, 2023
60e8138
FIX: typo in results
Nov 14, 2023
fb848d4
UPD: add a header docstring
Nov 14, 2023
379db90
FIX: lint error
Nov 14, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
name: Unit tests

on: [push, pull_request]
on:
push:
branches:
-dev
-main
-master
pull_request:

jobs:
build:
Expand All @@ -19,10 +25,16 @@ jobs:
numpy-version: 1.21.4
- os: ubuntu-latest
python-version: "3.10"
numpy-version: 1.22.3
numpy-version: 1.22.4
- os: ubuntu-latest
python-version: "3.11"
numpy-version: 1.25.2
- os: windows-latest
python-version: "3.10"
numpy-version: 1.22.3
python-version: "3.11"
numpy-version: 1.25.2
- os: macos-latest
python-version: "3.11"
numpy-version: 1.25.2
defaults:
run:
shell: bash -l {0}
Expand Down
1 change: 1 addition & 0 deletions AUTHORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,6 @@ Contributors
* Sofiane Ziane <sziane@quantmetry.com>
* Remi Colliaux <rcolliaux@quantmetry.com>
* Arthur Phan <aphan@quantmetry.com>
* Rafael Saraiva <rafael.saraiva.de@gmail.com>

To be continued ...
2 changes: 2 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@ History

##### (##########)
------------------

* Add the Adaptative Conformal Inference (ACI) method for MapieTimeSeriesRegressor.
* Add the Coverage Width-based Criterion (CWC) metric.
* Add new checks for metrics calculations

0.7.0 (2023-09-14)
------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,6 @@ def get_1d_data_with_heteroscedastic_noise(
# Here, the noise is considered as *heteroscedastic*, since it will increase
# linearly with :math:`x`.


min_x, max_x, n_samples, noise = 0, 5, 300, 0.5
(
X_train, y_train, X_test, y_test, y_mesh
Expand Down Expand Up @@ -113,10 +112,9 @@ def get_1d_data_with_heteroscedastic_noise(

##############################################################################
# We then estimate the prediction intervals for all the strategies very easily
# with a
# `fit` and `predict` process. The prediction interval's lower and upper bounds
# are then saved in a DataFrame. Here, we set an alpha value of 0.05
# in order to obtain a 95% confidence for our prediction intervals.
# with a `fit` and `predict` process. The prediction interval's lower and
# upper bounds are then saved in a DataFrame. Here, we set an alpha value of
# 0.05 in order to obtain a 95% confidence for our prediction intervals.

STRATEGIES = {
"naive": dict(method="naive"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
from io import BytesIO
from typing import Any, Optional, Tuple
from urllib.request import urlopen
import ssl
from zipfile import ZipFile

import matplotlib.pyplot as plt
Expand Down Expand Up @@ -68,6 +69,7 @@ def get_X_y() -> Tuple[NDArray, NDArray]:
zip_folder = "BlogFeedback.zip"
csv_file = "blogData_train.csv"
url = website + page + folder + zip_folder
ssl._create_default_https_context = ssl._create_unverified_context
resp = urlopen(url)
zipfile = ZipFile(BytesIO(resp.read()))
df = pd.read_csv(zipfile.open(csv_file)).to_numpy()
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,282 @@
"""
======================================================================
Reproduction of part of the paper experiments of Zaffran et al. (2022)
======================================================================

:class:`~mapie.regression.MapieTimeSeriesRegressor` is used to reproduce a
part of the paper experiments of Zaffran et al. (2022) in their article [1]
which we argue that Adaptive Conformal Inference (ACI, Gibbs & Candès, 2021)
[2], developed for distribution-shift time series, is a good procedure for
time series with general dependency.

For a given model, the simulation adjusts the MAPIE regressors using aci
method, on a dataset taken from the article and available on the github
repository 'https://github.com/mzaffran/AdaptiveConformalPredictionsTimeSeries'
and compares the bounds of the PIs.

In order to reproduce the results of the github repository, we reuse the
``RandomForestRegressor`` regression model and follow the same conformal
prediction procedure (see 'https://github.com/mzaffran/\
AdaptiveConformalPredictionsTimeSeries/blob/\
131656fe4c25251bad745f52db3c2d7cb1c24bbb/models.py').

This simulation is carried out to check that the aci method implemented in
MAPIE gives the same results as [1], and that the bounds of the PIs are
obtained.

[1] Zaffran, M., Féron, O., Goude, Y., Josse, J., & Dieuleveut, A. (2022).
Adaptive conformal predictions for time series.
In International Conference on Machine Learning (pp. 25834-25866). PMLR.

[2] Gibbs, I., & Candes, E. (2021). Adaptive conformal inference under
distribution shift.
Advances in Neural Information Processing Systems, 34, 1660-1672.
"""
import warnings

from typing import Tuple
from urllib.request import urlopen
import ssl
import pickle

import datetime
import numpy as np
import pandas as pd
from matplotlib import pylab as plt
from sklearn.ensemble import RandomForestRegressor

from mapie.time_series_regression import MapieTimeSeriesRegressor

from mapie._typing import NDArray

warnings.simplefilter("ignore")


#########################################################
# Global random forests parameters
#########################################################

def init_model():
# the number of trees in the forest
n_estimators = 1000

# the minimum number of samples required to be at a leaf node
# (default skgarden's parameter)
min_samples_leaf = 1

# the number of features to consider when looking for the best split
# (default skgarden's parameter)
max_features = 6

model = RandomForestRegressor(
n_estimators=n_estimators,
min_samples_leaf=min_samples_leaf,
max_features=max_features,
random_state=1
)

return model


#########################################################
# Get data
#########################################################

def get_data() -> pd.DataFrame:
"""
TODO

Returns
-------
TODO
"""
website = "https://raw.githubusercontent.com/"
page = "mzaffran/AdaptiveConformalPredictionsTimeSeries/"
folder = "131656fe4c25251bad745f52db3c2d7cb1c24bbb/data_prices/"
file = "Prices_2016_2019_extract.csv"
url = website + page + folder + file
ssl._create_default_https_context = ssl._create_unverified_context
df = pd.read_csv(url)
return df


#########################################################
# Get & Present data
#########################################################

data = get_data()

date_data = pd.to_datetime(data.Date)

plt.figure(figsize=(10, 5))
plt.plot(date_data, data.Spot, color='black', linewidth=0.6)

locs, labels = plt.xticks()
new_labels = ['2016', '2017', '2018', '2019', '2020']
plt.xticks(locs[0:len(locs):2], labels=new_labels)

plt.xlabel('Date')
plt.ylabel('Spot price (\u20AC/MWh)')

plt.show()

#########################################################
# Prepare data
#########################################################

limit = datetime.datetime(2019, 1, 1, tzinfo=datetime.timezone.utc)
id_train = data.index[pd.to_datetime(data['Date'], utc=True) < limit].tolist()
id_test = data.index[pd.to_datetime(data['Date'], utc=True) >= limit].tolist()

data_train = data.iloc[id_train, :]
data_test = data.iloc[id_test, :]

features = ['hour'] + ['dow_%d' % i for i in range(7)] \
+ ['lag_24_%d' % i for i in range(24)] \
+ ['lag_168_%d' % i for i in range(24)] + ['conso']
X_train = data_train.loc[:, features]
y_train = data_train.Spot

X_train_0 = X_train.loc[X_train.hour == 0]
y_train_0 = data_train.loc[data_train.hour == 0, 'Spot']

X_test = data_test.loc[:, features]
y_test = data_test.Spot

X_test_0 = X_test.loc[X_test.hour == 0]
y_test_0 = data_test.loc[data_test.hour == 0, 'Spot']


#########################################################
# Prepare model
#########################################################

alpha = 0.1
gap = 1
iteration_max = 10
gamma = 0.04

model = init_model()
mapie_aci = MapieTimeSeriesRegressor(
model, method="aci", agg_function="mean", n_jobs=-1
)


#########################################################
# Reproduce experiment and results
#########################################################

all_x_train = [
np.array(data_train.loc[data_train.hour == h]) for h in range(24)
]

train_size = all_x_train[0].shape[0]
idx = np.array(range(train_size))
n_half = int(np.floor(train_size/2))

X_train_0 = X_train_0[:n_half]
y_train_0 = y_train_0[:n_half]

mapie_aci = mapie_aci.fit(X_train_0, y_train_0)
y_pred_aci_npfit, y_pis_aci_npfit = mapie_aci.predict(
X_test_0, alpha=alpha, ensemble=True, optimize_beta=False
)
print("MAPIE estimator fitted!")

# step 0
y_pred_aci_pfit = np.zeros(y_pred_aci_npfit.shape)
y_pis_aci_pfit = np.zeros(y_pis_aci_npfit.shape)
y_pred_aci_pfit[:gap], y_pis_aci_pfit[:gap, :, :] = mapie_aci.predict(
X_test_0.iloc[:gap, :], alpha=alpha, ensemble=True, optimize_beta=False
)

# step t
for step in range(1, min(len(X_test_0), iteration_max+1), gap):

mapie_aci.estimator_.single_estimator_.fit(
X_test_0.iloc[(step - gap):step, :],
y_test_0.iloc[(step - gap):step]
)

mapie_aci.partial_fit(
X_test.iloc[(step - gap):step, :],
y_test.iloc[(step - gap):step],
)

mapie_aci.adapt_conformal_inference(
X_test_0.iloc[(step - gap):step, :],
y_test_0.iloc[(step - gap):step],
gamma=gamma
)

(
y_pred_aci_pfit[step:step + gap],
y_pis_aci_pfit[step:step + gap, :, :],
) = mapie_aci.predict(
X_test_0.iloc[step:(step + gap), :],
alpha=alpha,
ensemble=True,
optimize_beta=True
)

results = y_pis_aci_pfit.copy()


#########################################################
# Get referenced result to reproduce
#########################################################

def get_pickle() -> Tuple[NDArray, NDArray]:
"""
TODO

Returns
-------
TODO
"""
website = "https://github.com/"
page = "mzaffran/AdaptiveConformalPredictionsTimeSeries/raw/"
folder = "131656fe4c25251bad745f52db3c2d7cb1c24bbb/results/"
folder += "Spot_France_Hour_0_train_2019-01-01/"
file = "ACP_0.04_RF.pkl"
url = website + page + folder + file
ssl._create_default_https_context = ssl._create_unverified_context
try:
loaded_data = pickle.load(urlopen(url))
except FileNotFoundError:
print(f"The file {file} was not found.")
except Exception as e:
print(f"An error occurred: {e}")
return loaded_data


data_ref = get_pickle()


#########################################################
# Compare results
#########################################################

# Flatten the array to shape (n, 2)
results_ref = np.concatenate(
[data_ref['Y_inf'], data_ref['Y_sup']], axis=0
).T
results = np.array(results.reshape(-1, 2))

# Compare the NumPy array with the corresponding DataFrame columns
comparison_result_Y_inf = np.array_equal(
results[:iteration_max, 0],
results_ref[:iteration_max, 0]
)
comparison_result_Y_sup = np.array_equal(
results[:iteration_max, 1],
results_ref[:iteration_max, 1]
)

# Print the comparison results
print(f"Comparison (Y_inf): {results[:iteration_max, 0]}")
print(f"Comparison (Y_inf ref): {results_ref[:iteration_max, 0]}")
print(f"Comparison (Y_sup): {results[:iteration_max, 1]}")
print(f"Comparison (Y_sup ref): {results_ref[:iteration_max, 1]}")
print(f"Comparison for ACP_0.04 (Y_inf): {comparison_result_Y_inf}")
print(f"Comparison for ACP_0.04 (Y_sup): {comparison_result_Y_sup}")
5 changes: 0 additions & 5 deletions examples/regression/4-tutorials/plot_ts-tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,7 @@ class that block bootstraps the training set.
the size of the intervals will grow.
Conversely, if the real values are in the coverage,
the size of the intervals will decrease.

You can use a gamma coefficient to adjust the strength of the correction.
The correction formula is :math:`alpha {t} = alpha_{t-1} + gamma
(alpha - 1_{y_t \notin C{alpha_{t-1}}(X_t)})`.
Where :math:`C{alpha_t}` is the coverage given alpha at time t.
If gamma=0, it means we don't adapt the conformal inference.

References
----------
Expand Down
Loading