[timeseries] Speed up the train/val splitter #2586

shchur · 2022-12-20T14:54:03Z

Description of changes:

Speed up the append_suffix_to_item_id function
Replace the surprisingly slow DataFrame.loc[index] operation with DataFrame.query("item_id in @index").

Testing on a subset of 5000 items from the M5 competition dataset:

Using code currently on master:

Loaded dataset with 7559974 rows and 5000 items.
df.slice_by_timestep(None, -prediction_length): 5.75s
LastWindowSplitter.split(df, prediction_length): 63.83s

After current PR:

Loaded dataset with 7559974 rows and 5000 items.
df.slice_by_timestep(None, -prediction_length): 1.38s
LastWindowSplitter.split(df, prediction_length): 8.30s

Code for reproducing the results

import time
import pandas as pd
from autogluon.timeseries import TimeSeriesDataFrame
from autogluon.timeseries.splitter import LastWindowSplitter

prediction_length = 28
# Dataset consists of the first 5000 items of the M5 competition dataset
raw_data = pd.read_parquet("../m5/data/subset.parquet")
static = pd.read_parquet("../m5/data/static.parquet")

raw_data["item_id"] = raw_data["item_id"].astype("str")
static["item_id"] = static["item_id"].astype("str")
static.set_index("item_id", inplace=True)

print(f"Loaded dataset with {len(raw_data)} rows and {raw_data['item_id'].nunique()} items.")
df = TimeSeriesDataFrame(raw_data, static_features=static)

start = time.time()
df.slice_by_timestep(None, -prediction_length)
print(f"df.slice_by_timestep(None, -prediction_length): {time.time() - start:.2f}s")

start = time.time()
splitter = LastWindowSplitter()
train_data, val_data = splitter.split(df, prediction_length)
print(f"LastWindowSplitter.split(df, prediction_length): {time.time() - start:.2f}s")

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

timeseries/src/autogluon/timeseries/dataset/ts_dataframe.py

timeseries/tests/unittests/test_ts_dataset.py

github-actions · 2022-12-20T16:55:48Z

Job PR-2586-77c7f57 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2586/77c7f57/index.html

canerturkmen

LGTM! Just some questions.

timeseries/src/autogluon/timeseries/dataset/ts_dataframe.py

timeseries/src/autogluon/timeseries/splitter.py

github-actions · 2022-12-22T17:28:18Z

Job PR-2586-12d2b30 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2586/12d2b30/index.html

github-actions · 2022-12-23T13:51:14Z

Job PR-2586-7d41cb1 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2586/7d41cb1/index.html

shchur commented Dec 20, 2022

View reviewed changes

timeseries/src/autogluon/timeseries/dataset/ts_dataframe.py Outdated Show resolved Hide resolved

timeseries/tests/unittests/test_ts_dataset.py Outdated Show resolved Hide resolved

shchur requested a review from canerturkmen December 20, 2022 14:57

shchur added the module: timeseries related to the timeseries module label Dec 20, 2022

shchur mentioned this pull request Dec 20, 2022

[timeseries] Fix several bottlenecks in the training procedure #2577

Closed

shchur force-pushed the faster-splitter branch from 097d2a0 to 77c7f57 Compare December 20, 2022 15:27

shchur added this to the 0.6.2 Release milestone Dec 21, 2022

canerturkmen approved these changes Dec 22, 2022

View reviewed changes

timeseries/src/autogluon/timeseries/dataset/ts_dataframe.py Show resolved Hide resolved

timeseries/src/autogluon/timeseries/splitter.py Outdated Show resolved Hide resolved

timeseries/src/autogluon/timeseries/splitter.py Show resolved Hide resolved

shchur added 3 commits December 23, 2022 11:46

Speed up multi-window splitter

57d57bb

Move _append_suffix_to_item_id to a local method

6cf0cc1

Add test for static_features order

7d41cb1

shchur force-pushed the faster-splitter branch from 12d2b30 to 7d41cb1 Compare December 23, 2022 12:17

shchur merged commit 78b0426 into autogluon:master Dec 23, 2022

shchur deleted the faster-splitter branch December 23, 2022 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[timeseries] Speed up the train/val splitter #2586

[timeseries] Speed up the train/val splitter #2586

shchur commented Dec 20, 2022 •

edited

github-actions bot commented Dec 20, 2022

canerturkmen left a comment

github-actions bot commented Dec 22, 2022

github-actions bot commented Dec 23, 2022

[timeseries] Speed up the train/val splitter #2586

[timeseries] Speed up the train/val splitter #2586

Conversation

shchur commented Dec 20, 2022 • edited

github-actions bot commented Dec 20, 2022

canerturkmen left a comment

Choose a reason for hiding this comment

github-actions bot commented Dec 22, 2022

github-actions bot commented Dec 23, 2022

shchur commented Dec 20, 2022 •

edited