Skip to content

Latest commit

 

History

History
20 lines (12 loc) · 2.68 KB

File metadata and controls

20 lines (12 loc) · 2.68 KB

The purpose of these notebooks is to demonstrate the application of time series clustering techniques to split the target time series (TTS) data into homogenous chunks that may produce more accurate forecasts for the subsets of data when trained individually with the Forecast service. The intuition is that training Forecast models with clustered data will allow them to learn stronger patterns from homogenous subsets of the time series data.

As risk of over-fitting exists with very high cluster counts, we set num_clusters=3 with the intention of splitting the time series dataset into subsets of "fast moving", "slow moving", and "intermittent demand" items. Also, clustering techniques are not advised for datasets with fewer than a thousand time series since this could have limiting effect on deep learning models.

We leverage the tslearn.clustering module of Python package tslearn for clustering the time series dataset using the DTW Barycenter Averaging (DBA) KMeans algorithm with Dynamic Time Warping (DTW) distance as the metric.

The collection includes two notebooks, the first is optional relating to data cleaning / processing; and the main notebook relating to time series clustering. We use the open source UCI Online Retail II Data Set for this demonstration.

Please note, these notebooks cover the preprocessing and data preparation steps related to the clustering of Time Series data. The reader is referred to the Forecast Developers Guide for model training and evaluation.

Table of contents:

References: