Supported Datasets

PaddleTS currently supports dozens of datasets including the four major tasks of time series prediction, time series anomaly detection, time series imputation, and time series classification.

Time Series Prediction

Time series forecasting is one of the most important tasks in time series. At present, we integrate data from multiple scenarios such as electricity, weather, disease, and exchange rate for time series forecasting.

1. ETT-small

Data source: https://github.com/zhouhaoyi/ETDataset
Data brief introduction: Power transformer data containing 7 variables, used to forecast electricity demand in various regions. Divided into minute-level and hour-level data
Dataset names: ETTh1, ETTh2, ETTm1, ETTm2.

2. Weather

Data source: https://www.bgc-jena.mpg.de/wetter
Data introduction: contains 22 variables sampled weather data every ten minutes, used to predict the weather.
Dataset name: Weather.

3. ILI

Data source: https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html
Data brief introduction: Contains daily sampling of 22 variables, influenza disease data spanning 19 years, used to predict the proportion of influenza patients.
Dataset name: ILI.

4. Traffic

Data source: https://pems.dot.ca.gov
Data brief introduction: Contains 862 sensors, hourly road occupancy ratio.
Dataset name: Traffic.

5. Exchange

Data source: https://github.com/laiguokun/multivariate-time-series-data
Data profile: Contains daily sampled exchange rate data spanning 36 years.
Dataset name: Exchange.

6. ECL

Data source: https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014
Data introduction: Contains hourly electricity consumption data of 321 customers spanning 4 years, used to predict electricity demand in various regions.
Dataset name: ECL.

6. M4

Data source: https://www.kaggle.com/datasets/yogesh94/m4-forecasting-competition-dataset
Data introduction: 100,000 pieces of univariate data including demography, finance, industry, macroeconomics, microeconomics, etc.
Dataset name: M4-Yearly-train, M4-Yearly-test, M4-Monthly-train, M4-Monthly-test, M4-Weekly-train, M4-Weekly-test, M4-Daily-train, M4-Daily-test, M4-Hourly-train, M4-Hourly-test.

7. WTH

Data source: https://www.ncei.noaa.gov/data/local-climatological-data/
Data introduction: hourly weather data with 12 variables spanning 4 years, used to predict the weather in various regions.
Dataset names: WTH, UNI_WTH.

Time Series Anomaly Detection

Detecting anomalies from monitoring data is vital to industrial maintenance. We provide widely-used anomaly detection benchmarks: SMD, MSL, SMAP, SWaT, PSM, covering service monitoring, space & earth exploration, and water treatment applications.

1. SMD

Data source: https://github.com/NetManAIOps/OmniAnomaly/tree/master/ServerMachineDataset
Data brief introduction: SMD (Server Machine Dataset is a 5-week-long dataset that is collected from a large Internet company with 38 dimensions.
Dataset names: smd_train, smd_test.

2. SMAP

Data source: https://raw.githubusercontent.com/khundman/telemanom/master/labeled_anomalies.csv
Data brief introduction: SMAP is from NASA with 25 dimensions, which contain the telemetry anomaly data derived from the Incident Surprise Anomaly (ISA) reports of spacecraft monitoring systems.
Dataset names: smap_train, smap_test.

3. MSL

Data source: https://s3-us-west-2.amazonaws.com/telemanom/data.zip
Data brief introduction: MSL (Mars Science Laboratory) is from NASA with 55 dimensions, which contain the telemetry anomaly data derived from the Incident Surprise Anomaly (ISA) reports of spacecraft monitoring systems.
Dataset names: msl_train, msl_test.

4. SWAT

Data source: https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info
Data brief introduction: ) SWaT(Secure Water Treatment) is obtained from 51 sensors of the critical infrastructure system under continuous operations.
Dataset names: swat_train, swat_test.

5. PSM

Data source: https://cloud.tsinghua.edu.cn/d/9605612594f0423f891e/files/?p=%2FPSM%2Ftrain.csv
Data brief introduction: PSM (Pooled Server Metrics) is collected internally from multiple application server nodes at eBay with 26 dimensions.
Dataset names: psm_train, psm_test.

6. NAB_TEMP

Data source: https://github.com/numenta/NAB
Data brief introduction: The Numenta Anomaly Benchmark (NAB) provides streaming data to research anomaly detection algorithms. NAB_TEMP is the temperature dataset.
Dataset names: NAB_TEMP

Time Series Classification

1. UEA

Data source: https://www.timeseriesclassification.com/index.php
Data brief introduction: UEA Time Series Classification dataset includes 10 multivariate datasets, covering the gesture, action and audio recognition, medical diagnosis by heartbeat monitoring and other practical tasks.
Dataset names: EthanolConcentration_Train, EthanolConcentration_Test, FaceDetection_Train, FaceDetection_Test, Handwriting_Train, Handwriting_Test, Heartbeat_Train, Heartbeat_Test, JapaneseVowels_Train, JapaneseVowels_Test, PEMSSF_Train, PEMSSF_Test, SelfRegulationSCP1_Train, SelfRegulationSCP1_Test, SelfRegulationSCP2_Train, SelfRegulationSCP2_Test, SpokenArabicDigits_Train, SpokenArabicDigits_Test, UWaveGestureLibrary_Train, UWaveGestureLibrary_Test.

2. BasicMotions

Data source: https://timeseriesclassification.com/description.php?Dataset=BasicMotions
Data brief introduction: The data was generated as part of a student project where four students performed four activities whilst wearing a smart watch. here are classes: walking, resting, running and badminton. Participants were required to record motion a total of five times, and the data is sampled once every tenth of a second, for a ten second period.
Dataset names: BasicMotions_Train, BasicMotions_Test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

supported_datasets.rst

supported_datasets.rst

Supported Datasets

Time Series Prediction

1. ETT-small

2. Weather

3. ILI

4. Traffic

5. Exchange

6. ECL

6. M4

7. WTH

Time Series Anomaly Detection

1. SMD

2. SMAP

3. MSL

4. SWAT

5. PSM

6. NAB_TEMP

Time Series Classification

1. UEA

2. BasicMotions

Files

supported_datasets.rst

Latest commit

History

supported_datasets.rst

File metadata and controls

Supported Datasets

Time Series Prediction

1. ETT-small

2. Weather

3. ILI

4. Traffic

5. Exchange

6. ECL

6. M4

7. WTH

Time Series Anomaly Detection

1. SMD

2. SMAP

3. MSL

4. SWAT

5. PSM

6. NAB_TEMP

Time Series Classification

1. UEA

2. BasicMotions