Skip to content

Datasets

melwes edited this page Aug 4, 2025 · 1 revision

A list of relevant datasets sorted by domain. All Datasets listed here should be publicly accessible. Feel free to add the datasets relevant to your domain here, via a pull-request.

Time-Series

Dataset - Name Tasks Short Description Link Labels Combined with Source Datasets Combined with Target Datasets Size Resolution License Papers Referencing it
MIMIC-III (Adult-AHRF dataset) Mortality Prediction Critical care database containing 58000 admission records (38645 adults and 7875 neonatal) MIMIC-III xxx - Child-AHRF dataset 38645 adults and 7875 neonatal xxx PhysioNet Credentialed Health Data License 1.5.0 Dissanayake & Fernando 2021
Child-AHRF dataset Mortality Prediction Child-AHRF contains a record of 398 children (admission records) TODO xxx MIMIC-III (Adult-AHRF dataset) - 398 children xxx xxx Dissanayake & Fernando 2021
Brains4Cars Advanced Assistance (ADAS) Brains4cars contains 1180 miles of freeway driving for 10 drivers Brains4Cars - 10 drivers and 1180 miles xxx xxx custom Dissanayake & Fernando 2021
Boiler Fault Detection Dataset Fault Detection Dataset for 3 boilers over 2 years (2014-2016) xxx xxx xxx xxx xxx xxx xxx Dissanayake & Fernando 2021
Air Quality Forecast Dataset Air Quality Forecast The dataset consists of air quality data, meteorological data, and weather forecast data covering 4 Chinese cities with each hour data for the 2014-2015 years. Air Quality Forecast Dataset xxx xxx xxx 2014-2015 years of 4 cities 1 datapoint/hour xxx Dissanayake & Fernando 2021
UCIHAR classification - activity recognition Data from three sensors: accelerometer, gyroscope and body sensors. The sensors were applied to 30 subjects. Each subject has perfromed six activities (walking, walking upstairs, walking downstairs, sitting and lying down). UCIHAR walking, walking upstairs, walking downstairs, sitting, lying down - cross-subject DA - cross-subject DA 30 subjects, 10299 instances xxx Creative Commons Attribution 4.0 International Ragab & Edele 2023
WISDM classification - activity recognition Accelerometer sensors were applied to 36 subjects. Each subject has perfromed six activities (walking, walking upstairs, walking downstairs, sitting and lying down). Highly imbalanced data recording for each subject. WISDM walking, walking upstairs, walking downstairs, sitting, lying down cross-subject-DA cross-subject-DA 36 subjects xxx xxx Ragab & Edele 2023
Heterogeneity Human Activity Recognition (HHAR) classification-activity recognition smartwatch and sensor readings od 9 subjects HHAR ‘Biking’, ‘Sitting’, ‘Standing’, ‘Walking’, ‘Stair Up’ and ‘Stair down’ cross-subject-DA cross-subject-DA 9 subjects, 43930257 instances xxx Creative Commons Attribution 4.0 International Ragab & Edele 2023
Sleep-EDF classification-sleep stage The sleep-edf database contains 197 whole-night PolySomnoGraphic sleep recordings, containing EEG, EOG, chin EMG, and event markers. Some records also contain respiration and body temperature. Corresponding hypnograms (sleep patterns) were manually scored by well-trained technicians according to the Rechtschaffen and Kales manual, and are also available. Sleep-EDF Wake, Non-Rapid Eye Movement stages N1, N2, N3, Rapid Eye Movement (REM) xxx cross-channel-DA cross-channel-DA 20 subjects, 197 whole-night PolySomnoGraphic sleep recordings Open Data Commons Attribution License v1.0 Ragab & Edele 2023
Machine Fault Diagnosis (MFD) Fault Detection Collected by Paderborn University to identify various types of incipient faults using vibration signals. The data were collected under four different operating conditions, and in our experiments, each of these conditions was treated as a separate domain. xxx xxx cross-condition-DA cross-condition-DA xxx xxx xxx Ragab & Edele 2023
VitalDB This is a comprehensive dataset of 6,388 surgical patients composed of intraoperative biosignals and clinical information. The biosignal data included in the dataset is high quality data such as 500 Hz waveform signals and numeric values at intervals of 1-7 seconds. More than 60 surgery related clinical information is also provided to help interpret the signals. VitalDB xxx xxx xxx xxx 6.388 patients custom -

Computer Vision

Natural Language Processing (NLP)

Speech

Multi-Modal

Clone this wiki locally