Skip to content

dple/Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Datasets for Machine Learning


Data sets are classified in different machine learning problems

Anomaly Detection

Dataset No features No records Outliers Description
http 3 567497 2211 (0.4 %) Download
smtp 3 95156 30 (0.03 %) Download
annthyroid 6 6,832 534 (7.42 %) Source UCI
thyroid 6 3,772 93 (2.5 %) Source UCI
satelite 36 6,435 2036 (32%) Source UCI
pima 8 768 268 (35%) Pima Indians Diabetes Database was provided by National Institute of Diabetes and Digestive and Kidney Diseases. Download
arrhythmia 274 452 66 (15%) The aim is to determine the type of arrhythmia from the ECG recordings. Source UCI

Fraud Detection

Dataset No features No records Outliers Description
Credit Card Fraud Detection 31 284,807 492 (0.172%) The datasets contains transactions made by credit cards in September 2013 by european cardholders. The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group of ULB (Université Libre de Bruxelles). Download
IEEE-CIS Fraud Detection 434 569,877 20,633 (3 %) The dataset is provided by Vesta's real-world e-commerce transactions. The data is broken into two files identity and transaction, which are joined by TransactionID. Download

Streaming data

Dataset No features No records Outliers Description
Mulcross 4 262,144 2 dense clusters (10%) A synthetic multi-variate normal distribution with two dense anomaly clusters. Download
Covertype 10 286,048 0.9% Predicting forest cover type from cartographic variables only. Source UCI
Adult 6 35,760 class > 50k (3.21%) Prediction task is to determine whether a person makes over 50K a year. Source UCI
Weather 8 18,159 rain (5,698 - 31%) The National Oceanic and Atmospheric Administration (NOAA) measured weather from over 7,000 weather stations worldwide. Records date back to the mid-1900's providing a wide scope of weather trends. Daily measurements include a variety of features (temperature, pressure, wind speed, etc.) as well as a series of indicators for precipitation and other weather-related events. Source NOAA
Shuttle 9 49,097 classes 2,3,5-7 (7%) The shuttle dataset contains 9 attributes all of which are numerical. Approximately 80% of the data belongs to class 1. Source UCI
KDDCUP99 41 494,021 23 classes Kddcup99 stream was collected from the KDD CUP challenge in 1999, and the task is to build predictive models capable of distinguishing between intrusions and normal connections. Source KDD CUP challenge

About

Datasets for machine learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published