Skip to content
A collection of commonly used datasets as benchmarks for density estimation
Python
Branch: master
Clone or download
Pull request Compare This branch is 7 commits behind arranger1044:master.
Latest commit 25c2f34 Dec 4, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
datasets moviereview README and metadata files Nov 8, 2016
README.md cite Dec 4, 2017
utils.py Adding the first 20 datasets from Lowd 2010 and Van Haaren 2012 Nov 7, 2015

README.md

Density Estimation Benchmark Datasets

A collection of datasets used in machine learning for density estimation.

If you use any of the datasets, you should cite their original papers.123

Datasets

Dataset type #vars #train #valid #test density abbrv
NLTCS1 binary 16 16181 2157 3236 0.332 NLTCS
MSNBC1 binary 17 291326 38843 58265 0.166 msnbc
KDDCup2k1 binary 65 180092 19907 34955 0.008 kdd
Plants1 binary 69 17412 2321 3482 0.180 plants
Audio1 binary 100 15000 2000 3000 0.199 baudio
Jester1 binary 100 9000 1000 4116 0.608 jester
Netflix1 binary 100 15000 2000 3000 0.541 bnetflix
Accidents2 binary 111 12758 1700 2551 0.291 accidents
Retail2 binary 135 22041 2938 4408 0.024 tretail
Pumsb-star2 binary 163 12262 1635 2452 0.270 pumsb_star
DNA2 binary 180 1600 400 1186 0.253 dna
Kosarek2 binary 190 33375 4450 6675 0.020 kosarek
MSWeb1 binary 294 29441 3270 5000 0.010 MSWeb
Book1 binary 500 8700 1159 1739 0.016 book
EachMovie1 binary 500 4525 1002 591 0.059 tmovie
WebKB1 binary 839 2803 558 838 0.064 cwebkb
Reuters-521 binary 889 6532 1028 1540 0.036 cr52
20 NewsGroup1 binary 910 11293 3764 3764 0.049 c20ng
Movie reviews3 binary 1001 1600 150 250 0.140 moviereview
BBC2 binary 1058 1670 225 330 0.078 bbc
Voting3 binary 1359 1214 200 350 0.333 voting
Ad2 binary 1556 2461 327 491 0.008 ad

Introduced in:

1 Daniel Lowd, Jesse Davis: Learning Markov Network Structure with Decision Trees. ICDM 2010

2 Jan Van Haaren, Jesse Davis: Markov Network Structure Learning: A Randomized Feature Generation Approach. AAAI 2012

3 Jessa Bekker, Jesse Davis, Arthur Choi, Adnan Darwiche, Guy Van den Broeck: Tractable Learning for Complex Probability Queries. NIPS 2015

You can’t perform that action at this time.