Skip to content

GiulioRossetti/cdlib_datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

CDlib datasets

CDlib logo

Remote repository of public domain network datasets (along with their ground truth clustering) for the CDlib libray.

For instructions on how to load the data within CDlib refer to the official documentation

Available datasets

Here the list of available network datasets - both real and synthetically generated.

Real world

Network Name Network Type Upstream
Karate Club Social UCINET
Youtube Social SNAP
DBLP Scientific Collaboration SNAP
Amazon Co-Purchases SNAP

Synthetic

LFR Benchmark datasets:

Set of networks with planted community partitions generated using the networkx implementation of the Lancichinetti-Fortunato-Radicchi benchmark.

“Benchmark graphs for testing community detection algorithms”, Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi, Phys. Rev. E 78, 046110 2008

Dataset names follows the pattern

LFR_N{number of nodes}_ad{average degree}_mc{min community size}_mu{mixing coefficient}

where:

  • number of nodes: [1000, 5000, 10000, 50000, 100000]
  • average degree: [5]
  • min community size: [50]
  • mixing coefficient: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]

The power law exponent for the degree distribution is fixed at 3, while for the community size distribution to 1.5

About

Network datasets with ground truth clusterings

Topics

Resources

Stars

Watchers

Forks