Grow your team on GitHub
GitHub is home to over 28 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.Sign up
A toolset to work with the Wikidata Graph
Generalized Conventional Mutual Information (GenConvMI) - NMI for overlapping (soft, fuzzy) clusters (communities), compatible with standard NMI, pure C++ version (single executable)
Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling Clustering Algorithms on NUMA Architectures
Extremely fast evaluation of the extrinsic clustering measures: various (mean) F1 measures and Omega Index (Fuzzy Rand Index) for the multi-resolution clustering with overlaps/covers, standard NMI, clusters labeling
Overlapping Normalized Mutual Information and Omega Index evaluation for the overlapping community structure produced by clustering algorithms
Python Multi-Process Execution Pool: concurrent asynchronous execution pool with custom resource constraints (memory, timeouts, affinity, CPU cores and caching), load balancing and profiling capabilities of the external apps on NUMA architecture
This repository contains the pipeline for table detection/extraction from 'Bundesarchive' documents.
Implementation of HistoSketch and D2HistoSketch in MATLAB
Resolution levels clustering merger with filtering and clusters deduplication. Flattens a hierarchy/list of multiple resolutions levels (clusterings) into the single flat clustering (collection), synchronizing the node base and deduplicating.
Python Benchmarking Framework for the Clustering Algorithms Evaluation: networks generation and shuffling; failover execution and resource consumption tracing (peak RAM RSS, CPU, ...); evaluation of Modularity, conductance, NMI and F1 Score for overlapping communities
RG (Randomized Greedy clustering), CGGC_RG (Core Groups Graph ensemble Clustering) or CGGCi_RG (Core Groups Graph ensemble Clustering Iterative) algorithms
Type Inference Evaluation Scripts & Accessory Apps (used for the StaTIX benchmarking)
Statistical Type Inference (both fully automatic and semi supervised) for RDF datasets
(Scalable) High-order proximity-preserving Unique node embeddings for undirected graphs
Network (Graph) Format Converter: RCG, Pajek, Metis, NSL (NCol, SNAP, ...)
Benchmark for Centroid Decomposition of streams
A collections of 30 random Wikipedia pages manually annotated with entities.
Log Analysis Stack
Extended version of the Lancichinetti-Fortunato-Radicchi Benchmark for Undirected Weighted Overlapping networks to evaluate clustering algorithms using generated ground-truth communities