Skip to content

QEDan/links_clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Links Online Clustering

Python implementation of the Links Online Clustering algorithm: https://arxiv.org/abs/1801.10123

Title: Links: A High-Dimensional Online Clustering Method

Authors: Philip Andrew Mansfield, Quan Wang, Carlton Downey, Li Wan, Ignacio Lopez Moreno

Overview

This is a clustering algorithm for online data. That is, it will predict cluster membership for vectors that it is shown one-by-one. It does not require examining the entire dataset to predict cluster membership.

It works by maintaining a two-level hierarchy of clusters and subclusters. Each subcluster has a centroid that is compared with new vector for prediction using cosine similarity. Depending on the previous data that has been seen, the new data point can be assigned to an existing cluster/subcluster, assigned to a new subcluster within an existing cluster, or assigned to a new subcluster and cluster.

Instantiating the class requires 3 hyperparameters:

  • cluster_similarity_threshold
  • subcluster_similarity_threshold
  • pair_similarity_maximum

These details are best understood by reading the paper.

Installation

pip install -r requirements.txt

Usage example

from links_cluster import LinksCluster

...
links_cluster = LinksCluster(cluster_similarity_threshold, subcluster_similarity_threshold, pair_similarity_maximum)
for vector in data:
    predicted_cluster = links_cluster.predict(vector)

For more usage examples, see the tests.

About

Implementation of the Links Online Clustering algorithm: https://arxiv.org/abs/1801.10123

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages