Skip to content

CZ-NIC/silicon-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SIngle-LInkage CONnectivity clustering

A Python library for a fast approximation of single-linkage clustering with given eclidean distance or cosine similarity threshold.

Supports both dense arrays (numpy) and sparse matrices (scipy) and visualisation via matplotlib. The algorithm easily clusters 10000 points in 30 dimensions in under a second.

This module grew into a separate library strating as an data exploratory project in CZ.NIC labs to cluster captured honeypot records.

Install from PyPI with pip install silicon-clustering

Github: CZ-NIC/silicon-clustering

Docs: silicon-clustering.readthedocs.io

Author: Tomáš Gavenčiak, gavento@ucw.cz

https://travis-ci.org/CZ-NIC/silicon-clustering.svg?branch=master

Usage example

import silicon, numpy
# use a fixed seed to get the same data
rnd = numpy.random.RandomState(42)
# create some data
data = rnd.uniform(-1.0, 1.0 ,(1000, 3))
# create and run the clustering instance
c = silicon.CosineClustering(data, rnd=rnd, sim_threshold=0.995)
c.run()
print(c.clusters())

import matplotlib.pyplot as plt
# plot the data overview
c.plot()
plt.show()
# plot the largest cluster
c.clusters_by_size()[0].plot()
plt.show()