Skip to content

Build Agglomerative hierarchical clustering algorithm from scratch, i.e. WITHOUT any advance libraries such as Numpy, Pandas, Scikit-learn, etc.

Notifications You must be signed in to change notification settings

OlaPietka/Agglomerative-Hierarchical-Clustering-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AgglomerativeHierarchicalClusterFromScratching

Agglomerative hierarchical clustering algorithm from scratch (i.e. without advance libraries such as Numpy, Pandas, Scikit-learn, etc.)

Algorithm

During the clustering process, we iteratively aggregate the most similar two clusters, until there are $K$ clusters left. For initialization, each data point forms its own cluster.

Cluster similarity measures

The similarity of two clusters $C_i, C_j$ is determined by a distance measure.

Single link

equation

Complete_link

equation

Average link

equation

The smaller the distance is, the more similar the two clusters are. In the equations d(), is a distance measure between two data points, i.e. Euclidean distance, defined by: equation where p_i, q_i are dimensions of p, q

Sample usage

python main.py -d sample_input.txt -k 4 -m 0

About

Build Agglomerative hierarchical clustering algorithm from scratch, i.e. WITHOUT any advance libraries such as Numpy, Pandas, Scikit-learn, etc.

Topics

Resources

Stars

Watchers

Forks

Languages