Subspace clustering and coclustering course at the School of Electrical and Computer Engineering (FEEC-UNICAMP). I just want to collect repositories and useful materials for clustering course. Official site by Prof. Fernando Von Zuben and Rosana Veroneze here.
Clustering or cluster analysis takes a mass of observations and separates them, using some measure of dissimilarity, into distinct groups called clusters (disjoint subsets of the whole dataset). Each group is expected to exhibit some internal homogeneity.
- Linear Algebra by Prof. Strang - Amazing classes of linear algebra
- The 5 Clustering Algorithms Data Scientists Need to Know - Introduction to well-known clustering algorithms for Data Scientists
-
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering - Hans-Peter Kriegel, et. reference 18 of class material
-
Subspace clustering - Hans-Peter Kriegel, et. reference 19 of class material
-
A TUTORIAL ON SUBSPACE CLUSTERING - R. Vidal
-
Subpace Clustering for High Dimensional Data: A Review - L. Parsons, E. Haque, and H. Liu. reference 25 of class material
-
ClustNails: Visual Analysis of Subspace Clusters - The paper has good figures for understanding subspace clustering.
-
A Geometric Analysis of Subspace Clustering with Outliers - Another paper that has good figures for understanding subspace clustering.
scikit learn package has many clustering algorithms implementation, you can see here. How to install on Ubuntu 16.04 (needs numpy and scipy):
- Python 2.7
$ pip install -U scikit-learn
- Python 3.5
$ pip3 install -U scikit-learn
-
Fuzzy c-means. Other good example here
-
Self-organizing maps- MiniSom is a minimalistic and Numpy based implementation of the Self Organizing Maps (SOM)
-
ProClus: The ProClus Algorithm for Projected Clustering- R package
-
Subpace clustering- R package