Fastcluster.jl

Julia wrapper to Daniel Muellner's fastcluster library for hierarchical clustering.

Installation

Pkg.clone("http://github.com/jmboehm/Fastcluster.jl.git")

Usage

The main function is

linkage(d::Array{T,2}, method::Symbol) where {T<:Real}

which returns a tuple m, h that contains the dendrogram information. The input arguments are:

d::Array{Float64,2} is the dissimilarity matrix between the points to cluster. You can use the Distances.jl package to generate the dissimilarity matrix (see example below).
method::Symbol is one of the following: :single, :complete, :average, :weighted, :ward, :centroid, :median. These clustering methods are described in the documentation of fastcluster. Note that the behavior of :ward is different to those in the R and Python interfaces (see below).

The function

linkage!(d::Array{T,2}, method::Symbol) where {T<:Real}

is a memory-saving alternative that allows fastcluster to overwrite some content in d, instead of allocating more memory for the computations.

Finally, you cut the dendrogram at a particular height to get a specified number of clusters k with the function

function cutree(m::Vector{Int32}, nobs::Int64, k::Int64)

where

m::Vector{Int32} is the m component of the dendrogram returned by linkage().
nobs::Int64 is the number of original observations. By default, that is (length(m)>>1)+1
k::Int64 is the desired number of clusters. The behavior of this function is very similar their counterparts in R and python.

Example

using RDatasets, Fastcluster

df = dataset("datasets", "iris")

points = convert(Array{Float64,2},df[:,[:SepalWidth, :SepalLength]])
d = pairwise(Euclidean(), points, dims=1)
m,h = linkage(d, :single)
cut = cutree(m,(length(m)>>1)+1,3)

Important Caveat for Ward Linkage

NOTE: The methods :ward, :centroid, and :median the function assumes that the distance metric used is the squared Euclidean distance (e.g. SqEuclidean() in Distances.jl). This is different to the R interface of fastcluster, which, for the Ward.D2 method, operates on the squares of the distances that are passed to the hclust function. (The Python interface operates on the squares of the distances passed to the linkage function for all three methods, :ward, :centroid, and :median.) We choose this way in order to save on memory.

Hence, the following two snippets produce the same output:

using RDatasets, Fastcluster
df = dataset("datasets", "iris")
points = convert(Array{Float64,2},df[:,[:SepalWidth, :SepalLength]])
d = pairwise(SqEuclidean(), points, dims=1)
m,h = linkage(d, meth)
cut = Fastcluster.cutree(m,(length(m)>>1)+1,3)

using RDatasets, Fastcluster
using RCall

df = dataset("datasets", "iris")
points = convert(Array{Float64,2},df[:,[:SepalWidth, :SepalLength]])
d2 = pairwise(Euclidean(), points, dims=1)
@rput d
R"library('fastcluster')"
R"clusters <- hclust(as.dist(d), \"ward.D2\")"
R"clusterCut <- cutree(clusters, k = 3)"
@rget clusterCut

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
deps		deps
src		src
test		test
.travis.yml		.travis.yml
CITATION.txt		CITATION.txt
LICENSE.md		LICENSE.md
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deps

deps

src

src

test

test

.travis.yml

.travis.yml

CITATION.txt

CITATION.txt

LICENSE.md

LICENSE.md

Manifest.toml

Manifest.toml

Project.toml

Project.toml

README.md

README.md

Repository files navigation

Fastcluster.jl

Installation

Usage

Example

Important Caveat for Ward Linkage

About

Releases 1

Packages

Languages

License

jmboehm/Fastcluster.jl

Folders and files

Latest commit

History

Repository files navigation

Fastcluster.jl

Installation

Usage

Example

Important Caveat for Ward Linkage

About

Resources

License

Stars

Watchers

Forks

Languages