In [None]:
%matplotlib inline

# Load Data

The dataset is a small representative SPED dataset with a small number of components.

In [None]:
import hyperspy.api as hs
data = hs.load('./hyperspy-demo-data.hspy')
data.change_dtype('float64')

# Decomposition

Decompose data using PCA (avoids the first dimension representing the mean, which improves clustering).

Then, check the loadings to see if clusters actually exist in the data. Overly complex datasets won't work well.

In [None]:
data.decomposition(algorithm='sklearn_pca', output_dimension=2)

In [None]:
loadings = data.learning_results.loadings

In [None]:
import matplotlib.pyplot as plt
plt.scatter(loadings[:, 0], loadings[:, 1], s=1, c='k')
plt.axis('equal')

# Clustering

Clustering using skcmeans as the engine. The `hyperspy` submodule makes it work similar to `data.decomposition` above.

`alg` stores the algorithm object used for clustering. `skcmeans` algorithms come with limited plotting utilities.

## Probabilistic algorithm

The probabilistic algorithm is default. It produces approximately spherical clusters.

In [None]:
from skcmeans.hyperspy import cluster, get_cluster_centers, get_cluster_memberships
alg = cluster(data, n_clusters=3, use_decomposition_results=True)

In [None]:
alg.plot(loadings)

In [None]:
centers = get_cluster_centers(data) + data.mean()  # Get cluster centers as a hyperspy signal
# Because we used PCA, we must add the mean value to the centers found

hs.plot.plot_images(centers, tight_layout=True)  

In [None]:
memberships = get_cluster_memberships(data)  # Get cluster memberships as a hyperspy signal
hs.plot.plot_images(memberships, tight_layout=True)

## Gustafson-Kessel variation

Including the `gustafson_kessel` flag allows the clusters to adopt elliptical shapes. This will usually (though not always) improve results, but can take much longer on large datasets.

In [None]:
alg = cluster(data, n_clusters=3, use_decomposition_results=True, gustafson_kessel=True)

In [None]:
ax = plt.figure().add_subplot(111)
alg.plot(loadings, ax=ax)

In [None]:
centers = get_cluster_centers(data)
hs.plot.plot_images(centers + data.mean(), tight_layout=True, colorbar=None)

In [None]:
memberships = get_cluster_memberships(data)
hs.plot.plot_images(memberships, tight_layout=True, colorbar='single')

## Without decomposition

For some signals, we don't want to use the decomposition loadings for clustering. This is unusual in SPED, but to prove you can do it:

In [None]:
alg = cluster(data, n_clusters=3, use_decomposition_results=False)

In [None]:
centers = get_cluster_centers(data)
hs.plot.plot_images(centers + data.mean(), tight_layout=True, colorbar=None)

In [None]:
memberships = get_cluster_memberships(data)
hs.plot.plot_images(memberships, tight_layout=True, colorbar='single')