Gregory Sharov edited this page Jul 26, 2016 · 2 revisions

Scipion Logo

The following three Spider protocols (Ward, Diday and K-means) provide means for clustering and classification of input particles, assuming that correspondance analysis (CA/PCA) was already done. All three protocols need input particle set, the output file from previous CA/PCA analysis and number of eigenfactors to use. K-means additionally requires desired number of classes.

spider - classify-diday

Performs automatic clustering using Diday’s method and Hierarchical Ascendant Classification (HAC) using Ward’s criterion on factors produced by CA or PCA. Uses the Spider CL CLA program.

spider - classify-kmeans

Performs automatic K-Means clustering and classification on factors produced by CA or PCA. Uses the Spider CL KM program. K-Means is a method of classification that divides the data into a user specified number of clusters. Two random images "seeds" are chosen, and their centers of gravity are computed. A partition is drawn down the middle between the centers, the new centers of gravity are computed, and the process is repeated for a given number of times. The final result is VERY dependent on which image seeds are the first chosen.

spider - classify-ward

This protocol finds clusters of images/elements in factor space (or a selected subspace) by using Diday’s method of moving centers, and applies hierarchical ascendant classification (HAC) (using Ward’s method) to the resulting cluster centers. Uses the Spider CL HC program.

Analyzing results

When the protocol is finished you may click on the Analyze Results button. The main results are:

  • dendrogram (fig. 1). The class relationships are represented in the form of a dendrogram (tree structure) for both Ward and Diday protocols. Every vertical line at the bottom of the dendrogram (with no cut-off) represents an input particle. Each vertical line is an average of the images, or vertical lines, below it. The threshold is a scaled value from 0 to 100 (default is 0.5) that informs the viewer how far "up" the dendrogram you wish it to look. A threshold set at the bottom would result in the number of classes being equal to the number of input images. A median threshold value of 50 results in fewer classes. And a top level threshold gives a single class containing of all the inputs.

  • class averages (fig. 2). While K-means protocol allows to simply display produced class averages, Ward and Diday protocols can produce class averages as an alternative to the dendrogram for evaluation of clustering results. The key parameter maximum level defines how many clusters levels will be displayed, with a class average calculated for each cluster. You can imagine this as looking at your dendrogram from the top, when all clusters below maximum level will be merged.

Figure 1. Dendrogram of clusters, showing class number and number of particles in it.

Figure 2. Class averages dendrogram.


Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.