SpiderProtCAPCA

Jose Miguel de la Rosa Trevin edited this page May 21, 2016 · 2 revisions

Scipion Logo


spider - capca

This protocol runs Correspondence Analysis (CA) or Principal Component Analysis (PCA), based on CA S program from Spider.

The main goal of this method is a dimensionality reduction, i.e. expressing MxN image using only a few terms called eigenvectors. This way the whole data set can be expressed by a few eigenvectors in MxN dimensional space. Usually, CA is followed by classification.

Input parameters

The following input parameters have to be provided (fig. 1):

  • Input particles should preferably be aligned and low-pass filtered (optionally)

  • Analysis type: CA, PCA or iterative PCA (IPCA). CA is the preferred method of finding inter-image variations. PCA computes the distance between data vectors with Euclidean distances, while CA uses Chi-squared distance. CA is superior here because it ignores differences in exposure between images, eliminating the need to rescale the image densities. For very large problems (the size of covariance matrix in the order of thousands) the methods used for CA and PCA analysis are slow and inaccurate, the system may fail on numerical accuracy or enter an endless loop. In these cases use 'Iterative PCA analysis' instead. This same strategy may be useful if you get an error message: * ERROR: DIAGONALIZATION FAILURE when using CORAN.

  • Additive constant is usually set to 0 so the pixel values are rescaled automatically if necessary.

  • Number of factors (eigenvectors) - depends on the size of the data set and what you are trying to analyse, but about 25 is normally enough.

  • Optionally you can provide a mask that covers the area of interest you want to analyse.

01.SpiderProtCAPCA
Figure 1. GUI input form of the spider - capca protocol


Analyzing results

When the protocol is finished you may click on the Analyze Results button. Among numerous output files the main results are:

  • eigenimages (fig. 2, right). Qualitatively, eigenimages are the systematic variations of the input images. Think of expressing each image as the average plus some linear combination of the eigenimages. The best method to determine what eigenvectors are useful for further analysis, and which are from noise is to view a histogram showing the percentage of eigenvalue variance accounted for by each factor.

  • eigenvalue histogram (fig 2, left) - shows how much variation is accounted for each eigenfactor

  • factor maps - 2D plots of a pair of factors (e.g., 1 vs. 2) for each image. Once you know which eigenvectors have some meaning and which are from noise, you can display 2D factor maps of selected pairs of factors to visualize clustering (if any).

More details about analyzing results can be found here.

02.SpiderProtCAPCA Results
Figure 2. Displaying CA/PCA results


References

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.