Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
spider - capca
This protocol runs Correspondence Analysis (CA) or Principal Component Analysis (PCA), based on CA S program from Spider.
The main goal of this method is a dimensionality reduction, i.e. expressing MxN image using only a few terms called eigenvectors. This way the whole data set can be expressed by a few eigenvectors in MxN dimensional space. Usually, CA is followed by classification.
The following input parameters have to be provided (fig. 1):
Input particles should preferably be aligned and low-pass filtered (optionally)
Analysis type: CA, PCA or iterative PCA (IPCA). CA is the preferred method of finding inter-image variations. PCA computes the distance between data vectors with Euclidean distances, while CA uses Chi-squared distance. CA is superior here because it ignores differences in exposure between images, eliminating the need to rescale the image densities. For very large problems (the size of covariance matrix in the order of thousands) the methods used for CA and PCA analysis are slow and inaccurate, the system may fail on numerical accuracy or enter an endless loop. In these cases use 'Iterative PCA analysis' instead. This same strategy may be useful if you get an error message: * ERROR: DIAGONALIZATION FAILURE when using CORAN.
Additive constant is usually set to 0 so the pixel values are rescaled automatically if necessary.
Number of factors (eigenvectors) - depends on the size of the data set and what you are trying to analyse, but about 25 is normally enough.
Optionally you can provide a mask that covers the area of interest you want to analyse.
When the protocol is finished you may click on the Analyze Results button. Among numerous output files the main results are:
eigenimages (fig. 2, right). Qualitatively, eigenimages are the systematic variations of the input images. Think of expressing each image as the average plus some linear combination of the eigenimages. The best method to determine what eigenvectors are useful for further analysis, and which are from noise is to view a histogram showing the percentage of eigenvalue variance accounted for by each factor.
eigenvalue histogram (fig 2, left) - shows how much variation is accounted for each eigenfactor
factor maps - 2D plots of a pair of factors (e.g., 1 vs. 2) for each image. Once you know which eigenvectors have some meaning and which are from noise, you can display 2D factor maps of selected pairs of factors to visualize clustering (if any).
More details about analyzing results can be found here.