Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Structure Alignment Datastructures #277

Closed
wants to merge 56 commits into from

Conversation

sbliven
Copy link
Member

@sbliven sbliven commented Jun 5, 2015

Introduces new data structures for structure alignments, created along with @lafita. The data structure can represent standard pairwise alignments, but also multiple alignments, flexible alignments, and non-topological alignments (#126).

The structure consists of a hierarchy of objects:

  1. MultipleAlignmentEnsemble A collection of alignments over a set of structures
  2. MultipleAlignment A single alignment
  3. BlockSet A portion of the alignment with a single rigid superposition
  4. Block A portion of the alignment with preserved sequence order. Stores the actual aligned residues for each position (which may be gaps).

Some documentation still needs to be written and will be added to the cookbook.

A few other design decisions bear mention:

  • The ensemble stores references to the aligned atom arrays.
  • In the normal case, atom arrays can be regenerated from the structure names list, which is currently of type String but will change to StructureIdentifier following the completion of Make loading of structures more consistent #81.
  • All levels of the hierarchy can serve as a cache for various scores (e.g. RMSD, TM-Score, etc), but such scores are not standardized and should be recalculated when needed by client code
  • The superposition matrices are now stored using vecmath Matrix4d objects. To support flexible alignments, the definitive matrices are stored in each BlockSet. However, a default matrix can be stored in MultipleAlignment to save memory for rigid alignments.
  • AFPChain can be converted directly to MultipleAlignmentEnsemble

This pull request also bundles concurrent development of:

  • GUI improvements
  • Ghe creation of a monte-carlo based optimization strategy for refining structural refinements
  • A new AtomCache.getRepresentativeAtoms() method (that should replace getAtoms() everywhere)

etc. etc. !

This is a fairly major feature addition, so I'll leave this request open for a few days to allow comments.

lafita and others added 30 commits April 20, 2015 18:10
The core data structures for the Multiple Alignment object have been
created: MultipleAlignment, BlockSet, Block, Pose.
The distanceMatrix is renamed to distanceTables to match with the
AFPChain nomenclature. The description of replaceOptAln has also been
changed to be more general.
The pose contains the translation and the rotationMatrix as information
of the 3D transformation of the proteins. A Demo for the display of the
multiple alignment has been created.
In order to generalize the 3D GUI features of the Structure Alignment
and implement a Multiple Alignment GUI for the new MultipleAlignment
object.
The multiple alignments can be visualized through the
MultipleAlignmentJmol class, adapted from the StructureAlignmentJmol.
The coloring of the different blocks and the alignment menus are still
not implemented.
Gaps are described by null values in the Blocks of the
MultipleAlignment. Now the Jmol class accounts for these gaps and does
not color them.
from the Pose class, because it is a static variable that does not
depend on the specific BlockSet. It only stores the intra-residue
distances of every protein.
The wrong line was commented out, so the molecule was not colored.
Adapted the display method in StructureAlignmentDisplay to rotate and
display in Jmol the atoms of a MultipleAlignment.
Minor changes to respond to TODOs
Interfaces for the classes Block, Pose and BlockSet have been created to
generalize and document all the methods needed for a MultipleAlignment
object.
The interfaces have been implemented again and the Jmol display also
works for the new MultipleAlignment DS composition.
Add some methods to calculate internal variables (update), and moved the
cache variables (RMSD, TM-score, similarity, coverage) from the
MultipleAlignment to these two classes.
Another layer in the OO data structure has been added to allow returning
alternative alignments. An ensemble of MSTA is a collection of
MultipleAlignment objects. Another change has been the addition of two
different implementations of Pose, one to determine global
superimpositions and another to determine flexible part
superimpositions.
When an object is created with the constructor and its parent is set,
the parent also gets a link to the object automatically.
The Ensemble can calculate the distance Matrices for every structure in
the updateDistanceMatrix() method. Automatic cross-references added to
the setParent() methods, for consistency.
All pairwise structural comparisons are evaluated to build the
background distance Matrices. Atoms can be rotated from Pose as well.
A new Pose abstract implementation has been created that calculates the
TMscore and RMSD of the alignment. The name of AlignmentJmol has been
changed to AbstractAlignmentJmol to be clear that is an abstract class.
A constructor for a new MultipleAlignment can be used from an AFPChain.
It creats an equivalent alignment object, for backwards compatibility.
The clone methods now entirely change the links between the cloned and
the original objects so that no cross-links occur.
An initial implementation of the CEMC algorithm for multiple structure
alignment has been created. Now a seed MultipleAlignment can be created
with a parallel pairwise all-to-all alignment. The MC optimization is
still not implemented. A demo is available under the structure-gui
package.
In the transition to replace AFPChain with the MultipleAlignment class.
A core structure for the CEMC algorithm has also been created.
lafita and others added 26 commits May 6, 2015 12:24
Only the CA atoms were rotated before. Now the whole structure is
rotated. The Atoms are now a cache variable of the alignment, the real
identifiers are the structureNames. The methods downlad the structures
from the identifiers if the atoms are not present in the alignment.
The first class was only used in DB search and was very specific. Now it
is general enough to allow any threaded pairwise alignment calculation.
The central structure identification is not the atomArrays, but the
structureNames (from where the arrays can be recovered if they are not
present)
Added more families and examples to the DemoCEMC
The calculation of the angle was not possible because cos(theta) was out
of range [-1,1].
Multiple alignments can be performed, but no gaps or circular
permutations are handled yet.
With the idea that using vecmath more consistently throughout will increase
performance, but that Atom/JAMA-based code will stick around for a while.
Interfaces no longer inherit from Cloneable, so implementations should
flag themselves specifically.
- Added ScoresCache to all levels of the heirarchy, which allows
  algorithm-specific scores to be added and retrieved. Replaces
  several methods for individual scores.
- Removed update methods from the interface
- Removed Pose in favor of raw vecmath transformation matrices
Adapt the alignment panel to work with the MultipleAlignment DS. The
changes still don't work, waiting for the last changes in the DS.
This should be the preferred way of fetching CA atoms
Only the connection between the panel and the jmol has to be
implemented.
- Setters now only modify downstream parts of the hierarchy. For instance, calling
  MultipleAlignment.setEnsemble() changes the alignment links without touching
  either new or old ensemble links, but MultipleAlignment.setBlockSet() does
  modify the alignment link for each block set.
- Added clear() methods for resetting cached variables
- Added MultipleAlignmentEnsemble.addMultipleAlignment() method, rather
  than modifying the underlying list directly.
- Improve documentation somewhat
- Fix infinite loop in toString methods
Conflicts:
	biojava-structure/src/main/java/org/biojava/nbio/structure/align/model/MultipleAlignmentImpl.java
@sbliven
Copy link
Member Author

sbliven commented Jun 5, 2015

Oops, this can go in 4.1

@sbliven sbliven closed this Jun 5, 2015
@sbliven
Copy link
Member Author

sbliven commented Jun 5, 2015

Replaced by #278

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants