Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve structural alignment datastructures #126

Closed
sbliven opened this issue May 12, 2014 · 4 comments
Closed

Improve structural alignment datastructures #126

sbliven opened this issue May 12, 2014 · 4 comments
Labels
enhancement Improvement of existing code or method new feature New method or data structure
Milestone

Comments

@sbliven
Copy link
Member

sbliven commented May 12, 2014

BioJava contains a number of algorithms for aligning protein structures. In the most general case, an alignment consists of a mapping between residues of two (or more) proteins. However, for historical and performance reasons alignments are stored as linear, sorted arrays. This makes it difficult to express cases where the order of aligned residues differs between the two proteins. For instance, storing the following alignment requires some creative work-arounds:

 123456
 456123

Additionally, the class to store structural alignments (AFPChain) contains a number of unneccessary, poorly documented, or algorithm-specific parameters which should be removed or refactored.

AFPChain should be refactored into a new data structure that

  • Is flexible enough to store topology-independent alignments
  • Efficiently utilizes memory
  • Has good performance for common tasks
  • Works for rigid body, flexible, and local structural alignments
  • Maintains consistency when modified

Suggested for GSoC 2013

@willishf
Copy link
Contributor

Spencer

From a google summer of code project a couple years ago that Andreas and I
coordianted we have a very reasonable/modern sequence aligner. Details at
http://biojava.org/wiki/BioJava:CookBook#Pairwise_and_Multiple_Sequence_Alignment

The developer was a PhD student who is no longer active so this code needs
a new owner.

Scooter

On Mon, May 12, 2014 at 6:39 AM, Spencer Bliven notifications@github.comwrote:

BioJava contains a number of algorithms for aligning protein structures.
In the most general case, an alignment consists of a mapping between
residues of two (or more) proteins. However, for historical and performance
reasons alignments are stored as linear, sorted arrays. This makes it
difficult to express cases where the order of aligned residues differs
between the two proteins. For instance, storing the following alignment
requires some creative work-arounds:

123456
456123

Additionally, the class to store structural alignments (AFPChain) contains
a number of unneccessary, poorly documented, or algorithm-specific
parameters which should be removed or refactored.

AFPChain should be refactored into a new data structure that

  • Is flexible enough to store topology-independent alignments
  • Efficiently utilizes memory
  • Has good performance for common tasks
  • Works for rigid body, flexible, and local structural alignments
  • Maintains consistency when modified


Reply to this email directly or view it on GitHubhttps://github.com//issues/126
.

@andreasprlic
Copy link
Member

Spencer: what is the "creative work-around" that you took to store that alignment?

@sbliven
Copy link
Member Author

sbliven commented May 15, 2014

@willishf This issue is about structure alignment. However, I we don't have any order-independent sequence alignment algorithms either, and SequencePair wouldn't be able to store them if we did. So really the issue could be a feature request in both the structure and sequence spaces.

@andreasprlic Storing each side of the CP in a separate block in the AFPChain, then getting you to fix all the places in AFPWriter and other classes that assumed a sequential order between blocks. There are still places which assume sequential order within blocks for performance.

Basically, this issue is for a rewrite of AFPChain that I've long been thinking about. AFPChain was basically a bean for all the globals during the jFATCAT/jCE port, but now it is the core class for structure alignment. I would prefer a more conceptual data model of what constitutes an alignment. For the structure package this must include order-independent concepts to support CE-CP and derivative algorithms.

@lafita
Copy link
Member

lafita commented Feb 25, 2016

This should have been closed when the MultipleAlignment Data Structure (to store multiple structural alignments) was merged #278.

@lafita lafita closed this as completed Feb 25, 2016
kamildoleglo pushed a commit to kamildoleglo/biojava that referenced this issue Jul 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement of existing code or method new feature New method or data structure
Projects
None yet
Development

No branches or pull requests

4 participants