-
Notifications
You must be signed in to change notification settings - Fork 1.9k
internal-coords vector assembly update #3774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3774 +/- ##
==========================================
- Coverage 84.05% 78.82% -5.24%
==========================================
Files 321 301 -20
Lines 54093 51749 -2344
==========================================
- Hits 45470 40790 -4680
- Misses 8623 10959 +2336
Continue to review full report at Codecov.
|
|
New bpbp-gist.py sample code for exercising features in this PR, plus the two example scripts above, uploaded to rtm-biopython-scripts. |
|
Closes #3793 |
d13bc43 to
5dd44fb
Compare
|
Working on improving test coverage and hoping to resolve #3802 |
b5ec61f to
eaf3794
Compare
…key(), rm L12, L23 refs
eaf3794 to
3301155
Compare
JoaoRodrigues
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this took so long @rob-miller - work+life got in the way. 👍 for me to merge. It would be really nice at some point to have a doc showcasing how useful this code can be for the users!
PR overhauls the vector assembly code and makes heavier use of numpy for fast vectorized operations.
PR overhauls the vector assembly code and makes heavier use of numpy for fast vectorized operations.
I hereby agree to dual licence this and any previous contributions under both
the Biopython License Agreement AND the BSD 3-Clause License.
I have read the
CONTRIBUTING.rstfile, have runpre-commitlocally,and understand that AppVeyor and TravisCI will be used to confirm the Biopython unit
tests and style checks pass with these changes.
I have added my name to the alphabetical contributors listings in the files
NEWS.rstandCONTRIB.rstas part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
This is an update / upgrade to the Bio/PDB/internal_coords module, primarily around 'vectorizing' (NumPy-izing') the generation of atom coordinates from bond lengths, angles and dihedral angles.
Will come up with News entry when happy to merge; last time forgot to mention 3D printing aspect.
Primary improvements / additions:
assembly (internal_to_atom_coordinates()) is now ‘vectorized’, as in the steps are processed as NumPy commands on arrays of data as the coordinates become available. Assembly is still dependent on preceding regions of a chain, so for example both the side chain and backbone can be assembled in the same numpy steps once the shared initial coordinates are set. Chains over 100 residues assemble in about 1/2 the time as for the serial algorithm, while the speedup for shorter chains (e.g. 1crn, 46 residues) is insignificant. For more specific cases like changing all side chain chi1 angles on a fixed backbone, the speedup is much greater as all sidechains can be processed in parallel. The serial algorithm can still be optionally activated, and is triggered when start and end positions are specified for assembly. The reverse procedure, atom_to_internal_coordinates(), was vectorized in the initial release.
AtomArray: To support the vectorized assembly process, all atom coordinates are relocated into one numpy array and the coordinates in the Biopython Atom objects become views into this larger array. (This was discussed on the Biopython mailing list in Oct, 2019 as "Overhauling of Bio.PDB module" but the result here just came about as a result of vectorizing the assembly step.) The atomArray is created and made available when internal coordinates are calculated (atom_to_internal_cooordinates()). Subsets of this array can be accessed using AtomKey objects, and generating a 2D distance plot is a single (admittedly arcane) line of numpy code.
Both the assembly and internal coordinate calculation algorithms are based on computing coordinate spaces for relevant triples (hedra) of atoms. The coordinate transform matrices to and from these spaces for every dihedral are available for use, for example to compare residue environments or pairwise interactions. The steps are essentially:
chi1 = ric0.pick_angle("chi1") # chi1 space defined with CA at origin
cst = np.transpose(chi1.cst) # transform TO chi1 space
newAtomCoords = oldAomCoords.dot(cst)
See attached sample code presenting phe-phe pair interactions as PDB files for inspection (proof of concept only, not carefully selected interacting pairs). Note this should/will use PR #3676 if/when it is approved but for now there is a redundant pdb_residue_string() routine (and supporting data structures) supplied with the internal_coordinates module.
bugfixes / smaller things:
ca-plot-7rsa.py.txt
phe-pairs-2.py.txt