Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for DSSP v4.0 #3775

Merged
merged 6 commits into from
Aug 7, 2022
Merged

Conversation

cbalbin-bio
Copy link
Contributor

  • I hereby agree to dual licence this and any previous contributions under both
    the Biopython License Agreement AND the BSD 3-Clause License.

  • I have read the CONTRIBUTING.rst file, have run pre-commit locally,
    and understand that AppVeyor and TravisCI will be used to confirm the Biopython unit
    tests and style checks pass with these changes.

  • I have added my name to the alphabetical contributors listings in the files
    NEWS.rst and CONTRIB.rst as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

Closes #3763 #3433

This is a patch to the current DSSP parser to support DSSP v4.0 . The DSSP parser currently only handles up to v3.x.x. I think supporting v4.0 is important as that is what is install by default when using apt on Ubuntu 21.04 or greater. Currently it is not mentioned in the documentation that v4.0 is not supported and this could be undesirable to some users.

DSSP v4.0 shifts to a mmCIF format by default. You can output a DSSP v3.0 style report using the flag --output-format=dssp. While I am a big fan of mmCIFs gain in popularity. DSSP 4.0 s mmCIF out is missing some critical items such as accessibility; see comment.
Thus I think it is better to stick with the classic DSSP format output.

DSSP v4.0 also now uses auth_asym_id for the chain IDs instead of label_asym_id when ran on mmCIF files see comment. For older versions of DSSP, a mapping had to be done from the label to auth chain ID f85e4d9. This is now disabled when a DSSP v4.0 or higher executable is detected.

In order to enable the expected experience for the most users, whether using PDB or mmCIF format files, or DSSP v3 vs v4 I added a version check to the initializer of the DSSP class. This check uses code similar to that found the the DSSP unit test. If it is found that the executable has a version greater than or equal to '4.0.0' the flag --output-format=dssp is added to the dssp command. Also if the user is found to be using v4.0.0 or greater with an mmCIF file. The label to auth chain ID mapping is disabled as the results already contain the auth chain ID instead.

Thanks for your time,

Christian

@cbalbin-bio cbalbin-bio changed the title Dssp updates Add Support for DSSP v4.0 Oct 29, 2021
@codecov
Copy link

codecov bot commented Oct 29, 2021

Codecov Report

Merging #3775 (bbb114b) into master (c6da294) will increase coverage by 82.11%.
The diff coverage is 27.77%.

❗ Current head bbb114b differs from pull request most recent head b43f4b5. Consider uploading reports for the commit b43f4b5 to get more accurate results

@@             Coverage Diff             @@
##           master    #3775       +/-   ##
===========================================
+ Coverage        0   82.11%   +82.11%     
===========================================
  Files           0      298      +298     
  Lines           0    49955    +49955     
===========================================
+ Hits            0    41023    +41023     
- Misses          0     8932     +8932     
Impacted Files Coverage Δ
Bio/PDB/DSSP.py 53.17% <27.77%> (ø)
Bio/SeqIO/__init__.py 84.66% <0.00%> (ø)
Bio/Align/Applications/_ClustalOmega.py 100.00% <0.00%> (ø)
Bio/ExPASy/ScanProsite.py 30.15% <0.00%> (ø)
Bio/SVDSuperimposer/__init__.py 86.95% <0.00%> (ø)
Bio/AlignIO/Interfaces.py 90.90% <0.00%> (ø)
Bio/Align/nexus.py 76.31% <0.00%> (ø)
Bio/phenotype/pm_fitting.py 12.28% <0.00%> (ø)
Bio/Blast/NCBIWWW.py 77.17% <0.00%> (ø)
Bio/GenBank/Record.py 91.96% <0.00%> (ø)
... and 289 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

@rob-miller
Copy link
Contributor

This is working well for me (once I read the docs on how to skip the file type test) and I support merging.

May I suggest changing make_dssp_dict() to use as_handle() instead of open():

from Bio.File import as_handle
...
def make_dssp_dict(filename):
    with as_handle(filename, "r") as handle:
        return _make_dssp_dict(handle)

This maintains current functionality and further enables the user to get the dssp output from their own subprocess if/when the one in this code fails again.

Also

    dssp : string
        The dssp executable (ie. the argument to subprocess)

might be better described as something like

    dssp : string
        The name of the dssp executable ('dssp' or 'mkdssp', ie. the argument to subprocess)

@rob-miller
Copy link
Contributor

Just to note, I've run this on 6,145 mmCIF chains from a current Dunbrack PISCES nonredundant list, and the only issue is an alignment error on PDB code 3sqz chain A due to Cys 111 being a non-standard residue. The PDB format file for this chain is handled correctly.

@JoaoRodrigues
Copy link
Member

Going back to this, thanks for the ping @rob-miller. I downloaded DSSP versions 3.x.x and 4.x.x (from their github repos) and installed them locally. I am having an issue with reading mmCIF files with version 4.x.x. The tests will pass on version 3.x.x but version 4.x.x produces an empty dssp_dict file. I am looking into it, but it might be related with issues with label_ and auth_ fields in the cif file.

@JoaoRodrigues
Copy link
Member

Uh oh, sorry... my git pull origin master created this mess!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_PDB_DSSP failure with mmcif_chain_dict[chain_id] KeyError: 'E'
3 participants