Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_PDB_DSSP failure with mmcif_chain_dict[chain_id] KeyError: 'E' #3763

Closed
emollier opened this issue Oct 13, 2021 · 4 comments · Fixed by #3775
Closed

test_PDB_DSSP failure with mmcif_chain_dict[chain_id] KeyError: 'E' #3763

emollier opened this issue Oct 13, 2021 · 4 comments · Fixed by #3775
Assignees

Comments

@emollier
Copy link
Contributor

Greetings,

When trying to upgrade the python-biopython package to version 1.79 in Debian Sid, I'm getting the following unit test error:

======================================================================
ERROR: test_dssp_with_mmcif_file_and_different_chain_ids (test_PDB_DSSP.DSSP_tool_test)
Test DSSP generation from MMCIF which has different label and author chain IDs.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/<<PKGBUILDDIR>>/.pybuild/cpython3_3.9/build/Tests/test_PDB_DSSP.py", line 125, in test_dssp_with_mmcif_file_and_different_chain_ids
    dssp = DSSP(model, pdbfile)
  File "/<<PKGBUILDDIR>>/.pybuild/cpython3_3.9/build/Bio/PDB/DSSP.py", line 476, in __init__
    chain_id = mmcif_chain_dict[chain_id]
KeyError: 'E'

It turns out that the KeyError occurs a couple of lines before the try: except KeyError: statement which, I suppose, is the object of the test. Note that this occurs with the patch mentionned in the issue #3433, in order to work around the issue caused by the new default file format, so there is a good chance this is a self inflicted breakage. If need be, you can refer to the full log on Salsa CI, on which I added a couple of print in the loop, in an attempt to understand what happens.

Core components versions are:

>>> import sys; print(sys.version)
3.9.7 (default, Sep 24 2021, 09:43:00)
>>> import platform; print(platform.python_implementation()); print(platform.platform())
CPython
Linux-5.14.0-1-amd64-x86_64-with-glibc2.32

Current versions of mkdssp and libcifpp available in Sid are:

$ mkdssp --version
mkdssp version 4.0.0 2020-12-18
$ dpkg -l | grep libcifpp
ii  libcifpp1  1.0.1-3  amd64  Library files for libcifpp

Kind Regards,
Étienne.

@peterjc
Copy link
Member

peterjc commented Oct 15, 2021

I think we can close this issue as a duplicate of #3433, not yet supporting DSSP v4. But the test failure is worth knowing about.

@emollier
Copy link
Contributor Author

Hi Peter,

Yes, I thought it would be better that you'd be aware of the issue. Please feel free to sort the bug entry in the way you think is the most sensible; I was a bit unsure of the channel to use to be honest.

For information, I uploaded to Experimental with the following workaround to skip the test, in case the mkdssp version is too recent:

--- python-biopython.orig/Tests/test_PDB_DSSP.py
+++ python-biopython/Tests/test_PDB_DSSP.py
@@ -117,8 +117,8 @@
 
     def test_dssp_with_mmcif_file_and_different_chain_ids(self):
         """Test DSSP generation from MMCIF which has different label and author chain IDs."""
-        if self.dssp_version < StrictVersion("2.2.0"):
-            self.skipTest("Test requires DSSP version 2.2.0 or greater")
+        if self.dssp_version < StrictVersion("2.2.0") or self.dssp_version >= StrictVersion("4.0.0"):
+            self.skipTest("Test requires DSSP version between 2.2.0 and 4.0.0.")
 
         pdbfile = "PDB/1A7G.cif"
         model = self.cifparser.get_structure("1A7G", pdbfile)[0]

As usual, Thanks for your time!

Have a nice day, :)
Étienne.

@cbalbin-bio
Copy link
Contributor

cbalbin-bio commented Oct 29, 2021

Hey,

I just wanted to mention something I noticed about running DSSP v4.0 on mmCIF files. The old version of DSSP used to use the label_asym_id field for the chain identifiers when running on mmCIF files. But it seems the new version uses auth_asym_id.

Overall I see this as an improvement.

But I mention this because the current DSSP parsing code detects if the input file is a mmCIF and then assumes the dssp chains are label_asym_id and maps them to auth_asym_id. See: f85e4d9

But now in v4.0 even when passing --output-format=dssp, the chain idents are already that of auth_asym_id. So I believe the current code would try to map the chains in the dssp output to auth_asym_id even though it is already that of auth_asym_id.

So it may need to detect what version of DSSP is running and disable the mapping if it is version 4.0 or greater. Otherwise it may 1) cause a mmcif_chain_dict[chain_id] KeyError and 2) falsely map across chain ident schemes. NOTE: I have not actually tried running the biopython DSSP parser on v4.0. Just bringing this up because I think its worth mentioning.

I really wish PDB would have stuck with a single convention for chain identifiers.

Attached are some example files where you can see the difference in the chain column between versions. Here is the PDB link for the example used.

dssp2.3_1nvx_cif.txt
dssp4_1nvx_cif.txt

Thanks,
Christian

Edit to tag @jgreener64 as he will probably know best how to handle this potential issue.

Thanks again !

@jgreener64
Copy link
Contributor

Good to know, should we move to support the new DSSP version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants