DsspApp unable to work on multicharacter chain identifiers #264

saiden89 · 2020-12-09T15:10:55Z

The issue

When using DsspApp.annotate_sse on an extracted chain, if the chain has more than one character the program raises an error.

Expected behaviour

annotate_sse should work no matter what the chain id.

Steps to reproduce

Minimal code to reproduce

from tempfile import gettempdir
import biotite.structure as struc
import biotite.structure.io.mmtf as mmtf
import biotite.database.rcsb as rcsb
import biotite.application.dssp as dssp


file_name = rcsb.fetch("4YBB", "mmtf", gettempdir())
mmtf_file = mmtf.MMTFFile.read(file_name)
array = mmtf.get_structure(mmtf_file, model=1)
tk_dimer = array[struc.filter_amino_acids(array)]
bb_chain= tk_dimer[tk_dimer.chain_id == "BB"]
sse = dssp.DsspApp.annotate_sse(bb_chain)

Actual behaviour

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "biotite/application/dssp/app.py", line 130, in annotate_sse
    app.join()
  File "biotite/application/application.py", line 58, in wrapper
    return func(*args, **kwargs)
  File "biotite/application/application.py", line 153, in join
    self.evaluate()
  File "biotite/application/dssp/app.py", line 71, in evaluate
    super().evaluate()
  File "biotite/application/localapp.py", line 236, in evaluate
    raise SubprocessError(
subprocess.SubprocessError: 'mkdssp' returned with exit code 1: DSSP could not be created due to an error: bad lexical cast: source type value could not be interpreted as target

The text was updated successfully, but these errors were encountered:

padix-key · 2020-12-09T15:37:26Z

Thank you for the report. I edited your code snippet: I think you meant tk_dimer instead of filtered_amino, as otherwise I get a NameError. After the correction I was able to reproduce the reported error.

saiden89 · 2020-12-09T15:39:53Z

Yes sorry, it is because i copied it from your page "Four ways to get the secondary structure of a protein" but using another PDB entry instead.

padix-key · 2020-12-09T15:56:21Z

I think the problem is that the mkdssp software takes PDB files as input, which do not support multi-character chain IDs. However, mkdssp seems also to support .cif files, acoording to the help message of the program. So I think, there are two issues to solve:

Raise an error when a PDB file is created with multi-character chain ID
Use mmCIF files for mkdssp input

However, I just tried to give mkdssp bb_chain as mmCIF file and the program returned the same error, even after changing the chain ID from BB to B. Hence, I think that mkdssp requires more information from the mmCIF file than the atom_site category.

saiden89 · 2020-12-09T18:25:45Z

I did some testing, and it seems that mkdssp v3.0.0 is able to process multicharacter chains with just the atom_site category, provided the extension is .cif. If, for example the extension is .pdbx, mkdssp refuses to work giving the same error.
My file 4YBB.cif:

data_4YBB 
# 
loop_
_atom_site.group_PDB 
_atom_site.id 
_atom_site.type_symbol 
_atom_site.label_atom_id 
_atom_site.label_alt_id 
_atom_site.label_comp_id 
_atom_site.label_asym_id 
_atom_site.label_entity_id 
_atom_site.label_seq_id 
_atom_site.pdbx_PDB_ins_code 
_atom_site.Cartn_x 
_atom_site.Cartn_y 
_atom_site.Cartn_z 
_atom_site.occupancy 
_atom_site.B_iso_or_equiv 
_atom_site.pdbx_formal_charge 
_atom_site.auth_seq_id 
_atom_site.auth_comp_id 
_atom_site.auth_asym_id 
_atom_site.auth_atom_id 
_atom_site.pdbx_PDB_model_num 
ATOM   84609  N  N     . VAL W   2  1    ? -21.193  87.636   15.757   1.00 69.69  ?  4    VAL BB N     1 
...
ATOM   86361  N  NE2   . GLN W   2  224  ? -30.148  73.033   12.778   1.00 79.30  ?  227  GLN BB NE2   1 
#

Produces the correct output. The same content but in a file named 4YBB.pdbx gives:

bad lexical cast: source type value could not be interpreted as target

padix-key · 2020-12-10T08:57:58Z

I identified the problem with my test: mkdssp seems to require also the occupancy, pdbx_formal_charge and B_iso_or_equiv fields. In the upcoming PR I simply set default values for these fields, if the input AtomArray misses these annotations.

Fix #264

padix-key · 2020-12-15T14:33:07Z

This issue should be fixed now

saiden89 · 2020-12-15T15:09:38Z

Confirmed. Thank you very much!

padix-key mentioned this issue Dec 10, 2020

Fix #264 #265

Merged

padix-key closed this as completed in #265 Dec 15, 2020

padix-key added a commit that referenced this issue Dec 15, 2020

Merge pull request #265 from padix-key/issue-264

4a609e7

Fix #264

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DsspApp unable to work on multicharacter chain identifiers #264

DsspApp unable to work on multicharacter chain identifiers #264

saiden89 commented Dec 9, 2020 •

edited by padix-key

padix-key commented Dec 9, 2020 •

edited

saiden89 commented Dec 9, 2020

padix-key commented Dec 9, 2020 •

edited

saiden89 commented Dec 9, 2020

padix-key commented Dec 10, 2020

padix-key commented Dec 15, 2020

saiden89 commented Dec 15, 2020

DsspApp unable to work on multicharacter chain identifiers #264

DsspApp unable to work on multicharacter chain identifiers #264

Comments

saiden89 commented Dec 9, 2020 • edited by padix-key

The issue

Expected behaviour

Steps to reproduce

Actual behaviour

padix-key commented Dec 9, 2020 • edited

saiden89 commented Dec 9, 2020

padix-key commented Dec 9, 2020 • edited

saiden89 commented Dec 9, 2020

padix-key commented Dec 10, 2020

padix-key commented Dec 15, 2020

saiden89 commented Dec 15, 2020

saiden89 commented Dec 9, 2020 •

edited by padix-key

padix-key commented Dec 9, 2020 •

edited

padix-key commented Dec 9, 2020 •

edited