Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DsspApp unable to work on multicharacter chain identifiers #264

Closed
saiden89 opened this issue Dec 9, 2020 · 7 comments · Fixed by #265
Closed

DsspApp unable to work on multicharacter chain identifiers #264

saiden89 opened this issue Dec 9, 2020 · 7 comments · Fixed by #265

Comments

@saiden89
Copy link

saiden89 commented Dec 9, 2020

The issue

When using DsspApp.annotate_sse on an extracted chain, if the chain has more than one character the program raises an error.

Expected behaviour

annotate_sse should work no matter what the chain id.

Steps to reproduce

Minimal code to reproduce

from tempfile import gettempdir
import biotite.structure as struc
import biotite.structure.io.mmtf as mmtf
import biotite.database.rcsb as rcsb
import biotite.application.dssp as dssp


file_name = rcsb.fetch("4YBB", "mmtf", gettempdir())
mmtf_file = mmtf.MMTFFile.read(file_name)
array = mmtf.get_structure(mmtf_file, model=1)
tk_dimer = array[struc.filter_amino_acids(array)]
bb_chain= tk_dimer[tk_dimer.chain_id == "BB"]
sse = dssp.DsspApp.annotate_sse(bb_chain)

Actual behaviour

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "biotite/application/dssp/app.py", line 130, in annotate_sse
    app.join()
  File "biotite/application/application.py", line 58, in wrapper
    return func(*args, **kwargs)
  File "biotite/application/application.py", line 153, in join
    self.evaluate()
  File "biotite/application/dssp/app.py", line 71, in evaluate
    super().evaluate()
  File "biotite/application/localapp.py", line 236, in evaluate
    raise SubprocessError(
subprocess.SubprocessError: 'mkdssp' returned with exit code 1: DSSP could not be created due to an error: bad lexical cast: source type value could not be interpreted as target
@padix-key
Copy link
Member

padix-key commented Dec 9, 2020

Thank you for the report. I edited your code snippet: I think you meant tk_dimer instead of filtered_amino, as otherwise I get a NameError. After the correction I was able to reproduce the reported error.

@saiden89
Copy link
Author

saiden89 commented Dec 9, 2020

Yes sorry, it is because i copied it from your page "Four ways to get the secondary structure of a protein" but using another PDB entry instead.

@padix-key
Copy link
Member

padix-key commented Dec 9, 2020

I think the problem is that the mkdssp software takes PDB files as input, which do not support multi-character chain IDs. However, mkdssp seems also to support .cif files, acoording to the help message of the program. So I think, there are two issues to solve:

  • Raise an error when a PDB file is created with multi-character chain ID
  • Use mmCIF files for mkdssp input

However, I just tried to give mkdssp bb_chain as mmCIF file and the program returned the same error, even after changing the chain ID from BB to B. Hence, I think that mkdssp requires more information from the mmCIF file than the atom_site category.

@saiden89
Copy link
Author

saiden89 commented Dec 9, 2020

I did some testing, and it seems that mkdssp v3.0.0 is able to process multicharacter chains with just the atom_site category, provided the extension is .cif. If, for example the extension is .pdbx, mkdssp refuses to work giving the same error.
My file 4YBB.cif:

data_4YBB 
# 
loop_
_atom_site.group_PDB 
_atom_site.id 
_atom_site.type_symbol 
_atom_site.label_atom_id 
_atom_site.label_alt_id 
_atom_site.label_comp_id 
_atom_site.label_asym_id 
_atom_site.label_entity_id 
_atom_site.label_seq_id 
_atom_site.pdbx_PDB_ins_code 
_atom_site.Cartn_x 
_atom_site.Cartn_y 
_atom_site.Cartn_z 
_atom_site.occupancy 
_atom_site.B_iso_or_equiv 
_atom_site.pdbx_formal_charge 
_atom_site.auth_seq_id 
_atom_site.auth_comp_id 
_atom_site.auth_asym_id 
_atom_site.auth_atom_id 
_atom_site.pdbx_PDB_model_num 
ATOM   84609  N  N     . VAL W   2  1    ? -21.193  87.636   15.757   1.00 69.69  ?  4    VAL BB N     1 
...
ATOM   86361  N  NE2   . GLN W   2  224  ? -30.148  73.033   12.778   1.00 79.30  ?  227  GLN BB NE2   1 
#

Produces the correct output. The same content but in a file named 4YBB.pdbx gives:

bad lexical cast: source type value could not be interpreted as target

@padix-key
Copy link
Member

I identified the problem with my test: mkdssp seems to require also the occupancy, pdbx_formal_charge and B_iso_or_equiv fields. In the upcoming PR I simply set default values for these fields, if the input AtomArray misses these annotations.

@padix-key
Copy link
Member

This issue should be fixed now

@saiden89
Copy link
Author

Confirmed. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants