Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDB parsing fails if the charge is specified as "+X" instead of "X+" #421

Closed
aurelg opened this issue Sep 7, 2022 · 0 comments · Fixed by #422
Closed

PDB parsing fails if the charge is specified as "+X" instead of "X+" #421

aurelg opened this issue Sep 7, 2022 · 0 comments · Fixed by #422

Comments

@aurelg
Copy link
Contributor

aurelg commented Sep 7, 2022

Hi!

Thanks for your great work! Here's a small bug report:

Context

I have a problem parsing PDB files that were apparently written with REFMAC 5.8.0258.

Symptom

After this file is loaded (as pdb_file), a call to .get_structure(altloc="all", extra_fields=["charge"]) fails.

Ie. the following code:

pdb.PDBFile.read("1atp_charged.pdb").get_structure(
    altloc="all",
    extra_fields=["charge"],
)

leads to the following error:

Traceback (most recent call last):
  File "/YYYYYYY", line 5, in <module>
    pdb.PDBFile.read("1atp_charged.pdb").get_structure(
  File "/XXXXXX/biotite/structure/io/pdb/file.py", line 400, in get_structure
    "charge", np.where(charge == "  ", "0", charge).astype(int)
ValueError: invalid literal for int() with base 10: '2+'

Root cause analysis

The PDB format specification explicitly state that Columns 79 - 80 indicate any charge on the atom, e.g., 2+, 1-. In most cases, these are blank..

Biotite consequently reverses the charge column before parsing in io/pdb/file.py:370:

charge_raw[i] = line[_charge][::-1]

Unfortunately, REFMAC (and possibly others?) seems to store charges as follows (notice the +2 instead of 2+, here):

HETATM 2937 MN    MN E 351      14.676   8.154  -3.730  1.00 23.67          MN+2
HETATM 2938 MN    MN E 352      13.555   6.903  -0.131  1.00 21.04          MN+2

This notation violates the PDB format specification and breaks parsing with biotite as the reversed charge columns 2+ cannot be converted to an integer, hence the stack trace above.

Possible mitigation and corresponding motivation

Adding support for non-standard PDB files might be beyond the scope of biotite. However, in this particular case, the fix should have no side effect and would allow biotite users to load their files.

The line io/pdb/file.py:370

charge_raw[i] = line[_charge][::-1]

Would have to be replaced by:

if line[_charge][0] in "+-":
  charge_raw[i] = line[_charge]
else:
  charge_raw[i] = line[_charge][::-1] # turn "1-" into "-1"

If you agree, I can make a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant