Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openfold/np/protein.py:to_pdb(): chain_tag sometimes not set #254

Open
flowers9 opened this issue Dec 23, 2022 · 3 comments
Open

openfold/np/protein.py:to_pdb(): chain_tag sometimes not set #254

flowers9 opened this issue Dec 23, 2022 · 3 comments

Comments

@flowers9
Copy link

I found what appears to be a rare case (once in millions of proteins) where the loop in to_pdb() sometimes fails to set chain_tag before closing the chain, causing an error:

Traceback (most recent call last):
  File "/pscratch/sd/f/flowers/esm/scripts/esmfold_inference.py", line 186, in <module>
    pdbs = model.output_to_pdb(output)
  File "/pscratch/sd/f/flowers/miniconda3/lib/python3.9/site-packages/esm/esmfold/v1/esmfold.py", line 303, in output_to_pdb
    return output_to_pdb(output)
  File "/pscratch/sd/f/flowers/miniconda3/lib/python3.9/site-packages/esm/esmfold/v1/misc.py", line 115, in output_to_pdb
    pdbs.append(to_pdb(pred))
  File "/pscratch/sd/f/flowers/miniconda3/lib/python3.9/site-packages/openfold/np/protein.py", line 373, in to_pdb
    f"{chain_tag:>1}{residue_index[i]:>4}"
UnboundLocalError: local variable 'chain_tag' referenced before assignment

It's possible esmfold was passing bad parameters, but adding a check to set chain_tag to "A" if not set allowed the code to run without errors.

The protein in question was

MAPVKVFGPAKSRNVARVLVCLEEVGAEYEVVDMDLKALEHKSPEHLARNPFGQTPAFQDGDLLLFESRAISRYVLRKYKTNQVDLLREGNLKEAAMVDVWTEVDAHTYNPAISPVVYECLINPLVLGIPTNQKVVDESLEKLKKALEVYEAHLSKDKYLAGDFMSFADINHFPHTCSFMAAPHAVLFDSYPYVKAWWERLMARPSIKKLSASLAPPKA*

And the tail of the output pdb (when run with the modified code) was:

ATOM 1736 CB ALA A 219 -14.556 -18.156 -6.584 1.00 83.46 C
ATOM 1737 O ALA A 219 -16.753 -18.815 -4.504 1.00 84.66 O
TER 1738 UNK A 220
PARENT N/A
TER 1739 ALA A 1
END

@gahdritz
Copy link
Collaborator

Hm peculiar. Could you share the modification you made?

@flowers9
Copy link
Author

flowers9 commented Jan 29, 2023

--- protein.py.orig	2023-01-28 22:31:40.566683304 -0800
+++ protein.py	2023-01-28 22:31:23.543314000 -0800
@@ -367,8 +367,10 @@
         if(should_terminate):
             # Close the chain.
             chain_end = "TER"
+            if atom_index == 1:
+                chain_tag = "A"
             chain_termination_line = (
                 f"{chain_end:<6}{atom_index:>5}      "
                 f"{res_1to3(aatype[i]):>3} "
                 f"{chain_tag:>1}{residue_index[i]:>4}"
            )

Just to prevent chain_tag from being undefined right there. I mean, you could for it being undefined, but it'll only happen it atom_index is 1, so.

@gahdritz
Copy link
Collaborator

It still looks like it's outputting an extra TER and PARENT line. I'll look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants