Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom template results in huge difference with alphafold #95

Closed
empyriumz opened this issue Apr 12, 2022 · 11 comments
Closed

Custom template results in huge difference with alphafold #95

empyriumz opened this issue Apr 12, 2022 · 11 comments

Comments

@empyriumz
Copy link

empyriumz commented Apr 12, 2022

Hi there,

Thanks a lot for your effort to implement trainable AlphaFold in PyTorch.

I came across an interesting paper claiming using templates built with the information from experimental cryo-EM density maps can improve the AlphaFold accuracy.

The authors provide a Colab notebook here. I tried the notebook, and it worked as intended.

As an example, the PDB entry 7KU7:
Input fasta sequence: PLREAKDLHTALHIGPRALSKACNISMQQAREVVQTCPHCNSAPALEAGVNPRGLGPLQIWQTDFTLEPRMAPRSWLAVTVDTASSAIVVTQHGRVTSVAVQHHWATAIAVLGRPKAIKTDNGSCFTSKSTREWLARWGIAHTTGIPGNSQGQAMVERANRLLKDKIRVLAEGDGFMKRIPTSKQGELLAKAMYALNHFERGENTKTPIQKHWRPTVLTEGPPVKIRIETGEWEKGWNVLVWGRGYAAVKNRDTDKVIWVPSRKVKPDITQKDEVTKK

I supplemented a custom template in CIF format:
https://drive.google.com/file/d/1DUN793nHr0aRRSp29_FwgTGUREwTHcfp/view?usp=sharing

By using this template and turning off the MSA (skip_all_msa == True, equivalent to using dummy MSA), the mean plddt score is about 90, which is higher than the case with MSA but no custom template.


When I tried to replicate the above procedure in OpenFold, however, it looked like the template didn't help. The mean plddt score was less than 40 for model_1 to 5.

To quickly reproduce the results,

  1. I make an empty directory as the path for the use_precomputed_alignments, which will lead the data pipeline to use the dummy MSA and an empty template.

  2. Then I load template features generated in the Colab notebook template_feature_7ku7.pkl (https://drive.google.com/file/d/1pnZ8pwQZTgcOsHTikQ6X7PQ1bqQs3tqt/view?usp=sharing)

import pickle
with open("template_feature_7ku7.pkl", "rb") as f:
    template_feature = pickle.load(f)
feature_dict = {**feature_dict, **template_feature}

The rest of the codes are left intact.
So, could you help me check if there is anything wrong with my approach, or is it due to something buggy with template associated codes within the OpenFold?
Thank you very much.

@gahdritz
Copy link
Collaborator

gahdritz commented Apr 12, 2022

Could you share your complete OpenFold script? Where exactly are you inserting this snippet? Additionally, are you using this Colab's version of AlphaFold to get the high pLDDT scores or are you using a similar hack in the official DeepMind version?

@empyriumz
Copy link
Author

Sure, you can find it here:
https://gist.github.com/empyriumz/4fddf49c3bf09ef7c0a11b9e2d453189
Two major differences:

  • template_featurizer = None (line 60)
    then manually load the template feature (line 119-121)

  • Create an empty folder ./alignments/7KU7 and run the script with
    --use_precomputed_alignments ./alignments
    For your reference, the input fasta is
    >7KU7 PLREAKDLHTALHIGPRALSKACNISMQQAREVVQTCPHCNSAPALEAGVNPRGLGPLQIWQTDFTLEPRMAPRSWLAVTVDTASSAIVVTQHGRVTSVAVQHHWATAIAVLGRPKAIKTDNGSCFTSKSTREWLARWGIAHTTGIPGNSQGQAMVERANRLLKDKIRVLAEGDGFMKRIPTSKQGELLAKAMYALNHFERGENTKTPIQKHWRPTVLTEGPPVKIRIETGEWEKGWNVLVWGRGYAAVKNRDTDKVIWVPSRKVKPDITQKDEVTKK

@gahdritz
Copy link
Collaborator

gahdritz commented Apr 12, 2022

Great thanks --- I was editing my question with a second question just as you responded. Did you use the official DeepMind AlphaFold to get those high pLDDT values or this third-party Colab?

@empyriumz
Copy link
Author

Oh I use this colab notebook:
https://colab.research.google.com/github/phenix-project/Colabs/blob/main/alphafold2/AlphaFold2.ipynb
which says

This notebook is derived from ColabFold and the DeepMind AlphaFold2 Colab.

I'm not sure how it's different with the official DeepMind AlphaFold. At least the template processing part is a bit different, that's why I dumped their processed template features and use it in OpenFold

@gahdritz
Copy link
Collaborator

gahdritz commented Apr 12, 2022

If you have time, could you try the same hack in the official DeepMind Colab? I'd do it myself but I'm rate limited on Colab ATM. You should be able to insert the template features between the MSA generation pane and the model inference pane.

@empyriumz
Copy link
Author

No problem, I'll see how the DeepMind Colab perform in this case.

@empyriumz
Copy link
Author

Hi @gahdritz
I just finished experiment on official AlphaFold Colab notebook by manually uploading all the input feature dictionary (dummy MSA + custom template).

The mean plddt was above 90 for the first 2 models, then I was cut off due to time limit.

@gahdritz
Copy link
Collaborator

OK I'll investigate this further.

@empyriumz
Copy link
Author

Thank you! Let me know if you need more information.

@gahdritz
Copy link
Collaborator

gahdritz commented Apr 19, 2022

Thanks for bringing this to our attention---this one was a doozy. There were ultimately several issues responsible for this discrepancy. First, there were a couple of bugs in the OpenFold template processing pipeline, which I've fixed in 591d10d. Second, OpenFold and AlphaFold differ slightly in the naming of the template atom mask feature. AlphaFold calls it template_all_atom_masks, while OpenFold drops the "s" and calls it template_all_atom_mask. As bad luck would have it, the pickled template feature dictionary you sent contains one of each, which obscures the problem.

After pulling the latest OpenFold commit and running:

template_feature["template_all_atom_mask"] = template_feature["template_all_atom_masks"]

on the unpickled template feature dict you sent earlier, repeating your experiment on the same protein gives an average pLDDT of ~91.21, almost identical to AlphaFold's.

@empyriumz
Copy link
Author

Thank you so much @gahdritz !
I'll pull the new version and test it myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants