Custom template results in huge difference with alphafold #95

empyriumz · 2022-04-12T21:29:13Z

Hi there,

Thanks a lot for your effort to implement trainable AlphaFold in PyTorch.

I came across an interesting paper claiming using templates built with the information from experimental cryo-EM density maps can improve the AlphaFold accuracy.

The authors provide a Colab notebook here. I tried the notebook, and it worked as intended.

As an example, the PDB entry 7KU7:
Input fasta sequence: PLREAKDLHTALHIGPRALSKACNISMQQAREVVQTCPHCNSAPALEAGVNPRGLGPLQIWQTDFTLEPRMAPRSWLAVTVDTASSAIVVTQHGRVTSVAVQHHWATAIAVLGRPKAIKTDNGSCFTSKSTREWLARWGIAHTTGIPGNSQGQAMVERANRLLKDKIRVLAEGDGFMKRIPTSKQGELLAKAMYALNHFERGENTKTPIQKHWRPTVLTEGPPVKIRIETGEWEKGWNVLVWGRGYAAVKNRDTDKVIWVPSRKVKPDITQKDEVTKK

I supplemented a custom template in CIF format:
https://drive.google.com/file/d/1DUN793nHr0aRRSp29_FwgTGUREwTHcfp/view?usp=sharing

By using this template and turning off the MSA (skip_all_msa == True, equivalent to using dummy MSA), the mean plddt score is about 90, which is higher than the case with MSA but no custom template.

When I tried to replicate the above procedure in OpenFold, however, it looked like the template didn't help. The mean plddt score was less than 40 for model_1 to 5.

To quickly reproduce the results,

I make an empty directory as the path for the use_precomputed_alignments, which will lead the data pipeline to use the dummy MSA and an empty template.
Then I load template features generated in the Colab notebook template_feature_7ku7.pkl (https://drive.google.com/file/d/1pnZ8pwQZTgcOsHTikQ6X7PQ1bqQs3tqt/view?usp=sharing)

import pickle
with open("template_feature_7ku7.pkl", "rb") as f:
    template_feature = pickle.load(f)
feature_dict = {**feature_dict, **template_feature}

The rest of the codes are left intact.
So, could you help me check if there is anything wrong with my approach, or is it due to something buggy with template associated codes within the OpenFold?
Thank you very much.

The text was updated successfully, but these errors were encountered:

gahdritz · 2022-04-12T22:36:11Z

Could you share your complete OpenFold script? Where exactly are you inserting this snippet? Additionally, are you using this Colab's version of AlphaFold to get the high pLDDT scores or are you using a similar hack in the official DeepMind version?

empyriumz · 2022-04-12T22:54:07Z

Sure, you can find it here:
https://gist.github.com/empyriumz/4fddf49c3bf09ef7c0a11b9e2d453189
Two major differences:

template_featurizer = None (line 60)
then manually load the template feature (line 119-121)
Create an empty folder ./alignments/7KU7 and run the script with
--use_precomputed_alignments ./alignments
For your reference, the input fasta is
>7KU7 PLREAKDLHTALHIGPRALSKACNISMQQAREVVQTCPHCNSAPALEAGVNPRGLGPLQIWQTDFTLEPRMAPRSWLAVTVDTASSAIVVTQHGRVTSVAVQHHWATAIAVLGRPKAIKTDNGSCFTSKSTREWLARWGIAHTTGIPGNSQGQAMVERANRLLKDKIRVLAEGDGFMKRIPTSKQGELLAKAMYALNHFERGENTKTPIQKHWRPTVLTEGPPVKIRIETGEWEKGWNVLVWGRGYAAVKNRDTDKVIWVPSRKVKPDITQKDEVTKK

gahdritz · 2022-04-12T22:54:55Z

Great thanks --- I was editing my question with a second question just as you responded. Did you use the official DeepMind AlphaFold to get those high pLDDT values or this third-party Colab?

empyriumz · 2022-04-12T23:00:35Z

Oh I use this colab notebook:
https://colab.research.google.com/github/phenix-project/Colabs/blob/main/alphafold2/AlphaFold2.ipynb
which says

This notebook is derived from ColabFold and the DeepMind AlphaFold2 Colab.

I'm not sure how it's different with the official DeepMind AlphaFold. At least the template processing part is a bit different, that's why I dumped their processed template features and use it in OpenFold

gahdritz · 2022-04-12T23:02:52Z

If you have time, could you try the same hack in the official DeepMind Colab? I'd do it myself but I'm rate limited on Colab ATM. You should be able to insert the template features between the MSA generation pane and the model inference pane.

empyriumz · 2022-04-12T23:08:25Z

No problem, I'll see how the DeepMind Colab perform in this case.

empyriumz · 2022-04-13T17:45:02Z

Hi @gahdritz
I just finished experiment on official AlphaFold Colab notebook by manually uploading all the input feature dictionary (dummy MSA + custom template).

The mean plddt was above 90 for the first 2 models, then I was cut off due to time limit.

gahdritz · 2022-04-13T17:45:51Z

OK I'll investigate this further.

empyriumz · 2022-04-13T17:47:15Z

Thank you! Let me know if you need more information.

gahdritz · 2022-04-19T20:55:33Z

Thanks for bringing this to our attention---this one was a doozy. There were ultimately several issues responsible for this discrepancy. First, there were a couple of bugs in the OpenFold template processing pipeline, which I've fixed in 591d10d. Second, OpenFold and AlphaFold differ slightly in the naming of the template atom mask feature. AlphaFold calls it template_all_atom_masks, while OpenFold drops the "s" and calls it template_all_atom_mask. As bad luck would have it, the pickled template feature dictionary you sent contains one of each, which obscures the problem.

After pulling the latest OpenFold commit and running:

template_feature["template_all_atom_mask"] = template_feature["template_all_atom_masks"]

on the unpickled template feature dict you sent earlier, repeating your experiment on the same protein gives an average pLDDT of ~91.21, almost identical to AlphaFold's.

empyriumz · 2022-04-19T21:01:34Z

Thank you so much @gahdritz !
I'll pull the new version and test it myself.

gahdritz closed this as completed Apr 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom template results in huge difference with alphafold #95

Custom template results in huge difference with alphafold #95

empyriumz commented Apr 12, 2022 •

edited

gahdritz commented Apr 12, 2022 •

edited

empyriumz commented Apr 12, 2022

gahdritz commented Apr 12, 2022 •

edited

empyriumz commented Apr 12, 2022

gahdritz commented Apr 12, 2022 •

edited

empyriumz commented Apr 12, 2022

empyriumz commented Apr 13, 2022

gahdritz commented Apr 13, 2022

empyriumz commented Apr 13, 2022

gahdritz commented Apr 19, 2022 •

edited

empyriumz commented Apr 19, 2022

Custom template results in huge difference with alphafold #95

Custom template results in huge difference with alphafold #95

Comments

empyriumz commented Apr 12, 2022 • edited

gahdritz commented Apr 12, 2022 • edited

empyriumz commented Apr 12, 2022

gahdritz commented Apr 12, 2022 • edited

empyriumz commented Apr 12, 2022

gahdritz commented Apr 12, 2022 • edited

empyriumz commented Apr 12, 2022

empyriumz commented Apr 13, 2022

gahdritz commented Apr 13, 2022

empyriumz commented Apr 13, 2022

gahdritz commented Apr 19, 2022 • edited

empyriumz commented Apr 19, 2022

empyriumz commented Apr 12, 2022 •

edited

gahdritz commented Apr 12, 2022 •

edited

gahdritz commented Apr 12, 2022 •

edited

gahdritz commented Apr 12, 2022 •

edited

gahdritz commented Apr 19, 2022 •

edited