Protein Embedding with last activation layers? #15

victormaricato · 2021-07-16T22:07:12Z

Is it possible to obtain the last activation values using AlphaFold?

Something like ESM allows with the model.forward method.

The text was updated successfully, but these errors were encountered:

ptynecki · 2021-07-19T14:47:46Z

I would precise the question:

How can we execute AF2 pipeline to get fixed-length numeric vector which will represent single AA sequence?
If it is possible, should we expect that AA sequence length have to be no longer than 512, 1280 or any other limits?

russbates · 2021-07-20T08:11:52Z

Hi,
Although the ability to return the final representations/embeddings is not currently exposed in the RunModel container, it should be possible to enable it by adding a return_representations=True key-word argument here:
https://github.com/deepmind/alphafold/blob/d26287ea57e1c5a71372f42bf16f486bb9203068/alphafold/model/model.py#L64

xinformatics · 2021-07-21T21:57:56Z

I didn't run the actual model but I was using the jupyter notebook provided by @sokrypton. He suggested to edit class AlphaFold (located inalphafold/model/modules.py ) set return_representations=True.

In the jupyter notebook he provided,

prediction_result = model_runner.predict(processed_feature_dict)

gives 'prediction_result' as a dictionary with a key as 'representations'

prediction_result.keys()
dict_keys(['distogram', 'experimentally_resolved', 'masked_msa', 'predicted_lddt', 'representations', 'structure_module', 'plddt'])

this returns a nested dictionary and then

prediction_result['representations'].keys() outputs
dict_keys(['msa', 'msa_first_row', 'pair', 'single', 'structure_module'])

it contains the learned representations, although I am not sure which one to use. Hope it helps

xinformatics · 2021-07-21T22:25:52Z

@tfgg Could you suggest which representation would be a good choice as an protein embedding for downstream tasks? since i get 5 different representations from the prediction result?

ptynecki · 2021-07-26T07:51:22Z

@tfgg
Is there any reason why this thread was closed? @xinformatics shared some tips but the main questions still haven't answered.

Thank you for considering.

ricomnl · 2021-07-26T08:32:50Z

@xinformatics The first section of the article The AlphaFold2 Method Paper: A Fount of Good Ideas suggests that s_i is the embedding you want to use. This would correspond to the single key in the prediction_result['representations'] dict.

At every step of the process, {s_i} is kept updated, communicating back and forth with {z_{ij}}, so that whatever is built up in {z_{ij}} is made accessible to {s_i}. As a result {s_i} is front and center in all the major modules. And at the end, in the structure module, it is ultimately {s_i}, not {z_{ij}}, that encodes the structure (where the quaternions get extracted to generate the structure). This avoids the awkwardness of having to project the 2D representation onto 3D space.

xinformatics · 2021-07-26T08:43:52Z

@rmeinl Thank you so much. I was thinking on the similar lines. Actually, the problem is my case is that I only need the representations (not the final PDB product) and somehow I am unable to figure out how to run AF2 prediction in a loop. I have 964 sequences and I wish to avoid running AF2 manually on each sequence. The embedding extraction is available on my Github Alphafold

ricomnl · 2021-07-26T09:07:14Z

Ah interesting! I'm looking at a similar task. Two things I'll look at is 1) "turning off" the recycling step (doing a one-pass only) and 2) using only 1 of the models (instead of all 7 and then select the best scoring as they do in the provided AlphaFold.ipynb).

[...]
model_names = ['model_1', 'model_2', 'model_3', 'model_4', 'model_5', 'model_2_ptm']

[...]
for model_name in model_names:
   [...]

[...]
# Find the best model according to the mean pLDDT
best_model_name = max(plddts.keys(), key=lambda x: plddts[x].mean())

[...]

pykao · 2022-05-10T03:44:10Z

Hi @xinformatics,

I set return_representations=True within alphafold/model/modules.py, relaunched the docker container, and ran the same experiment again. However, the feature.pkl is still the same. Could you please point out which jupyter notebook ColabFold use to generate the protein embedding?

Best,
Po-Yu

tfgg closed this as completed Jul 21, 2021

abridgland mentioned this issue Jul 28, 2021

Is there a way to get per-residue embeddings from the model? #50

Closed

ptynecki mentioned this issue Aug 9, 2021

AlphaFold2 protein embedding Notebook as example sokrypton/ColabFold#19

Open

pykao mentioned this issue May 11, 2022

Extract Protein Embedding using ColabFold sokrypton/ColabFold#226

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protein Embedding with last activation layers? #15

Protein Embedding with last activation layers? #15

victormaricato commented Jul 16, 2021

ptynecki commented Jul 19, 2021 •

edited

Loading

russbates commented Jul 20, 2021 •

edited

Loading

xinformatics commented Jul 21, 2021

xinformatics commented Jul 21, 2021

ptynecki commented Jul 26, 2021

ricomnl commented Jul 26, 2021

xinformatics commented Jul 26, 2021

ricomnl commented Jul 26, 2021

pykao commented May 10, 2022

Protein Embedding with last activation layers? #15

Protein Embedding with last activation layers? #15

Comments

victormaricato commented Jul 16, 2021

ptynecki commented Jul 19, 2021 • edited Loading

russbates commented Jul 20, 2021 • edited Loading

xinformatics commented Jul 21, 2021

xinformatics commented Jul 21, 2021

ptynecki commented Jul 26, 2021

ricomnl commented Jul 26, 2021

xinformatics commented Jul 26, 2021

ricomnl commented Jul 26, 2021

pykao commented May 10, 2022

ptynecki commented Jul 19, 2021 •

edited

Loading

russbates commented Jul 20, 2021 •

edited

Loading