Getting "ValueError: Cannot create a tensor proto whose content is larger than 2GB." for smaller sequences #71

DavidB256 · 2021-07-29T01:38:53Z

Hello! I'm getting the error "ValueError: Cannot create a tensor proto whose content is larger than 2GB." when running AlphaFold jobs for proteins longer than ~650 residues. I have tried using --max_template_data=1900-01-01 in order to limit MSA size, but this has not helped. I using 1 GPU, 8 CPU cores, and 150GB memory on my university's supercomputer. I am not interested in using --preset=reduced_dbs. Thanks!!

The text was updated successfully, but these errors were encountered:

abridgland · 2021-07-29T14:52:09Z

Hi, would you mind sharing your input sequence and/or the MSA size? We wouldn’t expect to see this error from inputs of that length but there have been a few other reports like this, so we would like to investigate it further.

DavidB256 · 2021-07-30T01:54:44Z

Hi, would you mind sharing your input sequence and/or the MSA size? We wouldn’t expect to see this error from inputs of that length but there have been a few other reports like this, so we would like to investigate it further.

Here is a link to a Google Drive folder containing the .fasta file and .a3m file for my AlphaFold job: https://drive.google.com/drive/folders/1bGS80lNFlo9xUDXQs71K5im_1ZihBPut?usp=sharing

If there are any more files or contextual information that I should provide, please let me know! Thanks so much for taking a look at my issue.

ShuminBAL · 2021-07-30T02:35:35Z

same issue here.

biohegedus · 2021-08-03T14:45:53Z

Hi,

I have the same issue:

sp|Q9H222|ABCG5_HUMAN, as a control run for my install; this protein is in the afold db.
approx. 650 a.a., --preset=reduced_dbs

I0803 14:46:22.664624 140420767680320 run_docker.py:193] I0803 12:46:22.663788 139992798213952 run_alphafold.py:142] Running model model_1
...
I0803 14:46:23.571433 140420767680320 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py", line 528, in make_tensor_proto
I0803 14:46:23.571529 140420767680320 run_docker.py:193] "Cannot create a tensor proto whose content is larger than 2GB.")
I0803 14:46:23.571625 140420767680320 run_docker.py:193] ValueError: Cannot create a tensor proto whose content is larger than 2GB.

If I run with a shorter segment, ~280a.a. I have a nice result.

biohegedus · 2021-08-04T18:30:33Z

It seems for me that not the protein length was my problem. I got my shorter segment by removing a domain, which is highly conserved. The alignment and feature files with the highly conserved domain were extremely large (over 5GB). However, I do not understand why this is happening and this did not happen when deepmind generated the structure for this protein.

chrisroat · 2021-08-09T22:54:24Z

Would it be recommended (or not recommended) to tweak either of these parameters to deal with this error? (I am facing it with 1200 a.a. sequences)

mgnify_max_hits: int = 501
uniref_max_hits: int = 10000

arashnh11 · 2021-09-22T21:46:53Z

Following

arashnh11 · 2021-09-22T22:00:58Z

@abridgland A similar error, I don't think it's the residue size. I successfully folded a giant protein with 4,800 sequences without an issue. but now getting error on a lower chain of 1,480 acids.
Maybe related to the graph-proto definition for the tensor?
https://stackoverflow.com/questions/51470991/create-a-tensor-proto-whose-content-is-larger-than-2gb

mbassalbioinformatics · 2021-09-25T06:06:09Z

same issue here for protein with 1070AA

MSRRKQAKPQHINSEEDQGEQQPQQQTPEFADAAPAAPAAGELGAPVNHPGNDEVASEDE
ATVKRLRREETHVCEKCCAEFFSISEFLEHKKNCTKNPPVLIMNDSEGPVPSEDFSGAVL
SHQPTSPGSKDCHRENGGSSEDMKEKPDAESVVYLKTETALPPTPQDISYLAKGKVANTN
VTLQALRGTKVAVNQRSADALPAPVPGANSIPWVLEQILCLQQQQLQQIQLTEQIRIQVN
MWASHALHSSGAGADTLKTLGSHMSQQVSAAVALLSQKAGSQGLSLDALKQAKLPHANIP
SATSSLSPGLAPFTLKPDGTRVLPNVMSRLPSALLPQAPGSVLFQSPFSTVALDTSKKGK
GKPPNISAVDVKPKDEAALYKHKCKYCSKVFGTDSSLQIHLRSHTGERPFVCSVCGHRFT
TKGNLKVHFHRHPQVKANPQLFAEFQDKVAAGNGIPYALSVPDPIDEPSLSLDSKPVLVT
TSVGLPQNLSSGTNPKDLTGGSLPGDLQPGPSPESEGGPTLPGVGPNYNSPRAGGFQGSG
TPEPGSETLKLQQLVENIDKATTDPNECLICHRVLSCQSSLKMHYRTHTGERPFQCKICG
RAFSTKGNLKTHLGVHRTNTSIKTQHSCPICQKKFTNAVMLQQHIRMHMGGQIPNTPLPE
NPCDFTGSEPMTVGENGSTGAICHDDVIESIDVEEVSSQEAPSSSSKVPTPLPSIHSASP
TLGFAMMASLDAPGKVGPAPFNLQRQGSRENGSVESDGLTNDSSSLMGDQEYQSRSPDIL
ETTSFQALSPANSQAESIKSKSPDAGSKAESSENSRTEMEGRSSLPSTFIRAPPTYVKVE
VPGTFVGPSTLSPGMTPLLAAQPRRQAKQHGCTRCGKNFSSASALQIHERTHTGEKPFVC
NICGRAFTTKGNLKVHYMTHGANNNSARRGRKLAIENTMALLGTDGKRVSEIFPKEILAP
SVNVDPVVWNQYTSMLNGGLAVKTNEISVIQSGGVPTLPVSLGATSVVNNATVSKMDGSQ
SGISADVEKPSATDGVPKHQFPHFLEENKIAVS

any advise on how to fix this issue would be much appreciated. For now im running alphafold through the run_docker.py script.

biohegedus · 2021-09-25T17:57:13Z

I think that I solved this issue last week.
Based on @chrisroat question, I tried to play with uniref_max_hits. However, it did not have any effect. I realized that this option is not implemented in the pipeline.py script. I suggest that two lines are missing from the script:

uniref90_msa = uniref90_msa[:self.uniref_max_hits] # hege
uniref90_deletion_matrix = uniref90_deletion_matrix[:self.uniref_max_hits] # hege

Download the patch: http://alphafold.hegelab.org/pipeline.patch
and apply it as: patch alphafold/data/pipeline.py pipeline.patch

Then using the default values it should work.
You probably need to rebuild the docker image (I use AF2 outside docker).

mbassalbioinformatics · 2021-09-27T05:38:45Z

ok for me this patch seems to have fixed the issue and ive been able to generate a prediction for my protein. Thanks @biohegedus for the support!!

chrisroat · 2021-10-06T22:23:36Z

@biohegedus - Would you be able to create a fork with a branch containing your patch, and then create a PR?

I know that DeepMind doesn't accept patches, preferring to push from their internal piper repo. But I think if you generate a PR, it is a more efficient way to make your code available to others (and to DeepMind, should they chose to internalize it). After all, git is all about sharing & open source!

biohegedus · 2021-10-07T08:08:27Z

I thought about forking, since I have corrected some other issues.
However, I am new to git. What do you suggest?

I started to use and modify a non-docker fork, since it is easier to test and modify.
So I will create a fork of that, make changes, and make pull request to that?
Or should I create two forks?

Thanks.

hegelab · 2021-10-07T09:18:53Z

My last comment is obsolete - I figured out how to do forking these...

hegelab · 2021-10-07T09:21:28Z

Soon you will find the incorporated patches here - https://github.com/hegelab/alphafold

chrisroat · 2021-10-07T18:04:42Z

Great. Looking forward to it. Feel free to reach out if you need some help. Something like this might work, if you are operating in your fork:

git checkout -b fix_max_hits  # create new branch
<apply patch>
git add alphafold/data/pipeline.py  # Stage file for commit
git commit -m "Use uniref_max_hits in pipeline.py"  # Commit
git push --set-upstream origin fix_max_hits  # Push new branch and its commit to github

Then you can create a pull request from the branch you created.

Once it's available, we can get some feedback from deepmind, hopefully.

ekiefl · 2021-10-25T17:37:26Z

For anyone who does not want to apply a patch without looking at it's contents, this is @biohegedus's patch:

--- alphafold/data/pipeline.py	2021-09-25 19:18:02.756831226 +0200
+++ pipeline-ht.py	2021-09-25 19:25:28.710044994 +0200
@@ -158,6 +158,8 @@
     hhsearch_hits = parsers.parse_hhr(hhsearch_result)
     mgnify_msa = mgnify_msa[:self.mgnify_max_hits]
     mgnify_deletion_matrix = mgnify_deletion_matrix[:self.mgnify_max_hits]
+    uniref90_msa = uniref90_msa[:self.uniref_max_hits]  # hege
+    uniref90_deletion_matrix = uniref90_deletion_matrix[:self.uniref_max_hits] # hege
 
     if self._use_small_bfd:
       jackhmmer_small_bfd_result = self.jackhmmer_small_bfd_runner.query(

I will edit this post once I've tested the patch and confirm that it works.

hegelab · 2021-10-25T18:37:21Z

Hi,

I suggest to use this fork instead of patching:
https://github.com/hegelab/alphafold

chrisroat · 2021-10-25T18:55:43Z

@hegelab - there are 5 commits on your main branch, one of which even creates a log file at a hardcoded path that won't exist on most machines. This goes way beyond the 2-line patch being advocated above.

I'd suggest creating a separate branch with the just the changes (without renaming/moving to .orig) you think are important, and create a PR. That way, DeepMind can incorporate or comment on your changes.

ekiefl · 2021-10-25T18:56:16Z

Hi,

I suggest to use this fork instead of patching:
https://github.com/hegelab/alphafold

It seems you are making substantial changes in your fork with no interest in merging them into the official community project. Why not cast your changes into a PR so the official community can benefit from them?

While your changes seem great, you're quickly diverging from the official project, which is extremely undesirable for people like me, who intend to follow the development path of deepmind's alphafold rather than unofficial offshoots. Except in circumstances such as discontinued development by deepmind, I don't see the disadvantage in contributing to this community project, rather than creating your own fork.

For these reasons, I have no interest in using your fork--however thank you for the patch. I would recommend you create a PR with all of the other beneficial features you've added to alphafold, rather than going rogue unnecessarily.

hegelab · 2021-10-25T19:04:34Z

@chrisroat

I had edited that file on the github and had removed the hard coded log file. I think that it is not there. "writing to a log file used for debugging caused file permission issue…"
I still do not understand forking concept well. It was difficult to realize that PR is pull request... So I have to create a pull request to push it back for the official community... I will try.

@ekiefl Yes, I made substantial changes, since the "2GB memory problem" is just one of the manifestation of file reading/memory issues. Even if you correct this, you can run in out of system memory errors. So I think you want the scripts with the substantial changes.

ekiefl · 2021-10-25T19:18:18Z

@ekiefl Yes, I made substantial changes, since the "2GB memory problem" is just one of the manifestation of file reading/memory issues. Even if you correct this, you can run in out of system memory errors. So I think you want the scripts with the substantial changes.

Good to know @hegelab.

I still do not understand forking concept well. It was difficult to realize that PR is pull request... So I have to create a pull request to push it back for the official community... I will try.

I would recommend submitting several PRs, with as few changes as possible in each. Each PR should accomplish exactly one thing. For example, this 2 line patch is the perfect size for a PR, and could be called "Enforce max size for uniref90 MSA and deletion matrix". This way deepmind may scrutinize, discuss, and test it in detail before merging it into the main repository. If you submit all of the changes in a single PR, it will likely never be merged because there are too many moving parts for deepmind to dissect.

Please reach out if you want my help with the mechanics of this process.

chrisroat · 2021-10-25T20:04:25Z

I added a PR for the two lines change to use the uniref_max_hits. The basic steps:

Make a fork and clone it (You have already done this, I think)
Create a branch, commit your change, and push it to your fork

git checkout -b unifer_max_hits
# make changes
git commit -a -m "Use uniref_max_hits to limit sequence/matrix size"
git push --set-upstream origin unifer_max_hits

Now you can create a PR. Go to your fork in the github UI. If you go immediately, you will see a “xxx had recent pushes xx minutes ago” with a “Compare & pull request” button. Click the button. Add any additional information. Click “Create pull request”.

Note that I referenced this issue in the PR description. This will tie this Issue conversation to the PR by putting links in both places.

hegelab · 2021-10-25T20:18:25Z

@ekiefl and @chrisroat Thanks, I start to understand the idea of these stuffs - now I feel that I am not an professional programmer... I will create a clean fork, introduce my substantial changes for avoiding sys memory errors, create a branch and create a pull request. Tomorrow or Wed...

hegelab · 2021-10-29T10:55:05Z

@chrisroat Thanks again. I cleaned up my alphafold fork and introduced only those changes that decrease memory usage in the case of large jackhmmer outputs and submitted a pull request.

Augustin-Zidek · 2021-11-03T09:53:32Z

Addressed in 0be2b30.

abridgland added the error report Something isn't working label Sep 1, 2021

Augustin-Zidek mentioned this issue Sep 27, 2021

Error on tensor proto size #154

Closed

Augustin-Zidek mentioned this issue Oct 19, 2021

Cannot create a tensor proto larger than 2GB. #205

Closed

chrisroat mentioned this issue Oct 25, 2021

Use uniref_max_hits to limit sequence/matrix size #216

Closed

Augustin-Zidek closed this as completed Nov 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting "ValueError: Cannot create a tensor proto whose content is larger than 2GB." for smaller sequences #71

Getting "ValueError: Cannot create a tensor proto whose content is larger than 2GB." for smaller sequences #71

DavidB256 commented Jul 29, 2021

abridgland commented Jul 29, 2021

DavidB256 commented Jul 30, 2021

ShuminBAL commented Jul 30, 2021

biohegedus commented Aug 3, 2021

biohegedus commented Aug 4, 2021

chrisroat commented Aug 9, 2021

arashnh11 commented Sep 22, 2021

arashnh11 commented Sep 22, 2021

mbassalbioinformatics commented Sep 25, 2021

biohegedus commented Sep 25, 2021 •

edited

mbassalbioinformatics commented Sep 27, 2021

chrisroat commented Oct 6, 2021

biohegedus commented Oct 7, 2021

hegelab commented Oct 7, 2021

hegelab commented Oct 7, 2021

chrisroat commented Oct 7, 2021

ekiefl commented Oct 25, 2021

hegelab commented Oct 25, 2021

chrisroat commented Oct 25, 2021

ekiefl commented Oct 25, 2021

hegelab commented Oct 25, 2021

ekiefl commented Oct 25, 2021

chrisroat commented Oct 25, 2021

hegelab commented Oct 25, 2021

hegelab commented Oct 29, 2021

Augustin-Zidek commented Nov 3, 2021

Getting "ValueError: Cannot create a tensor proto whose content is larger than 2GB." for smaller sequences #71

Getting "ValueError: Cannot create a tensor proto whose content is larger than 2GB." for smaller sequences #71

Comments

DavidB256 commented Jul 29, 2021

abridgland commented Jul 29, 2021

DavidB256 commented Jul 30, 2021

ShuminBAL commented Jul 30, 2021

biohegedus commented Aug 3, 2021

biohegedus commented Aug 4, 2021

chrisroat commented Aug 9, 2021

arashnh11 commented Sep 22, 2021

arashnh11 commented Sep 22, 2021

mbassalbioinformatics commented Sep 25, 2021

biohegedus commented Sep 25, 2021 • edited

mbassalbioinformatics commented Sep 27, 2021

chrisroat commented Oct 6, 2021

biohegedus commented Oct 7, 2021

hegelab commented Oct 7, 2021

hegelab commented Oct 7, 2021

chrisroat commented Oct 7, 2021

ekiefl commented Oct 25, 2021

hegelab commented Oct 25, 2021

chrisroat commented Oct 25, 2021

ekiefl commented Oct 25, 2021

hegelab commented Oct 25, 2021

ekiefl commented Oct 25, 2021

chrisroat commented Oct 25, 2021

hegelab commented Oct 25, 2021

hegelab commented Oct 29, 2021

Augustin-Zidek commented Nov 3, 2021

biohegedus commented Sep 25, 2021 •

edited