Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting "ValueError: Cannot create a tensor proto whose content is larger than 2GB." for smaller sequences #71

Closed
DavidB256 opened this issue Jul 29, 2021 · 26 comments
Labels
error report Something isn't working

Comments

@DavidB256
Copy link

Hello! I'm getting the error "ValueError: Cannot create a tensor proto whose content is larger than 2GB." when running AlphaFold jobs for proteins longer than ~650 residues. I have tried using --max_template_data=1900-01-01 in order to limit MSA size, but this has not helped. I using 1 GPU, 8 CPU cores, and 150GB memory on my university's supercomputer. I am not interested in using --preset=reduced_dbs. Thanks!!

@abridgland
Copy link
Contributor

Hi, would you mind sharing your input sequence and/or the MSA size? We wouldn’t expect to see this error from inputs of that length but there have been a few other reports like this, so we would like to investigate it further.

@DavidB256
Copy link
Author

Hi, would you mind sharing your input sequence and/or the MSA size? We wouldn’t expect to see this error from inputs of that length but there have been a few other reports like this, so we would like to investigate it further.

Here is a link to a Google Drive folder containing the .fasta file and .a3m file for my AlphaFold job: https://drive.google.com/drive/folders/1bGS80lNFlo9xUDXQs71K5im_1ZihBPut?usp=sharing

If there are any more files or contextual information that I should provide, please let me know! Thanks so much for taking a look at my issue.

@ShuminBAL
Copy link

same issue here.

@biohegedus
Copy link

Hi,

I have the same issue:

sp|Q9H222|ABCG5_HUMAN, as a control run for my install; this protein is in the afold db.
approx. 650 a.a., --preset=reduced_dbs

I0803 14:46:22.664624 140420767680320 run_docker.py:193] I0803 12:46:22.663788 139992798213952 run_alphafold.py:142] Running model model_1
...
I0803 14:46:23.571433 140420767680320 run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py", line 528, in make_tensor_proto
I0803 14:46:23.571529 140420767680320 run_docker.py:193] "Cannot create a tensor proto whose content is larger than 2GB.")
I0803 14:46:23.571625 140420767680320 run_docker.py:193] ValueError: Cannot create a tensor proto whose content is larger than 2GB.

If I run with a shorter segment, ~280a.a. I have a nice result.

@biohegedus
Copy link

It seems for me that not the protein length was my problem. I got my shorter segment by removing a domain, which is highly conserved. The alignment and feature files with the highly conserved domain were extremely large (over 5GB). However, I do not understand why this is happening and this did not happen when deepmind generated the structure for this protein.

@chrisroat
Copy link

Would it be recommended (or not recommended) to tweak either of these parameters to deal with this error? (I am facing it with 1200 a.a. sequences)

mgnify_max_hits: int = 501
uniref_max_hits: int = 10000

@abridgland abridgland added the error report Something isn't working label Sep 1, 2021
@arashnh11
Copy link

Following

@arashnh11
Copy link

@abridgland A similar error, I don't think it's the residue size. I successfully folded a giant protein with 4,800 sequences without an issue. but now getting error on a lower chain of 1,480 acids.
Maybe related to the graph-proto definition for the tensor?
https://stackoverflow.com/questions/51470991/create-a-tensor-proto-whose-content-is-larger-than-2gb

@mbassalbioinformatics
Copy link

same issue here for protein with 1070AA

MSRRKQAKPQHINSEEDQGEQQPQQQTPEFADAAPAAPAAGELGAPVNHPGNDEVASEDE
ATVKRLRREETHVCEKCCAEFFSISEFLEHKKNCTKNPPVLIMNDSEGPVPSEDFSGAVL
SHQPTSPGSKDCHRENGGSSEDMKEKPDAESVVYLKTETALPPTPQDISYLAKGKVANTN
VTLQALRGTKVAVNQRSADALPAPVPGANSIPWVLEQILCLQQQQLQQIQLTEQIRIQVN
MWASHALHSSGAGADTLKTLGSHMSQQVSAAVALLSQKAGSQGLSLDALKQAKLPHANIP
SATSSLSPGLAPFTLKPDGTRVLPNVMSRLPSALLPQAPGSVLFQSPFSTVALDTSKKGK
GKPPNISAVDVKPKDEAALYKHKCKYCSKVFGTDSSLQIHLRSHTGERPFVCSVCGHRFT
TKGNLKVHFHRHPQVKANPQLFAEFQDKVAAGNGIPYALSVPDPIDEPSLSLDSKPVLVT
TSVGLPQNLSSGTNPKDLTGGSLPGDLQPGPSPESEGGPTLPGVGPNYNSPRAGGFQGSG
TPEPGSETLKLQQLVENIDKATTDPNECLICHRVLSCQSSLKMHYRTHTGERPFQCKICG
RAFSTKGNLKTHLGVHRTNTSIKTQHSCPICQKKFTNAVMLQQHIRMHMGGQIPNTPLPE
NPCDFTGSEPMTVGENGSTGAICHDDVIESIDVEEVSSQEAPSSSSKVPTPLPSIHSASP
TLGFAMMASLDAPGKVGPAPFNLQRQGSRENGSVESDGLTNDSSSLMGDQEYQSRSPDIL
ETTSFQALSPANSQAESIKSKSPDAGSKAESSENSRTEMEGRSSLPSTFIRAPPTYVKVE
VPGTFVGPSTLSPGMTPLLAAQPRRQAKQHGCTRCGKNFSSASALQIHERTHTGEKPFVC
NICGRAFTTKGNLKVHYMTHGANNNSARRGRKLAIENTMALLGTDGKRVSEIFPKEILAP
SVNVDPVVWNQYTSMLNGGLAVKTNEISVIQSGGVPTLPVSLGATSVVNNATVSKMDGSQ
SGISADVEKPSATDGVPKHQFPHFLEENKIAVS

any advise on how to fix this issue would be much appreciated. For now im running alphafold through the run_docker.py script.

@biohegedus
Copy link

biohegedus commented Sep 25, 2021

I think that I solved this issue last week.
Based on @chrisroat question, I tried to play with uniref_max_hits. However, it did not have any effect. I realized that this option is not implemented in the pipeline.py script. I suggest that two lines are missing from the script:

  • uniref90_msa = uniref90_msa[:self.uniref_max_hits] # hege
  • uniref90_deletion_matrix = uniref90_deletion_matrix[:self.uniref_max_hits] # hege

Download the patch: http://alphafold.hegelab.org/pipeline.patch
and apply it as: patch alphafold/data/pipeline.py pipeline.patch

Then using the default values it should work.
You probably need to rebuild the docker image (I use AF2 outside docker).

@mbassalbioinformatics
Copy link

ok for me this patch seems to have fixed the issue and ive been able to generate a prediction for my protein. Thanks @biohegedus for the support!!

@chrisroat
Copy link

@biohegedus - Would you be able to create a fork with a branch containing your patch, and then create a PR?

I know that DeepMind doesn't accept patches, preferring to push from their internal piper repo. But I think if you generate a PR, it is a more efficient way to make your code available to others (and to DeepMind, should they chose to internalize it). After all, git is all about sharing & open source!

@biohegedus
Copy link

I thought about forking, since I have corrected some other issues.
However, I am new to git. What do you suggest?

I started to use and modify a non-docker fork, since it is easier to test and modify.
So I will create a fork of that, make changes, and make pull request to that?
Or should I create two forks?

Thanks.

@hegelab
Copy link

hegelab commented Oct 7, 2021

My last comment is obsolete - I figured out how to do forking these...

@hegelab
Copy link

hegelab commented Oct 7, 2021

Soon you will find the incorporated patches here - https://github.com/hegelab/alphafold

@chrisroat
Copy link

Great. Looking forward to it. Feel free to reach out if you need some help. Something like this might work, if you are operating in your fork:

git checkout -b fix_max_hits  # create new branch
<apply patch>
git add alphafold/data/pipeline.py  # Stage file for commit
git commit -m "Use uniref_max_hits in pipeline.py"  # Commit
git push --set-upstream origin fix_max_hits  # Push new branch and its commit to github

Then you can create a pull request from the branch you created.

Once it's available, we can get some feedback from deepmind, hopefully.

@ekiefl
Copy link

ekiefl commented Oct 25, 2021

For anyone who does not want to apply a patch without looking at it's contents, this is @biohegedus's patch:

--- alphafold/data/pipeline.py	2021-09-25 19:18:02.756831226 +0200
+++ pipeline-ht.py	2021-09-25 19:25:28.710044994 +0200
@@ -158,6 +158,8 @@
     hhsearch_hits = parsers.parse_hhr(hhsearch_result)
     mgnify_msa = mgnify_msa[:self.mgnify_max_hits]
     mgnify_deletion_matrix = mgnify_deletion_matrix[:self.mgnify_max_hits]
+    uniref90_msa = uniref90_msa[:self.uniref_max_hits]  # hege
+    uniref90_deletion_matrix = uniref90_deletion_matrix[:self.uniref_max_hits] # hege
 
     if self._use_small_bfd:
       jackhmmer_small_bfd_result = self.jackhmmer_small_bfd_runner.query(

I will edit this post once I've tested the patch and confirm that it works.

@hegelab
Copy link

hegelab commented Oct 25, 2021

Hi,

I suggest to use this fork instead of patching:
https://github.com/hegelab/alphafold

@chrisroat
Copy link

@hegelab - there are 5 commits on your main branch, one of which even creates a log file at a hardcoded path that won't exist on most machines. This goes way beyond the 2-line patch being advocated above.

I'd suggest creating a separate branch with the just the changes (without renaming/moving to .orig) you think are important, and create a PR. That way, DeepMind can incorporate or comment on your changes.

@ekiefl
Copy link

ekiefl commented Oct 25, 2021

Hi,

I suggest to use this fork instead of patching:
https://github.com/hegelab/alphafold

It seems you are making substantial changes in your fork with no interest in merging them into the official community project. Why not cast your changes into a PR so the official community can benefit from them?

While your changes seem great, you're quickly diverging from the official project, which is extremely undesirable for people like me, who intend to follow the development path of deepmind's alphafold rather than unofficial offshoots. Except in circumstances such as discontinued development by deepmind, I don't see the disadvantage in contributing to this community project, rather than creating your own fork.

For these reasons, I have no interest in using your fork--however thank you for the patch. I would recommend you create a PR with all of the other beneficial features you've added to alphafold, rather than going rogue unnecessarily.

@hegelab
Copy link

hegelab commented Oct 25, 2021

@chrisroat

  1. I had edited that file on the github and had removed the hard coded log file. I think that it is not there. "writing to a log file used for debugging caused file permission issue…"
  2. I still do not understand forking concept well. It was difficult to realize that PR is pull request... So I have to create a pull request to push it back for the official community... I will try.

@ekiefl Yes, I made substantial changes, since the "2GB memory problem" is just one of the manifestation of file reading/memory issues. Even if you correct this, you can run in out of system memory errors. So I think you want the scripts with the substantial changes.

@ekiefl
Copy link

ekiefl commented Oct 25, 2021

@ekiefl Yes, I made substantial changes, since the "2GB memory problem" is just one of the manifestation of file reading/memory issues. Even if you correct this, you can run in out of system memory errors. So I think you want the scripts with the substantial changes.

Good to know @hegelab.

I still do not understand forking concept well. It was difficult to realize that PR is pull request... So I have to create a pull request to push it back for the official community... I will try.

I would recommend submitting several PRs, with as few changes as possible in each. Each PR should accomplish exactly one thing. For example, this 2 line patch is the perfect size for a PR, and could be called "Enforce max size for uniref90 MSA and deletion matrix". This way deepmind may scrutinize, discuss, and test it in detail before merging it into the main repository. If you submit all of the changes in a single PR, it will likely never be merged because there are too many moving parts for deepmind to dissect.

Please reach out if you want my help with the mechanics of this process.

@chrisroat
Copy link

I added a PR for the two lines change to use the uniref_max_hits. The basic steps:

  • Make a fork and clone it (You have already done this, I think)

  • Create a branch, commit your change, and push it to your fork

git checkout -b unifer_max_hits
# make changes
git commit -a -m "Use uniref_max_hits to limit sequence/matrix size"
git push --set-upstream origin unifer_max_hits
  • Now you can create a PR. Go to your fork in the github UI. If you go immediately, you will see a “xxx had recent pushes xx minutes ago” with a “Compare & pull request” button. Click the button. Add any additional information. Click “Create pull request”.

Note that I referenced this issue in the PR description. This will tie this Issue conversation to the PR by putting links in both places.

@hegelab
Copy link

hegelab commented Oct 25, 2021

@ekiefl and @chrisroat Thanks, I start to understand the idea of these stuffs - now I feel that I am not an professional programmer... I will create a clean fork, introduce my substantial changes for avoiding sys memory errors, create a branch and create a pull request. Tomorrow or Wed...

@hegelab
Copy link

hegelab commented Oct 29, 2021

@chrisroat Thanks again. I cleaned up my alphafold fork and introduced only those changes that decrease memory usage in the case of large jackhmmer outputs and submitted a pull request.

@Augustin-Zidek
Copy link
Collaborator

Addressed in 0be2b30.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error report Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants