New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting "ValueError: Cannot create a tensor proto whose content is larger than 2GB." for smaller sequences #71
Comments
Hi, would you mind sharing your input sequence and/or the MSA size? We wouldn’t expect to see this error from inputs of that length but there have been a few other reports like this, so we would like to investigate it further. |
Here is a link to a Google Drive folder containing the .fasta file and .a3m file for my AlphaFold job: https://drive.google.com/drive/folders/1bGS80lNFlo9xUDXQs71K5im_1ZihBPut?usp=sharing If there are any more files or contextual information that I should provide, please let me know! Thanks so much for taking a look at my issue. |
same issue here. |
Hi, I have the same issue:
If I run with a shorter segment, ~280a.a. I have a nice result. |
It seems for me that not the protein length was my problem. I got my shorter segment by removing a domain, which is highly conserved. The alignment and feature files with the highly conserved domain were extremely large (over 5GB). However, I do not understand why this is happening and this did not happen when deepmind generated the structure for this protein. |
Would it be recommended (or not recommended) to tweak either of these parameters to deal with this error? (I am facing it with 1200 a.a. sequences) mgnify_max_hits: int = 501 |
Following |
@abridgland A similar error, I don't think it's the residue size. I successfully folded a giant protein with 4,800 sequences without an issue. but now getting error on a lower chain of 1,480 acids. |
same issue here for protein with 1070AA
any advise on how to fix this issue would be much appreciated. For now im running alphafold through the run_docker.py script. |
I think that I solved this issue last week.
Download the patch: http://alphafold.hegelab.org/pipeline.patch Then using the default values it should work. |
ok for me this patch seems to have fixed the issue and ive been able to generate a prediction for my protein. Thanks @biohegedus for the support!! |
@biohegedus - Would you be able to create a fork with a branch containing your patch, and then create a PR? I know that DeepMind doesn't accept patches, preferring to push from their internal piper repo. But I think if you generate a PR, it is a more efficient way to make your code available to others (and to DeepMind, should they chose to internalize it). After all, git is all about sharing & open source! |
I thought about forking, since I have corrected some other issues. I started to use and modify a non-docker fork, since it is easier to test and modify. Thanks. |
My last comment is obsolete - I figured out how to do forking these... |
Soon you will find the incorporated patches here - https://github.com/hegelab/alphafold |
Great. Looking forward to it. Feel free to reach out if you need some help. Something like this might work, if you are operating in your fork:
Then you can create a pull request from the branch you created. Once it's available, we can get some feedback from deepmind, hopefully. |
For anyone who does not want to apply a patch without looking at it's contents, this is @biohegedus's patch: --- alphafold/data/pipeline.py 2021-09-25 19:18:02.756831226 +0200
+++ pipeline-ht.py 2021-09-25 19:25:28.710044994 +0200
@@ -158,6 +158,8 @@
hhsearch_hits = parsers.parse_hhr(hhsearch_result)
mgnify_msa = mgnify_msa[:self.mgnify_max_hits]
mgnify_deletion_matrix = mgnify_deletion_matrix[:self.mgnify_max_hits]
+ uniref90_msa = uniref90_msa[:self.uniref_max_hits] # hege
+ uniref90_deletion_matrix = uniref90_deletion_matrix[:self.uniref_max_hits] # hege
if self._use_small_bfd:
jackhmmer_small_bfd_result = self.jackhmmer_small_bfd_runner.query( I will edit this post once I've tested the patch and confirm that it works. |
Hi, I suggest to use this fork instead of patching: |
@hegelab - there are 5 commits on your main branch, one of which even creates a log file at a hardcoded path that won't exist on most machines. This goes way beyond the 2-line patch being advocated above. I'd suggest creating a separate branch with the just the changes (without renaming/moving to .orig) you think are important, and create a PR. That way, DeepMind can incorporate or comment on your changes. |
It seems you are making substantial changes in your fork with no interest in merging them into the official community project. Why not cast your changes into a PR so the official community can benefit from them? While your changes seem great, you're quickly diverging from the official project, which is extremely undesirable for people like me, who intend to follow the development path of deepmind's alphafold rather than unofficial offshoots. Except in circumstances such as discontinued development by deepmind, I don't see the disadvantage in contributing to this community project, rather than creating your own fork. For these reasons, I have no interest in using your fork--however thank you for the patch. I would recommend you create a PR with all of the other beneficial features you've added to alphafold, rather than going rogue unnecessarily. |
@ekiefl Yes, I made substantial changes, since the "2GB memory problem" is just one of the manifestation of file reading/memory issues. Even if you correct this, you can run in out of system memory errors. So I think you want the scripts with the substantial changes. |
Good to know @hegelab.
I would recommend submitting several PRs, with as few changes as possible in each. Each PR should accomplish exactly one thing. For example, this 2 line patch is the perfect size for a PR, and could be called "Enforce max size for uniref90 MSA and deletion matrix". This way deepmind may scrutinize, discuss, and test it in detail before merging it into the main repository. If you submit all of the changes in a single PR, it will likely never be merged because there are too many moving parts for deepmind to dissect. Please reach out if you want my help with the mechanics of this process. |
I added a PR for the two lines change to use the
Note that I referenced this issue in the PR description. This will tie this Issue conversation to the PR by putting links in both places. |
@ekiefl and @chrisroat Thanks, I start to understand the idea of these stuffs - now I feel that I am not an professional programmer... I will create a clean fork, introduce my substantial changes for avoiding sys memory errors, create a branch and create a pull request. Tomorrow or Wed... |
@chrisroat Thanks again. I cleaned up my alphafold fork and introduced only those changes that decrease memory usage in the case of large jackhmmer outputs and submitted a pull request. |
Addressed in 0be2b30. |
Hello! I'm getting the error "ValueError: Cannot create a tensor proto whose content is larger than 2GB." when running AlphaFold jobs for proteins longer than ~650 residues. I have tried using
--max_template_data=1900-01-01
in order to limit MSA size, but this has not helped. I using 1 GPU, 8 CPU cores, and 150GB memory on my university's supercomputer. I am not interested in using--preset=reduced_dbs
. Thanks!!The text was updated successfully, but these errors were encountered: