Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Issue on [INFO] Merge chunked contigs vcf files #45

Closed
fidibidi opened this issue Aug 26, 2021 · 9 comments
Closed

Memory Issue on [INFO] Merge chunked contigs vcf files #45

fidibidi opened this issue Aug 26, 2021 · 9 comments
Labels
enhancement New feature or request

Comments

@fidibidi
Copy link

Hi guys,

Running clair3, and have encountered this issue twice now. I'm not sure whats going on.

I'm running on virtual machine with 64gb of memory, and on a disk with ~85gb free.

The bam I am processing is around 90gb.

Here is the runlog

run_clair3.log

Any info appreciated,
Thanks!

@fidibidi
Copy link
Author

Also!

Is it possible to resume this step?

@zhengzhenxian
Copy link
Collaborator

Hi,

It should be an out of memory issue, here are some suggestions for your reference:

  1. There are much more candidate sites than expected, we suggest you use a higher AF cut-off by setting a larger --snp_min_af and --indel_min_af (default is 0.08 and 0.15) or enable --fast_mode. It would sacrifice a little bit of sensitivity to reduce calling time and memory occupation. Pls also try to use Guppy2 model if your base-caller (Guppy) version is <3.6.0.

  2. Your log shows that some of your pileup parallel jobs were also OOM after calling several chromosomes, Pls try to reduce the --threads(you were using all your 12 CPU threads in the log).

We will also improve the SortVcf sub-module to reduce memory caching for massive candidates cases in a few days.

Hope it helps!

@zhengzhenxian zhengzhenxian added the enhancement New feature or request label Aug 27, 2021
@fidibidi
Copy link
Author

Thank you for the response! Will try these suggestions.

@fidibidi
Copy link
Author

A few more questions:

  1. For our validation run we are using data from , in this case, the GM24385_2.fast5.tar.gz GM24385_2.fastq.gz, we aren't exactly sure what version of Guppy was used to process the fast5s.
    Are guys aware of a method for determining what version of Guppy was used to process FAST5 files?

  2. When running clair3, I use the following parameters in for model settings:
    --platform="ont" --model_path=pwd"/models/ont"

Which model ends up being used in this scenario? Since there are 5 models present in this location.

Thanks!

@zhengzhenxian
Copy link
Collaborator

Seems that your link is not visible, you might get some base-caller version details in the BAM header.

Actually, you could base-call the GM24385_2.fast5.tar.gz using the latest Guppy version(We suggest Guppy version >=3.6.0) from the ONT community.

We suggest you use our ONT default Guppy3-4 model(in your pathpwd"/models/ont") for Guppy3 or Guppy4 base-called datasets, or Guppy5 model(download here) for datasets base-called by Guppy5(v5.0.6). Just unzip the file and replace the --model_path with the new folder to run the Guppy5 model.

Hope it helps!

@fidibidi
Copy link
Author

Hey guys!

Again the suggestion to reduce threads is what we opted for, at least for the time being and we had some success with that. However, I'm seeing an issue now in another run that is giving me the error:

Too many open files..
run_clair3 (3).log

Any thoughts on this?
Thanks!

@aquaskyline aquaskyline reopened this Sep 1, 2021
@aquaskyline
Copy link
Member

aquaskyline commented Sep 1, 2021

In your running environment, please run ulimit -a, my environment gave me the following outputs.

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 773361
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 773361
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Your error was triggered by a not high enough open files. It's a system setting thus can only be changed by your system administrator. Another workaround is to lower the number of THREADS further so as to have fewer files opened at the same time. But to achieve the full speed of Clair3, we suggest lifting the system limitations.

@fidibidi
Copy link
Author

fidibidi commented Sep 1, 2021

Awesome, thanks for suggestion!

For anyone who may encounter this issue:

I found this thread the most helpful for debugging the issue of why my root users limits were increased but not my normal users.

https://superuser.com/questions/1200539/cannot-increase-open-file-limit-past-4096-ubuntu

@zhengzhenxian
Copy link
Collaborator

Hi,

The new release(v0.1-r6) reduced the memory footprint in merging VCF as well as reduced ulimit -n requirement, pls feel free to have a try. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants