Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastqc is out of RAM #2989

Closed
naumenko-sa opened this issue Oct 22, 2019 · 6 comments
Closed

fastqc is out of RAM #2989

naumenko-sa opened this issue Oct 22, 2019 · 6 comments
Assignees
Labels

Comments

@naumenko-sa
Copy link
Contributor

Hi!

In some rare cases 250m RAM is not enough for fastqc:

[2019-10-22T18:21Z] Started analysis of sort-downsample.bam
[2019-10-22T18:21Z] Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead limit exceeded
[2019-10-22T18:21Z] 	at java.lang.String.toCharArray(String.java:2899)
[2019-10-22T18:21Z] 	at uk.ac.babraham.FastQC.Modules.PerSequenceGCContent.truncateSequence(PerSequenceGCContent.java:222)
[2019-10-22T18:21Z] 	at uk.ac.babraham.FastQC.Modules.PerSequenceGCContent.processSequence(PerSequenceGCContent.java:174)
[2019-10-22T18:21Z] 	at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:88)
[2019-10-22T18:21Z] 	at java.lang.Thread.run(Thread.java:748)
[2019-10-22T18:21Z] Uncaught exception occurred

Running fastqc from the command line with -t 2 allocates 500M and works:
s-andrews/FastQC#24

It looks like we don't have an options to control fastqc threads or memory from bcbio.yaml.

"-t", str(num_cores), "--extract", "-o", tx_tmp_dir, "-f", frmt, bam_file]

And we need it.

SN

@naumenko-sa naumenko-sa self-assigned this Oct 22, 2019
@roryk roryk added the bug label Oct 22, 2019
@roryk
Copy link
Collaborator

roryk commented Oct 22, 2019

Thanks, what was the FastQC command line that failed? I think we can add an explicit -Xmx500m to the FastQC call to give it more memory.

@naumenko-sa
Copy link
Contributor Author

The VM got killed already. It was a regular fastqc command.

I think fastqc is a perl script that does not allow to set up java memory. Instead it is doing hardcoded mem=threads*250M

https://github.com/s-andrews/FastQC/blob/master/fastqc#L166

So, I will just use 2 threads as suggested in here s-andrews/FastQC#24

@roryk
Copy link
Collaborator

roryk commented Oct 22, 2019

Ah, interesting!

@naumenko-sa
Copy link
Contributor Author

In the default bcbio installation in bcbio/galaxy/bcbio_system.yaml we have

resources:
  default:
    cores: 16

so fastqc is being called as fastqc -t 16 ... and we never face the fastqc memory issue.

However, in some installations people do bcbio/galaxy/bcbio_system.yaml:

resources:
  default:
    cores: 1

So default is 1, and fastqc call is fastqc -t 1 ... - memory issue.

Reminding myself how cores work for a multicore job project.
There are:

  • bcbio_nextgen.py -n NUMCORES (default = 1, typically 20-40)
  • bcbio/galaxy/bcbio_system.yaml: resources/default/cores (default=16)
  • project/config/bcbio.yaml: resources/default/cores

Probably, the easiest way to tackle this situation is to avoid setting cores: 1 in a system-wide config, min cores: 2.

SN

@chapmanb
Copy link
Member

Sergey;
Great catch on the underlying issue. I pushed a small fix that floors the number of threads used at 2 to avoid hitting the low memory issues even if folks set cores: 1, so that it'll work in these cases as well. Hope this avoids the problems going forward.

@naumenko-sa
Copy link
Contributor Author

Thanks, @chapmanb !

jfy133 added a commit to nf-core/eager that referenced this issue Apr 8, 2020
lordkev pushed a commit to lordkev/eager that referenced this issue May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants