Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM for 1.5 Terabytes? #438

Closed
moldach opened this issue Jan 10, 2021 · 3 comments
Closed

OOM for 1.5 Terabytes? #438

moldach opened this issue Jan 10, 2021 · 3 comments

Comments

@moldach
Copy link

moldach commented Jan 10, 2021

Trying to run GRIDSS for the first time and getting OOM errors no matter how much RAM I throw at it?

#!/bin/bash
#SBATCH --job-name=gridss_bigmem # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=moldach@ucalgary.ca # Where to send mail
#SBATCH --ntasks=1 #Run on a single CPU
#SBATCH --cpus-per-task=8 # How many cores?
#SBATCH --mem-per-cpu=250G
#SBATCH --partition=bigmem
#SBATCH --output=nap_gridss_test_bigmem_%j.log # Standard output and error log
#SBATCH --error=nap_gridss_test_bigmem_%j.err # Error log
#SBATCH --time=12:00:00
pwd; hostname; date

/project/M-mtgraovac182840/tools/gridss.sh \
        --reference /project/M-mtgraovac182840/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa \
        --output gridss.vcf.gz \
        --assembly ./ \
        --threads 1 \
        --jar /project/M-mtgraovac182840/tools/gridss-2.10.2-gridss-jar-with-dependencies.jar \
        --workingdir ./ \
        --jvmheap 2000g \
        --steps All \
        --maxcoverage 50000 \
        --labels proband \
        proband_trim_bwaMEM_sort_dedupped.bam

date
Sun Jan 10 12:08:16 MST 2021: Full log file is: ./gridss.full.20210110_120816.sm1.286547.log
Sun Jan 10 12:08:16 MST 2021: Found /usr/bin/time
Sun Jan 10 12:08:16 MST 2021: Using GRIDSS jar /project/M-mtgraovac182840/tools/gridss-2.10.2-gridss-jar-with-dependencies.jar
Sun Jan 10 12:08:16 MST 2021: Using reference genome "/project/M-mtgraovac182840/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa"
Sun Jan 10 12:08:16 MST 2021: Using assembly bam  ./
Sun Jan 10 12:08:16 MST 2021: Using output VCF gridss.vcf.gz
Sun Jan 10 12:08:16 MST 2021: Using 1 worker threads.
Sun Jan 10 12:08:16 MST 2021: Using no blacklist bed. The encode DAC blacklist is recommended for hg19.
Sun Jan 10 12:08:16 MST 2021: Using JVM maximum heap size of 2000g for assembly and variant calling.
Sun Jan 10 12:08:16 MST 2021: Using input file proband_trim_bwaMEM_sort_dedupped.bam
Sun Jan 10 12:08:16 MST 2021: label is proband
Sun Jan 10 12:08:16 MST 2021: Found /project/M-mtgraovac182840/tools/R-4.0.3/bin/Rscript
Sun Jan 10 12:08:16 MST 2021: Found /project/M-mtgraovac182840/tools/samtools-1.3.1/bin/samtools
Sun Jan 10 12:08:16 MST 2021: Found /usr/bin/java
Sun Jan 10 12:08:16 MST 2021: Found /project/M-mtgraovac182840/tools/bwa
Sun Jan 10 12:08:16 MST 2021: samtools version: 1.3.1+htslib-1.3.1
Sun Jan 10 12:08:16 MST 2021: R version: R scripting front-end version 4.0.3 (2020-10-10)
Sun Jan 10 12:08:16 MST 2021: bwa Version: 0.7.15-r1140
Sun Jan 10 12:08:16 MST 2021: time version: time (GNU Time) UNKNOWN
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by David Keppel, David MacKenzie, and Assaf Gordon.
Sun Jan 10 12:08:16 MST 2021: bash version: GNU bash, version 4.4.19(1)-release (x86_64-redhat-linux-gnu)
Sun Jan 10 12:08:16 MST 2021: java version: openjdk version "1.8.0_272" OpenJDK Runtime Environment (build 1.8.0_272-b10)       OpenJDK 64-Bit Server VM (build 25.272-b10, mixed mode)
Sun Jan 10 12:08:18 MST 2021: Max file handles: 131072
Sun Jan 10 12:08:18 MST 2021: Running GRIDSS steps: setupreference, preprocess, assemble, call,
Sun Jan 10 12:08:18 MST 2021: Running   PrepareReference        (once-off setup for reference genome)
INFO    2021-01-10 12:08:18     Defaults        Found file for property samjdk.reference_fasta: /project/M-mtgraovac182840/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa
INFO    2021-01-10 12:08:18     PrepareReference

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    PrepareReference -REFERENCE_SEQUENCE /project/M-mtgraovac182840/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa
**********


12:08:18.987 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/project/M-mtgraovac182840/tools/gridss-2.10.2-gridss-jar-with-dependencies.jar!/com/intel/gkl/native/libgkl_compression.so
[Sun Jan 10 12:08:19 MST 2021] PrepareReference REFERENCE_SEQUENCE=/project/M-mtgraovac182840/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa    CREATE_SEQUENCE_DICTIONARY=true CREATE_GRIDSS_REFERENCE_CACHE=true CREATE_BWA_INDEX_IMAGE=true VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sun Jan 10 12:08:19 MST 2021] Executing as moldach@sm1 on Linux 4.18.0-193.28.1.el8_2.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_272-b10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.10.2-gridss
INFO    2021-01-10 12:08:19     PrepareReference        Sequence dictionary found.
INFO    2021-01-10 12:08:19     PrepareReference        Creating GRIDSS reference cache file /project/M-mtgraovac182840/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gridsscache
INFO    2021-01-10 12:08:19     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1287_PATCH
INFO    2021-01-10 12:08:22     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1459_PATCH
INFO    2021-01-10 12:08:24     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR6_MHC_SSTO
INFO    2021-01-10 12:08:25     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR6_MHC_MCF
INFO    2021-01-10 12:08:28     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR6_MHC_COX
INFO    2021-01-10 12:08:30     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR6_MHC_MANN
INFO    2021-01-10 12:08:32     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR6_MHC_APD
INFO    2021-01-10 12:08:34     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR6_MHC_QBL
INFO    2021-01-10 12:08:36     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR6_MHC_DBB
INFO    2021-01-10 12:08:38     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1433_PATCH
INFO    2021-01-10 12:08:40     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1257_PATCH
INFO    2021-01-10 12:08:47     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR17_1
INFO    2021-01-10 12:08:48     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1292_PATCH
INFO    2021-01-10 12:08:51     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR5_1_CTG1
INFO    2021-01-10 12:08:53     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1592_PATCH
INFO    2021-01-10 12:08:55     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1425_PATCH
INFO    2021-01-10 12:08:56     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1462_PATCH
INFO    2021-01-10 12:08:58     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG7_PATCH
INFO    2021-01-10 12:09:01     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1458_PATCH
INFO    2021-01-10 12:09:03     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR19LRC_LRC_J_CTG1
INFO    2021-01-10 12:09:03     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR19LRC_LRC_S_CTG1
INFO    2021-01-10 12:09:04     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR19LRC_LRC_I_CTG1
INFO    2021-01-10 12:09:05     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1079_PATCH
INFO    2021-01-10 12:09:05     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1443_HG1444_PATCH
INFO    2021-01-10 12:09:07     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG979_PATCH
INFO    2021-01-10 12:09:08     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR19LRC_LRC_T_CTG1
INFO    2021-01-10 12:09:09     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR19LRC_COX1_CTG1
INFO    2021-01-10 12:09:10     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR19LRC_PGF1_CTG1
INFO    2021-01-10 12:09:11     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1453_PATCH
INFO    2021-01-10 12:09:13     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1423_PATCH
INFO    2021-01-10 12:09:15     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1437_PATCH
INFO    2021-01-10 12:09:16     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG865_PATCH
INFO    2021-01-10 12:09:18     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1434_PATCH
INFO    2021-01-10 12:09:19     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1438_PATCH
INFO    2021-01-10 12:09:21     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR19LRC_PGF2_CTG1
INFO    2021-01-10 12:09:22     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1293_PATCH
INFO    2021-01-10 12:09:25     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1426_PATCH
INFO    2021-01-10 12:09:27     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR19LRC_COX2_CTG1
INFO    2021-01-10 12:09:28     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1308_PATCH
INFO    2021-01-10 12:09:29     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1463_PATCH
INFO    2021-01-10 12:09:31     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG987_PATCH
INFO    2021-01-10 12:09:32     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG19_PATCH
INFO    2021-01-10 12:09:33     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG953_PATCH
INFO    2021-01-10 12:09:36     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HSCHR4_1
INFO    2021-01-10 12:09:38     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG730_PATCH
INFO    2021-01-10 12:09:39     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1350_HG959_PATCH
INFO    2021-01-10 12:09:40     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG306_PATCH
INFO    2021-01-10 12:09:41     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1082_HG167_PATCH
INFO    2021-01-10 12:09:43     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG75_PATCH
INFO    2021-01-10 12:09:44     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1424_PATCH
INFO    2021-01-10 12:09:46     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG174_HG254_PATCH
INFO    2021-01-10 12:09:49     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG357_PATCH
INFO    2021-01-10 12:09:51     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG29_PATCH
INFO    2021-01-10 12:09:53     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG256_PATCH
INFO    2021-01-10 12:09:54     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG183_PATCH
INFO    2021-01-10 12:09:55     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1146_PATCH
INFO    2021-01-10 12:09:56     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1441_PATCH
INFO    2021-01-10 12:09:58     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG871_PATCH
INFO    2021-01-10 12:10:00     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG104_HG975_PATCH
INFO    2021-01-10 12:10:02     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG1442_PATCH
INFO    2021-01-10 12:10:04     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG185_PATCH
INFO    2021-01-10 12:10:05     TwoBitBufferedReferenceSequenceFile     Caching reference genome contig HG122_PATCH
[Sun Jan 10 12:10:05 MST 2021] gridss.PrepareReference done. Elapsed time: 1.78 minutes.
Runtime.totalMemory()=3621257216
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at htsjdk.samtools.reference.AbstractIndexedFastaSequenceFile.getSubsequenceAt(AbstractIndexedFastaSequenceFile.java:184)
        at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:49)
        at htsjdk.samtools.reference.AbstractIndexedFastaSequenceFile.getSequence(AbstractIndexedFastaSequenceFile.java:162)
        at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSequence(IndexedFastaSequenceFile.java:49)
        at au.edu.wehi.idsv.picard.TwoBitBufferedReferenceSequenceFile.cacheLoad(TwoBitBufferedReferenceSequenceFile.java:172)
        at au.edu.wehi.idsv.picard.TwoBitBufferedReferenceSequenceFile.lambda$save$1(TwoBitBufferedReferenceSequenceFile.java:73)
        at au.edu.wehi.idsv.picard.TwoBitBufferedReferenceSequenceFile$$Lambda$36/1896294051.accept(Unknown Source)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
        at au.edu.wehi.idsv.picard.TwoBitBufferedReferenceSequenceFile.save(TwoBitBufferedReferenceSequenceFile.java:73)
        at gridss.PrepareReference.doWork(PrepareReference.java:48)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at gridss.PrepareReference.main(PrepareReference.java:89)
@d-cameron
Copy link
Member

d-cameron commented Jan 10, 2021 via email

@d-cameron
Copy link
Member

Would an additional option to override the fixed heap size commands be useful?

@moldach
Copy link
Author

moldach commented Jan 27, 2021

Hi @d-cameron, sorry for the slow response.

The reference genome (Ensembl's TopLevel) Homo_sapiens.GRCh37.dna.toplevel.fa is 31G in size and contains alt/patch contigs.

Also, your assembly parameter needs to be a file not directory (eg
--assembly ./assembly.bam)

Thanks!

Would an additional option to override the fixed heap size commands be useful?

I was able to run successfully by with making hard changes to the gridss.sh; however, an option to override the fixed heap size would likely be preferable to others.

d-cameron pushed a commit that referenced this issue Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants