Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Cannot read from buffer; Error: cannot load book-keeping #7012

Closed
ccastane9 opened this issue Dec 21, 2020 · 24 comments
Closed

Error: Cannot read from buffer; Error: cannot load book-keeping #7012

ccastane9 opened this issue Dec 21, 2020 · 24 comments
Assignees

Comments

@ccastane9
Copy link

Hello,
I have been using he GenotypeGVCFs function to call variants on roughly 300 whole genome sequenced individuals. I have not run into any issue when calling variants for these same individuals using the majority of chromosomes, however when I use the same script for chromosomes 1, 2 and 3 of the species I get the error "Couldn't create GenomicsDBFeatureReader" as in issue #6616 although I believe our issues may differ because I also have the errors "Cannot read from buffer" and "cannot load book-keeping; Reading-tiles offset".

Below is the computer output:

Using GATK jar /data1/_software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx16g -jar /data1/_software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar GenotypeGVCFs --reference /data1/EquCab/_ECA30/Equus_caballus.EquCab3.0.dna_sm.toplevel.fa/ -V gendb://ECA3_GenomicsDB_260/1 -O ECA3_GenomicsDB_260.1.g.vcf.gz
13:56:51.939 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data1/_software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Dec 21, 2020 1:56:52 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
13:56:52.185 INFO GenotypeGVCFs - ------------------------------------------------------------
13:56:52.186 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.1.8.1
13:56:52.186 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
13:56:52.186 INFO GenotypeGVCFs - Executing as ccastane9@andersserver-01.cvm.tamu.edu on Linux v3.10.0-1127.19.1.el7.x86_64 amd64
13:56:52.186 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-b01
13:56:52.186 INFO GenotypeGVCFs - Start Date/Time: December 21, 2020 1:56:51 PM CST
13:56:52.186 INFO GenotypeGVCFs - ------------------------------------------------------------
13:56:52.186 INFO GenotypeGVCFs - ------------------------------------------------------------
13:56:52.187 INFO GenotypeGVCFs - HTSJDK Version: 2.23.0
13:56:52.187 INFO GenotypeGVCFs - Picard Version: 2.22.8
13:56:52.187 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:56:52.187 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:56:52.187 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:56:52.187 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:56:52.187 INFO GenotypeGVCFs - Deflater: IntelDeflater
13:56:52.188 INFO GenotypeGVCFs - Inflater: IntelInflater
13:56:52.188 INFO GenotypeGVCFs - GCS max retries/reopens: 20
13:56:52.188 INFO GenotypeGVCFs - Requester pays: disabled
13:56:52.188 INFO GenotypeGVCFs - Initializing engine
13:56:53.115 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.0-e701905
[TileDB::Buffer] Error: Cannot read from buffer; End of buffer reached.
[TileDB::BookKeeping] Error: Cannot load book-keeping; Reading tile offsets failed.
13:57:15.762 INFO GenotypeGVCFs - Shutting down engine
[December 21, 2020 1:57:15 PM CST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.40 minutes.
Runtime.totalMemory()=2119696384


A USER ERROR has occurred: Couldn't create GenomicsDBFeatureReader


org.broadinstitute.hellbender.exceptions.UserException: Couldn't create GenomicsDBFeatureReader
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:410)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:326)
at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:282)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.initializeDrivingVariants(VariantLocusWalker.java:76)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:709)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.onStartup(VariantLocusWalker.java:63)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: java.io.IOException: GenomicsDB JNI Error: VariantQueryProcessorException : Could not open array 1$1$188260577 at workspace: /data1/EquCab/GenomicsDB/ECA3_GenomicsDB_260/1
TileDB error message : [TileDB::BookKeeping] Error: Cannot load book-keeping; Reading tile offsets failed
at org.genomicsdb.reader.GenomicsDBQueryStream.jniGenomicsDBInit(Native Method)
at org.genomicsdb.reader.GenomicsDBQueryStream.(GenomicsDBQueryStream.java:209)
at org.genomicsdb.reader.GenomicsDBQueryStream.(GenomicsDBQueryStream.java:182)
at org.genomicsdb.reader.GenomicsDBQueryStream.(GenomicsDBQueryStream.java:91)
at org.genomicsdb.reader.GenomicsDBFeatureReader.generateHeadersForQuery(GenomicsDBFeatureReader.java:200)
at org.genomicsdb.reader.GenomicsDBFeatureReader.(GenomicsDBFeatureReader.java:85)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:407)
... 12 more

I'm assuming it is something in the array 1$1$188260577 files, and possibly the _book_keep.tbs.gz file, although I'm not sure how to go about trouble shooting the issue. I also recreated the database for these chromosomes (still using the same scripts as other chromosomes where variant calling was successful) to see if perhaps something went wrong during the initial database creation. I still received this error when I was trying to call variants.
What is most confusing to me is that this issue isn't happening for every chromosome, just the first 3. Any advice to get over this hump is greatly appreciated, and let me know if there is more information you need to help trouble shoot.

Thanks,
Caitlin

@droazen
Copy link
Collaborator

droazen commented Jan 4, 2021

@nalinigans @mlathara ^^

@nalinigans
Copy link
Collaborator

@ccastane9, looks like a memory issue. Some questions -

  1. What are the sizes of the book-keeping files in your GenomicsDB workspace? Try running find /ECA3_GenomicsDB_260 -name __book_keeping.tdb.gz -ls.
  2. Is /ECA3_GenomicsDB_260 on NFS or another shared Posix FS? Can you try running GenotypeGVCFs with --genomicsdb-shared-posixfs-optimizations turned on?
  3. What does your hardware configuration look like, memory wise?
  4. What are your -Xmx and -Xms java options?

@ccastane9
Copy link
Author

@nalinigans, to answer your questions as best as I can (sorry, I'm a bit of a novice)

  1. the size of the book keeping file is: 20,779,823bytes (~21Mb)
  2. I believe it is a shared Posix FS; Running this option created a similar error except this time there was: "cannot load book-keeping: Reading MBR failed" (output below)
  3. available memory ~89Gb
  4. I am running -Xmx16g java option

Newest output:
Using GATK jar /data1/_software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx16g -jar /data1/_software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar GenotypeGVCFs --genomicsdb-shared-posixfs-optimizations --reference /data1/EquCab/_ECA30/Equus_caballus.EquCab3.0.dna_sm.toplevel.fa/ -V gendb://ECA3_GenomicsDB_260/3 -O ECA3_GenomicsDB_260.3.g.vcf.gz
16:26:34.912 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data1/_software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 06, 2021 4:26:35 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:26:35.417 INFO GenotypeGVCFs - ------------------------------------------------------------
16:26:35.418 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.1.8.1
16:26:35.418 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
16:26:35.420 INFO GenotypeGVCFs - Executing as ccastane9@andersserver-01.cvm.tamu.edu on Linux v3.10.0-1127.19.1.el7.x86_64 amd64
16:26:35.421 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-b01
16:26:35.421 INFO GenotypeGVCFs - Start Date/Time: January 6, 2021 4:26:34 PM CST
16:26:35.421 INFO GenotypeGVCFs - ------------------------------------------------------------
16:26:35.421 INFO GenotypeGVCFs - ------------------------------------------------------------
16:26:35.422 INFO GenotypeGVCFs - HTSJDK Version: 2.23.0
16:26:35.423 INFO GenotypeGVCFs - Picard Version: 2.22.8
16:26:35.423 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:26:35.423 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:26:35.426 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:26:35.426 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:26:35.427 INFO GenotypeGVCFs - Deflater: IntelDeflater
16:26:35.427 INFO GenotypeGVCFs - Inflater: IntelInflater
16:26:35.427 INFO GenotypeGVCFs - GCS max retries/reopens: 20
16:26:35.427 INFO GenotypeGVCFs - Requester pays: disabled
16:26:35.427 INFO GenotypeGVCFs - Initializing engine
16:26:37.201 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.0-e701905
[TileDB::Buffer] Error: Cannot read from buffer; End of buffer reached.
[TileDB::BookKeeping] Error: Cannot load book-keeping; Reading MBR failed.
16:26:39.459 INFO GenotypeGVCFs - Shutting down engine
[January 6, 2021 4:26:39 PM CST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.08 minutes.
Runtime.totalMemory()=2303197184


A USER ERROR has occurred: Couldn't create GenomicsDBFeatureReader


Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
[ccastane9@andersserver-01 GenomicsDB]$ bash *_genotype.3.sh
Using GATK jar /data1/_software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx16g -jar /data1/_software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar GenotypeGVCFs --genomicsdb-shared-posixfs-optimizations --reference /data1/EquCab/_ECA30/Equus_caballus.EquCab3.0.dna_sm.toplevel.fa/ -V gendb://ECA3_GenomicsDB_260/3 -O ECA3_GenomicsDB_260.3.g.vcf.gz
16:27:53.573 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data1/_software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 06, 2021 4:27:54 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:27:54.132 INFO GenotypeGVCFs - ------------------------------------------------------------
16:27:54.133 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.1.8.1
16:27:54.133 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
16:27:54.143 INFO GenotypeGVCFs - Executing as ccastane9@andersserver-01.cvm.tamu.edu on Linux v3.10.0-1127.19.1.el7.x86_64 amd64
16:27:54.143 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-b01
16:27:54.144 INFO GenotypeGVCFs - Start Date/Time: January 6, 2021 4:27:53 PM CST
16:27:54.144 INFO GenotypeGVCFs - ------------------------------------------------------------
16:27:54.144 INFO GenotypeGVCFs - ------------------------------------------------------------
16:27:54.145 INFO GenotypeGVCFs - HTSJDK Version: 2.23.0
16:27:54.145 INFO GenotypeGVCFs - Picard Version: 2.22.8
16:27:54.145 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:27:54.145 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:27:54.145 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:27:54.146 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:27:54.146 INFO GenotypeGVCFs - Deflater: IntelDeflater
16:27:54.146 INFO GenotypeGVCFs - Inflater: IntelInflater
16:27:54.146 INFO GenotypeGVCFs - GCS max retries/reopens: 20
16:27:54.146 INFO GenotypeGVCFs - Requester pays: disabled
16:27:54.146 INFO GenotypeGVCFs - Initializing engine
16:27:55.873 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.0-e701905
[TileDB::Buffer] Error: Cannot read from buffer; End of buffer reached.
[TileDB::BookKeeping] Error: Cannot load book-keeping; Reading MBR failed.
16:27:58.483 INFO GenotypeGVCFs - Shutting down engine
[January 6, 2021 4:27:58 PM CST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.08 minutes.
Runtime.totalMemory()=2231894016


A USER ERROR has occurred: Couldn't create GenomicsDBFeatureReader


org.broadinstitute.hellbender.exceptions.UserException: Couldn't create GenomicsDBFeatureReader
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:410)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:326)
at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:282)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.initializeDrivingVariants(VariantLocusWalker.java:76)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:709)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.onStartup(VariantLocusWalker.java:63)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: java.io.IOException: GenomicsDB JNI Error: VariantQueryProcessorException : Could not open array 3$1$121351753 at workspace: /data1/EquCab/GenomicsDB/ECA3_GenomicsDB_260/3
TileDB error message : [TileDB::BookKeeping] Error: Cannot load book-keeping; Reading MBR failed
at org.genomicsdb.reader.GenomicsDBQueryStream.jniGenomicsDBInit(Native Method)
at org.genomicsdb.reader.GenomicsDBQueryStream.(GenomicsDBQueryStream.java:209)
at org.genomicsdb.reader.GenomicsDBQueryStream.(GenomicsDBQueryStream.java:182)
at org.genomicsdb.reader.GenomicsDBQueryStream.(GenomicsDBQueryStream.java:91)
at org.genomicsdb.reader.GenomicsDBFeatureReader.generateHeadersForQuery(GenomicsDBFeatureReader.java:200)
at org.genomicsdb.reader.GenomicsDBFeatureReader.(GenomicsDBFeatureReader.java:85)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:407)
... 12 more

@nalinigans
Copy link
Collaborator

Your -Xmx option seems to be rather small. Can you try 32G or 48G? Also, if you can share the following files from the GenomicsDB array /ECA3_GenomicsDB_260/3 - __array_schema.tdb and any one __book_keeping.tdb.gz from one of the directories under /ECA3_GenomicsDB_260/3), that would be helpful for debugging.

@ccastane9
Copy link
Author

Hopefully I have attached the files correctly, I also tried increasing the -Xmx option but received the same error. Thanks again for all of your help!
__array_schema.zip

__book_keeping.tdb.zip

@nalinigans
Copy link
Collaborator

nalinigans commented Jan 12, 2021

Thanks, we have reproduced the issue with your files.

Did you see any errors logged during the GenomicsDBImport phase? What OS are you running on? Will you be able to help by re-running GenomicsDBImport with a debug version of the libtiledbgenomicsdb.so and ./gatk --java-options "-Dgenomicsdb.library.path=/path/to/libtiledbgenomicsdb.so" GenomicsDBImport? If so, will build and share a debug version with you.

As a workaround for now, can you split the intervals to GenomicsDBImport - see https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists? Splitting the chromosome into 2 or 3 roughly equal regions may help.

@ccastane9
Copy link
Author

Hi, I do not recall seeing any of those errors during the GenomicsSBImport phase nor can I find any errors readily logged, although I can recreate the database for Chromosome 3 and tell you for sure if I find any errors logged. I am running GenomicsDBImport on a Linux v3.10.0-1127.19.1.el7.x86_64 amd64 server - and yes, I can rerun a debugged version.

@nalinigans
Copy link
Collaborator

nalinigans commented Jan 12, 2021

@ccastane9, what flavor of Linux is your server running on?

@ccastane9
Copy link
Author

@nalinigans scientific Linux 7.9

@nalinigans
Copy link
Collaborator

Hi, you have probably deleted out the previous comment with the No space left on device error, but was wondering if you could check if you have enough space, import to a new workspace and turn on --genomicsdb-shared-posixfs-optimizations with GenomicsDBImport?

@ccastane9
Copy link
Author

Yes, the "no space left on device" was temporary and has been fixed - I'm creating a new workspace with the --genomicsdb-shared-posixfs-optimizations turned on.

@nalinigans
Copy link
Collaborator

Did you reproduce the issue after using --genomicsdb-shared-posixfs-optimizations with GenomicsDBImport?

@ccastane9
Copy link
Author

@nalinigans I did, and the import was a success and I saw no errors pop up - however I still cannot call variants. Below is the output for the GenomicsImport and then the same errors when I tried to call variants.

GenomicsImport output
11:28:17.308 INFO IntervalArgumentCollection - Processing 121351753 bp from intervals
11:28:17.574 INFO GenomicsDBImport - Done initializing engine
11:28:18.095 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.0-e701905
11:28:18.119 INFO GenomicsDBImport - Vid Map JSON file will be written to /data1/EquCab/GenomicsDB/ECA3_GenomicsDB_260/3/vidmap.json
11:28:18.119 INFO GenomicsDBImport - Callset Map JSON file will be written to /data1/EquCab/GenomicsDB/ECA3_GenomicsDB_260/3/callset.json
11:28:18.119 INFO GenomicsDBImport - Complete VCF Header will be written to /data1/EquCab/GenomicsDB/ECA3_GenomicsDB_260/3/vcfheader.vcf
11:28:18.120 INFO GenomicsDBImport - Importing to workspace - /data1/EquCab/GenomicsDB/ECA3_GenomicsDB_260/3
11:28:18.120 INFO ProgressMeter - Starting traversal
11:28:18.120 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
11:28:24.736 INFO GenomicsDBImport - Importing batch 1 with 260 samples
05:39:42.050 INFO ProgressMeter - 3:1 2531.4 1 0.0
05:39:42.051 INFO GenomicsDBImport - Done importing batch 1/1
05:39:42.060 INFO ProgressMeter - 3:1 2531.4 1 0.0
05:39:42.061 INFO ProgressMeter - Traversal complete. Processed 1 total batches in 2531.4 minutes.
05:39:42.061 INFO GenomicsDBImport - Import completed!
05:39:42.061 INFO GenomicsDBImport - Shutting down engine
[January 16, 2021 5:39:42 AM CST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 2,531.64 minutes.
Runtime.totalMemory()=9711910912
Tool returned:
true
Calling Variants Attempt
Using GATK jar /data1/_software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx32g -jar /data1/_software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar GenotypeGVCFs --genomicsdb-shared-posixfs-optimizations --reference /data1/EquCab/_ECA30/Equus_caballus.EquCab3.0.dna_sm.toplevel.fa/ -V gendb://ECA3_GenomicsDB_260/3 -O ECA3_GenomicsDB_260.3.g.vcf.gz
21:16:35.251 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data1/_software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 17, 2021 9:16:35 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
21:16:35.496 INFO GenotypeGVCFs - ------------------------------------------------------------
21:16:35.497 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.1.8.1
21:16:35.497 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
21:16:35.497 INFO GenotypeGVCFs - Executing as ccastane9@andersserver-01.cvm.tamu.edu on Linux v3.10.0-1127.19.1.el7.x86_64 amd64
21:16:35.497 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_275-b01
21:16:35.497 INFO GenotypeGVCFs - Start Date/Time: January 17, 2021 9:16:35 PM CST
21:16:35.497 INFO GenotypeGVCFs - ------------------------------------------------------------
21:16:35.497 INFO GenotypeGVCFs - ------------------------------------------------------------
21:16:35.498 INFO GenotypeGVCFs - HTSJDK Version: 2.23.0
21:16:35.498 INFO GenotypeGVCFs - Picard Version: 2.22.8
21:16:35.498 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
21:16:35.498 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
21:16:35.498 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
21:16:35.498 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
21:16:35.498 INFO GenotypeGVCFs - Deflater: IntelDeflater
21:16:35.499 INFO GenotypeGVCFs - Inflater: IntelInflater
21:16:35.499 INFO GenotypeGVCFs - GCS max retries/reopens: 20
21:16:35.499 INFO GenotypeGVCFs - Requester pays: disabled
21:16:35.499 INFO GenotypeGVCFs - Initializing engine
21:16:36.737 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.0-e701905
[TileDB::Buffer] Error: Cannot read from buffer; End of buffer reached.
[TileDB::BookKeeping] Error: Cannot load book-keeping; Reading MBR failed.
21:16:38.472 INFO GenotypeGVCFs - Shutting down engine
[January 17, 2021 9:16:38 PM CST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.06 minutes.
Runtime.totalMemory()=2551709696


A USER ERROR has occurred: Couldn't create GenomicsDBFeatureReader


org.broadinstitute.hellbender.exceptions.UserException: Couldn't create GenomicsDBFeatureReader
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:410)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:326)
at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:282)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.initializeDrivingVariants(VariantLocusWalker.java:76)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:67)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:709)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.onStartup(VariantLocusWalker.java:63)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: java.io.IOException: GenomicsDB JNI Error: VariantQueryProcessorException : Could not open array 3$1$121351753 at workspace: /data1/EquCab/GenomicsDB/ECA3_GenomicsDB_260/3
TileDB error message : [TileDB::BookKeeping] Error: Cannot load book-keeping; Reading MBR failed
at org.genomicsdb.reader.GenomicsDBQueryStream.jniGenomicsDBInit(Native Method)
at org.genomicsdb.reader.GenomicsDBQueryStream.(GenomicsDBQueryStream.java:209)
at org.genomicsdb.reader.GenomicsDBQueryStream.(GenomicsDBQueryStream.java:182)
at org.genomicsdb.reader.GenomicsDBQueryStream.(GenomicsDBQueryStream.java:91)
at org.genomicsdb.reader.GenomicsDBFeatureReader.generateHeadersForQuery(GenomicsDBFeatureReader.java:200)
at org.genomicsdb.reader.GenomicsDBFeatureReader.(GenomicsDBFeatureReader.java:85)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:407)
... 12 more

@nalinigans
Copy link
Collaborator

@ccastane9, it looks like we are hitting the limits of zlib memory-wise. I had asked before, but as a workaround for now, can you split the intervals to GenomicsDBImport - see https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists? Splitting the chromosome into 2 or 3 roughly equal regions may help.

@ccastane9
Copy link
Author

@nalinigans, would this be an appropriate interval command? The chromosome is roughly 121Mb, so I plan on using 3 intervals to GenomicsDBImport. Or do I need to add commas

/data1/_software/gatk-4.1.8.1/gatk --java-options "-Xmx16g" GenomicsDBImport --reference /data1/EquCab/_ECA30/Equus_caballus.EquCab3.0.dna_sm.toplevel.fa/
-V /data1/EquCab/gVCF_public/xxxx.3.g.vcf.gz
-L chr3:1-40450000
--genomicsdb-workspace-path /data1/EquCab/GenomicsDB/ECA3_GenomicsDB/3_part1

@mlathara
Copy link
Contributor

@ccastane9 The command you show is on the right track. Couple of things:

  • Since you are apparently using whole genome you'll want to figure out areas of the reference genome that have consecutive Ns and start/end your intervals there (as stated in the link @nalinigans posted)
  • You have the chr prefix in your intervals but that may or may not be needed based on what your reference and vcfs expect for the contigs. From the naming of your interval folders, I would assume the chr prefix should not be included but I may be wrong.

For completeness, I'll note that you don't necessarily have to have a single interval per workspace (though you may want to for scatter gather parallelism). You can specify multiple intervals per workspace. In your case, that could look something like -L 3:1-40450000 -L 3:40450001-80000000 -L 3:80000001-121351753 -imr OVERLAPPING_ONLY. Note that I picked the interval endpoints in an ad hoc fashion -- you'll want to use the consecutive Ns as your guideline instead. Also, the -imr OVERLAPPING_ONLY is important in this case with multiple abutting intervals in order to ensure that the intervals don't get merged. Otherwise, the tool will merge the abutting intervals and only output a single interval...

@ccastane9
Copy link
Author

@mlathara thank you so much for the clarification, I will try to break this chromosome into multiple intervals for the GenomicsDBImport and once more try to call variants.

Again, truly appreciate all of the help!

@ccastane9
Copy link
Author

@nalinigans @mlathara it seems that breaking into intervals during the GenomicsDBImport is solving the problem and allowing me to joint call variants.

Thanks for all of the help in this!

@ccastane9
Copy link
Author

ccastane9 commented Jan 20, 2021

Actually, it seems it will let me call variants for part 1 of my chromosome, although I seem to get empty files for part 2 and part 3 databases when trying to joint call the variants in those intervals. In my script for using GenotypeGVCFs function I also specified the intervals which match those used during the GenomicsImportDB function. The program itself (GenotypeGVCFs) does seem to be running without throwing errors though and may just be taking a while to start processing variants.

Edit: it was just taking a while!

@dwuab
Copy link

dwuab commented Aug 17, 2021

What is the status of this issue? I have encountered exactly the same error with GATK 4.2.1.0.

@mlathara
Copy link
Contributor

@dwuab which issue/error are you specifically referring to? As indicated in the last message before you posted, the previous user was able to use GenomicsDBImport and GenotypeGVCFs after following our suggestions to break up large chromosomes into smaller intervals.

@dwuab
Copy link

dwuab commented Aug 29, 2021

@mlathara The workaround of using multiple intervals does work (it took me more than one week to confirm, implying how annoying this bug could be), but several versions later, GATK still has this issue and could not even produce an informative error message. I could not find any mention of this issue in GATK's documentation. Is this bug going to be fixed?

@nalinigans
Copy link
Collaborator

@dwuab, we are making some performance improvements with GenomicsDB and still are in the testing stage. Just wondering if you could try gatk from this branch https://github.com/broadinstitute/gatk/tree/genomicsdb_142 to import large intervals(the ones that were problematic before) and let us know.

@dwuab
Copy link

dwuab commented Oct 16, 2021

@nalinigans Thanks. I tried the genomicsdb_142 branch. While I did not try it on chr1 and chr2, I tried it on chrX. With GATK 4.2.1.0, I encountered the problem described above while importing and joint-calling chrX, which is a bit weird since I had no problem with chr3, 4, etc. Now with genomicsdb_142, chrX has been imported with only one interval, and joint-calling is running fine at this moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants