Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to avoid java.lang.ArrayIndexOutOfBoundsException when indexing a vcf.gz file? #8747

Open
2 tasks
erah1 opened this issue Mar 21, 2024 · 1 comment
Open
2 tasks

Comments

@erah1
Copy link

erah1 commented Mar 21, 2024

Hello,

Could you help me with this? I ran this code:

prg=/home/user1/Programs/gatk-4.5.0.0
log_dir=/home/user1/Programs/logs
java -Xmx64g -XX:ParallelGCThreads=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true \
     -jar ${prg}/gatk-package-4.5.0.0-local.jar IndexFeatureFile -I ${dir}/snp_allsamples.vcf.gz \
     --output snp_allsamples.vcf.tbi \
     2>${log_dir}/snp_allsamples_gvcf_index.err

and I received the following error message

09:36:35.254 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/user1/Programs/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
09:36:35.386 INFO  IndexFeatureFile - ------------------------------------------------------------
09:36:35.389 INFO  IndexFeatureFile - The Genome Analysis Toolkit (GATK) v4.5.0.0
09:36:35.389 INFO  IndexFeatureFile - For support and documentation go to https://software.broadinstitute.org/gatk/
09:36:35.389 INFO  IndexFeatureFile - Executing as user1@xxx.xx on Linux v5.4.0-150-generic amd64
09:36:35.389 INFO  IndexFeatureFile - Java runtime: OpenJDK 64-Bit Server VM v17.0.3-internal+0-adhoc..src
09:36:35.389 INFO  IndexFeatureFile - Start Date/Time: March 21, 2024 at 9:36:35 a.m. CST
09:36:35.390 INFO  IndexFeatureFile - ------------------------------------------------------------
09:36:35.390 INFO  IndexFeatureFile - ------------------------------------------------------------
09:36:35.390 INFO  IndexFeatureFile - HTSJDK Version: 4.1.0
09:36:35.391 INFO  IndexFeatureFile - Picard Version: 3.1.1
09:36:35.391 INFO  IndexFeatureFile - Built for Spark Version: 3.5.0
09:36:35.391 INFO  IndexFeatureFile - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:36:35.391 INFO  IndexFeatureFile - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:36:35.392 INFO  IndexFeatureFile - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:36:35.392 INFO  IndexFeatureFile - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:36:35.392 INFO  IndexFeatureFile - Deflater: IntelDeflater
09:36:35.392 INFO  IndexFeatureFile - Inflater: IntelInflater
09:36:35.392 INFO  IndexFeatureFile - GCS max retries/reopens: 20
09:36:35.392 INFO  IndexFeatureFile - Requester pays: disabled
09:36:35.393 INFO  IndexFeatureFile - Initializing engine
09:36:35.393 INFO  IndexFeatureFile - Done initializing engine
09:36:35.502 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/user1/snp_allsamples.vcf.gz
09:36:35.518 INFO  ProgressMeter - Starting traversal
09:36:35.518 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Records Processed   Records/Minute
09:36:36.979 INFO  IndexFeatureFile - Shutting down engine
[March 21, 2024 at 9:36:36 a.m. CST] org.broadinstitute.hellbender.tools.IndexFeatureFile done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=1241513984
java.lang.ArrayIndexOutOfBoundsException: Index 37451 out of bounds for length 37451
        at htsjdk.samtools.BinningIndexBuilder.processFeature(BinningIndexBuilder.java:102)
        at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeFeature(TabixIndexCreator.java:106)
        at htsjdk.tribble.index.tabix.TabixIndexCreator.addFeature(TabixIndexCreator.java:92)
        at htsjdk.tribble.index.IndexFactory.createIndex(IndexFactory.java:529)
        at htsjdk.tribble.index.IndexFactory.createTabixIndex(IndexFactory.java:476)
        at htsjdk.tribble.index.IndexFactory.createTabixIndex(IndexFactory.java:502)
        at htsjdk.tribble.index.IndexFactory.createIndex(IndexFactory.java:403)
        at org.broadinstitute.hellbender.tools.IndexFeatureFile.createAppropriateIndexInMemory(IndexFeatureFile.java:109)
        at org.broadinstitute.hellbender.tools.IndexFeatureFile.doWork(IndexFeatureFile.java:75)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
        at org.broadinstitute.hellbender.Main.main(Main.java:306)

Thank you

Instructions

The github issue tracker is for bug reports, feature requests, and API documentation requests. General questions about how to use the GATK, how to interpret the output, etc. should be asked on the official support forum.

  • Search the existing github issues to see if your issue (or something similar) has already been reported. If the issue already exists, you may comment there to inquire about the progress.
  • Determine whether your issue is a bug report, a feature request, or a documentation request (for tool/class javadoc only -- for forum docs please post there)
  • Consider if your "issue" is better addressed on the GATK forum: http://gatkforums.broadinstitute.org/gatk Post there if you have questions about expected tool behavior, output format, unexpected results, or generally any question that does not fit into the categories above
  • Use a concise yet descriptive title
  • Choose the corresponding template block below and fill it in, replacing or deleting text in italics (surrounded by _) as appropriate
  • Delete the other template blocks and this header.

Bug Report

Affected tool(s) or class(es)

Tool/class name(s), special parameters?

Affected version(s)

  • Latest public release version [version?]
  • Latest master branch as of [date of test?]

Description

Describe the problem below. Provide screenshots , stacktrace , logs where appropriate.

Steps to reproduce

Tell us how to reproduce this issue. If possible, include command lines that reproduce the problem. (The support team may follow up to ask you to upload data to reproduce the issue.)

Expected behavior

Tell us what should happen

Actual behavior

Tell us what happens instead


Feature request

Tool(s) or class(es) involved

Tool/class name(s), special parameters?

Description

Specify whether you want a modification of an existing behavior or addition of a new capability.
Provide examples, screenshots, where appropriate.


Documentation request

Tool(s) or class(es) involved

Tool/class name(s), parameters?

Description

Describe what needs to be added or modified.


@evanizer8
Copy link

Probably this is to late to be of any help, but I had the exact same issue, down to the index it prints out as problematic. Maybe others will stumble upon this and find the issue here as I have. I found some pertinent info here:
https://gatk.broadinstitute.org/hc/en-us/community/posts/12862204385051-Is-it-feasible-to-use-the-extracted-vcf-gz-file-for-CombineGVCFs-and-GenotypeGVCFs

Though it seems like they never got around to a more useful stdout message. Anyway, I did as advised and split the chromosome sizes (because I'm working with barley, and the seq lengths are > 2^19)

BUT- when I try indexing the bgzipped second "halves" of each chromosome with IndexFeatureFile, I get the same message again! When they're not bgzipped, however, it actually works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants