-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable CSI index reading for bgzipped VCF files #6110
Comments
@tfenne I don't think htsjdk supports CSI for vcf. I'm pretty sure it was only wired up for bam. Defaulting to false when the references are too big is a good idea. |
Having the same issue here, and I follow essentially the same steps with .csi and .idx indexing for bam and .g.vcf files, respectively. I would have to carry ~1000 uncompressed *.g.vcf to GenomicsDBImport and I simply don't have the disk-space for that manoeuvre. |
This is still a problem. Is anyone working on it? |
I do have the same problem with samtools indexing, in order to use this for GATK I need it in .bai index, .csi index is not supported in GATK! |
It is time to solve ...? |
I have the same problem when I try BaseRecalibrator. |
We also have the same issue. |
Feature request
Tool(s) or class(es) involved
Any tools that read VCF, but specifically GenotypeGVCFs
Description
I'm doing work where I'm working with genomes that have chromosomes that are too long for both BAI and tabix index formats. I'm working around the problem for BAMs by disabling on-the-fly index generation in Picard/GATK based tools and then running
samtools index --csi
to generate the CSI index, which GATK will happily use.Then I ran into the exact same problem with VCFs. If I'm using bgzipped VCFs then I have to disable index creation in the GATK as it will fail when it hits a feature with a position higher than
512 * 2^20
. It's possible to then generate a CSI index using (surprisingly)tabix
. But I can't find a way to get the GATK to detect and use a CSI index for a bgzipped VCF. I think almost everything that is needed is there in HTSJDK, I think it's just a case of auto-detecting the .csi index.I'm working around this for now by using uncompressed VCFs as the .idx format doesn't have the same limit. But it's not great having uncompressed VCFs.
Bonus: it would be nice if the GATK auto-defaulted index creation for bgzipped VCFs to off if any of the sequences in the sequence dictionary is longer than is supported by tabix.
The text was updated successfully, but these errors were encountered: