You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am testing elPrep, mostly for the variant-calling step: hoping to use it as a drop-in replacement for GATK4. I am using a small test BAM, paired-end reads, some mates map to different chromosomes. I noticed that in sfm mode, every read whose mate maps to a different chrom gets duplicated in the output.bam. This doesn't affect the GVCFs produced by elPrep, but the resulting BAM is not spec-compliant, and I suspect this may affect other tools working on the elPrep-produced BAMs. Details below.
Run elprep in sfm mode: elprep sfm testin.bam testout_sfm.bam --haplotypecaller test_sfm.g.vcf.gz --reference /data/HumanGenome/hs38DH.elfasta
Run elprep in filter mode: elprep filter testin.bam testout_filter.bam --haplotypecaller test_filter.g.vcf.gz --reference /data/HumanGenome/hs38DH.elfasta
Compare small BAMs sfm vs filter: /home/nthierry/Software/BamUtil/bamUtil_1.0.13/bamUtil/bin/bam diff --in1 testout_sfm.bam --in2 testout_filter.bam
-> testout_sfm.bam contains duplicate lines for each read whose mate maps to a different chromosome.
Question: does this impact the elprep GVCFs? zdiff test_sfm.g.vcf.gz test_filter.g.vcf.gz
-> only diff is the header ##elPrepCommandLine , no consequence for the elprep variant-caller.
The bug doesn't occur if I don't call variants: elprep filter testin.bam ttt_filter.bam elprep sfm testin.bam ttt_sfm.bam /home/nthierry/Software/BamUtil/bamUtil_1.0.13/bamUtil/bin/bam diff --in1 ttt_sfm.bam --in2 ttt_filter.bam
-> no difference
The SAM spec says:
[QNAME] In a SAM file, a read may occupy multiple alignment lines, when its alignment is chimeric or when multiple mappings are given.
[FLAG] For each read/contig in a SAM file, it is required that one and only one line associated with the read satisfies ‘FLAG & 0x900 == 0’
-> the elPrep-produced BAMs with duplicate lines doesn't seem compliant (AFAICT?)
Regards,
Nicolas
The text was updated successfully, but these errors were encountered:
It is a bug I am aware of, and have already fixed it internally, but I have not released the fix yet.
You are correct that the bug only occurs when using the haplotype caller and that VCF output is not affected. It is a bug that occurs because the final bam is incorrectly merged from the intermediate split files.
Hi,
I am testing elPrep, mostly for the variant-calling step: hoping to use it as a drop-in replacement for GATK4. I am using a small test BAM, paired-end reads, some mates map to different chromosomes. I noticed that in sfm mode, every read whose mate maps to a different chrom gets duplicated in the output.bam. This doesn't affect the GVCFs produced by elPrep, but the resulting BAM is not spec-compliant, and I suspect this may affect other tools working on the elPrep-produced BAMs. Details below.
Run elprep in sfm mode:
elprep sfm testin.bam testout_sfm.bam --haplotypecaller test_sfm.g.vcf.gz --reference /data/HumanGenome/hs38DH.elfasta
Run elprep in filter mode:
elprep filter testin.bam testout_filter.bam --haplotypecaller test_filter.g.vcf.gz --reference /data/HumanGenome/hs38DH.elfasta
Compare small BAMs sfm vs filter:
/home/nthierry/Software/BamUtil/bamUtil_1.0.13/bamUtil/bin/bam diff --in1 testout_sfm.bam --in2 testout_filter.bam
-> testout_sfm.bam contains duplicate lines for each read whose mate maps to a different chromosome.
Question: does this impact the elprep GVCFs?
zdiff test_sfm.g.vcf.gz test_filter.g.vcf.gz
-> only diff is the header ##elPrepCommandLine , no consequence for the elprep variant-caller.
The bug doesn't occur if I don't call variants:
elprep filter testin.bam ttt_filter.bam
elprep sfm testin.bam ttt_sfm.bam
/home/nthierry/Software/BamUtil/bamUtil_1.0.13/bamUtil/bin/bam diff --in1 ttt_sfm.bam --in2 ttt_filter.bam
-> no difference
The SAM spec says:
[QNAME] In a SAM file, a read may occupy multiple alignment lines, when its alignment is chimeric or when multiple mappings are given.
[FLAG] For each read/contig in a SAM file, it is required that one and only one line associated with the read satisfies ‘FLAG & 0x900 == 0’
-> the elPrep-produced BAMs with duplicate lines doesn't seem compliant (AFAICT?)
Regards,
Nicolas
The text was updated successfully, but these errors were encountered: