New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems running Mutect2 #6695
Comments
Does the mutect2 directory exist in your current dir? |
Mutect2 is in conda environment and my working directory is different from that path. |
@ashwini06 Could you post the entire command line you are using, some of it appears to have been cut off. |
@ashwini06 Following up on this to see if you are still experiencing problems. |
@fleharty : Thanks for the followup. Sorry I missed your previous reply. Yes, the problem with mutect2 still exists. gatk4 exists in my conda environment path
Here is my full command-line
|
@avalind This appears to be a different error from the one you were previously encountering. Is there a way that you can share your bam? Also, are you sure that you intend to have insertion and deletion qualities, this is something we haven't been using for a few years now. |
@fleharty : You can download the bam file using the shared link. https://ki.box.com/s/b9fe0854eccclz85vvkktd2qfqquyq71
|
This bam appears to be malformed and it fails Picard ValidateSamFile. I think you'll need to examine the earlier stages of your pipeline that produce your bam to ensure you get a correctly formed bam. I'm going to close this ticket now since this doesn't appear to be an issue with Mutect2. (base) wm462-624:Downloads fleharty$ java -jar $PICARD ValidateSamFile I=concatenated_ACC5611A1_XXXXXX_consensusalign_ds.bam ********** NOTE: Picard's command line syntax is changing. ********** For more information, please see: ********** The command line looks like this in the new syntax: ********** ValidateSamFile -I concatenated_ACC5611A1_XXXXXX_consensusalign_ds.bam 11:25:52.673 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/fleharty/resources/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib |
@avalind I got an e-mail saying that you ran picard and had no errors, but I don't see that comment here. |
@fleharty I think you meant to tag @ashwini06 (the creator of this issue). I also received that email, maybe @ashwini06 deleted the comment shortly after posting it? |
@fleharty @avalind I tried validating my bam file and I don't see any errors. Even the samtools flagstat option works fine on my bam file. Do you still think my bam file is malformatted? PS: @fleharty used Picard version (2.20.4-SNAPSHOT), whereas I used v.2.23.2; for running Picard ValidateSamFile. |
Bumping this since I ran into the same error as I was helping QC a colleagues data, running GATK 4.1.8.1 produces the following: https://www.dropbox.com/s/2uleabl53dmg9y3/Screenshot%202020-07-28%2000.35.45.png And this is on targeted capture data (Twist custom capture) ran through our core facility's sentieon pipeline, using the 'consensus' reads mapped to 1kg_grch37, using the raw reads works fine. Im not very familiar with sentieons pipelines but the steps to generate the UMI consensus reads are described at https://support.sentieon.com/appnotes/umi/. At first I though that discrepancy between @fleharty's ValidateSam and yours @ashwini06, could be that in the the newer version of Picard uses an updated version of htsjdk (v 2.23.0), but it's the same version of htsjdk that's included in GATK 4.1.8.1, so it seems unlikely. Walking through the commits between Picard 2.22.8 (the one bundled with GATK 4.1.8.1) and 2.23.2 doesn't (at least at first glance for me) show any commits changing code that could explain the differences in behaviour. |
After more digging around it seems that in the case of partial alignment (i.e. hard clipped bases) the BD and BI tags that sentieon just copies from the consensus fastq aren't trimmed to the actual length of the aligned sequence, and thus are to long and it's this that causes problems. As these are non-standard tags the SAM/BAM format specification doesn't say anything on whether their length must equal the aligned segment of bases, but it clearly doesn't make any sense to have quality data on bases that are not part of the alignment (= hard clipped), so IMHO the solution here would be for Sentieon to fix their tool. I've written a small utility that trims the BD and BI tags (based on the CIGAR-string) to have the same length as the actual aligned segment of the read, https://github.com/avalind/doppelganger. |
I have problems running gatk Mutect2.
gatk version
command-line
gatk Mutect2 -R /home/proj/stage/cancer/reference/GRCh37/genome/human_g1k_v37_decoy.fasta -L /home/proj/stage/cancer/reference/target_capture_bed/production/balsamic/gicfdna_3.1_hg1
Error
The text was updated successfully, but these errors were encountered: