You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To improve parallelism, I try to split my all chrome bed to 25 files.Parallel running the flowing command brings me signficient performance improvement.
If your all_chrome_bed is identical to the concatenation of chr1_bed, chr2_bed etc I would have expected no discrepancies. Fortunately, the discrepancies are very small, and the annotations with the discrepancies will not exist in the next version of Mutect, which we will be releasing in a few months.
I'm going to close this issue because it's not a bug. Several things in the code of Mutect2 and FilterMutectCalls adapt as they traverse the genome and it's possible that some learned parameter shifts minutely. For example, the assembly graph pruning algorithm uses knowledge of previously assembled regions to better distinguish between errors and somatic variation. It's also possible that somewhere we forgot to give something a fixed random seed.
In full honesty, I wish that I knew exactly what causes the 3142 to become 3143, and I regret that I don't have time for it. Nonetheless, in principle it is not cause for alarm.
Bug Report
Affected tool(s) or class(es)
Mutect2
Affected version(s)
I test this problem in two versions, V4.1.4.1 and V4.3.0.0.They all have this problem.
Description
Following the recommendations of the 'Best Practice Workflows', I run mutect2 in the following command.
java -jar -Djava.io.tmpdir=${tmpDir} -Xms2g -Xmx16g
/mnt/bin/gatk-4.1.4.1/gatk-package-4.1.4.1-SNAPSHOT-local.jar Mutect2
--native-pair-hmm-threads 32
-R ${Fasta}
-I ${cancer_bam}
-I ${normal_bam}
--tumor-sample cancer --normal-sample normal
-L ${all_chrome_bed}
--bam-output ${bam_output}
-O ${vcf_output}
To improve parallelism, I try to split my all chrome bed to 25 files.Parallel running the flowing command brings me signficient performance improvement.
java -jar -Djava.io.tmpdir=${tmpDir} -Xms2g -Xmx16g
/mnt/bin/gatk-4.1.4.1/gatk-package-4.1.4.1-SNAPSHOT-local.jar Mutect2
--native-pair-hmm-threads 32
-R ${Fasta}
-I ${cancer_bam}
-I ${normal_bam}
--tumor-sample cancer --normal-sample normal
-L ${chr1_bed}
--bam-output ${chr1_bam_output}
-O ${chr1_vcf_output}
java -jar -Djava.io.tmpdir=${tmpDir} -Xms2g -Xmx16g
/mnt/bin/gatk-4.1.4.1/gatk-package-4.1.4.1-SNAPSHOT-local.jar Mutect2
--native-pair-hmm-threads 32
-R ${Fasta}
-I ${cancer_bam}
-I ${normal_bam}
--tumor-sample cancer --normal-sample normal
-L ${chr2_bed}
--bam-output ${chr2_bam_output}
-O ${chr2_vcf_output}
But when I examined the vcf results produced by both modes of operation, I found consistency issues.
Expected behavior
Let's focus on chromosome 2.I expect 100% consistency between the following two runs.
Actual behavior
21 44513281 . ACCACCG A . . AS_SB_TABLE=1282,1133|5,3;DP=3142;ECNT=2;MBQ=33,35;MFRL=173,197;MMQ=60,60;MPOS=25;NALOD=1.21;NLOD=161.38;POPAF=6.00;RPA=3,2;RU=CCACCG;STR;TLOD=11.75 GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|1:1810,7:4.456e-03:1817:555,1:575,4:1581,6:0|1:44513281_ACCACCG_A:44513281:980,830,4,3 0|0:605,1:3.469e-03:606:179,0:184,1:560,1:0|1:44513281_ACCACCG_A:44513281:302,303,1,0
21 44513281 . ACCACCG A . . AS_SB_TABLE=1282,1134|5,3;DP=3143;ECNT=2;MBQ=33,35;MFRL=172,197;MMQ=60,60;MPOS=25;NALOD=1.21;NLOD=161.08;POPAF=6.00;RPA=3,2;RU=CCACCG;STR;TLOD=11.75 GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|1:1811,7:4.462e-03:1818:555,1:572,4:1579,6:0|1:44513281_ACCACCG_A:44513281:982,829,4,3 0|0:605,1:3.474e-03:606:178,0:184,1:559,1:0|1:44513281_ACCACCG_A:44513281:300,305,1,0
Feature request
How to Scatter Gather correctly can ensure that the results are exactly the same.
The text was updated successfully, but these errors were encountered: