Using the native PairHMM in HaplotypeCaller can change the QUAL field vs. using the java LOGLESS_CACHING PairHMM #1572

droazen · 2016-03-11T15:55:57Z

Using the native PairHMM in the current version of the HaplotypeCaller can sometimes change the QUAL field in called variants. Eg.,

$ diff hc.vcf hc_javahmm.vcf 
79c79
< 20    10008221    .   T   C   3224.77 .       AC=2;AF=1.00;AN=2;DP=80;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.17;QD=27.78;SOR=0.850 GT:AD:DP:GQ:PL  1/1:0,80:80:99:3253,240,0

---
> 20    10008221    .   T   C   3224.61 .       AC=2;AF=1.00;AN=2;DP=80;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.17;QD=27.78;SOR=0.850 GT:AD:DP:GQ:PL  1/1:0,80:80:99:3253,240,0

The text was updated successfully, but these errors were encountered:

droazen · 2016-03-11T15:58:05Z

@gspowley any thoughts on this?

droazen · 2016-03-11T16:03:48Z

To reproduce, check out the dr_runnable_haplotypecaller branch, then run (on a machine with AVX):

./gatk-launch HaplotypeCaller -R src/test/resources/large/human_g1k_v37.20.21.fasta -I src/test/resources/large/CEUTrio.HiSeq.WGS.b37.NA12878.20.21.bam -O hc_native.vcf

and

./gatk-launch HaplotypeCaller -R src/test/resources/large/human_g1k_v37.20.21.fasta -I src/test/resources/large/CEUTrio.HiSeq.WGS.b37.NA12878.20.21.bam -O hc_java.vcf -pairHMM LOGLESS_CACHING

Then: diff hc_java.vcf hc_native.vcf

gspowley · 2016-03-11T18:51:37Z

@droazen Yes, this is because the native PairHMM is using single precision floating point and Flush To Zero (FTZ), while the Java PairHMM is using double precision and not using FTZ. I planned to address this when we integrate native PairHMM into HaplotypeCaller. It looks like the time is here.

For now, you can configure native PairHMM to use double precision and not use FTZ. With the diff below, the VCFs from native and Java PairHMM are exactly the same.

In the future, we need to enable FTZ in the Java PairHMM and provide the option to use single precision or double precision in native PairHMM.

--- i/src/main/cpp/VectorLoglessPairHMM/LoadTimeInitializer.cc
+++ w/src/main/cpp/VectorLoglessPairHMM/LoadTimeInitializer.cc
@@ -23,7 +23,7 @@ LoadTimeInitializer::LoadTimeInitializer()            //will be called when library is loa
   //Very important to get good performance on Intel processors
   //Function: enabling FTZ converts denormals to 0 in hardware
   //Denormals cause microcode to insert uops into the core causing big slowdown
-  _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
+  //  _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

   //Profiling: times for compute and transfer (either bytes copied or pointers copied)
   m_compute_time = 0;
diff --git i/src/main/cpp/VectorLoglessPairHMM/org_broadinstitute_hellbender_utils_pairhmm_VectorLoglessPairHMM.cc w/src/main/cpp/VectorLoglessPairH
index f45153e..70cf54f 100644
--- i/src/main/cpp/VectorLoglessPairHMM/org_broadinstitute_hellbender_utils_pairhmm_VectorLoglessPairHMM.cc
+++ w/src/main/cpp/VectorLoglessPairHMM/org_broadinstitute_hellbender_utils_pairhmm_VectorLoglessPairHMM.cc
@@ -6,7 +6,7 @@

 using namespace std;

-bool use_double = false;
+bool use_double = true;

 //Should be called only once for the whole Java process - initializes field ids for the classes JNIReadDataHolderClass
 //and JNIHaplotypeDataHolderClass
diff --git i/src/main/java/org/broadinstitute/hellbender/utils/pairhmm/VectorLoglessPairHMM.java w/src/main/java/org/broadinstitute/hellbender/utils
index 18370b0..74a1dad 100644
--- i/src/main/java/org/broadinstitute/hellbender/utils/pairhmm/VectorLoglessPairHMM.java
+++ w/src/main/java/org/broadinstitute/hellbender/utils/pairhmm/VectorLoglessPairHMM.java
@@ -1,6 +1,7 @@
 package org.broadinstitute.hellbender.utils.pairhmm;

-import org.apache.log4j.Logger;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
 import org.broadinstitute.hellbender.exceptions.UserException;
 import org.broadinstitute.hellbender.utils.genotyper.ReadLikelihoods;
 import org.broadinstitute.hellbender.utils.genotyper.LikelihoodMatrix;
@@ -20,7 +21,7 @@ import java.util.Map;
  */
 public final class VectorLoglessPairHMM extends LoglessPairHMM {

-    final static Logger logger = Logger.getLogger(VectorLoglessPairHMM.class);
+    private static final Logger logger = LogManager.getLogger(VectorLoglessPairHMM.class);
     final static Boolean runningOnMac = System.getProperty("os.name", "unknown").toLowerCase().startsWith("mac");
     long threadLocalSetupTimeDiff = 0;
     long pairHMMSetupTime = 0;

droazen · 2016-03-11T19:45:11Z

@gspowley Excellent, thanks for the detailed reply! I didn't realize that the FTZ issue hadn't yet been addressed.

I'll leave the native code as-is in my HC branch for now and let you address the issue as you see fit in a separate branch.

droazen · 2016-08-01T14:45:41Z

Solution is to set FTZ in main(), as discussed in #1771

erniebrau · 2017-06-09T13:16:13Z

@droazen Would it make sense to make the setting of FTZ a CLI argument? For example something like --disable-flush-to-zero? to disable FTZ, which would be on by default?

droazen · 2017-06-09T14:44:22Z

@erniebrau Yes, that would certainly make sense.

erniebrau · 2017-06-09T15:47:51Z

OK. I will submit a PR with this as soon as we release a version of GKL with the necessary FTZ functionality.

#1572

erniebrau · 2017-08-15T17:58:15Z

@droazen I cannot reproduce this issue. I've tried doing what you say in your comment above but the results of running Java and (FTZ-enabled) AVX versions of HC are identical. I also tried running analogous experiments using several different BAM files in the repo but, again, got identical results when comparing Java vs AVX.

I need a pair of files whose results are different between the implementations, so that I can confirm that enabling FTZ in the Java PairHMM fixes this issue. Any suggestions as to where I can find such files?

sooheelee · 2018-09-17T19:54:42Z

Was the --disable-flush-to-zero CLI option merged at some point? Can someone please point to the PR? Thank you.

droazen · 2018-09-17T20:16:21Z

@sooheelee No, I don't believe that this option was ever implemented.

gspowley · 2018-09-17T20:23:59Z

Right, the code to change the flush to zero flag is available in GKL, but the option to set the flush to zero flag from GATK was not implemented.

sooheelee · 2018-09-17T20:57:52Z

I have a researcher interested in understanding the WARN:

WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM

at https://gatkforums.broadinstitute.org/gatk/discussion/11433/haplotypecaller-warnings-depthpersamplehc#latest. I'd like to be able to enumerate more on the matter. I see these parameters for HaplotypeCaller that seem to be related (please correct me if I am mistaken):

--native-pair-hmm-use-double-precision / NA
use double precision in the native pairHmm. This is slower but matches the java implementation better

and

--pair-hmm-implementation / -pairHMM
The PairHMM implementation to use for genotype likelihood calculations
The PairHMM implementation to use for genotype likelihood calculations. The various implementations balance a tradeoff of accuracy and runtime.

The --pair-hmm-implementation argument is an enumerated type (Implementation), which can have one of the following values:

EXACT
ORIGINAL
LOGLESS_CACHING
AVX_LOGLESS_CACHING
AVX_LOGLESS_CACHING_OMP
EXPERIMENTAL_FPGA_LOGLESS_CACHING
FASTEST_AVAILABLE

where the former is false and the latter is FASTEST_AVAILABLE by default. Can some of these parameters change HaplotypeCaller's use of flush-to-zero?

droazen · 2018-09-17T21:19:49Z

@sooheelee "Flush to zero" or FTZ is a CPU flag that causes extremely small floating-point values to be treated as 0. Enabling this flag improves performance and consistency across PairHMM implementations, without causing precision loss of any significance.

ldgauthier · 2020-02-24T14:28:54Z

@droazen do we still care about this? Given that we've been moving to non-exact match integration tests and fuzzy comparisons in production, maybe this isn't worth the effort.

droazen added this to the alpha-2 milestone Mar 11, 2016

droazen added bug HaplotypeCaller labels Mar 11, 2016

droazen self-assigned this Mar 11, 2016

droazen modified the milestones: alpha-3, alpha-2 Jul 1, 2016

droazen assigned gspowley and unassigned droazen Aug 1, 2016

droazen mentioned this issue Aug 1, 2016

Determine why FTZ gets unset during integration tests #1771

Closed

droazen modified the milestones: beta, alpha-3 Mar 20, 2017

droazen modified the milestones: 4.0 release, beta May 30, 2017

erniebrau pushed a commit that referenced this issue Jun 27, 2017

Enabled flush-to-zero by default and added option to disable. Addresses

0f640a7

#1572

erniebrau mentioned this issue Jun 27, 2017

Enabled flush-to-zero by default and added option to disable. Address… #3177

Closed

erniebrau assigned erniebrau and unassigned gspowley Aug 15, 2017

droazen removed this from the Engine-4.0 milestone Oct 17, 2017

droazen added GKL NativeLibraries and removed bug labels Oct 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the native PairHMM in HaplotypeCaller can change the QUAL field vs. using the java LOGLESS_CACHING PairHMM #1572

Using the native PairHMM in HaplotypeCaller can change the QUAL field vs. using the java LOGLESS_CACHING PairHMM #1572

droazen commented Mar 11, 2016

droazen commented Mar 11, 2016

droazen commented Mar 11, 2016

gspowley commented Mar 11, 2016

droazen commented Mar 11, 2016

droazen commented Aug 1, 2016

erniebrau commented Jun 9, 2017

droazen commented Jun 9, 2017

erniebrau commented Jun 9, 2017

erniebrau commented Aug 15, 2017

sooheelee commented Sep 17, 2018

droazen commented Sep 17, 2018

gspowley commented Sep 17, 2018

sooheelee commented Sep 17, 2018

droazen commented Sep 17, 2018

ldgauthier commented Feb 24, 2020

Using the native PairHMM in HaplotypeCaller can change the QUAL field vs. using the java LOGLESS_CACHING PairHMM #1572

Using the native PairHMM in HaplotypeCaller can change the QUAL field vs. using the java LOGLESS_CACHING PairHMM #1572

Comments

droazen commented Mar 11, 2016

droazen commented Mar 11, 2016

droazen commented Mar 11, 2016

gspowley commented Mar 11, 2016

droazen commented Mar 11, 2016

droazen commented Aug 1, 2016

erniebrau commented Jun 9, 2017

droazen commented Jun 9, 2017

erniebrau commented Jun 9, 2017

erniebrau commented Aug 15, 2017

sooheelee commented Sep 17, 2018

droazen commented Sep 17, 2018

gspowley commented Sep 17, 2018

sooheelee commented Sep 17, 2018

droazen commented Sep 17, 2018

ldgauthier commented Feb 24, 2020