You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Latest public release version [3.20]
Latest development/master branch as of [7/30/24]
Description
There are two potential bugs that I have noted, both of which revolve around the FOLD_80_BASE_PENALTY metric.
The FOLD_80_BASE_PENALTY metric is defined as:
However the calculation for the FOLD_80_BASE_PENALTY does not actually filter out the zero-coverage targets from the histogram which it uses for its getPercentile() calculation. You can see this when you look at the code for the metric:
In line 731, if an interval is found to have 0 coverage across all its bases, the histogram which is later used to calculate the 20th percentile has bin '0' incremented by the length of the interval: highQualityDepthHistogram.increment(0, c.interval.length()). Thus, the highQualityDepthHistogram includes zero-coverage target bases. Once all bases in the given intervals have been counted towards the highQualityDepthHistogram, the histogram is then called in line 762 for the calculation of Fold80: metrics.FOLD_80_BASE_PENALTY = metrics.MEAN_TARGET_COVERAGE / highQualityDepthHistogram.getPercentile(0.2). As a result, the FOLD_80_BASE_PENALTY metric is not calculating the fold over-coverage necessary to raise 80% of the non-zero coverage target bases to the mean coverage, but rather the fold over-coverage necessary to raise 80% of all target bases to the mean coverage.
Issue 2 -- Mean Coverage Calculation:
At line 762, FOLD_80_BASE_PENALTY is defined to be metrics.MEAN_TARGET_COVERAGE / highQualityDepthHistogram.getPercentile(0.2). The metrics.MEAN_TARGET_COVERAGE, which for this calculation should be the mean coverage of the non-zero-coverage target bases, is calculated in line 754: metrics.MEAN_TARGET_COVERAGE = (double) totalCoverage / metrics.TARGET_TERRITORY. totalCoverage is the sum of all the depths at each target base, while metrics.TARGET_TERRITORY is defined at line 407: metrics.TARGET_TERRITORY = targetTerritory. targetTerritory, in turn, is defined in line 301 as: this.targetTerritory = Interval.countBases(uniqueTargets). In summary, the metrics.MEAN_TARGET_COVERAGE is calculating the mean coverage of all target bases, while the FOLD_80_BASE_PENALTY metric is defined as requiring the mean target coverage of non-zero-coverage target bases. Although the output MEAN_TARGET_COVERAGE metric should independently be calculated including zero-coverage target bases, it should not be the metric that is used for the calculation of the FOLD_80_BASE_PENALTY metric.
Steps to reproduce
Run CollectHsMetrics on any exome bam file that has any number of zero-coverage target bases. The output FOLD_80_BASE_PENALTY will not match what you get if you were to manually calculate the FOLD_80_BASE_PENALTY.
Minimal test case:
Download a sample exome .bam file, open it on IGV, find a small interval where some of the bases are at 0 coverage (I would recommend ~30 bases) .
Copy the sample's reference exome target and bait interval list files, delete all of their intervals and add in the ~30 base interval that you found.
Download the respective Fasta/Fai reference files.
Conclusion:
Since the two FOLD_80_BASE_PENALTY output values(1.136, undefined) are not equal, there must be an error in the FOLD_80_BASE_PENALTY calculation regarding zero-coverage target bases.
Expected behavior
The metrics collector returns the FOLD_80_BASE_PENALTY value required to raise 80% of the bases in the non-zero coverage target region to the mean coverage of that non-zero coverage target region.
Actual behavior
The metrics collector returns the FOLD_80_BASE_PENALTY value required to raise 80% of the bases in the target region to the mean coverage of that target region.
The text was updated successfully, but these errors were encountered:
Bug Report
Affected tool(s)
TargetMetricsCollector, HsMetricCollector, WgsMetrics
Affected version(s)
Latest public release version [3.20]
Latest development/master branch as of [7/30/24]
Description
There are two potential bugs that I have noted, both of which revolve around the FOLD_80_BASE_PENALTY metric.
The FOLD_80_BASE_PENALTY metric is defined as:
However the calculation for the FOLD_80_BASE_PENALTY does not actually filter out the zero-coverage targets from the histogram which it uses for its getPercentile() calculation. You can see this when you look at the code for the metric:
picard/src/main/java/picard/analysis/directed/TargetMetricsCollector.java
Lines 728 to 762 in d8d87c9
Issue 1 -- Base Coverage Percentile Function:
In line 731, if an interval is found to have 0 coverage across all its bases, the histogram which is later used to calculate the 20th percentile has bin '0' incremented by the length of the interval:
highQualityDepthHistogram.increment(0, c.interval.length())
. Thus, thehighQualityDepthHistogram
includes zero-coverage target bases. Once all bases in the given intervals have been counted towards thehighQualityDepthHistogram
, the histogram is then called in line 762 for the calculation of Fold80:metrics.FOLD_80_BASE_PENALTY = metrics.MEAN_TARGET_COVERAGE / highQualityDepthHistogram.getPercentile(0.2)
. As a result, theFOLD_80_BASE_PENALTY
metric is not calculating the fold over-coverage necessary to raise 80% of the non-zero coverage target bases to the mean coverage, but rather the fold over-coverage necessary to raise 80% of all target bases to the mean coverage.Issue 2 -- Mean Coverage Calculation:
At line 762,
FOLD_80_BASE_PENALTY
is defined to bemetrics.MEAN_TARGET_COVERAGE / highQualityDepthHistogram.getPercentile(0.2)
. Themetrics.MEAN_TARGET_COVERAGE
, which for this calculation should be the mean coverage of the non-zero-coverage target bases, is calculated in line 754:metrics.MEAN_TARGET_COVERAGE = (double) totalCoverage / metrics.TARGET_TERRITORY
.totalCoverage
is the sum of all the depths at each target base, whilemetrics.TARGET_TERRITORY
is defined at line 407:metrics.TARGET_TERRITORY = targetTerritory
.targetTerritory
, in turn, is defined in line 301 as:this.targetTerritory = Interval.countBases(uniqueTargets)
. In summary, themetrics.MEAN_TARGET_COVERAGE
is calculating the mean coverage of all target bases, while theFOLD_80_BASE_PENALTY
metric is defined as requiring the mean target coverage of non-zero-coverage target bases. Although the outputMEAN_TARGET_COVERAGE
metric should independently be calculated including zero-coverage target bases, it should not be the metric that is used for the calculation of theFOLD_80_BASE_PENALTY
metric.Steps to reproduce
Run CollectHsMetrics on any exome bam file that has any number of zero-coverage target bases. The output
FOLD_80_BASE_PENALTY
will not match what you get if you were to manually calculate theFOLD_80_BASE_PENALTY
.Minimal test case:
Download a sample exome .bam file, open it on IGV, find a small interval where some of the bases are at 0 coverage (I would recommend ~30 bases) .
Copy the sample's reference exome target and bait interval list files, delete all of their intervals and add in the ~30 base interval that you found.
Download the respective Fasta/Fai reference files.
Run the gatk 4.5.0.0 or 4.6.0.0 Docker Image
FOLD_80_BASE_PENALTY
value in the HsMetrics output file.Calculate the
FOLD_80_BASE_PENALTY
by hand for the interval:Note that the two
FOLD_80_BASE_PENALTY
values are not equivalent.Example for test case:
Using chr7:142,560,458-142,560,485.
This is the depth profile over that region shown in IGV:
- Before Modification:
- After Modification:
CollectHsMetrics output
FOLD_80_BASE_PENALTY
value:FOLD_80_BASE_PENALTY = undefined
Calculated expected
FOLD_80_BASE_PENALTY
output value:- Excluding Zero-Coverage-Target-Bases:
- Including Zero-Coverage-Target-Bases:
Since the two
FOLD_80_BASE_PENALTY
output values(1.136, undefined) are not equal, there must be an error in theFOLD_80_BASE_PENALTY
calculation regarding zero-coverage target bases.Expected behavior
The metrics collector returns the FOLD_80_BASE_PENALTY value required to raise 80% of the bases in the non-zero coverage target region to the mean coverage of that non-zero coverage target region.
Actual behavior
The metrics collector returns the FOLD_80_BASE_PENALTY value required to raise 80% of the bases in the target region to the mean coverage of that target region.
The text was updated successfully, but these errors were encountered: