Skip to content

Commit

Permalink
FastQC: Handle NaN in Sequence Content Plot
Browse files Browse the repository at this point in the history
Closes #1246
  • Loading branch information
ewels committed Mar 31, 2021
1 parent 744d666 commit 414fcc5
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 1 deletion.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Expand Up @@ -20,6 +20,8 @@
- Fixed bug where `QUAL` value `.` would crash MultiQC ([#1400](https://github.com/ewels/MultiQC/issues/1400))
- **bowtie2**
- Fix bug where HiSAT2 paired-end bar plots were missing unaligned reads ([#1230](https://github.com/ewels/MultiQC/issues/1230))
- **FastQC**
- Replace `NaN` with `0` in the _Per Base Sequence Content_ plot to avoid crashing the plot ([#1246](https://github.com/ewels/MultiQC/issues/1246))
- **Picard**
- Fixed bug in `ValidateSamFile` module where additional whitespace at the end of the file would cause MultiQC to crash ([#1397](https://github.com/ewels/MultiQC/issues/1397))
- **Somalier**
Expand Down
11 changes: 10 additions & 1 deletion multiqc/modules/fastqc/fastqc.py
Expand Up @@ -15,6 +15,7 @@
import io
import json
import logging
import math
import os
import re
import zipfile
Expand Down Expand Up @@ -465,14 +466,22 @@ def sequence_content_plot(self):
}
except KeyError:
pass

# Old versions of FastQC give counts instead of percentages
for b in data[s_name]:
tot = sum([data[s_name][b][base] for base in ["a", "c", "t", "g"]])
if tot == 100.0:
break
break # Stop loop after one iteration if summed to 100 (percentages)
else:
for base in ["a", "c", "t", "g"]:
data[s_name][b][base] = (float(data[s_name][b][base]) / float(tot)) * 100.0

# Replace NaN with 0
for b in data[s_name]:
for base in ["a", "c", "t", "g"]:
if math.isnan(float(data[s_name][b][base])):
data[s_name][b][base] = 0

if len(data) == 0:
log.debug("sequence_content not found in FastQC reports")
return None
Expand Down

0 comments on commit 414fcc5

Please sign in to comment.