Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Picard: MarkDuplicates: account for missing trailing 0 #2094

Merged
merged 3 commits into from
Oct 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 2 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,8 @@

### Module updates

- **FastQC**:
- Add top overrepresented sequences table ([#2075](https://github.com/ewels/MultiQC/pull/2075))
- **Picard**:
- Fix parsing mixed strings/numbers, account for trailing tab ([#2083](https://github.com/ewels/MultiQC/pull/2083))
- **FastQC**: Add top overrepresented sequences table ([#2075](https://github.com/ewels/MultiQC/pull/2075))
- **Picard**: MarkDuplicates: Fix parsing mixed strings/numbers, account for missing trailing `0` ([#2083](https://github.com/ewels/MultiQC/pull/2083), [#2094](https://github.com/ewels/MultiQC/pull/2094))

## [MultiQC v1.16](https://github.com/ewels/MultiQC/releases/tag/v1.16) - 2023-09-22

Expand Down
35 changes: 17 additions & 18 deletions multiqc/modules/picard/MarkDuplicates.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import math
import os
import re
from collections import OrderedDict
from collections import OrderedDict, defaultdict

from multiqc import config
from multiqc.plots import bargraph
Expand Down Expand Up @@ -80,7 +80,7 @@ def save_table_results(s_name, base_s_name, keys, parsed_data, recompute_merged_
for f in self.find_log_files(log_key, filehandles=True):
s_name = f["s_name"]
base_s_name = f["s_name"]
parsed_data = {}
parsed_lists = defaultdict(list)
keys = None
in_stats_block = False
recompute_merged_metrics = False
Expand Down Expand Up @@ -112,13 +112,14 @@ def save_table_results(s_name, base_s_name, keys, parsed_data, recompute_merged_
# Split the values columns
vals = l.rstrip("\n").split("\t")

# End of the METRICS table, or multiple libraries and we're not merging them
if len(vals) < 6 or (not merge_multiple_libraries and len(parsed_data) > 0):
# End of the METRICS table, or multiple libraries, and we're not merging them
if len(vals) < 6 or (not merge_multiple_libraries and len(parsed_lists) > 0):
parsed_data = {k: parsed_list[0] for k, parsed_list in parsed_lists.items()}
if save_table_results(s_name, base_s_name, keys, parsed_data, recompute_merged_metrics):
# Reset for next file if returned True
s_name = f["s_name"]
base_s_name = f["s_name"]
parsed_data = {}
parsed_lists = defaultdict(list)
keys = None
in_stats_block = False
recompute_merged_metrics = False
Expand All @@ -129,26 +130,24 @@ def save_table_results(s_name, base_s_name, keys, parsed_data, recompute_merged_
if keys and vals and len(keys) == len(vals):
for i, k in enumerate(keys):
# More than one library present and merging stats
if k in parsed_data:
if k in parsed_lists:
recompute_merged_metrics = True

val = vals[i].strip()
try:
val_float = float(val)
except ValueError:
# Account for string values
if k not in parsed_data: # First library
parsed_data[k] = val
else:
parsed_data[k] += "/" + val
parsed_lists[k].append(val)
else:
# Numerical values we can just add up
if k not in parsed_data: # First library
parsed_data[k] = val_float
elif isinstance(parsed_data[k], float):
parsed_data[k] += val_float
else:
parsed_data[k] += "/" + val
parsed_lists[k].append(val_float)

parsed_data = {}
for k in parsed_lists:
# Sometimes a numerical column will an empty string, so converting "" to 0.0
if all(isinstance(x, float) or x == "" for x in parsed_lists[k]):
parsed_data[k] = sum(0.0 if x == "" else x for x in parsed_lists[k])
else:
parsed_data[k] = "/".join(str(x) for x in parsed_lists[k])

# Superfluous function call to confirm that it is used in this module
# Replace None with actual version if it is available
Expand Down