Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New module: Freyja #1903

Merged
merged 29 commits into from Aug 31, 2023
Merged

New module: Freyja #1903

merged 29 commits into from Aug 31, 2023

Conversation

Joon-Klaps
Copy link
Contributor

@Joon-Klaps Joon-Klaps commented Apr 13, 2023

Adress Issue #1902
The nf-core/viralrecon pipeline is being extended with the Freyja module to analyse mixed SARS-CoV-2 samples (i.e wastewater samples)
PR: nf-core/viralrecon#375

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md has been updated
  • There is example tool output for tools in the https://github.com/ewels/MultiQC_TestData repository or attached to this PR
  • Code is tested and works locally (including with --lint flag)
  • docs/README.md is updated with link to below
  • docs/modulename.md is created
  • Everything that can be represented with a plot instead of a table is a plot
  • Report sections have a description and help text (with self.add_section)
  • There aren't any huge tables with > 6 columns (explain reasoning if so)
  • Each table column has a different colour scale to its neighbour, which relates to the data (eg. if high numbers are bad, they're red)
  • Module does not do any significant computational work

@@ -345,6 +345,10 @@ flash/hist:
flexbar:
contents: "Flexbar - flexible barcode and adapter removal"
shared: true
freyja:
fn: "*.tsv"
contents: "summarized"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels not very specific - do you think more conditions can be added? Is it fair to expect that files are named *demix_outs.tsv?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not...
Perhaps we could check more contents lines? Like require lineages and abundances?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah definitely if its better to add those checks. I thought it would be enough because of the number of lines we specify.


# Percentages don't always add up to 1, show a warning if this is the case
if sum(d.values()) != 1:
log.warning(f"Freyja {s_name}: percentages don't sum to 1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often does this happen? In all the 3 test examples, the percentages do not sum to 1. If it happens frequently, we probably shouldn't be polluting the logs with these warnings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing it will always be the case unless amplicon or any other targeted sequencing approach is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, merged before I remembered that we didn't fully address this issue. I think I will remove this warning in a separate PR, or make it a debug message.

headers["Top_lineage_freyja"] = {
"title": "Top lineage (Freyja)",
"description": "The most abundant lineage in the sample",
"scale": "RdYlGn-rev", # Not sure if this is the best scale
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the categorical scales should be good here (like Set2, etc, see: https://multiqc.info/docs/development/plots/#table-colour-scales), though I can't make it work 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's not working because the color is applied to the bar, which has a width of zero here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I pushed a change to use categorical colours instead, let me know if it looks good:

Screenshot 2023-08-30 at 15 26 43 Screenshot 2023-08-30 at 15 25 48 looks good Screenshot 2023-08-30 at 15 24 50

And would love @ewels opinion on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vladsavelyev
Copy link
Member

Thanks a lot for contributing this module! I added a few changes to the branch.

I don't have a strong opinion on this, but one of a bit unusual things for MultiQC is the use of a non-numerical column in the general stats (i.e., the most abundant lineage here). I'm interested in @ewels's take on it, whether we want it, and what would be the best color scale. The barplot pretty much shows the same information, but we want something to be in the general stats anyway, right?

I added the use of a categorial scale into this PR, but I don't like that the generated colors in the result don't match the colors used in the barplot; I'm going to look into whether I can make them match.

@vladsavelyev vladsavelyev added the awaits-review Awaiting final review and merge. label Aug 30, 2023

> **Note**: Lineage designation is based on the used WHO nomenclature, which could vary over time.
""",
plot=bargraph.plot(self.freyja_data, None, pconfig),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to make sure the bars have the same sorting order, e.g. on bar 2 Omicron goes at the tail, and on the bar 3 it is in the head.

Screenshot 2023-08-30 at 16 23 09

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting order was simple (pushed a change), however, making sure the barchart colors are the same as the general stats column colors is not obvious: for the column, more pastel colors look better, whereas on the barchart, the bright default highcharts colors look the best.

@vladsavelyev vladsavelyev changed the base branch from master to colors August 31, 2023 13:46
@vladsavelyev
Copy link
Member

Stacked on top #2017 - do no merge until that PR is merged.

@vladsavelyev vladsavelyev changed the base branch from colors to master August 31, 2023 15:57
@vladsavelyev vladsavelyev self-requested a review August 31, 2023 16:35
@vladsavelyev vladsavelyev changed the title Add new module Freyja New module: Freyja Aug 31, 2023
@vladsavelyev vladsavelyev merged commit 33db316 into MultiQC:master Aug 31, 2023
11 checks passed
vladsavelyev added a commit that referenced this pull request Sep 1, 2023
… than 1 (as it is almost always going to be the case), see #1903
vladsavelyev added a commit that referenced this pull request Sep 1, 2023
… than 1 (as it is almost always going to be the case), see #1903 (#2022)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaits-review Awaiting final review and merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants