Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastQC: add top overrepresented sequences table #2075

Merged
merged 11 commits into from
Sep 28, 2023
Merged

FastQC: add top overrepresented sequences table #2075

merged 11 commits into from
Sep 28, 2023

Conversation

vladsavelyev
Copy link
Member

Fix #926

Add a table into the FastQC module showing the most common overrepresented sequences across all samples:

Screenshot 2023-09-25 at 20 10 37

By default, it shows top 20 sequences occurring in the most number of samples. The number can be customised:

fastqc_config:
  top_overrepresented_sequences: 50

It can also be customised to rank sequences by the total number of occurrences instead of the number of samples:

fastqc_config:
  top_overrepresented_sequences_by: "total"

@vladsavelyev vladsavelyev added this to the MultiQC v1.17 milestone Sep 25, 2023
@vladsavelyev vladsavelyev added the awaits-review Awaiting final review and merge. label Sep 25, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 25, 2023

🚀 Deployed on https://mqc-pr-2075--multiqc.netlify.app

@ewels
Copy link
Member

ewels commented Sep 26, 2023

How about having a third column that shows the read count as a percentage of the total read count across all samples?

@vladsavelyev
Copy link
Member Author

How about having a third column that shows the read count as a percentage of the total read count across all samples?

That's going to always be close to zero, not sure if a column like this is helpful 🤔
Screenshot 2023-09-27 at 11 03 45

Though maybe it is, as it just means that everything is okay? Maybe I should add another column with the max percentage across samples, to quickly identify the bad sequence in a bad sample.

@vladsavelyev vladsavelyev changed the title FastQC: top overrepresented sequences table FastQC: add top overrepresented sequences table Sep 27, 2023
@vladsavelyev
Copy link
Member Author

@multiqc-bot changelog

Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

CHANGELOG.md Outdated Show resolved Hide resolved
docs/modules/fastqc.md Outdated Show resolved Hide resolved
docs/modules/fastqc.md Outdated Show resolved Hide resolved
@ewels
Copy link
Member

ewels commented Sep 28, 2023

@multiqc-bot fix linting

@ewels
Copy link
Member

ewels commented Sep 28, 2023

Maybe I should add another column with the max percentage across samples, to quickly identify the bad sequence in a bad sample.

Could be nice, but might be approaching overkill a little. I think I'm happy to merge as-is for now, can wait to see if we get any feedback about this and always add it at a later date.

@ewels
Copy link
Member

ewels commented Sep 28, 2023

@multiqc-bot fix linting

@ewels
Copy link
Member

ewels commented Sep 28, 2023

@ewels ewels merged commit f64ee52 into master Sep 28, 2023
11 checks passed
@ewels ewels deleted the fastqc-overrep branch September 28, 2023 23:21
a-frantz added a commit to a-frantz/MultiQC that referenced this pull request Oct 2, 2023
* master:
  Just run CI on the oldest + newest supported Python versions (MultiQC#2074)
  Picard: fix parsing mixed strings/numbers, account for trailing tab (MultiQC#2083)
  FastQC: add top overrepresented sequences table (MultiQC#2075)
  Add GitHub Actions bot workflow to fix code linting from a PR comment (MultiQC#2082)
  Use custom exception type instead of `UserWarning` when no samples are found. (MultiQC#2049)
  Lint modules for missing `self.add_software_version` (MultiQC#2081)
  Changelog bot: Update docs (MultiQC#2077)
  Changelog action: remove `.capitalize()`, add changelog entry (MultiQC#2080)
  Add action to populate the change log from PR titles triggered by `@multiqc-bot changelog` (MultiQC#2025)

# Conflicts:
#	CHANGELOG.md
#	multiqc/modules/ngsderive/ngsderive.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaits-review Awaiting final review and merge. module: enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Access to top over-represented nucleotide sequences
3 participants