Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MosaiCatcher module #1910

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Conversation

weber8thomas
Copy link

@weber8thomas weber8thomas commented Apr 25, 2023

This PR adds a new module for mosaicatcher counts method. Mosaicatcher counts Strand-seq reads and classifies strand states of each chromosome in each cell using a Hidden Markov Model. The software is used as a direct first step after obtaining BAM files from Strand-Seq data.

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md has been updated
  • There is example tool output for tools in the https://github.com/ewels/MultiQC_TestData repository or attached to this PR
  • Code is tested and works locally (including with --lint flag)
  • docs/README.md is updated with link to below
  • docs/modulename.md is created
  • Everything that can be represented with a plot instead of a table is a plot
  • Report sections have a description and help text (with self.add_section)
  • There aren't any huge tables with > 6 columns (explain reasoning if so)
  • Each table column has a different colour scale to its neighbour, which relates to the data (eg. if high numbers are bad, they're red)
  • Module does not do any significant computational work

@weber8thomas
Copy link
Author

Hi @ewels ! I'm struggling to solve CI issues here with this new module, here a part of the logs:

|            report | LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-Aligned_reads_in_QC_coverage_region_pct-1) ## 
|            report | LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-Aligned_bases_in_QC_coverage_region_pct-1) ## 
|            report | LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-Average_alignment_coverage_over_QC_coverage_region-1) ## 
|            report | LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-Uniformity_of_coverage_PCT_0_2_mean_over_QC_coverage_region-1) ## 
|            report | LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-Mean_Median_autosomal_coverage_ratio_over_QC_coverage_region-1) ## 
|            report | LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-PCT_of_QC_coverage_region_with_coverage_1x_inf-1) ## 
|            report | LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-PCT_of_QC_coverage_region_with_coverage_10x_inf-1) ## 
|            report | LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-PCT_of_QC_coverage_region_with_coverage_20x_inf-1) ## 
|            report | LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-PCT_of_QC_coverage_region_with_coverage_50x_inf-1) ## 
|            report | LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-PCT_of_QC_coverage_region_with_coverage_100x_inf-1) ## 
|           multiqc | Compressing plot data
|           multiqc | Report      : full_report.html
|           multiqc | Data        : full_report_data
|           multiqc | MultiQC complete
|           multiqc | 6 flat-image plots used in the report due to large sample numbers
|           multiqc | To force interactive plots, use the '--interactive' flag. 
See the �]8;id=554880;[https://multiqc.info/docs/#flat--interactive-plots�\documentation�]8;;�\.](https://multiqc.info/docs/#flat--interactive-plots%1B\documentation%1B]8;;%1B\.)
|           multiqc | Found 22 linting errors!
LINT: >dragen/coverage_metrics.py< HTML ID was a duplicate (Aligned_reads_in_QC_coverage_region_pct-1) ## plot=table.plot(data_by_sample, own_tabl_headers, pconfig={"namespace": NAMESPACE}),
LINT: >dragen/coverage_metrics.py< HTML ID was a duplicate (Aligned_reads_in_QC_coverage_region-1) ## plot=table.plot(data_by_sample, own_tabl_headers, pconfig={"namespace": NAMESPACE}),
LINT: >dragen/coverage_metrics.py< HTML ID was a duplicate (Aligned_bases_in_QC_coverage_region_pct-1) ## plot=table.plot(data_by_sample, own_tabl_headers, pconfig={"namespace": NAMESPACE}),
LINT: >dragen/coverage_metrics.py< HTML ID was a duplicate (Aligned_bases_in_QC_coverage_region-1) ## plot=table.plot(data_by_sample, own_tabl_headers, pconfig={"namespace": NAMESPACE}),
LINT: >dragen/coverage_metrics.py< HTML ID was a duplicate (Average_alignment_coverage_over_QC_coverage_region-1) ## plot=table.plot(data_by_sample, own_tabl_headers, pconfig={"namespace": NAMESPACE}),
LINT: >dragen/coverage_metrics.py< HTML ID was a duplicate (Uniformity_of_coverage_PCT_0_2_mean_over_QC_coverage_region-1) ## plot=table.plot(data_by_sample, own_tabl_headers, pconfig={"namespace": NAMESPACE}),
LINT: >dragen/coverage_metrics.py< HTML ID was a duplicate (Average_chr_X_coverage_over_QC_coverage_region-1) ## plot=table.plot(data_by_sample, own_tabl_headers, pconfig={"namespace": NAMESPACE}),
LINT: >dragen/coverage_metrics.py< HTML ID was a duplicate (Average_chr_Y_coverage_over_QC_coverage_region-1) ## plot=table.plot(data_by_sample, own_tabl_headers, pconfig={"namespace": NAMESPACE}),
LINT: >dragen/coverage_metrics.py< HTML ID was a duplicate (Average_mitochondrial_coverage_over_QC_coverage_region-1) ## plot=table.plot(data_by_sample, own_tabl_headers, pconfig={"namespace": NAMESPACE}),
LINT: >dragen/coverage_metrics.py< HTML ID was a duplicate (Average_autosomal_coverage_over_QC_coverage_region-1) ## plot=table.plot(data_by_sample, own_tabl_headers, pconfig={"namespace": NAMESPACE}),
LINT: >dragen/coverage_metrics.py< HTML ID was a duplicate (Median_autosomal_coverage_over_QC_coverage_region-1) ## plot=table.plot(data_by_sample, own_tabl_headers, pconfig={"namespace": NAMESPACE}),
LINT: >dragen/coverage_metrics.py< HTML ID was a duplicate (Mean_Median_autosomal_coverage_ratio_over_QC_coverage_region-1) ## plot=table.plot(data_by_sample, own_tabl_headers, pconfig={"namespace": NAMESPACE}),
LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-Aligned_reads_in_QC_coverage_region_pct-1) ## 
LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-Aligned_bases_in_QC_coverage_region_pct-1) ## 
LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-Average_alignment_coverage_over_QC_coverage_region-1) ## 
LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-Uniformity_of_coverage_PCT_0_2_mean_over_QC_coverage_region-1) ## 
LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-Mean_Median_autosomal_coverage_ratio_over_QC_coverage_region-1) ## 
LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-PCT_of_QC_coverage_region_with_coverage_1x_inf-1) ## 
LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-PCT_of_QC_coverage_region_with_coverage_10x_inf-1) ## 
LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-PCT_of_QC_coverage_region_with_coverage_20x_inf-1) ## 
LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-PCT_of_QC_coverage_region_with_coverage_50x_inf-1) ## 
LINT: HTML ID was a duplicate (mqc-generalstats-dragen_coverage-PCT_of_QC_coverage_region_with_coverage_100x_inf-1) ## 
Error: Process completed with exit code 1.

Many thanks in advance for your help!

Thomas

@weber8thomas weber8thomas changed the title Mosaicatcher Add MosaiCatcher module May 17, 2023
@ewels
Copy link
Member

ewels commented May 18, 2023

Yeah I desperately need to merge another PR for DRAGEN to solve these - it's not your PR that's at fault.

I'm trying to implement some new back-end stuff for MultiQC before the next release. When that's done I'll try to do a PR review sprint and get that in and sorted. Apologies for the inconvenience..

@weber8thomas
Copy link
Author

Yeah I desperately need to merge another PR for DRAGEN to solve these - it's not your PR that's at fault.

I'm trying to implement some new back-end stuff for MultiQC before the next release. When that's done I'll try to do a PR review sprint and get that in and sorted. Apologies for the inconvenience..

All fine, no worries :)

Cheers! Thomas

"hidden": False,
}
headers["suppl"] = {
"title": "suppl",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please capitalize the column names (Cell, Mapped, etc), replace the underscores with spaces.

Should also add "namespace": "MosaiCatcher" field into the headers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the title key into the nested dictionnary or the headers dict keys?

if not log_files:
log.warning("No log files found for MosaiCatcher module")
else:
print("Log files found:", log_files)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should call self.ignore_samples after parsing, and if no log files are found, should raise a UserWarning - see docs and example

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, done!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@vladsavelyev vladsavelyev self-requested a review September 2, 2023 21:17
Copy link
Member

@vladsavelyev vladsavelyev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you a lot for the contribution!

Left some comments.

@vladsavelyev vladsavelyev added the waiting: changes Issue / PR is on hold, waiting for requested changes label Sep 2, 2023
@weber8thomas
Copy link
Author

Hi @vladsavelyev ! :) Thanks for the reviewing! I wanted to filter out some columns to be included into the report but didn't find a good way to do it, do you have any tips on this?

@vladsavelyev
Copy link
Member

Hey!

I wanted to filter out some columns to be included into the report but didn't find a good way to do it, do you have any tips on this?

Not sure I understand! Do you mean the general stats table, or the table generated by this module? All columns that you specify in headers will be included, and those that you label as hidden=True won't show by default unless selected in the report's UI.

@weber8thomas
Copy link
Author

Seems to work! Let me know if I still need to modify something :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting: changes Issue / PR is on hold, waiting for requested changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants