Skip to content

Conversation

@FerriolCalvet
Copy link
Collaborator

Add comparison of germline and somatic variants of the cohort to quantify potential intersample contamination

AI summary

This pull request introduces a new process for computing contamination in mutation data and integrates it into the mutation preprocessing workflow. The key changes include adding the COMPUTE_CONTAMINATION process, updating the workflow to use it, and modifying channels to handle the necessary data.

Addition of the COMPUTE_CONTAMINATION process:

  • Added a new process COMPUTE_CONTAMINATION in modules/local/contamination/main.nf. This process computes contamination using input mutation files (maf and somatic_maf), outputs contamination results as TSV files, and optionally generates contamination plots as PDFs. It also records the Python version used in a versions.yml file.

Integration into the mutation preprocessing workflow:

  • Imported the new COMPUTE_CONTAMINATION process into the mutation preprocessing workflow by adding an include statement in subworkflows/local/mutationpreprocessing/main.nf.
  • Updated the MUTATION_PREPROCESSING workflow to create a new channel raw_muts_all_samples by joining metadata with named mutation files, and passed this channel along with muts_all_samples to the COMPUTE_CONTAMINATION process.

- missing plotting outputs
- missing summary table of contamination comparisons
- not tested
@FerriolCalvet FerriolCalvet self-assigned this Jul 16, 2025
@FerriolCalvet FerriolCalvet requested a review from Copilot July 16, 2025 14:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a contamination quantification step to the mutation preprocessing workflow by introducing a new COMPUTE_CONTAMINATION process and wiring it into the existing pipeline.

  • Added COMPUTE_CONTAMINATION process in modules/local/contamination/main.nf to compute contamination metrics and produce TSV and optional PDF outputs.
  • Imported and invoked CONTAMINATION in subworkflows/local/mutationpreprocessing/main.nf, with new channels for raw and somatic mutations.

Reviewed Changes

Copilot reviewed 2 out of 4 changed files in this pull request and generated 2 comments.

File Description
subworkflows/local/mutationpreprocessing/main.nf Imported CONTAMINATION and created raw_muts_all_samples channel before invoking contamination step
modules/local/contamination/main.nf Defined the COMPUTE_CONTAMINATION process with inputs, outputs, script, and stub sections
Comments suppressed due to low confidence (2)

modules/local/contamination/main.nf:10

  • The variable name meta2 is ambiguous; consider renaming it to something more descriptive like somatic_meta to clearly distinguish it from the primary meta.
    tuple val(meta2), path(somatic_maf)

modules/local/contamination/main.nf:1

  • There are no tests covering the new COMPUTE_CONTAMINATION process. Consider adding unit or integration tests to validate the TSV and PDF outputs under various data scenarios.
process COMPUTE_CONTAMINATION {

@FerriolCalvet FerriolCalvet deleted the branch dev July 24, 2025 12:18
@FerriolCalvet FerriolCalvet reopened this Jul 24, 2025
Copy link
Collaborator Author

@FerriolCalvet FerriolCalvet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good

@FerriolCalvet FerriolCalvet merged commit 256801b into dev Jul 31, 2025
@FerriolCalvet FerriolCalvet deleted the check-contamination branch August 1, 2025 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check contamination between samples based on the germline variants

2 participants