Add contamination check #317
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add comparison of germline and somatic variants of the cohort to quantify potential intersample contamination
AI summary
This pull request introduces a new process for computing contamination in mutation data and integrates it into the mutation preprocessing workflow. The key changes include adding the
COMPUTE_CONTAMINATIONprocess, updating the workflow to use it, and modifying channels to handle the necessary data.Addition of the
COMPUTE_CONTAMINATIONprocess:COMPUTE_CONTAMINATIONinmodules/local/contamination/main.nf. This process computes contamination using input mutation files (mafandsomatic_maf), outputs contamination results as TSV files, and optionally generates contamination plots as PDFs. It also records the Python version used in aversions.ymlfile.Integration into the mutation preprocessing workflow:
COMPUTE_CONTAMINATIONprocess into the mutation preprocessing workflow by adding anincludestatement insubworkflows/local/mutationpreprocessing/main.nf.MUTATION_PREPROCESSINGworkflow to create a new channelraw_muts_all_samplesby joining metadata with named mutation files, and passed this channel along withmuts_all_samplesto theCOMPUTE_CONTAMINATIONprocess.