Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce GENS to WGS workflow #1279

Merged
merged 80 commits into from Oct 27, 2023
Merged

feat: Introduce GENS to WGS workflow #1279

merged 80 commits into from Oct 27, 2023

Conversation

mathiasbio
Copy link
Collaborator

@mathiasbio mathiasbio commented Oct 5, 2023

This PR:

Adds rules and a script to create the final coverage.bed and baf.bed files to upload to GENS.
Only active for WGS samples.

Added new functionality in BALSAMIC to allow visualisation of coverage and BAF in GENS in WGS:

  • New command line arguments for config case required for creating GENS files
  • Additional DNAscope call using gnomad vcf of > 0.05 AF as --given command for creation of BAF-files
  • Rules using GATK rules for creating normalised coverage bed-files
  • Python-script for pre-processing of created files into different window-sizes for zoom feature in GENS
  • Workflow for generation of GENS PON
  • Flexibility in PON workflow creation, choosing wither CNVkit or GENS
  • Created 2 PONs for GENS coverage based on 100 female, and male WGS normal samples
    • balsamic_pon_100bp.female.v1.hdf5
    • balsamic_pon_100bp.male.v1.hdf5

New rules seen here:
image

Changed:

  • Added extra arguments for starting PON workflow creation (--pon-workflow)

Pre-steps:

Due to the presence of the decoy sequence hs37d5 in the MIP reference, which is missing from the BALSAMIC reference fasta, which could absorb reads in the creation of the PON in MIP and lead to a small risk of false positive / negative CNVs in GENS from balsamic using this PON, new PONs were created for balsamic (config and project structure here /home/proj/development/cancer/development_files/GENS_PON_CREATION):

  • Creation of balsamic GENS PON 100 WGS, 30X from females
  • Creation of balsamic GENS PON 100 WGS, 30X from males

Integrity tests:

  • Successfully run unitedbeagle T/N TGA without GENS reference-inputs
  • Successfully generated final GENS files for WGS T/N case
  • GENS files are saved in hk files.

image

Practical use-case tests:

https://gens-stage.scilifelab.se/

  • View CNVs in GENS for fleetjay. Quick overview of normal and tumor GENS results looks good! (2023-10-10 import date)
  • View CNVs in GENS for new case with new PONS (trustyfish).
  • upload manually to housekeeper: (housekeeper add file -kip -b trustyfish -t bed -t fracsnp -t scout -t gens -t ACC /path/to/file/analysis/cnv/ACC.baf.bed.gz etc...)
  • upload manually to GENS (cg upload gens trustyfish)
  • view and verify results in GENS look ok
  • verify access to GENS via Scout (blocked by this: Ensure compatibility of Scout links to GENS for uploaded balsamic samples #1298 but as this issue has been linked into the feature issue for GENS (GENS #1110), I'm checking this off as this will be solved by different PRs)

Review and tests:

  • Tests pass
  • Code review
  • New code is executed and covered by tests, and test approve

@mathiasbio mathiasbio changed the title feat: Introduce gens to WGS workflow feat: Introduce GENS to WGS workflow Oct 5, 2023
@codecov
Copy link

codecov bot commented Oct 5, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Files Coverage Δ
BALSAMIC/commands/base.py 100.00% <100.00%> (ø)
BALSAMIC/commands/config/base.py 100.00% <100.00%> (ø)
BALSAMIC/commands/config/case.py 100.00% <100.00%> (+3.44%) ⬆️
BALSAMIC/commands/config/pon.py 100.00% <100.00%> (+4.87%) ⬆️
BALSAMIC/commands/init/base.py 100.00% <100.00%> (ø)
BALSAMIC/commands/options.py 100.00% <100.00%> (ø)
BALSAMIC/commands/report/base.py 100.00% <100.00%> (ø)
BALSAMIC/commands/report/deliver.py 100.00% <100.00%> (ø)
BALSAMIC/commands/report/status.py 100.00% <100.00%> (ø)
BALSAMIC/commands/run/analysis.py 100.00% <100.00%> (ø)
... and 20 more

📢 Thoughts on this report? Let us know!.

@mathiasbio mathiasbio linked an issue Oct 6, 2023 that may be closed by this pull request
4 tasks
@mathiasbio mathiasbio added this to the Release 13 milestone Oct 6, 2023
@mathiasbio mathiasbio self-assigned this Oct 6, 2023
@mathiasbio mathiasbio marked this pull request as ready for review October 9, 2023 12:36
@mathiasbio mathiasbio requested a review from a team as a code owner October 9, 2023 12:36
@mathiasbio
Copy link
Collaborator Author

I still need to add some tests to fix the coverage but it could be nice to get some feedback at this point anyway to get ahead of the review process 🔥

@mathiasbio mathiasbio mentioned this pull request Oct 26, 2023
4 tasks
Copy link
Contributor

@ivadym ivadym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job with this PR and the work behind it, well done ⭐

I just left you some minor formatting and pytest suggestion

BALSAMIC/assets/scripts/preprocess_gens.py Outdated Show resolved Hide resolved
BALSAMIC/assets/scripts/preprocess_gens.py Outdated Show resolved Hide resolved
BALSAMIC/assets/scripts/preprocess_gens.py Outdated Show resolved Hide resolved
BALSAMIC/assets/scripts/preprocess_gens.py Outdated Show resolved Hide resolved
BALSAMIC/assets/scripts/preprocess_gens.py Outdated Show resolved Hide resolved
BALSAMIC/constants/analysis.py Outdated Show resolved Hide resolved
BALSAMIC/snakemake_rules/pon/cnvkit_create_pon.rule Outdated Show resolved Hide resolved
BALSAMIC/snakemake_rules/pon/gens_create_pon.rule Outdated Show resolved Hide resolved
BALSAMIC/utils/io.py Show resolved Hide resolved
BALSAMIC/assets/scripts/preprocess_gens.py Show resolved Hide resolved
@sonarcloud
Copy link

sonarcloud bot commented Oct 27, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 4 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@mathiasbio mathiasbio merged commit b7cbddb into develop Oct 27, 2023
9 checks passed
@mathiasbio mathiasbio deleted the introduce_gens branch October 27, 2023 10:57
@ivadym ivadym mentioned this pull request Nov 6, 2023
@mathiasbio mathiasbio mentioned this pull request Nov 22, 2023
3 tasks
mathiasbio added a commit that referenced this pull request Nov 22, 2023
Adds updates to the ReadTheDocs mainly to reflect the additions and changes made from the GENS (#1279) and Restructuring PR (#1176).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Archived
Development

Successfully merging this pull request may close these issues.

GENS
2 participants