Skip to content

Validation of DelID using YKOC data#7

Merged
owlang merged 7 commits intomasterfrom
validation
May 20, 2021
Merged

Validation of DelID using YKOC data#7
owlang merged 7 commits intomasterfrom
validation

Conversation

@owlang
Copy link
Copy Markdown
Contributor

@owlang owlang commented May 20, 2021

These scripts include everything needed to download data, align it, run it through DeletionID, and pileup the coverage to generate publication figures and tables for the samples from Puddu et al, 2019 that performed whole-genome-sequencing (WGS) on thousands of Yeast Knockout Collection diploid knockout strains.

OLIVIA LANG and others added 7 commits April 23, 2021 04:02
The job/01_align_fastq.pbs scripts align YKOC FASTQ files to the sacCer3 genome using PughLab core pipeline alignment defaults. This includes filtering BAM files for duplicates.

The .gitignore file is updated to include ignoring FASTQ and BAM files.
The 02_indexed_runDID.pbs script runs DeletionID on a single sample from YKOC by creating a temp directory and symlinking to the sample BAM files. This is to adjust for DeletionID expecting an input directory of files.

The .gitignore file has the YKOC-wgs/results/ID directory added.
The information from Puddu et al can be used as a "true" set for evaluating the performance of DeletionID. They used an alternative approach for verifying knockouts so this can be an orthogonal verification of DeletionID's performance.

README is updated to describe the new file addition and fix a small typo.
The PBS script `job/03_tally_results.pbs` calls the `scripts/analyze_ykoc_results.py` and `scripts/make_venn_diagram.py` scripts for analyzing DeletionID's performance on the YKOC samples. It further breaks down the output into samples whose KO were verified by only DeletionID, only Puddu et al, both, or neither.

The `analyze_ykoc_results.py` script combines the metadata from the several files listed here with the DeletionID results for a master table of results information to become one of the GenoPipe paper's supplementary tables.

The `make_venn_diagram.py` script makes a venn diagram out of the tallied DeletionID results showing the KO verified by DeletionID vs by the Puddu et al paper. This figure is named 3C.
The venn diagram script requires a title be included in the script call which was omitted in the previous commit. This commit includes the fix for this, titling the figure "Figure3C"
The aligned reads are piledup into a bedgraph formatted file and the values are normalized by a custom python script (`scripts/normalize_BedGraph.py`). Then another script identifies the expected knockout interval and pulls out the coverage within a window centered around the expected knockout gene for a width of 6000bp and formatted as a CDT file.

The new results directory file BedGraphs that holds the raw and normalized pileup bedgraphs, and the single interval CDTs for each sample is added to the .gitignore.

Notes about a different number of samples from the EBI sample set and the publication (Puddu et al 2019) sample accessions is explained in the README.
A heatmap is generated for samples identified by both Puddu and DeletionID methods, neither, only DeletionID, or only Puddu. These heatmaps are made from CDT files that concatenate the CDT row samples generated by the "04" script. Traces are also generated for the ORF boundaries using a version of ScriptManager that supports colors that use the alpha channel.

Figures 3E and 3F are genome browser shots zooming in on some of the coverage rows from the heatmap highlighting various cases DeletionID did not identify the expected gene knockout.
@owlang owlang merged commit 0de893b into master May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant