Conversation
The job/01_align_fastq.pbs scripts align YKOC FASTQ files to the sacCer3 genome using PughLab core pipeline alignment defaults. This includes filtering BAM files for duplicates. The .gitignore file is updated to include ignoring FASTQ and BAM files.
The 02_indexed_runDID.pbs script runs DeletionID on a single sample from YKOC by creating a temp directory and symlinking to the sample BAM files. This is to adjust for DeletionID expecting an input directory of files. The .gitignore file has the YKOC-wgs/results/ID directory added.
The information from Puddu et al can be used as a "true" set for evaluating the performance of DeletionID. They used an alternative approach for verifying knockouts so this can be an orthogonal verification of DeletionID's performance. README is updated to describe the new file addition and fix a small typo.
The PBS script `job/03_tally_results.pbs` calls the `scripts/analyze_ykoc_results.py` and `scripts/make_venn_diagram.py` scripts for analyzing DeletionID's performance on the YKOC samples. It further breaks down the output into samples whose KO were verified by only DeletionID, only Puddu et al, both, or neither. The `analyze_ykoc_results.py` script combines the metadata from the several files listed here with the DeletionID results for a master table of results information to become one of the GenoPipe paper's supplementary tables. The `make_venn_diagram.py` script makes a venn diagram out of the tallied DeletionID results showing the KO verified by DeletionID vs by the Puddu et al paper. This figure is named 3C.
The venn diagram script requires a title be included in the script call which was omitted in the previous commit. This commit includes the fix for this, titling the figure "Figure3C"
The aligned reads are piledup into a bedgraph formatted file and the values are normalized by a custom python script (`scripts/normalize_BedGraph.py`). Then another script identifies the expected knockout interval and pulls out the coverage within a window centered around the expected knockout gene for a width of 6000bp and formatted as a CDT file. The new results directory file BedGraphs that holds the raw and normalized pileup bedgraphs, and the single interval CDTs for each sample is added to the .gitignore. Notes about a different number of samples from the EBI sample set and the publication (Puddu et al 2019) sample accessions is explained in the README.
A heatmap is generated for samples identified by both Puddu and DeletionID methods, neither, only DeletionID, or only Puddu. These heatmaps are made from CDT files that concatenate the CDT row samples generated by the "04" script. Traces are also generated for the ORF boundaries using a version of ScriptManager that supports colors that use the alpha channel. Figures 3E and 3F are genome browser shots zooming in on some of the coverage rows from the heatmap highlighting various cases DeletionID did not identify the expected gene knockout.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
These scripts include everything needed to download data, align it, run it through DeletionID, and pileup the coverage to generate publication figures and tables for the samples from Puddu et al, 2019 that performed whole-genome-sequencing (WGS) on thousands of Yeast Knockout Collection diploid knockout strains.