Skip to content

Validation DelID using Simulated Synthetic Deletion data#6

Merged
owlang merged 9 commits intomasterfrom
validation
Apr 23, 2021
Merged

Validation DelID using Simulated Synthetic Deletion data#6
owlang merged 9 commits intomasterfrom
validation

Conversation

@owlang
Copy link
Copy Markdown
Contributor

@owlang owlang commented Apr 23, 2021

Here is the code for simulating deletions in yeast to validate DeletionID. Also included are the scripts for building the figures.

OLIVIA LANG and others added 9 commits March 31, 2021 18:09
Organize and setup the YKOC-wgs directory to run DeletionID on the thousands of samples generated by Puddu et al 2019.

YKOC-wgs/README
-information about how the metadata files were generated.
-outline step-wise plan
YKOC-wgs/results/README
-placeholder for results directory
YKOC-wgs/logs/README
-placeholder for logs directory
../.gitignore
-exclude log files
201213_STable1_del2srs.txt
-deletion background to ERS accession information
210223_EBIaccessions_PRJEB27160.txt
-download and EBI accession information
210316_sgd_names_and_aliases.txt
-mapping gene names with aliases reference file
The intermediate directory `logs` was missing from the `YKOC-wgs/logs/*out-*` and `YKOC-wgs/logs/*err-*` filepaths.
The PBS script can be used to download the raw FASTQ files from Puddu et al 2019.
`201213_Puddu_2019_STable1_del2ers.txt`
Supplementary Table 1 was renamed to be more descriptive and clear.

`210403_PRJEB27160_accessions.txt`
EBI metadata was updated to include more information for each sample including read count information.

`210316_sgd_names_and_aliases.txt`
Instead of using the table downloaded from YeastMine, I found some tabular information from the SGD downloads page that serves the purpose of mapping standard names to aliases. The download and removal cleanup will be incorporated into the job scripts that use them.
Change simulated dataset sizes to use simulate at 500K, 1M, 2M, 3M, 4M, and 5M reads.

depth_simulations.txt
-rewrite synthetic genome, read count, and seed parameter combinations
../scripts/simulate.sh
-add new depths tested to list of depth encodings
job/run_depth_*
-rewrite new PBS scripts for new read depths
Wrote PBS scripts to independently execute each simulation and time the execution for performance analysis. This is done by creating a temp directory for each sample with the *.bam and *.bai files for identiy-Deletion.sh to be called on.
The script `job/tally_results_and_runtime.sh` identify a list of simulated samples that uniquely found the simulated deltion interval using the `scripts/check_ID_tally.sh` bash script and uses `grep` to identify the runtime metrics from the log files.

Sucessful simulated sample list are in the <deletion>_<depth>_tally.txt files.

Runtimes are listed in the <deletion>_<depth>_runtime.txt files.
Figure 3A (barplot of tallied succsses in various simulation parameters) and Figure3B (box plot of runtimes for various simulation sets) are generated using these scripts. the *_config.txt files support the calling of the python scripts to create matplotlib-based figures from the results/*_tally.txt and results/*_runtime.txt files.
@owlang owlang merged commit db8aa6f into master Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant