Conversation
The scripts for running StrainID on the human simulations were stripped of a no longer accurate comment about how the script runs.
I needed to update the job/run_depth* and job/run_StrainID* PBS scripts with the new sacCer3 simulation depths (move from 1M, 2M, 3M, 4M, 5M to 10K, 50K, 100K, 500K, 1M, and 2M). Helper files including the `depth_simulations.txt` file needed to be updated with the new depths and seeds, as well as the simulation script so that it could support 10K and 50K read depths.
Previous commit missing this straggling PBS script for running StrainID on CENPK data at at depth of 100K.
The PBS script for generating the synthetic genomes to simulate off of is updated in this commit to point to the correct sacCer3 VCF files that contain ALL variants, and not just the subset of variants unique to the strain. The change in reference VCF files for the sacCer3 default run in a prior update precipitated the need for this update.
For each experiment (strain x depth), parse out the StrainID scores and runtimes using the tally PBS script and the two helper python scripts for parsing the results.
There were some typos leftover from a copy paste of the ENCODE processing scripts. This commit fixes the typos in the comment descriptions.
This commit includes a python script and an update to the tally job for generating jitter/strip plots of the simulation StrainID scores. scripts/make_jitter.py -seaborn library-based plot for showing spread of scores assigned to each strain for each simulation "experiment" (synthetic_strain x depth) job/tally_results.sh -PBS script updated with py script calls to generate figs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request includes changes made to shift sacCer3 simulation depths (more shallow) and to parse, tally, and visualize the StrainID score results and performance (runtimes).