Browse files

Various VQSR optimizations in both runtime and accuracy.

-- For very large whole genome datasets with over 2M variants overlapping the training data randomly downsample the training set that gets used to build the Gaussian mixture model.
-- Annotations are ordered by the difference in means between known and novel instead of by their standard deviation.
-- Removed the training set quality score threshold.
-- Now uses 2 gaussians by default for the negative model.
-- Num bad argument has been removed and the cutoffs are now chosen by the model itself by looking at the LOD scores.
-- Model plots are now generated much faster.
-- Stricter threshold for determining model convergence.
-- All VQSR integration tests change because of these changes to the model.
-- Add test for downsampling of training data.
  • Loading branch information...
Ryan Poplin
Ryan Poplin committed Sep 30, 2013
1 parent 5a6bb56 commit 3c7d94af4dfaa4f4621275cc75e851ca6d7d26ec
Showing with 0 additions and 0 deletions.

0 comments on commit 3c7d94a

Please sign in to comment.