Please sign in to comment.
Various VQSR optimizations in both runtime and accuracy.
-- For very large whole genome datasets with over 2M variants overlapping the training data randomly downsample the training set that gets used to build the Gaussian mixture model. -- Annotations are ordered by the difference in means between known and novel instead of by their standard deviation. -- Removed the training set quality score threshold. -- Now uses 2 gaussians by default for the negative model. -- Num bad argument has been removed and the cutoffs are now chosen by the model itself by looking at the LOD scores. -- Model plots are now generated much faster. -- Stricter threshold for determining model convergence. -- All VQSR integration tests change because of these changes to the model. -- Add test for downsampling of training data.
- Loading branch information...