Skip to content

v1.5

Latest
Compare
Choose a tag to compare
@jonassibbesen jonassibbesen released this 01 Apr 22:56
· 1 commit to master since this release

Release featuring:

  • Noise parameter estimation: Changed noise parameter estimation so that all variation types (except nested) are now used. This allows BayesTyper to run on variant sets containing few or even no SNVs. In addition, the minimum requirement on the number of variants needed for noise estimation have been removed and replaced with a warning.

  • Noise genotyping mode: Added new genotyping mode (--noise-genotyping) where noise parameters and genotypes are estimated jointly instead of sequentially. This allows for uncertainty in the noise estimates to be directly propagated into the genotype posteriors. For larger genomes the noise estimates are generally fairly stable, however for smaller genomes with few variants this is often not the case. Also, all variants even nested are used for noise estimation in this mode. Note, that this mode will in most cases be slower and require more memory than the default.

  • Seeding and threading: Fixed seeding so that identical results (within floating-point error) are attained between different runs independently of the number of threads used. Before the same number of threads were needed in order to get identical results using the same seed.

  • Genotype quality: Added genotype quality (GQ) as a sample attribute to the bayesTyper genotype output. The quality is calculated from the maximum genotype posterior probability (GPP) and is Phred-scaled.

  • Filters: Removed the --min-homozygote-genotypes filter from bayesTyper genotype. Due to several improvements to BayesTyper over the last couple of releases this filter is not as important as it used to be. Note, that it is still possible to apply the filter using bayesTyperTools filter.

  • Haplotype option: Renamed the option for setting the maximum number of haplotype candidates per sample to --max-number-of-sample-haplotypes and increased its default value to 32. A higher value has been shown to give better results when genotyping a small number of samples. Note, that this increase might result in longer computation time especially for more complex variant clusters.

  • Prior option: Changed the default parameters of the gamma distributed noise rate prior (--noise-rate-prior) to better reflect the expected Illumina error rate.

  • Insertion alleles: Added support for insertions in bayesTyperTools convertAllele. The sequences stored in the variant attributes SEQ or SVINSSEQ are now used as the inserted sequence for <INS> alleles. In addition, a fasta file containing the inserted sequences can be given with >"name" matching <"name">. Furthermore, support for partial insertions (Manta output) where the center and length is unknown has been added.

  • Scripts: Removed addMaxGenotypePosterior since it is no longer relevant now that genotype qualities are calculated during genotyping. Added filterAlleleCallsetOrigin script that can filter alleles based on their origin (ACO).

  • General: Made smaller improvements to the inference algorithm. Converted some common asserts related to input data to more readable error messages.