Releases: bioinformatics-centre/BayesTyper
v1.5
Release featuring:
-
Noise parameter estimation: Changed noise parameter estimation so that all variation types (except nested) are now used. This allows BayesTyper to run on variant sets containing few or even no SNVs. In addition, the minimum requirement on the number of variants needed for noise estimation have been removed and replaced with a warning.
-
Noise genotyping mode: Added new genotyping mode (
--noise-genotyping
) where noise parameters and genotypes are estimated jointly instead of sequentially. This allows for uncertainty in the noise estimates to be directly propagated into the genotype posteriors. For larger genomes the noise estimates are generally fairly stable, however for smaller genomes with few variants this is often not the case. Also, all variants even nested are used for noise estimation in this mode. Note, that this mode will in most cases be slower and require more memory than the default. -
Seeding and threading: Fixed seeding so that identical results (within floating-point error) are attained between different runs independently of the number of threads used. Before the same number of threads were needed in order to get identical results using the same seed.
-
Genotype quality: Added genotype quality (GQ) as a sample attribute to the
bayesTyper genotype
output. The quality is calculated from the maximum genotype posterior probability (GPP) and is Phred-scaled. -
Filters: Removed the
--min-homozygote-genotypes
filter frombayesTyper genotype
. Due to several improvements to BayesTyper over the last couple of releases this filter is not as important as it used to be. Note, that it is still possible to apply the filter usingbayesTyperTools filter
. -
Haplotype option: Renamed the option for setting the maximum number of haplotype candidates per sample to
--max-number-of-sample-haplotypes
and increased its default value to 32. A higher value has been shown to give better results when genotyping a small number of samples. Note, that this increase might result in longer computation time especially for more complex variant clusters. -
Prior option: Changed the default parameters of the gamma distributed noise rate prior (
--noise-rate-prior
) to better reflect the expected Illumina error rate. -
Insertion alleles: Added support for insertions in
bayesTyperTools convertAllele
. The sequences stored in the variant attributes SEQ or SVINSSEQ are now used as the inserted sequence for <INS> alleles. In addition, a fasta file containing the inserted sequences can be given with >"name" matching <"name">. Furthermore, support for partial insertions (Manta output) where the center and length is unknown has been added. -
Scripts: Removed
addMaxGenotypePosterior
since it is no longer relevant now that genotype qualities are calculated during genotyping. AddedfilterAlleleCallsetOrigin
script that can filter alleles based on their origin (ACO). -
General: Made smaller improvements to the inference algorithm. Converted some common asserts related to input data to more readable error messages.
v1.4.1
Patch fixing:
- Bug resulting in variants being incorrectly excluded when the reference allele in the vcf file is uppercase and the reference is lowercase.
- Bug resulting in the code trying to estimate genomic parameters from kmers with a multiplicity above 2 causing it to occasionally fail during kmer classification.
v1.4
Release featuring:
- Sparsity estimation: Fixed bug when estimating the sparsity parameter used for the population prior. This fix should result in better estimates for complex clusters.
- Ploidy input file: The ploidy of each chromosome for each gender (female and male) can now be specified using
--chromosome-ploidy-file
in bayesTyper genotype. Ploidy levels 0, 1 (haploid) and 2 (diploid) are supported. Human ploidy levels are assumed if no file is given (see wiki for more details). - Genomic parameter estimation: Genomic parameters are now estimated using either haploid or diploid k-mers. The ploidy level with the highest number of informative k-mers is used for estimation.
- Noise parameter estimation: Noise parameters are now estimated using SNVs across all supported ploidy levels. In addition, SNVs in clusters are now also used in parameter estimation.
- Error handling: Incorrect inputs now produces more informative error messaging.
v1.3.1
Patch fixing incompatibility with bcftools merge.
v1.3
Major overhaul of BayesTyper. Important new features:
- New interface: bayesTyper has been refactored into
BayesTyper cluster
andBayesTyper genotype
. The cluster part partitions the variants into units that can then be genotyped. Please refer to the new readme for details on how to update your pipeline. - Much reduced memory: Only graphs and k-mers for a single unit need to reside in memory at the same time; the rest remains on disk. A bloom filter stores information about k-mers shared across units. This construct ensures that memory usage is (almost) independent of the number of candidate variants.
- Cluster support: Each unit can be genotyped independently and hence distributed across nodes on a cluster followed by simple concatenation of the unit vcf files (e.g. using
bcftools concat
). - Simultaneous genotyping and filtering: No need to run
bayesTyperTools filter
. Hard filters are now applied up front bybayesTyper genotype
. Genotypes can still be refiltered usingbayesTyperTools filter
after genotyping if necessary. - Parallel read bloom generation:
bayesTyperTools makeBloom
can now use multiple threads (and scales very well with the number of threads). - snakemake workflow: We have added an example snakemake workflow to the repo - this can orchestrate the entire pipeline straight from BAM(s) over variant candidates to final genotypes.
v1.2
This release contains the following major changes to BayesTyper:
-
New haplotype generation approach based on Bloom filters.
- Reduced memory usage, especially for low coverage data.
- Removal of singleton k-mers is no longer needed for high coverage data.
-
Variant alleles longer than 500,000 nts are now excluded by default.
- Reduced computation time and memory usage.
- Can be changed using the option
--max-allele-length
.
Please note that gzip parsing is currently not working in the static build.
v1.1
Update README.md
v1.0
BayesTyper (v1.0)
v0.9: First release
First release