Releases: HKU-BAL/Clair3
Releases · HKU-BAL/Clair3
v0.1-r12
- CRAM input is supported (#117).
- Bumped up dependencies' version to "Python 3.9" (#96), "TensorFlow 2.8", "Samtools 1.15.1", "WhatsHap 1.4".
- VCF DP tag now shows raw coverage for both pileup and full-alignment calls (before r12, sub-sampled coverage was shown for pileup calls if average DP > 144, (#128).
- Fixed Illumina representation unification out-of-range error in training (#110).
v0.1-r11.1
Users, please ignore this pre-release. This pre-release is for Zenodo to pull and archive Clair3 for the first time.
v0.1-r11
- Variant calling ~2.5x faster than
v0.1-r10
tested with ONT Q20 data, with feature generation in both pileup and full-alignment now implemented in C (co-contributors @cjw85, @ftostevin-ont, @EpiSlim). - Added the lightning-fast longphase as an option for phasing. Enable using
longphase
with option--longphase_for_phasing
. New option is disabled by default to align with the default behavior of the previous versions, but we recommend enable when calling human variants with ≥20x long-reads). - Added
--min_coverage
and--min_mq
options (#83). - Added
--min_contig_size
option to skip calling variants in short contigs when using genome assembly as input. - Reads haplotagging after phasing before full-alignment calling now integrated into full-alignment calling to avoid generating an intermediate BAM file.
- Supported .
csi
BAM index for large references (#90). For more speedup details, please check Notes on r11.
v0.1-r11 minor 2 patches are included in all installation options
v0.1-r10
-
Added a new ONT Guppy5 model (
r941_prom_sup_g5014
). Click here for some benchmarking results. Thissup
model is also applicable to reads called using thehac
andfast
mode. The oldr941_prom_sup_g506
model that was fine-tuned from the Guppy3,4 model is obsoleted. -
Added
--var_pct_phasing
option to control the percentage of top ranked heterozygous pile-up variants used for WhatsHap phasing.
v0.1-r9
v0.1-r8
v0.1-r7
- Increased
var_pct_full
in ONT mode from 0.3 to 0.7. Indel F1-score increased ~0.2%, but took ~30 minutes longer to finish calling a ~50x ONT dataset. - Expand fall through to next most likely variant if network prediction has insufficient read coverage (#53 commit 09a7d18, contributor @ftostevin-ont), accuracy improved on complex Indels.
- Streamized pileup and full-alignment training workflows. Reduce diskspace demand in model training (#55 commit 09a7d18, contributor @ftostevin-ont).
- Added
mini_epochs
option in Train.py, performance slightly improved in training a model for ONT Q20 data using mini-epochs(#60, contributor @ftostevin-ont). - Massively reduced disk space demand when outputting GVCF. Now compressing GVCF intermediate files with lz4, five times smaller with little speed penalty.
- Added
--remove_intermediate_dir
to remove intermediate files as soon as no longer needed (#48). - Renamed ONT pre-trained models with Medaka's naming convention.
- Fixed training data spilling over to validation data (#57).
v0.1-r6
v0.1-r5
- Modified data generator in model training to avoid memory exhaustion and unexpected segmentation fault by Tensorflow (contributor @ftostevin-ont ).
- Simplified dockerfile workflow to reuse container caching (contributor @amblina).
- Fixed ALT output for reference calls (contributor @wdecoster).
- Fixed a bug in multi-allelic AF computation (AF of [ACGT]Del variants was wrong before r5).
- Added AD tag to the GVCF output.
- Added the
--call_snp_only
option to only call SNP only (#40). - Added pileup and full-alignment output validity check to avoid workflow crashing (#32, #38).
v0.1-r4
- Install via bioconda.
- Added an ONT Guppy2 model to the images (ont_guppy2). Click here for more benchmarking results. The results show you have to use the Guppy2 model for Guppy2 or earlier data. 3. Added google colab notebooks for quick demo. 4. Fixed a bug then there are too few variant candidates (#28).