Releases: HKU-BAL/Clair3


20 Aug 03:21
  1. CRAM input is supported (#117).
  2. Bumped up dependencies' version to "Python 3.9" (#96), "TensorFlow 2.8", "Samtools 1.15.1", "WhatsHap 1.4".
  3. VCF DP tag now shows raw coverage for both pileup and full-alignment calls (before r12, sub-sampled coverage was shown for pileup calls if average DP > 144, (#128).
  4. Fixed Illumina representation unification out-of-range error in training (#110).


13 Jun 03:28
v0.1-r11.1 Pre-release

Users, please ignore this pre-release. This pre-release is for Zenodo to pull and archive Clair3 for the first time.


04 Apr 10:16
  1. Variant calling ~2.5x faster than v0.1-r10 tested with ONT Q20 data, with feature generation in both pileup and full-alignment now implemented in C (co-contributors @cjw85, @ftostevin-ont, @EpiSlim).
  2. Added the lightning-fast longphase as an option for phasing. Enable using longphase with option --longphase_for_phasing. New option is disabled by default to align with the default behavior of the previous versions, but we recommend enable when calling human variants with ≥20x long-reads).
  3. Added --min_coverage and --min_mq options (#83).
  4. Added --min_contig_size option to skip calling variants in short contigs when using genome assembly as input.
  5. Reads haplotagging after phasing before full-alignment calling now integrated into full-alignment calling to avoid generating an intermediate BAM file.
  6. Supported .csi BAM index for large references (#90). For more speedup details, please check Notes on r11.

v0.1-r11 minor 2 patches are included in all installation options


13 Jan 12:43
  1. Added a new ONT Guppy5 model (r941_prom_sup_g5014). Click here for some benchmarking results. This sup model is also applicable to reads called using the hac and fast mode. The old r941_prom_sup_g506 model that was fine-tuned from the Guppy3,4 model is obsoleted.

  2. Added --var_pct_phasing option to control the percentage of top ranked heterozygous pile-up variants used for WhatsHap phasing.


01 Dec 12:05
Added the --enable_long_indel option to output indel variant calls >50bp (#64), Click here to see more benchmarking results.


11 Nov 13:59
  1. Added the --enable_phasing option that adds a step after Clair3 calling to output variants phased by Whatshap (#63).
  2. Fixed unexpected program termination on successful runs.


19 Oct 09:10
  1. Increased var_pct_full in ONT mode from 0.3 to 0.7. Indel F1-score increased ~0.2%, but took ~30 minutes longer to finish calling a ~50x ONT dataset.
  2. Expand fall through to next most likely variant if network prediction has insufficient read coverage (#53 commit 09a7d18, contributor @ftostevin-ont), accuracy improved on complex Indels.
  3. Streamized pileup and full-alignment training workflows. Reduce diskspace demand in model training (#55 commit 09a7d18, contributor @ftostevin-ont).
  4. Added mini_epochs option in, performance slightly improved in training a model for ONT Q20 data using mini-epochs(#60, contributor @ftostevin-ont).
  5. Massively reduced disk space demand when outputting GVCF. Now compressing GVCF intermediate files with lz4, five times smaller with little speed penalty.
  6. Added --remove_intermediate_dirto remove intermediate files as soon as no longer needed (#48).
  7. Renamed ONT pre-trained models with Medaka's naming convention.
  8. Fixed training data spilling over to validation data (#57).


04 Sep 13:47
  1. Reduced memory footprint at the SortVcf stage(#45).
  2. Reduced ulimit -n (number of files simultaneously opened) requirement (#45, #47).
  3. Added Clair3-Illumina package in bioconda(#42)


19 Jul 15:11
  1. Modified data generator in model training to avoid memory exhaustion and unexpected segmentation fault by Tensorflow (contributor @ftostevin-ont ).
  2. Simplified dockerfile workflow to reuse container caching (contributor @amblina).
  3. Fixed ALT output for reference calls (contributor @wdecoster).
  4. Fixed a bug in multi-allelic AF computation (AF of [ACGT]Del variants was wrong before r5).
  5. Added AD tag to the GVCF output.
  6. Added the --call_snp_only option to only call SNP only (#40).
  7. Added pileup and full-alignment output validity check to avoid workflow crashing (#32, #38).


28 Jun 13:44
  1. Install via bioconda.
  2. Added an ONT Guppy2 model to the images (ont_guppy2). Click here for more benchmarking results. The results show you have to use the Guppy2 model for Guppy2 or earlier data. 3. Added google colab notebooks for quick demo. 4. Fixed a bug then there are too few variant candidates (#28).