Skip to content

0.18.0

Compare
Choose a tag to compare
@gspowley gspowley released this 03 Aug 19:13
973fb6a

Changes:

  • 973fb6a Update TileDB and java/spark for version 0.18.0 (#451)
  • bc44a82 Merge pull request #447 from TileDB-Inc/awenocur/sc-19637/port-catch-search
  • e089dfc Add optional ingestion tasks (#448)
  • b3aa808 port FindCatch_EP from core
  • f03ad02 Fix global order violation (#446)

Ingestion Tasks

This release adds optional ingestion tasks to efficiently compute allele count and allele frequency. Ingestion tasks create arrays containing useful statistics at ingestion time. This approach is very efficient since the ingestion process is already reading every VCF record.

The ingestion task must be enabled at array creation time.

Python

ds = tiledbvcf.Dataset(uri, mode="w")

# enable both ingestion tasks
ds.create_dataset(enable_allele_count=True, enable_variant_stats=True)

CLI

# enable both ingestion tasks
tiledbvcf create -u vcf.tdb --enable-allele-count --enable-variant-stats

# enable the allele_count ingestion task only
tiledbvcf create -u vcf.tdb --enable-allele-count

Ingestion task overview

  • The allele_count task creates an array of called allele counts, for example:
         pos                    ref                                                alt filter   gt  count
0    5041703                     TA                                  T,TAAAAAAAAAAAAAA   PASS  1,2      1
1    5041703                     TA                                T,TAAAAAAAAAAAAAAAA   PASS  1,2      1
2    5041703                    TAA                                               T,TA   PASS  1,2      1
3    5046753                      G                                              C,GCC   PASS  1,2     11
  • The variant_stats task creates an array used to compute allele frequencies, for example:
pos	       allele   ac          an          af
43036308       C        781         1026        0.761209
               T        245         1026        0.238791
43036324       A        1218        3708        0.328479
               G        2490        3708        0.671521

This list of changes was auto generated.