0.18.0
Changes:
- 973fb6a Update TileDB and java/spark for version 0.18.0 (#451)
- bc44a82 Merge pull request #447 from TileDB-Inc/awenocur/sc-19637/port-catch-search
- e089dfc Add optional ingestion tasks (#448)
- b3aa808 port FindCatch_EP from core
- f03ad02 Fix global order violation (#446)
Ingestion Tasks
This release adds optional ingestion tasks to efficiently compute allele count and allele frequency. Ingestion tasks create arrays containing useful statistics at ingestion time. This approach is very efficient since the ingestion process is already reading every VCF record.
The ingestion task must be enabled at array creation time.
Python
ds = tiledbvcf.Dataset(uri, mode="w")
# enable both ingestion tasks
ds.create_dataset(enable_allele_count=True, enable_variant_stats=True)
CLI
# enable both ingestion tasks
tiledbvcf create -u vcf.tdb --enable-allele-count --enable-variant-stats
# enable the allele_count ingestion task only
tiledbvcf create -u vcf.tdb --enable-allele-count
Ingestion task overview
- The
allele_count
task creates an array of called allele counts, for example:
pos ref alt filter gt count
0 5041703 TA T,TAAAAAAAAAAAAAA PASS 1,2 1
1 5041703 TA T,TAAAAAAAAAAAAAAAA PASS 1,2 1
2 5041703 TAA T,TA PASS 1,2 1
3 5046753 G C,GCC PASS 1,2 11
- The
variant_stats
task creates an array used to compute allele frequencies, for example:
pos allele ac an af
43036308 C 781 1026 0.761209
T 245 1026 0.238791
43036324 A 1218 3708 0.328479
G 2490 3708 0.671521
This list of changes was auto generated.