Skip to content

v2.0

Latest
Compare
Choose a tag to compare
@dcdehaas dcdehaas released this 15 Apr 12:46
· 3 commits to main since this release

This release increments the IGD file format version from v2 to v3. There aren't really any VCF-related changes.

  • Simplification of missing data handling. Previously, it was stored in its own table as sparse lists, and loaded all at once. For any non-trivial amount of missing data this could use a fair amount of RAM. Now it is just another row in the "regular" data rows.
  • Each row can be either sparse or not. Sparse rows are list of sample indexes (like missing data was before). Non-sparse are bit vectors (like all the other data was before). This change makes the resulting file significantly smaller than before, for large datasets.
  • The API for writing IGD data rows was simplifed.
  • Faster processing of the bitvector representation.
  • Example in igdpp for computing allele frequency.
  • In-memory storage of allele values (the strings) was reworked to be significantly smaller. More than 6x smaller for most alleles.