Skip to content

html for huge cohorts

Choose a tag to compare

@brentp brentp released this 29 Oct 20:19
· 146 commits to master since this release

somalier was fast enough to use on large >5,000 sample cohorts, but the html output was not useful.
this fixes that by sub-sampling pairs of samples that are expected to be unrelated and also appear to be unrelated by the genotype information.

v0.2.6

  • for large cohorts (>1K samples) the html output is now usable.
    it randomly subsets samples that should be and are unrelated.
  • better error messages for bad input
  • inspect environment variable: SOMALIER_ALLOWED_FILTERS so that users can
    give a comma-delimited list of FILTERs that should be allowed (by default only PASS and RefCall
    variants are considered. This is useful for some GVCF formats.

sites files

These sites files are build-specific, but as of this release, once the sites are extracted, the resulting somalier files can be used to compare samples even across genome builds.

sites.hg19.vcf.gz
sites.hg38.nochr.vcf.gz
sites.GRCh37.vcf.gz
sites.hg38.vcf.gz