Skip to content

Releases: brentp/somalier

v0.1.5

19 Apr 20:36
Compare
Choose a tag to compare

v0.1.5

  • add experimental contamination estimate. this simply prints to stderr the sample and
    inferred source (another sample) of contamination along with the estimated level of
    contamination and the number of sites used to estimate it.
  • fix threading bug with large numbers of samples.
  • more lenient ped file parsing ("Female" will be recognized in sex column and
    "Affected" in phenotype column).
  • the html output now allows selecting a single sample to be highlighted in the plot
    this allows finding a sample of interest in a large cohort.
  • the output now includes a new metric for proportion of sites with an allele balance

    0.02 and < 0.2 or > 0.8 and < 0.98. this turns out to be a nice QC (high is bad)

  • for low coverage or targetted sites, sometimes nan values would stop the entire
    html page from working; this has been fixed.
  • make sure all reported relationships are plotted in correct colors (#14)
  • plotting fixes (#15)

Install

This release comes with 2 linux binaries:

  • somalier_static is a completely static binary; just wget, chmod+x (get a sites file) and go.
  • somalier_shared requires htslib (and libhts.so). use this binary if you need to access S3 or https files.

Sites files

hg38:

sites.hg38.vcf.gz
sites.chr.hg38.vcf.gz (for hg38 VCFs with "chr" prefix)

GRCh37

sites.vcf.gz

.list to specify bam/cram and index paths.

13 Feb 18:24
Compare
Choose a tag to compare

v0.1.4

  • if a file ending with ".list" is given as an argument (instead of .bam, .cram), it can contain
    paths to the alignment files and optionally the indexes. e.g.
https://abc/path/to/aaa.bam https://abc/indexes/path/aaa.bam.bai
https://abc/path/to/bbb.bam https://abc/indexes/path/bbb.bam.bai

These can be space, comma, or tab-delimited.

here are the current best sites files for hg38:

sites.hg38.vcf.gz
sites.chr.hg38.vcf.gz (for hg38 VCFs with "chr" prefix)

fixes for deep coverage and better QC metrics.

13 Feb 16:28
Compare
Choose a tag to compare

v0.1.3

  • if a sample had > 1 allele that was neither REF nor ALT at a given site, it was assigned
    an unknown genotype. This was too stringent for deep sequencing so it was changed to a
    proportion (> 0.04 [or 1 in 25 alleles]) #7
  • for samples with sparse coverage, e.g. from targeted sequencing projects, mean depth is
    not very informative because it gets washed out by all the zero-depth sites. The new columns:
    gt_depth_mean, gt_depth_std, gt_depth_skew` report the values for the depth at genotyped
    sites--those meeting the depth requirement (default of 7).

plot-aesthetics, fixes for RNA-Seq, more depth diagnostics

07 Feb 19:11
Compare
Choose a tag to compare

v0.1.2

  • allow lower-case reference alleles in case of masked genomes (see #5)
  • set relatedness values < -1.5 to -1.5 in the plot
  • fix bug that affected relatedness calcs especially in RNA-Seq
  • add more diagnostic values (allele-balance and number of non ref/alt bases)

see previous release for sites file. For hg38, it's possible to use: https://github.com/brentp/peddy/blob/master/peddy/GRCH38.sites
(if your chromosomes have the 'chr' prefix, that must be added to this sites file).

bug-fixes and interaction in html output

26 Jan 18:14
Compare
Choose a tag to compare

v0.1.1

  • fix bug in plot labels
  • better inter-plot interaction in html

continue to use sites.vcf.gz from previous release (GRCh37 only)

v0.1.0

18 Jan 23:31
Compare
Choose a tag to compare

this release improves the parallelization by sample and provides a better (GRCh37) sites file. It is recommended to use this file. It has fewer sites (23K) but they will work for BS-Seq data and should provide a slightly better relatedness estimate than the 37K from the previous release.

It also removed the heatmap plot in favor of a depth (diagnostic plot).

sites.vcf.gz

first release

07 Aug 18:35
Compare
Choose a tag to compare

see binary attached (built on oldish system so should avoid libc problems on most systems).

the sites.vcf.gz will work for hg37. the next release will provide one that works for hg38, but any set of common variants will work.
sites.vcf.gz