Skip to content
Joe Zhu edited this page Nov 18, 2019 · 2 revisions

Making sense of the output

Output files

dEploid outputs text files with user-specified prefix with flag -o.

prefix.log

Log file records dEploid version, input file paths, parameter used and proportion estimates at the final iteration.

prefix.llk

Log likelihood of the MCMC chain.

prefix.prop

MCMC updates of the proportion estimates.

prefix.hap

Haplotypes at the final iteration in plain text file.

prefix.vcf

When flag -vcfOut is turned on, haplotypes are saved at the final iteration in VCF format.

prefix.single[i]

When flag -exportPostProb is turned on, posterior probabilities of the final iteration of strain [i].

DEploid-IBD

When "flag" -ibd is used. 'DEploid' executes first learns the number of strain and their proportions with an identity by descent model ('DEploid-IBD'). Then it fixes the number of strains and proportions and train the haplotypes, and train the haplotypes using the original DEploid algorithm ('DEploid-classic'). The staged output are labelled with ".ibd" and ".classic" respectively, and followed by the prefix.

DEploid-BEST

When "flag" -best is used. 'DEploid-BEST' executes the deconvolution algorithms in an optimised sequence to best report the number of strains, proportions and haplotypes. The program ('DEploid-Lasso') learns the number of strain with optimised reference panel; ".chooseK" is appended to the prefix for these output (NOTE: likelihood is not tracked in this case). It ('DEploid-IBD') then fixes the number of strains and tune the strain proportions with an identity by descent model; ".ibd" is appended to the prefix for these output. Finally, the program ('DEploid-Lasso') fixes the number of strains and proportions, and uses the optimised reference panel again to train and report the haplotypes; ".final" is appended to the prefix for these output. When -vcfOut is applied, this will only be the final haplotypes.

Example of output interpretation

Example 1. Standard deconvolution output

    $ ./dEploid -vcf data/exampleData/PG0390-C.eg.vcf.gz \
    -plaf data/exampleData/labStrains.eg.PLAF.txt \
    -noPanel -o PG0390-CNopanel -seed 1
    $ utilities/interpretDEploid.r -vcf data/exampleData/PG0390-C.eg.vcf.gz \
    -plaf data/exampleData/labStrains.eg.PLAF.txt \
    -dEprefix PG0390-CNopanel \
    -o PG0390-CNopanel -ring

interpretDEploidFigure.1

The top three figures are the same as figures show in :ref:data example <sec-eg>, with a small addition of inferred WSAF marked in blue, in the top right figure.

  • The bottom left figure show the relative proportion change history of the MCMC chain.
  • The middle figure show the correlation between the expected and observed allele frequency in sample.
  • The right figure shows changes in MCMC likelihood .

interpretDEploidFigure.2

This panel figure shows all allele frequencies within sample across all 14 chromosomes. Expected and observed WSAF are marked in blue and red respectively.

Example 2. Haplotype painting from a given panel

dEploid can take its output haplotypes, and calculate the posterior probability of each deconvoluted strain with the reference panel. In this example, the reference panel includes four lab strains: 3D7 (red), Dd2 (dark orange), HB3 (orange) and 7G8 (yellow).

    $ ./dEploid -vcf data/exampleData/PG0390-C.eg.vcf.gz \
    -plaf data/exampleData/labStrains.eg.PLAF.txt \
    -panel data/exampleData/labStrains.eg.panel.txt \
    -o PG0390-CPanel -seed 1 -k 3
    $ ./dEploid -vcf data/exampleData/PG0390-C.eg.vcf.gz \
    -plaf data/exampleData/labStrains.eg.PLAF.txt \
    -panel data/exampleData/labStrains.eg.panel.txt \
    -o PG0390-CPanel \
    -painting PG0390-CPanel.hap \
    -initialP 0.8 0 0.2 -k 3
    $ utilities/interpretDEploid.r -vcf data/exampleData/PG0390-C.eg.vcf.gz \
    -plaf data/exampleData/labStrains.eg.PLAF.txt \
    -dEprefix PG0390-CPanel \
    -o PG0390-CPanel -ring

PG0390fwdBwdRing

Example 3. Deconvolution followed by IBD painting

In addition to lab mixed samples, here we show example of dEploid deconvolute field sample PD0577-C.

PD0577inbreeding