Skip to content

Diagnostic Plots

M. Brown edited this page Apr 24, 2020 · 20 revisions

Diagnostic Plots and Tables:

Several diagnostic plots and tables are created for the Bayesian network model to ensure convergence along with providing credibility intervals. By default these plots are not generated but can be by setting the option R diagnostic = TRUE .

Outline

Density And Trace Plots

Density and trace plots are used to help determine if the Markov chains are stable and the symmetry of the data. Trace plots should appear flat, this ensures that simulations are coming from a stable Markov chain. Because of the stochastic nature of the simulation, continuous variables should look like flat 'fuzzy caterpillars'. Density show the posterior distribution of the simulated values and should appear relatively symmetrical around the mean.


The following figures are trace and density plots for discrete and continuous posterior distributions. The following figures come from the example data given by infercnv. The left figure (epsilon) shows sampling of the predicted states for the first two cells in an identified CNV region, coming from a categorical distribution and Dirichlet prior. The right figure (theta) shows sampling of the predicted state probability for the first two predicted CNVs given by a Dirichlet distribution.


Autocorrelation plots

Autocorrelation plots are a way to evaluate the randomness of the data, comparing simulated values at a specific iteration to previous iterations (lags). Autocorrelation plots show the correlation coefficients, a value between 1 and -1. values around 1 and -1 show correlation while values around 0 show no correlation.

The following figures are autocorrelations plots for the first two identified CNVs. These plots are examples of what good autocorrelations plots look like with convergence on 0.

Gelman Plots

Gelman Plots are another source for determining convergence. Gelman Plots show the Gelman-Rubin statistics as the sampling iterations progress along the six chains. The Gelman-Rubin statistics measures the variance between and within chains, therefore a value of 1 means there is no variation between chains. A plot is created for each of the six chains, which corresponds to the CNV states probabilities (theta). If the chain is converging, the black line should converge onto the horizontal line stationary at 1.

The following figures are Gelman plots for the first two identified CNVs. These plots are examples of what good Gelman plots look like with convergence on 1.

Summary Tables

Summary tables are created for each CNV region. These summery tables contain the Mean, Standard Deviation, 95% CI, Median, Geweke Diagnostic with Fraction In 1st Window = 0.1 and 2nd Window = 0.5, for the posterior distributions for each of the 6 states (thetas). The Geweke convergence diagnostic is a Z-score that reveals whether convergence is occurring. This is a Z-score with Normal(0,1), values 1.96 or more extreme are considered significant and therefore not converging. The 95% credibility interval is given by the 2.5%, 50%, 97.5% quantiles.

Summary table for the first predicted CNV. Mean, Standard Deviation, 95% CI, Median, Geweke Diagnostic with Fraction In 1st Window = 0.1 and 2nd Window = 0.5

Clone this wiki locally