R/pearson.R

### Pearson-type goodness-of fit test (Aguirre-Hernandez and Farewell, 2002; Titman and Sharples, 2007)


#' Pearson-type goodness-of-fit test
#' 
#' Pearson-type goodness-of-fit test for multi-state models fitted to
#' panel-observed data.
#' 
#' This method (Aguirre-Hernandez and Farewell, 2002) is intended for data
#' which represent observations of the process at arbitrary times ("snapshots",
#' or "panel-observed" data). For data which represent the exact transition
#' times of the process, \code{\link{prevalence.msm}} can be used to assess
#' fit, though without a formal test.
#' 
#' When times of death are known exactly, states are misclassified, or an
#' individual's final observation is a censored state, the modification by
#' Titman and Sharples (2008) is used. The only form of censoring supported is
#' a state at the end of an individual's series which represents an unknown
#' transient state (i.e. the individual is only known to be alive at this
#' time). Other types of censoring are omitted from the data before performing
#' the test.
#' 
#' See the references for further details of the methods.  The method used for
#' censored states is a modification of the method in the appendix to Titman
#' and Sharples (2008), described at
#' \url{https://www.mrc-bsu.cam.ac.uk/wp-content/uploads/robustcensoring.pdf}
#' (Titman, 2007).
#' 
#' Groupings of the time since initiation, the time interval and the impact of
#' covariates are based on equally-spaced quantiles.  The number of groups
#' should be chosen that there are not many cells with small expected numbers
#' of transitions, since the deviance statistic will be unstable for sparse
#' contingency tables.  Ideally, the expected numbers of transitions in each
#' cell of the table should be no less than about 5.  Conversely, the power of
#' the test is reduced if there are too few groups. Therefore, some sensitivity
#' analysis of the test results to the grouping is advisable.
#' 
#' Saved model objects fitted with previous versions of R (versions less than
#' 1.2) will need to be refitted under the current R for use with
#' \code{pearson.msm}.
#' 
#' @param x A fitted multi-state model, as returned by \code{\link{msm}}.
#' @param transitions This should be an integer vector indicating which
#' interval transitions should be grouped together in the contingency table.
#' Its length should be the number of allowed interval transitions, excluding
#' transitions from absorbing states to absorbing states.
#' 
#' The allowed interval transitions are the set of pairs of states \eqn{(a,b)}
#' for which it is possible to observe \eqn{a} at one time and \eqn{b} at any
#' later time.  For example, in a "well-disease-death" model with allowed
#' \emph{instantaneous} 1-2, 2-3 transitions, there are 5 allowed
#' \emph{interval} transitions. In numerical order, these are 1-1, 1-2, 1-3,
#' 2-2 and 2-3, excluding absorbing-absorbing transitions.
#' 
#' Then, to group transitions 1-1,1-2 together, and transitions 2-2,2-3
#' together, specify
#' 
#' \code{transitions = c(1,1,2,3,3)}.
#' 
#' Only transitions from the same state may be grouped.  By default, each
#' interval transition forms a separate group.
#' @param timegroups Number of groups based on quantiles of the time since the
#' start of the process.
#' @param intervalgroups Number of groups based on quantiles of the time
#' interval between observations, within time groups
#' @param covgroups Number of groups based on quantiles of \eqn{\sum_r
#' q_{irr}}{sum_r q_{irr}}, where \eqn{q_{irr}} are the diagonal entries of the
#' transition intensity matrix for the \emph{i}th transition.  These are a
#' function of the covariate effects and the covariate values at the \emph{i}th
#' transition: \eqn{q_{irr}} is minus the sum of the off-diagonal entries
#' \eqn{q_{rs}^{(0)} exp (\beta_{rs}^T z_i)} on the \emph{r}th row.
#' 
#' Thus \code{covgroups} summarises the impact of covariates at each
#' observation, by calculating the overall rate of progression through states
#' at that observation.
#' 
#' For time-inhomogeneous models specified using the \code{pci} argument to
#' \code{\link{msm}}, if the only covariate is the time period,
#' \code{covgroups} is set to 1, since \code{timegroups} ensures that
#' transitions are grouped by time.
#' @param groups A vector of arbitrary groups in which to categorise each
#' transition. This can be an integer vector or a factor.  This can be used to
#' diagnose specific areas of poor fit.  For example, the contingency table
#' might be grouped by arbitrary combinations of covariates to detect types of
#' individual for whom the model fits poorly.
#' 
#' The length of \code{groups} should be \code{x$data$n}, the number of
#' observations used in the model fit, which is the number of observations in
#' the original dataset with any missing values excluded.  The value of
#' \code{groups} at observation \eqn{i} is used to categorise the transition
#' which \emph{ends} at observation i. Values of \code{groups} at the first
#' observation for each subject are ignored.
#' @param boot Estimate an "exact" p-value using a parametric bootstrap.
#' 
#' All objects used in the original call to \code{\link{msm}} which produced
#' \code{x}, such as the \code{qmatrix}, should be in the working environment,
#' or else an \dQuote{object not found} error will be given.  This enables the
#' original model to be refitted to the replicate datasets.
#' 
#' Note that \code{groups} cannot be used with bootstrapping, as the simulated
#' observations will not be in the same categories as the original
#' observations.
#' @param B Number of bootstrap replicates.
#' @param next.obstime This is a vector of length \code{x$data$n} (the number
#' of observations used in the model fit) giving the time to the next
#' \emph{scheduled} observation following each time point.  This is only used
#' when times to death are known exactly.
#' 
#' For individuals who died (entered an absorbing state) before the next
#' scheduled observation, and the time of death is known exactly,
#' \code{next.obstime} would be \emph{greater} than the observed death time.
#' 
#' If the individual did not die, and a scheduled observation did follow that
#' time point, \code{next.obstime} should just be the same as the time to that
#' observation.
#' 
#' \code{next.obstime} is used to determine a grouping of the time interval
#' between observations, which should be based on scheduled observations. If
#' exact times to death were used in the grouping, then shorter intervals would
#' contain excess deaths, and the goodness-of-fit statistic would be biased.
#' 
#' If \code{next.obstime} is unknown, it is multiply-imputed using a
#' product-limit estimate based on the intervals to observations other than
#' deaths. The resulting tables of transitions are averaged over these
#' imputations.  This may be slow.
#' @param N Number of imputations for the estimation of the distribution of the
#' next scheduled observation time, when there are exact death times.
#' @param indep.cens If \code{TRUE}, then times to censoring are included in
#' the estimation of the distribution to the next scheduled observation time.
#' If \code{FALSE}, times to censoring are assumed to be systematically
#' different from other observation times.
#' @param maxtimes A vector of length \code{x$data$n}, or a common scalar,
#' giving an upper bound for the next scheduled observation time.  Used in the
#' multiple imputation when times to death are known exactly.  If a value
#' greater than \code{maxtimes} is simulated, then the next scheduled
#' observation is taken as censored.  This should be supplied, if known.  If
#' not supplied, this is taken to be the maximum interval occurring in the
#' data, plus one time unit.  For observations which are not exact death times,
#' this should be the time since the previous observation.
#' @param pval Calculate a p-value using the improved approximation of Titman
#' (2009).  This is optional since it is not needed during bootstrapping, and
#' it is computationally non-trivial.  Only available currently for non-hidden
#' Markov models for panel data without exact death times.  Also not available
#' for models with censoring, including time-homogeneous models fitted with the
#' \code{pci} option to \code{\link{msm}}.
#' @return A list whose first two elements are contingency tables of observed
#' transitions \eqn{O} and expected transitions \eqn{E}, respectively, for each
#' combination of groups.  The third element is a table of the deviances
#' \eqn{(O - E)^2 / E} multiplied by the sign of \eqn{O - E}.  If the expected
#' number of transitions is zero then the deviance is zero.  Entries in the
#' third matrix will be bigger in magnitude for groups for which the model fits
#' poorly.  \cr
#' 
#' \item{list("\"test\"")}{the fourth element of the list, is a data frame with
#' one row containing the Pearson-type goodness-of-fit test statistic
#' \code{stat}.  The test statistic is the sum of the deviances.  For
#' panel-observed data without exact death times, misclassification or censored
#' observations, \code{p} is the p-value for the test statistic calculated
#' using the improved approximation of Titman (2009).
#' 
#' For these models, for comparison with older versions of the package,
#' \code{test} also presents \code{p.lower} and \code{p.upper}, which are
#' theoretical lower and upper limits for the p-value of the test statistic,
#' based on \ifelse{latex}{\eqn{\chi^2}}{chi-squared} distributions with
#' \code{df.lower} and \code{df.upper} degrees of freedom, respectively.
#' \code{df.upper} is the number of independent cells in the contingency table,
#' and \code{df.lower} is \code{df.upper} minus the number of estimated
#' parameters in the model.}
#' 
#' \item{list("\"intervalq\"")}{(not printed by default) contains the
#' definition of the grouping of the intervals between observations.  These
#' groups are defined by quantiles within the groups corresponding to the time
#' since the start of the process.}
#' 
#' \item{list("\"sim\"")}{If there are exact death times, this contains
#' simulations of the contingency tables and test statistics for each
#' imputation of the next scheduled sampling time.  These are averaged over to
#' produce the presented tables and test statistic. This element is not printed
#' by default.
#' 
#' With exact death times, the null variance of the test statistic (formed by
#' taking mean of simulated test statistics) is less than twice the mean
#' (Titman, 2008), and the null distribution is not
#' \ifelse{latex}{\eqn{\chi^2}}{chi-squared}.  In this case, \code{p.upper} is an
#' upper limit for the true asymptotic p-value, but \code{p.lower} is not a
#' lower limit, and is not presented.}
#' 
#' \item{list("\"boot\"")}{If the bootstrap has been used, the element will
#' contain the bootstrap replicates of the test statistics (not printed by
#' default).}
#' 
#' \item{list("\"lambda\"")}{If the Titman (2009) p-value has been calculated,
#' this contains the weights defining the null distribution of the test
#' statistic as a weighted sum of \ifelse{latex}{\eqn{\chi^2_1}}{chi-squared(1)}
#' random variables (not printed by default).}
#' @author Andrew Titman \email{a.titman@@lancaster.ac.uk}, Chris Jackson
#' \email{chris.jackson@@mrc-bsu.cam.ac.uk}
#' @seealso \code{\link{msm}}, \code{\link{prevalence.msm}},
#' \code{\link{scoreresid.msm}},
#' @references Aguirre-Hernandez, R. and Farewell, V. (2002) A Pearson-type
#' goodness-of-fit test for stationary and time-continuous Markov regression
#' models. \emph{Statistics in Medicine} 21:1899-1911.
#' 
#' Titman, A. and Sharples, L. (2008) A general goodness-of-fit test for Markov
#' and hidden Markov models. \emph{Statistics in Medicine} 27(12):2177-2195
#' 
#' Titman, A. (2009) Computation of the asymptotic null distribution of
#' goodness-of-fit tests for multi-state models. \emph{Lifetime Data Analysis}
#' 15(4):519-533.
#' 
#' Titman, A. (2008) Model diagnostics in multi-state models of biological
#' systems. PhD thesis, University of Cambridge.
#' @keywords models
#' @examples
#' 
#' psor.q <- rbind(c(0,0.1,0,0),c(0,0,0.1,0),c(0,0,0,0.1),c(0,0,0,0))
#' psor.msm <- msm(state ~ months, subject=ptnum, data=psor,
#'                 qmatrix = psor.q, covariates = ~ollwsdrt+hieffusn,
#'                 constraint = list(hieffusn=c(1,1,1),ollwsdrt=c(1,1,2)))
#' pearson.msm(psor.msm, timegroups=2, intervalgroups=2, covgroups=2)
#' # More 1-2, 1-3 and 1-4 observations than expected in shorter time
#' # intervals - the model fits poorly.
#' # A random effects model might accommodate such fast progressors.
#' 
#' @export pearson.msm
pearson.msm <- function(x, transitions=NULL, timegroups=3, intervalgroups=3, covgroups=3, groups=NULL,
                        boot=FALSE, B=500,
                        next.obstime=NULL, # user-supplied next observation times, if known.
                        N=100,
                        indep.cens=TRUE, # use censoring times when calculating the empirical distribution of sampling times
                        maxtimes=NULL,# upper limit for imputed next observation times
                        pval=TRUE # calculate p-values for test. false if calling from bootstrap
                        ){

    dat <- x$data$mf
    ## Error handling
    if (!inherits(x, "msm")) stop("expected \"x\" t to be a msm model")
    if (x$hmodel$hidden && !x$emodel$misc) stop("only HMMs handled are misclassification models specified using \"ematrix\"")
    if (any(dat$"(obstype)"==2)) stop("exact transition times are not supported, only panel-observed data")
    if (!is.null(transitions) && !is.numeric(transitions)) stop("expected \"transitions\" to be numeric")
    if (!is.numeric(timegroups) || length(timegroups) != 1) stop ("expected \"timegroups\" to be a single number")
    if (!is.numeric(intervalgroups) || length(intervalgroups) != 1) stop ("expected \"intervalgroups\" to be a single number")
    if (!is.numeric(covgroups) || length(covgroups) != 1) stop ("expected \"covgroups\" to be a single number")
    if (!is.numeric(B) || length(B) != 1) stop ("expected \"B\" to be a single number")
    if (!is.numeric(N) || length(N) != 1) stop ("expected \"N\" to be a single number")

    ## Use only one covariate group for pci models with no other covariates
    if (!is.null(x$pci) && length(grep("timeperiod\\[([0-9]+|Inf),([0-9]+|Inf)\\)", x$qcmodel$covlabels)) == x$qcmodel$ncovs)
        covgroups <- 1

    ## Label various constants
    nst <- x$qmodel$nstates
    exact.death <- any(dat$"(obstype)" == 3)
    dstates <- if (exact.death) absorbing.msm(x) else NULL
    ndstates <- if (exact.death) transient.msm(x) else 1:nst
    nndstates <- length(ndstates)

    ## Do minimum needed to port this to msm version 1.4
    od <- dat[,c("(subject)","(time)","(state)","(obstype)")]
    names(od) <- c("subject","time","state","obstype")
    od$cov <- dat[,attr(dat,"covnames.q"),drop=FALSE]
    if (x$emodel$misc) od$misccov <- dat[,attr(dat,"covnames.e")]
    ncovs <- x$qcmodel$ncovs
    od$state <- factor(od$state, levels=sort(unique(od$state)))
    n <- nrow(od)

    ## Method is restricted to "terminal" censoring (means not dead, occurs at end)
    ## Drop censored states not at the end of individual series
    lastobs <- c(od$subject[1:(n-1)] != od$subject[2:n], TRUE)
    cens.notend <- (od$state %in% x$cmodel$censor) &  (!lastobs)
    if (any(cens.notend)) {
        od <- od[!cens.notend,]
        warning("Omitting censored states not at the end of individual series")
    }
    ## Drop individuals with only one observation
    od <- od[!od$subject %in% unique(od$subject)[table(od$subject)==1],]

    ## Drop observations of censoring types other than "not dead"
    if (length(x$cmodel$censor) >= 2) {
        ind <- NULL
        for (i in 1:length(x$cmodel$censor)) {
            if (identical(x$cmodel$states[x$cmodel$index[i] : (x$cmodel$index[i+1]-1)], as.numeric(transient.msm(x))))
            {ind <- i; break}
        }
        if (is.null(ind)){
            warning("Omitting all censored states")
            cens.drop <- od$state %in% x$cmodel$censor
        }
        else {
            cens.drop <- od$state %in% x$cmodel$censor[-ind]
            warning("Omitting censored states of types other than ",x$cmodel$censor[ind])
        }
        od <- od[!cens.drop,]
    }
    n <- nrow(od)
    nstcens <- length(unique(od$state)) # no. unique states in data, should be same as number of markov states plus number of censor types (max 1)
    cens <- nstcens > nst # is there any remaining censoring
    od$prevstate <- factor(c(NA,od$state[1:(n-1)]),  levels=1:nndstates)
    od$prevtime <- c(NA, od$time[1:(n-1)])
    od$ind <- 1:n # index into original data (useful if any rows of od are dropped)
    od$firstobs <- rep(tapply(1:n,od$subject,min)[as.character(unique(od$subject))],
                    table(od$subject)[as.character(unique(od$subject))])
    od$obsno <- 1:n - od$firstobs + 1
    od$subj.num <- match(od$subject, unique(od$subject))
    if (!is.null(next.obstime) && (!is.numeric(next.obstime) || length(next.obstime) != n)) stop (paste("expected \"next.obstime\" to be a numeric vector length", n))
    od$timeinterval <- if (is.null(next.obstime)) (od$obsno>1)*(od$time - od$time[c(1,1:(n-1))]) else next.obstime
    if (!is.null(maxtimes) && (!is.numeric(maxtimes) || !(length(maxtimes) %in% c(1,n)))) stop (paste("expected \"maxtimes\" to be a numeric vector length 1 or", n))
    if (!is.null(groups) && (!(length(groups) == n))) stop (paste("expected \"groups\" to be a vector length", n))
    od$maxtimes <- if (is.null(maxtimes))
        ifelse(od$state %in% dstates, max(od$timeinterval) + 1, od$timeinterval) ## Set max possible obs time for deaths to be max observed plus arbitrary one unit
    else rep(maxtimes, length=n) # if supplied as a scalar, use it for all obs
    od$usergroup <- factor(if (!is.null(groups)) groups else rep(1, n)) ## User-supplied groups

    ## Check and label transition groupings. Store data in a list
    ## Includes transitions to terminal censoring as separate states
    trans <- list()
    trans$allowed <- intervaltrans.msm(x, exclude.absabs=TRUE, censor=cens)
    trans$labsall <- paste(trans$allowed[,1], trans$allowed[,2], sep="-")
    trans$allowed[trans$allowed[,2] > nst, 2] <- nst+1 # these are to be used as indices, so label censoring as nst+1 not e.g. 99
    trans$na <- nrow(trans$allowed)
    trans$use <- if (is.null(transitions)) 1:trans$na else transitions  # why was this a matrix before?
    if (length(trans$use) != trans$na)
        stop("Supplied ", length(trans$use), " transition indices, expected ", trans$na)
    else if (!x$emodel$misc &&
             ! all(tapply(trans$allowed[,1], trans$use, function(u)length(unique(u))) == 1) )
        stop("Only transitions from the same origin can be grouped")
    ## convert ordinal transition indices to informative labels
    trans$labsagg <- tapply(trans$labsall, trans$use, function(u)paste(u,collapse=","))
    trans$from <- trans$allowed[,1][!duplicated(trans$use)] # from-state corresp to each element of trans$labsagg
    trans$to <- trans$allowed[,2][!duplicated(trans$use)] # to-state corresp to each element of trans$labsagg
    trans$ngroups <- length(unique(trans$use))

    ## Determine unique Q matrices determined by covariate combinations
    if (ncovs>0) {
        uniq <- unique(od$cov)
        nouniq <- dim(uniq)[1]
        pastedu <- do.call("paste",uniq)
        pastedc <- do.call("paste",od$cov)
        qmatindex <- match(pastedc,pastedu)
        qmat <- array(0,dim=c(nst,nst,nouniq))
        for (i in 1:nouniq)
            qmat[,,i] <- qmatrix.msm(x, covariates=as.list(uniq[i,,drop=FALSE]), ci="none")
    }else{
        qmatindex <- rep(1,n)
        nouniq <- 1
        qmat <- array(qmatrix.msm(x,ci="none"),dim=c(nst,nst,1))
    }
    qmatmaster <- qmat
    qmat <- qmat[,,qmatindex]
    od$rates <- apply(qmat,3,function(u) sum(diag(u)))

    ## Now work with a dataset with one row per transition
    md <- od[od$obsno>1,]
    qmat <- qmat[,,od$obsno>1]
    qmatindex <- qmatindex[od$obsno>1]
    ntrans <- nrow(md)
    nfromstates <- length(unique(md$prevstate))

    ## Groups based on time since initiation (not observation number)
    timegroups.use <- min(length(unique(md$time)), timegroups)
    md$timegroup <- qcut(md$time, timegroups.use)

    ## Group time differences by quantiles within time since initiation
    intervalq <- tapply(md$timeinterval[md$state %in% ndstates],
                        md$timegroup[md$state %in% ndstates],
                        ## Categorise quantiles based only on full observations, not deaths or censoring.
                        function(x) quantile(x,probs=seq(0,1,1/intervalgroups)))
    md$intervalgroup <- rep(1, ntrans)
    for (i in levels(md$timegroup))
        md$intervalgroup[md$timegroup==i] <- unclass(qcut(md$timeinterval[md$timegroup==i], qu = intervalq[[i]]))
    md$intervalgroup <- factor(md$intervalgroup)

    ## Groups based on covariates
    covgroups.use <- min(length(unique(md$rates)), covgroups)
    md$covgroup <- if (ncovs > 0) qcut(md$rates, covgroups.use) else factor(rep(1,ntrans))
    groupdims <- c(length(levels(md$timegroup)),length(levels(md$intervalgroup)),length(levels(md$covgroup)),length(levels(md$usergroup)))
    groupdimnames <- list(levels(md$timegroup),levels(md$intervalgroup),levels(md$covgroup),levels(md$usergroup))

    ## Determine empirical distribution of time interval lengths.
    ## Then impute next scheduled observation for transitions which end in exact death times
    md$obtype <- rep(0, ntrans)
    if (exact.death) md$obtype[md$state %in% dstates] <- 1
    md$obtype[md$state %in% x$cmodel$censor] <- 2
    md$cens <- factor(as.numeric(md$obtype==2), levels=c(0,1))
    ndeath  <-  sum(md$obtype==1)

    if (exact.death && is.null(next.obstime)) {
        cat("Imputing sampling times after deaths...\n")
        incl <- if (indep.cens) (0:2) else (0:1)
        empiricaldist <- empiricaldists(md$timeinterval[md$obtype %in% incl], md$state[md$obtype %in% incl],
                                        as.numeric(md$timegroup[md$obtype %in% incl]), timegroups, ndstates)
        imputation <- array(0,c(ndeath,N,4))
        dimnames(imputation) <- list(NULL, NULL, c("times","cens","intervalgroup","timeqmatindex"))
        deathindex <- which(md$obtype==1)
        for (i in 1:ndeath) {
            mintime <- md$timeinterval[deathindex[i]]
            centime <- md$maxtimes[deathindex[i]] # - md$time[deathindex[i]] + mintime
            tg <- md$timegroup[deathindex[i]]
            ## returns list of times, whether would have ended in censoring, and time category.
            st <- sampletimes(mintime, centime, empiricaldist[,tg,empiricaldist["time",tg,]>0,drop=FALSE], N, tg, intervalq)
            for (j in c("times","cens","intervalgroup"))
                imputation[i,,j] <- st[,j]
        }
    }
    else imputation <- deathindex <- NULL
    ndeathindex <- setdiff(1:ntrans, deathindex)

    ## Transition probability matrices are indexed by unique combinations of time intervals and Q matrices.
    timeint <- c(md$timeinterval[md$obtype != 1],c(imputation[,,"times"])) # time intervals, excluding deaths, concatenated with imputations of next interval after death
    qmatint <- c(qmatindex[md$obtype != 1],rep(qmatindex[deathindex],N)) # index into unique Q matrices
    timeqmat <- paste(timeint,qmatint,sep="-")
    timeqmata <- data.frame(timeint,qmatint)[!duplicated(timeqmat),]
    pastedu <- unique(timeqmat)
    timeqmatindex <- match(timeqmat,pastedu)
    ## Work out the transition probability matrix for each unique time interval and Q matrix
    npmats<-length(pastedu)
    pmi <- array(0,dim=c(nst,nst,npmats))
    for (i in unique(timeqmata[,2]))
        pmi[,,timeqmata[,2]==i] <- MatrixExp(qmatmaster[,,i], timeqmata[timeqmata[,2]==i,1], method="pade")
    md$timeqmatindex<-rep(0,ntrans)
    md$timeqmatindex[md$obtype != 1] <- timeqmatindex[1:(ntrans-ndeath)]
    if (exact.death && is.null(next.obstime))
        imputation[,,"timeqmatindex"] <- timeqmatindex[(ntrans-ndeath+1):length(timeqmatindex)]

    ### Calculate transition probabilities for non-death intervals
    prob <- array(0,dim=c(nst,ntrans)) ## array of obs state probs, conditional on previous obs state
    if (x$emodel$misc) {
        misccov <-
            if (x$ecmodel$ncovs > 0) od$misccov[od$obsno>1,,drop=FALSE]
            else NULL
        p.true <- array(dim=c(nst, ntrans)) # prob of each true state conditional on complete history including current obs
        initp <- x$hmodel$initpmat
        if (x$ecmodel$ncovs > 0) {
            uniqmisc <- unique(misccov)
            ematindex <- match(do.call("paste",misccov), do.call("paste", uniqmisc))
            emat <- array(0,dim=c(nst,nst,nrow(uniqmisc)))
            for (i in 1:nrow(uniqmisc))
                emat[,,i] <- ematrix.msm(x, covariates=as.list(uniqmisc[i,]),ci="none")
        }
        else emat <- ematrix.msm(x, ci="none")
        for (i in 1:ntrans) {
            ematrix <- if (x$ecmodel$ncovs>0) emat[,,ematindex[i]] else emat
	    if (md$state[i] %in% ndstates) {
              T <- pmi[,,md$timeqmatindex[i]] * matrix(ematrix[,md$state[i]], nrow=nst, ncol=nst, byrow=TRUE)
              p.true[,i] <- if (md$obsno[i] == 2) t(initp[md$subj.num[i],]) %*% T else t(p.true[,i-1]) %*% T
              p.true[,i] <- p.true[,i] / sum(p.true[,i])  # prob of each true state
            }
	    if (!(md$state[i] %in% dstates)) {
                prob[,i] <-
                    if (md$obsno[i] == 2) initp[md$subj.num[i],] %*% pmi[,,md$timeqmatindex[i]] %*% ematrix
                    else p.true[,i-1] %*% pmi[,,md$timeqmatindex[i]] %*% ematrix
            }
        }
    }
    else
        prob[cbind(rep(1:nst,ntrans-ndeath),rep((1:ntrans)[ndeathindex],each=nst))] <-
            pmi[cbind(rep(md$prevstate[(1:ntrans)[ndeathindex]],each=nst),
                      rep(1:nst,ntrans - ndeath),rep(md$timeqmatindex[(1:ntrans)[ndeathindex]],each=nst))]

    if (exact.death && is.null(next.obstime)) {
        stat.sim <- rep(0,N)
        obs.rep <- exp.rep <- dev.rep <- array(0, dim=c(groupdims, trans$ngroups, N))
        dimnames(obs.rep) <- dimnames(exp.rep) <- dimnames(dev.rep) <- c(groupdimnames, list(trans$labsagg), list(1:N))
        cat("Calculating replicates of test statistics for imputations...\n")
        for (i in 1:N) {
            ## Calculate transition probabilities for death intervals
            if (x$emodel$misc) {
                for (j in 1:ndeath) {
                    k <- deathindex[j]
                    ematrix <- if (x$ecmodel$ncovs>0) emat[,,ematindex[k]] else emat
                    prob[,k] <-
                        if (md$obsno[k] == 2) initp[md$subj.num[k],] %*% pmi[,,imputation[j,i,"timeqmatindex"]] %*% ematrix
                        else p.true[,k-1] %*% pmi[,,imputation[j,i,"timeqmatindex"]] %*% ematrix
                }
            }
            else
                prob[cbind(rep(1:nst,ndeath),rep(deathindex,each=nst))] <-
                    pmi[cbind(rep(md$prevstate[deathindex],each=nst),rep(1:nst,ndeath),rep(imputation[,i,"timeqmatindex"],each=nst))]

            md$intervalgroup[deathindex] <- factor(imputation[,i,"intervalgroup"], labels=levels(md$intervalgroup[deathindex])[sort(unique(imputation[,i,"intervalgroup"]))])

            md$cens[deathindex] <- as.numeric(imputation[,i,"cens"]) # factor levels 0/1

            ## Observed transition table, including to censoring indicators
            obs.rep.i <- table(md$state, md$prevstate, md$timegroup, md$intervalgroup, md$covgroup, md$usergroup)

            ## Expected transition table
            exp.cens <- array(0, dim = c(nst, nndstates, groupdims, 2))
            for (j in 1:nst)
                exp.cens[j,,,,,,] <- tapply(prob[j,], list(md$prevstate,md$timegroup,md$intervalgroup,md$covgroup,md$usergroup,md$cens), sum)
            exp.cens <- replace(exp.cens,is.na(exp.cens),0)
            exp.unadj <- array(0, dim=c(nstcens, nndstates, groupdims))
            exp.unadj[1:nst,,,,,] <- exp.cens[,,,,,,1]
            for (j in dstates)
                exp.unadj[j,,,,,] <- exp.cens[j,,,,,,1] + exp.cens[j,,,,,,2]

            if (cens)
                exp.unadj[nst+1,,,,,] <- apply(exp.cens[1:nndstates,,,,,,2,drop=FALSE], 2:6, sum) # total censored from any non-death state
            ## Remove very small values, caused by inaccuracy in numerical matrix exponential.
            exp.unadj <- replace(exp.unadj,(exp.unadj<sqrt(.Machine$double.eps)),0)
            ## Adjust it for censoring
            exp.rep.i <- adjust.expected.cens(exp.unadj, obs.rep.i, nst, ndstates, dstates, groupdims, N, cens, md, nstcens)

            agg <- agg.tables(obs.rep.i, exp.rep.i, groupdims, groupdimnames, trans)
            obs.rep.i <- agg$obs; exp.rep.i <- agg$exp
            dev.rep.i <- (obs.rep.i - exp.rep.i)^2/(exp.rep.i + (exp.rep.i==0))
            obs.rep[,,,,,i] <- obs.rep.i
            exp.rep[,,,,,i] <- exp.rep.i
            dev.rep[,,,,,i] <- dev.rep.i
            stat.sim[i] <- sum(dev.rep.i)
        }
        obstable <- apply(obs.rep, 1:5, mean)
        exptable <- apply(exp.rep, 1:5, mean)
        exptable <- replace(exptable,(exptable < sqrt(.Machine$double.eps)), 0)
        devtable <- apply(dev.rep, 1:5, mean)
        stat <- mean(stat.sim)
    }
    else {
        obstable <- exptable <- array(0,dim=c(nstcens, nndstates, groupdims))
        dimnames(obstable) <- dimnames(exptable) <- c(list(1:nstcens, 1:nndstates), groupdimnames)
        ## ?? why is obstable redefined?
        obstable <- table(md$state, md$prevstate, md$timegroup, md$intervalgroup, md$covgroup, md$usergroup)
        exp.cens <- array(0, dim = c(nst,nndstates,groupdims,2))
        for (j in 1:nst)
            exp.cens[j,,,,,,] <- tapply(prob[j,], list(md$prevstate,md$timegroup,md$intervalgroup,md$covgroup,md$usergroup,md$cens), sum)
        exp.cens <- replace(exp.cens,is.na(exp.cens),0)
        exptable <- array(0, dim=c(nstcens, nndstates, groupdims))
        exptable[1:nst,,,,,] <- exp.cens[,,,,,,1]
        exptable[nst,,,,,] <- exp.cens[nst,,,,,,1] + exp.cens[nst,,,,,,2]
        if (cens)
            exptable[nst+1,,,,,] <- apply(exp.cens[1:nndstates,,,,,,2,drop=FALSE], 2:6, sum) # total censored from any non-death state
        exptable <- replace(exptable,(exptable < sqrt(.Machine$double.eps)), 0)
        ## adjust expected values for censoring if necessary
        if (cens)
            exptable <- adjust.expected.cens(exptable, obstable, nst, ndstates, dstates, groupdims, N, cens, md, nstcens)
        agg <- agg.tables(obstable, exptable, groupdims, groupdimnames, trans)
        obstable <- agg$obs; exptable <- agg$exp
        devtable <- (obstable-exptable)^2/(exptable+(exptable==0))
        stat <- sum(devtable)
    }

    n.indep.trans <- length(trans$from[trans$to <= nst]) - length(unique(trans$from[trans$to <= nst]))
    ## number of states x categories from which there were zero transitions, excluding states with only one destination (which were already removed in the previous line)

    ## Bug (Gavin Chan) - example where apply() returns object with dims 3,1,1,1, but 5 dims in subset
    ## obstable has dim 3,1,1,1,4 ,  4=nstcens
    ## psor: dim(obstable) = 2 2 2 1 9, apply returns object with dims (3,2,2,2,1).
    ## tapply only has one category in broken version?
    ## trans$from = 1, 1, 1  : only one from category?
    ## shouldn't drop first dimension
    n.from <- apply(obstable, 1:4, function(u)tapply(u, trans$from, sum)) # no trans from each state, by category
    n.from <- array(n.from,  dim=c(length(unique(trans$from)), dim(obstable)[1:4]))
    ndf <- as.numeric(table(trans$from) -  1) # number of independent to-states for each from-state
    n.zero <- sum((n.from==0) * ndf) # zero cells to exclude from the degrees of freedom

    df.upper <- prod(dim(obstable)[1:4])*n.indep.trans - n.zero
    df.lower <- df.upper - length(x$opt$par)

    ## Calculate accurate p-value for panel data Titman (Lifetime Data Analysis, 2011)
    ## notation mostly from that paper
    acc.p <- FALSE
    if ( pval && (x$cmodel$ncens==0) && !exact.death && !x$emodel$misc ) {
        acc.p <- TRUE
### compute P
        md$grouplab <- interaction(md$timegroup,md$intervalgroup,md$covgroup,md$usergroup,factor(md$prevstate))
        md$group <- match(md$grouplab, sort(unique(md$grouplab))) # in order of category, not data
        ## Only include categories observed in data, not all possible cross-classifications ncat*nfrom
        C <- length(unique(md$group))
        R <- nst
        ncat <- prod(groupdims)
        from <- trans$from; to <- trans$to
        nfrom <- length(unique(from))
        Parr <- array(0, dim=c(R, ncat, nfrom))
        for (i in 1:nfrom) {
            fi <- unique(from)[i]
            et <- 1/sqrt(exptable[,,,,from==fi])
            Parr[to[from==fi],,i] <- t(array(et, dim=c(ncat,length(to[from==fi]))))
        }
        Pmat <- array(Parr, dim=c(R, ncat*nfrom))
        Pmat <- Pmat[,is.finite(colSums(Pmat))] # drop empty categories
        if(ncol(Pmat) != C)
            stop("Remove any absorbing-absorbing transitions from data and refit model")
        ## dim RC RC, tostate changes fastest, then category
        P <- diag(as.vector(Pmat))

        ## compute Sigma: block diagonal.
        pr <- matrix(0, nrow=ntrans, ncol=nst)
        for (i in 1:ntrans)
            pr[i,] <- pmatrix.msm(x, t=md$timeinterval[i], t1=md$time[i], covariates=if(ncovs>0)as.list(md$cov[i,])else 0)[md$prevstate[i],]
        Sigma <- matrix(0, nrow=C*nst,ncol=C*nst)
        for (c in 1:C) {
            block <- matrix(0,nrow=nst,ncol=nst)
            prc <- pr[md$group==c,,drop=FALSE]
            for (r in 1:nst)
                block[r,-r] <- - colSums(prc[,r]*prc[,-r,drop=FALSE])
            diag(block) <- colSums(prc * (1 - prc))
            rows <- cols <- (c-1)*nst + 1:nst
            Sigma[rows,cols] <- block
        }
        PSigmaPT <- P %*% Sigma %*% t(P)

        ## compute Psi  = Cov(score(theta), Orc), where Orc is observed counts in pearson table
        ## arranged as npars x RC
        ## bottom left and top right blocks wrong way round in paper
        dp <- Ccall.msm(x$paramdata$opt$par, do.what="dpmat", expand.data(x), x$qmodel, x$qcmodel, x$cmodel, x$hmodel, x$paramdata) # ntrans x R x npars: trans in data order
        Psiarr <- apply(dp, c(2,3), function(x)tapply(x, md$group, sum))
        npars <- x$paramdata$nopt # only includes qbase,qcov, since no hmms here
        ## permute C x R x npars Psiarr to R x C x npars Psi and then collapse first two dims
        Psi <- t(array(aperm(Psiarr,c(2,1,3)), dim=c(R*C,npars)))
        ## rows ordered with tostate changing fastest, then category (match P)
        EI <- 0.5*x$paramdata$info
        Omega <- rbind(cbind(EI, Psi %*% P), cbind(t(P) %*% t(Psi),PSigmaPT))

        ## compute B: RC x npars matrix with entries derc/dtheta_m / sqrt(erc)
        ## erc(theta) is just sum over (i in that cat) of prc(theta)
        Barr <- Psiarr
        fromgroups <- md$prevstate[!duplicated(md$group)][order(unique(md$group))] # ugh
        for (i in 1:nfrom) {
            fi <- unique(from)[i]
            rows <- fromgroups==fi
            et <- exptable[,,,,from==fi]
            Barr[rows,to[from==fi],] <- - Barr[rows,to[from==fi],]  / as.numeric(sqrt(et[et>0]))
        }
        Bmat <- array(aperm(Barr,c(2,1,3)), dim=c(R*C,npars))
        A <- cbind(Bmat %*% solve(EI), diag(R*C))
        V <- A %*% Omega %*% t(A)
        lambda <- eigen(V,only.values=TRUE)$values
        psi <- function(u)prod((1 - 2i*lambda*u)^(-0.5))
        fn <- function(u){
            res <- numeric(length(u))
            for (i in seq_along(u))
                res[i] <- Im(psi(u[i])*exp(-1i*u[i]*stat) / (2*pi*u[i]))
            res
        }
        int <- try(integrate(fn, -Inf, Inf))
        if (inherits(int, "try-error"))
            { message("Unable to calculate more accurate p-value"); p.acc <- p.err <- NULL}
        else {
            p.acc <- 0.5 + int$value
            p.err <- int$abs.error
            if (p.acc - 2*p.err < 0) p.acc <- 0
            if (p.acc + 2*p.err > 1) p.acc <- 1
        }
    }
    test <- data.frame(stat=stat)
    if (acc.p) test$p <- p.acc
    test$df.lower <- if (exact.death && is.null(next.obstime)) NA else df.lower
    test$p.lower <- if (exact.death && is.null(next.obstime)) NA else 1 - pchisq(stat, df.lower)
    test$df.upper <- df.upper
    test$p.upper <- 1-pchisq(stat, df.upper)
    rownames(test) <- ""

    ## Simulated observation times to use as sampling frame for bootstrapped data
    if (!is.null(imputation)) {
      imp.times <- matrix(rep(x$data$mf$"(time)", N), nrow=length(x$data$mf$"(time)"), ncol=N)
      prevtime <- imp.times[which(x$data$mf$"(state)" %in% dstates) - 1,]
      imp.times[x$data$mf$"(state)" %in% dstates, ] <- prevtime + imputation[,,"times"]
    }
    else imp.times <- NULL
    if (boot) {
        if (!is.null(groups)) stop("Bootstrapping not valid with user-specified groups")
        cat("Starting bootstrap refitting...\n")
        boot.stats <- pearson.boot.msm(x, imp.times=imp.times, transitions=transitions, timegroups=timegroups, intervalgroups=intervalgroups, covgroups=covgroups, groups=groups,
                                       B=B)
        test$p.boot <- sum(boot.stats > stat) / B
    }
    pearson <- list(observed=obstable, expected=exptable, deviance=devtable*sign(obstable-exptable), test=test, intervalq=intervalq)
    names(pearson) <- c("Observed","Expected","Deviance*sign(O-E)","test","intervalq")
    if (exact.death && is.null(next.obstime)) pearson$sim <- list(observed=obs.rep, expected=exp.rep, deviances=dev.rep, stat=stat.sim, imputation=imputation)
    if (boot) pearson$boot <- boot.stats
    if (acc.p) pearson$lambda <- lambda
    pearson <- reformat.pearson.msm(pearson)
    class(pearson) <- "pearson.msm"
    pearson
}

### Reformat array output from Pearson test as a matrix with transition type in columns

reformat.pearson.msm <- function(pearson) {
    pp <- pearson
    nd <- length(dim(pearson$Observed))
    for (i in 1:3) {
        dim(pp[[i]]) <- c(prod(dim(pp[[i]])[-nd]), dim(pp[[i]])[nd])
        rnames <- do.call("expand.grid", dimnames(pearson$Observed)[1:4])
        colnames(pp[[i]]) <- dimnames(pearson$Observed)[[nd]]
        pp[[i]] <- as.data.frame(pp[[i]])
        pp[[i]] <- cbind(rnames, pp[[i]])
        colnames(pp[[i]])[1:4] <- c("Time","Interval","Cov","User")
        drop <- NULL # drop columns labelling categories with only one value
        for (j in 1:4)
            if (length(unique(pp[[i]][,j]))==1) drop <- c(drop,j)
        pp[[i]] <- pp[[i]][,-drop]
    }
    pp
}

### Keep the simulation and bootstrap output but don't print it

#' @export
print.pearson.msm <- function(x, ...){
    print(x[!(names(x) %in% c("sim","boot","intervalq","lambda"))])
}

### Adaptation of cut(quantile()) so that if two quantiles are equal, a unique category is created for x equal to that
### For point intervals, the label is just the value of the point
### Else, the labels are in (a,b] or [a,b) type interval notation

qcut <- function(x, n, qu=NULL, eps=1e-06, digits=2, drop.unused.levels=FALSE) {
    if (is.null(qu))
        qu <- quantile(x, probs = seq(0, 1, 1/n))
    q2.equal <- sequence(table(qu)) <= 2
    qu <- qu[q2.equal] ## If three or more quantiles are equal, only keep two of them.
    ## Ensure all x fall within intervals, widening upper and lower intervals if necessary.
    qu[qu==max(qu)] <- max(max(qu),x)
    qu[qu==min(qu)] <- min(min(qu),x)
    n <- length(qu)-1
    lagq <- c(-Inf, qu[1:n])
    qlabs <- paste("[",round(qu[1:n],digits),",",round(qu[2:(n+1)],digits),")",sep="")
    inti <- qu[1:n]==qu[2:(n+1)] # which intervals are points
    qlabs[inti] <- round(qu[1:n][inti], digits)
    lastopen <- if (n==1) FALSE else c(FALSE, inti[1:(n-1)])
    substr(qlabs[lastopen],1,1) <- "(" # interval with a point interval just before it is open on the left
    if (!inti[n])
       substr(qlabs[length(qlabs)], nchar(qlabs[length(qlabs)]), nchar(qlabs[length(qlabs)])) <- "]" # last interval is closed on the right
    qu[qu==lagq] <- qu[qu==lagq] + eps
    x.cut <- cut(x, qu, right=FALSE, include.lowest=TRUE, labels=qlabs)
    tx <- tapply(x, x.cut, unique)
    nxcats <- sapply(tx, length) # number of unique x values in each category
    nxcats[is.na(tx)] <- 0
    ## If only one unique x value is in a category, then label the category with that value
    if (any(nxcats==1))
        levels(x.cut)[nxcats == 1] <- round(unlist(tx[nxcats==1]), digits)
    if (drop.unused.levels)
      x.cut <- factor(x.cut, exclude=NULL) # drop unused factor levels
    x.cut
}


pearson.boot.msm <- function(x, imp.times=NULL, transitions=NULL, timegroups=4, intervalgroups=4, covgroups=4, groups=NULL, B=500){
  bootstat <- numeric(B)
  x$call$formula <- if (x$emodel$misc) substitute(obs ~ time) else substitute(state ~ time)
  x$call$qmatrix <- qmatrix.msm(x,ci="none") # put MLE in inits.
  x$call$hessian <- x$call$death <- FALSE
  x$call$obstype <- NULL
  x$call$subject <- substitute(subject)
  x$call$subject.weights <- substitute(subject.weights)
  i <- 1
  while (i <= B) {
    if (!is.null(imp.times))
     x$data$mf$"(time)" <- imp.times[,sample(ncol(imp.times), size=1)]  # resample one of the imputed sets of observation times
    boot.df <- simfitted.msm(x,drop.absorb=TRUE)
    x$call$data <- substitute(boot.df)
    refit.msm <- try(eval(x$call)) # estimation might not converge for a particular bootstrap resample
    if (inherits(refit.msm, "msm")) {
      p <- pearson.msm(refit.msm, transitions=transitions, timegroups=timegroups,
                       intervalgroups=intervalgroups, covgroups=covgroups, groups=groups, boot=FALSE, pval=FALSE)
      bootstat[i] <- p$test$stat
      i <- i + 1
    }
  }
  bootstat
}

### Work out empirical distribution of sampling times for each observation (or time since initiation) group:

empiricaldists  <-  function(timeinterval, state, obgroup, obgroups, ndstates) {
    empdist <- array(0,c(3, obgroups, max(table(obgroup[state %in% ndstates])))) # Need times, cum value and point mass value
    dimnames(empdist) <- list(c("time", "surv", "pdeath"), levels(obgroup), NULL)
    ## First dimension:   1: time, 2: surv prob 3: death prob in prev interval
    ## Second dim: observation group.  Third dim: size of biggest obs group (i.e. maximum number of event times for KM estimate)
    for (i in 1:obgroups) {
        ##s1 <- survfit(Surv(timeinterval[obgroup==i],(state[obgroup==i] %in% ndstates))~1,type="kaplan-meier")
        ##survfit seems to ignore increments of small size ## TODO investigate this
        ##Instead use own code to create KM
        t <- round(timeinterval[(obgroup==i & state %in% ndstates)],4)
        eligt <- sort(unique(t))
        events <- table(t)
        allt <- sort(round(timeinterval[obgroup==i],4))
        empdist["time",i,1:length(eligt)] <- eligt
        for (j in 1:length(eligt)) {
            nrisk <- sum(allt + sqrt(.Machine$double.eps) >= eligt[j])
            if (nrisk==0) nrisk <- 1 # avoid floating point fuzz in comparing last point
            empdist["surv",i,j] <-
                if (j>1) (empdist["surv",i,j-1]*(1 - events[j]/nrisk))
                else  (1 - events[j]/nrisk)
            empdist["pdeath",i,j] <-
                if (j>1) (empdist["surv",i,j-1] - empdist["surv",i,j])
                else (1 - empdist["surv",i,j])
            ## Now deals with ties correctly
        }
    }
    empdist
}

### For a single death, sample N points from the distribution of the next sampling time
### mintime, centime: minimum and maximum possible times
### dist: matrix of times, survival probs and previous-interval death probs

sampletimes <- function(mintime, centime, dist, N, obgroup, intervalq) {
    ## dist takes the overall distribution from the relevant obgroup, but since we supply obgroup it might be better to just supply the whole thing
    dist <- array(dist, dim=dim(dist)[c(1,3)], dimnames=dimnames(dist)[c(1,3)])
    dist <- dist[,dist[1,]>mintime,drop=FALSE] # Remove times before the minimum possible time.
    if (length(dist) > 0) {
        dist[c("surv","pdeath"),] <- dist[c("surv","pdeath"),] / dist["surv",1] # Normalise
        ## Compute the censoring fraction
        cen <- 1 - sum(dist["pdeath",dist["time",]<centime])
        dist <- dist[, dist["time",]<centime, drop=FALSE]
        if (length(dist)>0) {
            cend <- as.numeric(runif(N) < cen)
            if (length(dist["time",])>1) {
                ## Suppress "Walker's alias method used, results incompatible with R < 2.2.0" warning
                times <- suppressWarnings(sample(dist["time",],N,replace=TRUE,prob=dist["pdeath",])) * (cend==0) + centime*(cend==1)
            }else{
                times <- dist["time",]*(cend==0) + centime*(cend==1)
            }
        }else{
            cend <- rep(1,N)
            times <- rep(centime,N)
        }
    }else{
        cend <- rep(1,N)
        times <- rep(centime,N)
    }
    intervalgroup <- qcut(times,qu=intervalq[[obgroup]])
    cbind(times=times, cens=cend, intervalgroup=intervalgroup)
}

## make internal to pearson function
## requires table of expected values with dims:  nstcens, nndstates, groupdims
## also observed table with same dims
## prob need to calculate these in every case.
## why not do this calculation with the ungrouped version of the table, collapse the first two dims, then aggregate the table by trans$use


### Replace fromstate,tostate dimensions at beginning by a single allowed-transition dimension at end

agg.tables <- function(obs.full, exp.full, groupdims, groupdimnames, trans) {
    obstable <- exptable <- array(dim=c(groupdims, trans$na))
    for (i in 1:trans$na) {
        obstable[,,,,i] <- obs.full[trans$allowed[i,2],trans$allowed[i,1],,,,]
        exptable[,,,,i] <- exp.full[trans$allowed[i,2],trans$allowed[i,1],,,,]
    }
    obstable <- replace(obstable,is.na(obstable),0)
    exptable <- replace(exptable,is.na(exptable),0)
    ## Aggregate by transition groups
    obstable  <-  aperm(apply(obstable, 1:4, function(u)tapply(u, trans$use, sum)), c(2:5,1))
    exptable <- aperm(apply(exptable, 1:4, function(u)tapply(u, trans$use, sum)), c(2:5,1))
    dimnames(obstable) <- dimnames(exptable) <- c(groupdimnames, list(trans$labsagg))
    list(obs=obstable, exp=exptable)
}

### Adjust expected counts to account for informative censoring times, according to method in paper addendum.

adjust.expected.cens <- function(exp.unadj, obs,
                                 nst, ndstates, dstates, groupdims, N, cens, md, nstcens)
{
    ## Compute total expected counts from each from-state, summed over destination state (n-tilde in paper addendum)
    nndstates <- length(ndstates)
    dims <- c(nndstates, groupdims)
    nobs <- apply(obs, 2:6, sum)  # Obs trans summed over first dimension = destination state = Number of trans from state r in group
    nobs.xd <- nobs - array(obs[nst,,,,,], dim=dims)  # Number of trans excluding death
    nobs.xdc <- if (cens) nobs.xd - array(obs[nst+1,,,,,,drop=FALSE], dim=dims) else nobs.xd # excluding death and censoring
    po <- array(0, dim = c(nst+1, dims))  # p-hat in appendix to paper, MLEs from unrestricted alternative model
    for (j in ndstates) # prop of uncensored trans from state r (and group) ending in state j (not death), multiplied by prop of trans not ending in death.
        po[j,,,,,] <- (array(obs[j,,,,,], dim=dims) * nobs.xd)/(nobs.xdc*nobs + (nobs.xdc*nobs==0))
    for (j in dstates)
        po[nst,,,,,] <- array(obs[j,,,,,], dim=dims) / (nobs + (nobs==0)) # prop of trans from state r (and group) ending in death. (if zero in denom, this is 0)
    po[nst+1,,,,,] <- 1 - apply(po[dstates,,,,,,drop=FALSE], 2:6, sum)
    po <- replace(po, is.na(po), 0)

    nexp <- apply(exp.unadj, 2:6, sum)
    nexp.xd <- nexp - array(exp.unadj[nst,,,,,], dim=dims)
    nexp.xdc <- if (cens) nexp.xd - array(exp.unadj[nst+1,,,,,,drop=FALSE],dim=dims) else nexp.xd
    ps <- array(0,dim=c(nst+1, dims)) # p-tilde-star in addendum to paper. "robustified" MLEs from null Markov model
    for (j in ndstates)
        ps[j,,,,,] <- (array(exp.unadj[j,,,,,], dim=dims)*nexp.xd)/(nexp.xdc*nexp + (nexp.xdc*nexp==0))
    for (j in dstates)
        ps[j,,,,,] <- array(exp.unadj[j,,,,,], dim=dims)/(nexp + (nexp==0))
    ps[nst+1,,,,,] <- 1 - apply(ps[dstates,,,,,,drop=FALSE], 2:6, sum)
    ps <- replace(ps,is.na(ps),0)

    ncen <- tapply(md$cens, list(md$prevstate,md$timegroup,md$intervalgroup,md$covgroup,md$usergroup), function(x)sum(x==1))
    dim(ncen) <- dims
    ncen <- replace(ncen, is.na(ncen), 0)
    exp.adj <- array(0, dim=c(nstcens,dims)) #  = p-tilde-star * n / phat  (in notation of paper)  =   ps * obs / po
    exp.adj[ndstates,,,,,] <-
        ps[ndstates,,,,,,drop=FALSE] *
            rep(nobs.xdc * nobs,each=nndstates) /
                rep(nobs.xd + (nobs.xd==0),each=nndstates)
    for (j in dstates)
        exp.adj[j,,,,,] <- array(ps[j,,,,,], dim=dims) * nobs
    if (cens)
        exp.adj[nst+1,,,,,] <- nobs - apply(exp.adj[1:nst,,,,,,drop=FALSE], 2:6, sum)

    exp.adj
}