# Defacing pre-registration - Statistical analysis in R

## Load simulated data

In [2]:
df <- read.table(file = 'SimulatedDefacedRatings.csv', sep = ",", header = TRUE, stringsAsFactors = TRUE)

The simulated data were generated by running the `SimulateDefacedRatings.ipnyb` notebook.

## Continuation ratio mixed effects regression

The ratings are ordinal, i.e. categorical variables with a natural ordering of their levels. In order to model raters’ variabilities, we will use mixed effects regression from the GLMMadaptive package in R. This analysis is inspired from https://drizopoulos.github.io/GLMMadaptive/articles/Ordinal_Mixed_Models.html.

The backward formulation is commonly used when progression through ratings from excluded, poor, good, excellent, is represented by increasing integer values, and interest lies in estimating the odds of better quality compared to worst quality. The forward formulation specifies that subjects have to ‘pass through’ one category to get to the next one. The backward formulation is thus suitable for our analysis.

An advantage of the continuation ratio model is that its likelihood can be easily re-expressed such that it can be fitted with software the fits (mixed effects) logistic regression. This formulation requires a couple of data management steps creating separate records for each measurement, and suitably replicating the corresponding rows of the design matrices Xi and Zi. In addition, a new ‘cohort’ variable is constructed denoting at which category the specific measurement of i-th subject belongs. An extra advantage of this formulation is that we can easily evaluate if specific covariates satisfy the ordinality assumption (i.e., that their coefficients are independent of the category k) by including into the model their interaction with the ‘cohort’ variable and testing its significance.


In [4]:
library(GLMMadaptive)

## Data set-up and calculation of marginal probabilites from a continuation ratio model
cr_vals <- cr_setup(df$ratings, direction = "backward")
cr_data <- df[cr_vals$subs, ]
cr_data$ratings_new <- cr_vals$y
cr_data$cohort <- cr_vals$cohort

## Fits generalized linear mixed effects model
fm <- mixed_model(ratings_new ~ cohort + defaced + rater, random = ~ 1 | rater, data = cr_data, family = binomial(), na.action = na.exclude)
fm


Call:
mixed_model(fixed = ratings_new ~ cohort + defaced + rater, random = ~1 | 
    rater, data = cr_data, family = binomial(), na.action = na.exclude)


Model:
 family: binomial
 link: logit 

Random effects covariance matrix:
                 StdDev
(Intercept) 0.009017118

Fixed effects:
      (Intercept)     cohorty<=good cohorty<=excluded   defacedoriginal 
    -0.9129735766      0.3668093138      0.6392333953     -0.0006134864 
           rater2            rater3            rater4            rater5 
    -0.1263234751     -0.0688977084     -0.1184201225      0.0001946318 
           rater6            rater7            rater8            rater9 
    -0.0242123897     -0.2368901397     -0.3441942007     -0.3221685507 
          rater10           rater11           rater12 
    -0.3926695910     -0.1975251922     -0.2950198344 

log-Lik: -19062.91


`random = ~ 1 | rater` enables to account for between-raters variability. This formula lets the intercept take a different value for each rater.

### Relaxing continuation ratio assumption

We can relax the ordinality assumption for the defaced variable, namely, allowing that the effect of defaced is different for each of the response categories of our ordinal outcome y. This can be achieved by simply including the interaction term between the defaced and cohort variables.

In [8]:
gm <- mixed_model(ratings_new ~ cohort * defaced + rater, random = ~ 1 | rater, data = cr_data, family = binomial())
gm


Call:
mixed_model(fixed = ratings_new ~ cohort * defaced + rater, random = ~1 | 
    rater, data = cr_data, family = binomial())


Model:
 family: binomial
 link: logit 

Random effects covariance matrix:
                StdDev
(Intercept) 0.00835499

Fixed effects:
                 (Intercept)                cohorty<=good 
                 -1.27016209                   0.37607078 
               cohorty<=poor               defaceddefaced 
                  1.10847654                   0.54198717 
                      rater2                       rater3 
                 -0.06513242                  -0.05340367 
                      rater4                       rater5 
                  0.02817786                   0.17916487 
                      rater6                       rater7 
                  0.11767470                   0.14844202 
                      rater8                       rater9 
                  0.25101273                   0.19715950 
                     rat

To test whether these extensions are required we can perform a likelihood ratio test using the anova() method:

In [37]:
anova(fm,gm)


        AIC      BIC   log.Lik   LRT df p.value
fm 37740.76 37748.52 -18854.38                 
gm 37710.82 37719.54 -18837.41 33.94  2 <0.0001


**Discussion** The first test rejects the null hypothesis meaning that defaced doesn't satisfy the continuous ratio assumption => the addition of the interaction term cohort x defaced is thus necessary.

**We also test whether we need to relax the ordinality assumption for the rater predictor.**

In [35]:
#gm2 <- mixed_model(ratings_new ~ cohort * defaced + cohort * rater, random = ~ 1 | rater, data = cr_data, family = binomial())
#gm2


Call:
mixed_model(fixed = ratings_new ~ cohort * defaced + rater, random = ~1 | 
    rater, data = cr_data, family = binomial())


Model:
 family: binomial
 link: logit 

Random effects covariance matrix:
                 StdDev
(Intercept) 0.008354982

Fixed effects:
                  (Intercept)                 cohorty<=good 
                  -0.72818126                    0.10594841 
                cohorty<=poor               defacedoriginal 
                   1.19284550                   -0.54196754 
                       rater2                        rater3 
                  -0.06513242                   -0.05340367 
                       rater4                        rater5 
                   0.02817786                    0.17916487 
                       rater6                        rater7 
                   0.11767470                    0.14844202 
                       rater8                        rater9 
                   0.25101273                    0.19715950

In [36]:
#anova(gm, gm2)


         AIC      BIC   log.Lik   LRT df p.value
gm  37710.82 37719.54 -18837.41                 
gm2 37730.15 37749.54 -18825.07 24.67 22  0.3132


**Discussion** The second test doesn't reject the null hypothesis, meaning that rater satisfied the continuation ratio assumption, thus the interaction term cohort x rater is not necessary.

gm is our final model.

### Effect plot of conditional probabilities

In [55]:
library(lattice)

#Extract data necessary to plot
nDF <- with(cr_data, expand.grid(cohort = levels(cohort), defaced = levels(defaced), 
                                 rater = rater))

plot_data <- effectPlotData(gm, nDF, direction="backward")

#Plot
expit <- function (x) exp(x) / (1 + exp(x))
my_panel_bands <- function(x, y, upper, lower, fill, col, subscripts, ..., font, 
                           fontface) {
    upper <- upper[subscripts]
    lower <- lower[subscripts]
    panel.polygon(c(x, rev(x)), c(upper, rev(lower)), col = fill, border = FALSE, ...)
}


xyplot(expit(pred) ~ rater | defaced, group = cohort, data = plot_data, 
       upper = expit(plot_data$upp), low = expit(plot_data$low), type = "l",
       panel = function (x, y, ...) {
           panel.superpose(x, y, panel.groups = my_panel_bands, ...)
           panel.xyplot(x, y, lwd = 2,  ...)
       }, xlab = "Rater", ylab = "Continuation Ratio Probabilities")

“variable 'rater' is not a factor”


ERROR: Error in X %*% betas: non-conformable arguments


### Effect plot of marginal probabilities

The effect plot of the previous section depicts the conditional probabilities according to the backward formulation of the continuation ratio model. However, it is easier to understand the marginal probabilities of each category.

In [53]:
#Extract data for plot
plot_data_m <- effectPlotData(fm, nDF, CR_cohort_varname = "cohort", 
                              direction = "backward")
#Plot
key <- list(space = "top", rep = FALSE,
            text = list(levels(DF$y)[1:2]),
            lines = list(lty = c(1, 1), lwd = c(2, 2), col = c("#0080ff", "#ff00ff")),
            text = list(levels(DF$y)[3:4]),
            lines = list(lty = c(1, 1), lwd = c(2, 2), col = c("darkgreen", "#ff0000")))

xyplot(expit(pred) ~ time | sex, group = ordinal_response, data = plot_data_m, 
       upper = expit(plot_data_m$upp), low = expit(plot_data_m$low), type = "l",
       panel = function (x, y, ...) {
           panel.superpose(x, y, panel.groups = my_panel_bands, ...)
           panel.xyplot(x, y, lwd = 2, ...)
       }, xlab = "Follow-up time", ylab = "Marginal Probabilities", key = key)

“variable 'rater' is not a factor”


ERROR: Error in X %*% betas: non-conformable arguments
