# Defacing pre-registration - Statistical analysis in R

## Load simulated data

In [16]:
df <- read.table(file = 'SimulatedDefacedRatings.csv', sep = ",", header = TRUE, stringsAsFactors = TRUE)

The simulated data were generated by running the `SimulateDefacedRatings.ipnyb` notebook.

## Ordered logistic regression

Before increasing the complexity of the statistical model to account for between-raters variability, I gained understanding by running ordered logistic regression. The analysis is inspired from https://stats.oarc.ucla.edu/r/dae/ordinal-logistic-regression/

In [17]:
## Fit ordered logistic regression

library(MASS)
m <- polr(ratings ~ defaced + rater, data = df, Hess=TRUE, method = "logistic")
m

Call:
polr(formula = ratings ~ defaced + rater, data = df, Hess = TRUE, 
    method = "logistic")

Coefficients:
defacedoriginal           rater 
   -0.004995892    -0.022403282 

Intercepts:
excellent|excluded      excluded|good          good|poor 
        -0.9580290         -0.1320461          0.9408084 

Residual Deviance: 38218.07 
AIC: 38228.07 

**Compute p-values associated to the predictors' regression coefficients**.Those p-values can inform us whether adding those predictors improves the model fit.
One way to calculate a p-value in this case is by comparing the t-value against the standard normal distribution, like a z test. Of course this is only true with infinite degrees of freedom, but is reasonably approximated by large samples, becoming increasingly biased as sample size decreases

In [20]:
## Compute p-values

# Store table
ctable <- coef(summary(m))

# calculate and store p values
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2

# combined table
(ctable <- cbind(ctable, "p value" = p))

Unnamed: 0,Value,Std. Error,t value,p value
defacedoriginal,-0.004995892,0.030417672,-0.1642431,0.8695398
rater,-0.022403282,0.004403304,-5.087834,3.621762e-07
excellent|excluded,-0.958029012,0.036958519,-25.9217372,3.788642e-148
excluded|good,-0.132046061,0.036016872,-3.666228,0.0002461545
good|poor,0.940808416,0.037324476,25.2062058,3.424855e-140


**Compute confidence interval** Another way to get a feeling whether your predictors improve your model fit is to get confidence intervals for the parameter estimates. These can be obtained either by profiling the likelihood function or by using the standard errors and assuming a normal distribution. Note that profiled CIs are not symmetric (although they are usually close to symmetric). If the 95% CI does not cross 0, the parameter estimate is statistically significant.

In [22]:
## Compute confidence interval

(ci <- confint(m)) # default method gives profiled CIs
confint.default(m) # CIs assuming normality

Waiting for profiling to be done...



Unnamed: 0,2.5 %,97.5 %
defacedoriginal,-0.06461512,0.05462191
rater,-0.03103604,-0.01377503


Unnamed: 0,2.5 %,97.5 %
defacedoriginal,-0.06461343,0.05462165
rater,-0.0310336,-0.01377296


**Discussion** Both the p-values and the confidence interval indicate that both the defacing and rater predictors improve the fit of the model. It is thus beneficial to keep those two predictors in the model.

### Verifying proportional odds assumption

One of the assumptions underlying ordinal logistic (and ordinal probit) regression is that the relationship between each pair of outcome groups is the same. In other words, ordinal logistic regression assumes that the coefficients that describe the relationship between, say, the lowest versus all higher categories of the response variable are the same as those that describe the relationship between the next lowest category and all higher categories, etc. This is called the proportional odds assumption or the parallel regression assumption. Because the relationship between all pairs of groups is the same, there is only one set of coefficients. If this was not the case, we would need different sets of coefficients in the model to describe the relationship between each pair of outcome groups.

In [26]:
## Estimates the values that will be graphed

#The sf function calculates the log odds of being greater than or equal to each value of the target variable
sf <- function(y) {
  c('Y>=1' = qlogis(mean(y >= 1)),
    'Y>=2' = qlogis(mean(y >= 2)),
    'Y>=3' = qlogis(mean(y >= 3)),
    'Y>=4' = qlogis(mean(y >= 4)))
}

#calls the function sf on several subsets of the data defined by the predictors
(s <- with(df, summary(as.numeric(ratings) ~ defaced + rater, fun=sf)))

 Length   Class    Mode 
      3 formula    call 

The table above displays the (linear) predicted values we would get if we regressed our dependent variable on our predictor variables one at a time, without the parallel slopes assumption. We can evaluate the parallel slopes assumption by running a series of binary logistic regressions with varying cutpoints on the dependent variable and checking the equality of coefficients across cutpoints. We thus relax the parallel slopes assumption to checks its tenability. To accomplish this, we transform the original, ordinal, dependent variable into a new, binary, dependent variable which is =0 if the original, ordinal dependent variable (here apply) is < some value a, and 1 if the ordinal variable is >= a. This is done for k-1 levels of the ordinal variable and is executed by the   ` as.numeric(apply) >= a` coding below.

In [29]:
## Run series of binary logistic regression and check equality of coefficient across cutpoints
glm(I(as.numeric(ratings) >= 2) ~ defaced, family="binomial", data = df)


Call:  glm(formula = I(as.numeric(ratings) >= 2) ~ defaced, family = "binomial", 
    data = df)

Coefficients:
    (Intercept)  defacedoriginal  
       0.808596         0.007423  

Degrees of Freedom: 13919 Total (i.e. Null);  13918 Residual
Null Deviance:	    17180 
Residual Deviance: 17180 	AIC: 17180

In [30]:
glm(I(as.numeric(ratings) >= 3) ~ defaced, family="binomial", data = df)


Call:  glm(formula = I(as.numeric(ratings) >= 3) ~ defaced, family = "binomial", 
    data = df)

Coefficients:
    (Intercept)  defacedoriginal  
      -0.004598        -0.015518  

Degrees of Freedom: 13919 Total (i.e. Null);  13918 Residual
Null Deviance:	    19300 
Residual Deviance: 19300 	AIC: 19300

**Discussion** When defaced = 'defaced', the difference between the predicted value for apply >=2 and apply >=3 is roughly 2 (1.82 + 0.44 = 2.26). For defaced = 'original', the difference in predicted values for apply >= 2 and apply >=3 is roughly 1 ((1.82-0.73)+(0.44-0.46) = 1.07). This suggests that the parallel slopes assumption is not reasonable.