# Defacing pre-registration - Statistical analysis in R

We are running simulations order to evaluate how many missing values we are able to handle. The goal is to determine the maximum number of missing values before the model stops converging. We start with all the raters rating all the subjects and iteratively droping subjects.

#### Function to simulate data with missing values

In [1]:
source("simulate_data.R")

#### Run lmer on dataset with missing values

In [11]:
n_sub <- 580 #nbr of subjects available in the dataset
n_drop <- 50
n_rater <- 20 #nbr of raters
#Define for each rater the percentage of biased ratings
perc_biased <- rep(c(2,20,40,40,40,40,60,60,60,80), times = n_rater/10)
ratings_range <- seq(0,100,length.out=101)
print(ratings_range)
labels <- c('excluded','0.1','poor','0.3','acceptable','0.5','good','0.7','very good','0.9','excellent')
bias <- 1

library(coefplot2)

for (j in seq(0, n_sub, by=n_drop)){
    print(sprintf("_______________%.02f missing values__________", j*100/n_sub))
    
    filename = sprintf("SimulatedData/SimulatedDefacedRatings_%.2fMissing.Rda",j/n_sub*100)
    
    df <- simulate_data(n_sub-j, n_sub, n_rater, perc_biased, ratings_range=ratings_range, bias=bias)
    
    library(lme4)
    fm1 <- lmer(as.numeric(ratings) ~ defaced + (1 | rater), data=df, na.action=na.omit, REML = TRUE)
    
    
    print(summary(fm1)) 
    
    ## Visualize fixed effect regression coefficients
    #coefplot2(fm1)
}

 [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
[1] "_______________0.00 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (1 | rater)
   Data: df

REML criterion at convergence: 69863.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.8105 -0.7568  0.1536  1.0417  1.5202 

Random effects:
 Groups   Name        Variance Std.Dev.
 rater    (Intercept) 0.007201 0.08486 
 Residual             1.186768 1.08939 
Number of obs: 23200, groups:  rater, 20

Fixed effects:
               Estimate Std. Error t value
(Intercept)      2.4972     0.0215   116.1
defaceddefaced   0.3333     0.0143    23.3

Correlation of Fixed Effects:
            (Intr)
defaceddfcd -0.333
[1] "_______________8.62 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (1 | rater)
   Data: df

REML criterion at convergence: 63800.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max

### What does na.omit do ?

In [3]:
test <- matrix(,nrow=3 ,ncol=2)
test[1,2] <- 3
test[2,1] <- 4
test[2,2] <- 4
test[3,2] <- 6

na.exclude(test)

0,1
4,4


It discards a dataframe row when one value is missing.