# Defacing pre-registration - Statistical analysis in R

We are running simulations order to evaluate how many missing values we are able to handle. The goal is to determine the maximum number of missing values before the model stops converging. We start with all the raters rating all the subjects and iteratively droping subjects.

#### Function to simulate data with missing values

In [4]:
source("simulate_data.R")

#### Run lmer on dataset with missing values

In [10]:
n_sub <- 580 #nbr of subjects available in the dataset
n_drop <- 50
n_rater <- 100 #nbr of raters
#Define for each rater the percentage of biased ratings
perc_biased <- rep(c(2,20,40,40,40,40,60,60,60,80), times = n_rater/10)

library(coefplot2)

for (j in seq(0, n_sub, by=n_drop)){
    df <- simulate_data(n_sub-j, n_sub, n_rater, perc_biased, file=sprintf("SimulatedData/SimulatedDefacedRatings_%.2fMissing.Rda",j/n_sub*100))
    
    library(lme4)
    fm1 <- lmer(as.numeric(ratings) ~ defaced + (defaced | rater), data=df, na.action=na.omit, REML = TRUE)
    
    print(sprintf("_______________%.02f missing values__________", j*100/n_sub))
    print(summary(fm1)) 
    
    ## Visualize fixed effect regression coefficients
    #coefplot2(fm1)
}

[1] "_______________0.00 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (defaced | rater)
   Data: df

REML criterion at convergence: 349373.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9267 -0.7896  0.1225  0.9968  1.4061 

Random effects:
 Groups   Name           Variance  Std.Dev. Corr
 rater    (Intercept)    0.0004372 0.02091      
          defaceddefaced 0.0214948 0.14661  0.02
 Residual                1.1871586 1.08957      
Number of obs: 116000, groups:  rater, 100

Fixed effects:
               Estimate Std. Error t value
(Intercept)    2.490724   0.004984  499.75
defaceddefaced 0.332879   0.015996   20.81

Correlation of Fixed Effects:
            (Intr)
defaceddfcd -0.249


boundary (singular) fit: see ?isSingular



[1] "_______________8.62 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (defaced | rater)
   Data: df

REML criterion at convergence: 319315.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9487 -0.7807  0.1368  0.9683  1.4053 

Random effects:
 Groups   Name           Variance Std.Dev. Corr
 rater    (Intercept)    0.00000  0.0000       
          defaceddefaced 0.02615  0.1617    NaN
 Residual                1.18769  1.0898       
Number of obs: 106000, groups:  rater, 100

Fixed effects:
               Estimate Std. Error t value
(Intercept)    2.497755   0.004734  527.64
defaceddefaced 0.330094   0.017502   18.86

Correlation of Fixed Effects:
            (Intr)
defaceddfcd -0.270
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular



boundary (singular) fit: see ?isSingular



[1] "_______________17.24 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (defaced | rater)
   Data: df

REML criterion at convergence: 288718.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9914 -0.7770  0.1429  0.9665  1.4397 

Random effects:
 Groups   Name           Variance Std.Dev. Corr
 rater    (Intercept)    0.00000  0.0000       
          defaceddefaced 0.02634  0.1623    NaN
 Residual                1.18165  1.0870       
Number of obs: 96000, groups:  rater, 100

Fixed effects:
               Estimate Std. Error t value
(Intercept)    2.500729   0.004962  504.01
defaceddefaced 0.331063   0.017683   18.72

Correlation of Fixed Effects:
            (Intr)
defaceddfcd -0.281
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular



“Model failed to converge with max|grad| = 0.0131452 (tol = 0.002, component 1)”


[1] "_______________25.86 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (defaced | rater)
   Data: df

REML criterion at convergence: 259610.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9692 -0.7779  0.1420  0.9840  1.3883 

Random effects:
 Groups   Name           Variance  Std.Dev. Corr
 rater    (Intercept)    0.0001105 0.01051      
          defaceddefaced 0.0199480 0.14124  1.00
 Residual                1.1949455 1.09314      
Number of obs: 86000, groups:  rater, 100

Fixed effects:
               Estimate Std. Error t value
(Intercept)    2.503372   0.005375   465.7
defaceddefaced 0.329047   0.015971    20.6

Correlation of Fixed Effects:
            (Intr)
defaceddfcd -0.151
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.0131452 (tol = 0.002, component 1)



boundary (singular) fit: see ?isSingular



[1] "_______________34.48 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (defaced | rater)
   Data: df

REML criterion at convergence: 229043.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9493 -0.7904  0.1225  0.9751  1.4065 

Random effects:
 Groups   Name           Variance Std.Dev. Corr
 rater    (Intercept)    0.0000   0.0000       
          defaceddefaced 0.0263   0.1622    NaN
 Residual                1.1885   1.0902       
Number of obs: 76000, groups:  rater, 100

Fixed effects:
               Estimate Std. Error t value
(Intercept)    2.497421   0.005593   446.6
defaceddefaced 0.330132   0.018043    18.3

Correlation of Fixed Effects:
            (Intr)
defaceddfcd -0.310
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular



“Model failed to converge with max|grad| = 0.00201509 (tol = 0.002, component 1)”


[1] "_______________43.10 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (defaced | rater)
   Data: df

REML criterion at convergence: 198980.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9539 -0.7794  0.1224  0.9769  1.4162 

Random effects:
 Groups   Name           Variance  Std.Dev. Corr
 rater    (Intercept)    0.0002477 0.01574      
          defaceddefaced 0.0189072 0.13750  1.00
 Residual                1.1897243 1.09074      
Number of obs: 66000, groups:  rater, 100

Fixed effects:
               Estimate Std. Error t value
(Intercept)    2.492182   0.006207  401.50
defaceddefaced 0.331030   0.016161   20.48

Correlation of Fixed Effects:
            (Intr)
defaceddfcd -0.144
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.00201509 (tol = 0.002, component 1)



“Model failed to converge with max|grad| = 0.00269179 (tol = 0.002, component 1)”


[1] "_______________51.72 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (defaced | rater)
   Data: df

REML criterion at convergence: 168753.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9281 -0.7931  0.1302  1.0000  1.4252 

Random effects:
 Groups   Name           Variance Std.Dev. Corr
 rater    (Intercept)    0.00016  0.01265      
          defaceddefaced 0.01866  0.13661  1.00
 Residual                1.18776  1.08984      
Number of obs: 56000, groups:  rater, 100

Fixed effects:
               Estimate Std. Error t value
(Intercept)    2.499821   0.006635  376.77
defaceddefaced 0.330750   0.016476   20.07

Correlation of Fixed Effects:
            (Intr)
defaceddfcd -0.230
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.00269179 (tol = 0.002, component 1)



boundary (singular) fit: see ?isSingular



[1] "_______________60.34 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (defaced | rater)
   Data: df

REML criterion at convergence: 138392.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9306 -0.7850  0.1287  1.0049  1.4189 

Random effects:
 Groups   Name           Variance  Std.Dev. Corr
 rater    (Intercept)    0.0003035 0.01742      
          defaceddefaced 0.0174879 0.13224  1.00
 Residual                1.1814085 1.08693      
Number of obs: 46000, groups:  rater, 100

Fixed effects:
               Estimate Std. Error t value
(Intercept)    2.493087   0.007376  338.01
defaceddefaced 0.333522   0.016662   20.02

Correlation of Fixed Effects:
            (Intr)
defaceddfcd -0.231
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular



boundary (singular) fit: see ?isSingular



[1] "_______________68.97 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (defaced | rater)
   Data: df

REML criterion at convergence: 108024.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9804 -0.7719  0.1240  0.9671  1.4864 

Random effects:
 Groups   Name           Variance Std.Dev. Corr
 rater    (Intercept)    0.002086 0.04567      
          defaceddefaced 0.013924 0.11800  1.00
 Residual                1.170899 1.08208      
Number of obs: 36000, groups:  rater, 100

Fixed effects:
               Estimate Std. Error t value
(Intercept)    2.492722   0.009269  268.94
defaceddefaced 0.335444   0.016412   20.44

Correlation of Fixed Effects:
            (Intr)
defaceddfcd -0.073
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular



boundary (singular) fit: see ?isSingular



[1] "_______________77.59 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (defaced | rater)
   Data: df

REML criterion at convergence: 78302.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9817 -0.7852  0.1249  0.9926  1.4602 

Random effects:
 Groups   Name           Variance Std.Dev. Corr
 rater    (Intercept)    0.002396 0.04895      
          defaceddefaced 0.011125 0.10548  1.00
 Residual                1.183062 1.08769      
Number of obs: 26000, groups:  rater, 100

Fixed effects:
               Estimate Std. Error t value
(Intercept)     2.51523    0.01072  234.59
defaceddefaced  0.32985    0.01712   19.26

Correlation of Fixed Effects:
            (Intr)
defaceddfcd -0.214
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular



boundary (singular) fit: see ?isSingular



[1] "_______________86.21 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (defaced | rater)
   Data: df

REML criterion at convergence: 48088.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9599 -0.7759  0.1324  0.9670  1.5462 

Random effects:
 Groups   Name           Variance Std.Dev. Corr
 rater    (Intercept)    0.006507 0.08066      
          defaceddefaced 0.007077 0.08412  1.00
 Residual                1.172981 1.08304      
Number of obs: 16000, groups:  rater, 100

Fixed effects:
               Estimate Std. Error t value
(Intercept)     2.50412    0.01455  172.11
defaceddefaced  0.33350    0.01908   17.48

Correlation of Fixed Effects:
            (Intr)
defaceddfcd -0.284
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular



boundary (singular) fit: see ?isSingular



[1] "_______________94.83 missing values__________"
Linear mixed model fit by REML ['lmerMod']
Formula: as.numeric(ratings) ~ defaced + (defaced | rater)
   Data: df

REML criterion at convergence: 18134.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.7612 -0.7542  0.1598  1.0466  1.3794 

Random effects:
 Groups   Name           Variance Std.Dev. Corr
 rater    (Intercept)    0.000000 0.00000      
          defaceddefaced 0.008715 0.09336   NaN
 Residual                1.196772 1.09397      
Number of obs: 6000, groups:  rater, 100

Fixed effects:
               Estimate Std. Error t value
(Intercept)     2.49100    0.01997  124.72
defaceddefaced  0.33233    0.02975   11.17

Correlation of Fixed Effects:
            (Intr)
defaceddfcd -0.671
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular



### What does na.omit do ?

In [3]:
test <- matrix(,nrow=3 ,ncol=2)
test[1,2] <- 3
test[2,1] <- 4
test[2,2] <- 4
test[3,2] <- 6

na.exclude(test)

0,1
4,4


It discards a dataframe row when one value is missing.