# Comparing Correlations

Several groups have designs where they want to compare correlations. For example, you might be asking, "is the correlation between item norms for group A and B larger for CLMM estimates than for LMM estimates?"

Here's a quick example showing how to calculate this...

In [1]:
options(repr.plot.width=4, repr.plot.height=3.5)

Imagine we have the following data, with estimates for group A and group B, from an LMM (`lmer()`) and a CLMM (`clmm`).

(I think these data are organised similarly to all the groups I talked to, but let me know if not!)

In [3]:
library(dplyr)
library(tidyr)
library(readr)
library(faux)
set.seed(42)

results_wide <- rnorm_multi(
    n = 100,
    vars = 4,
    mu = c(3, 3, 0, 0),
    sd = c(2.3, 2, 1, 1),
    r = c(
        1, 0.7, 0.68, 0.7,
        0.7, 1, 0.7, 0.69,
        0.68, 0.7, 1, 0.88,
        0.7, 0.69, 0.88, 1
    ),
    varnames = c("A_LMM", "B_LMM", "A_CLMM", "B_CLMM")
) |>
    as_tibble() |>
    mutate(item_id = sprintf("word_%03d", 1:n())) |>
    select(item_id, everything())

write_csv(results_wide, "example_wide_res.csv")

In [None]:
library(readr)
library(cocor)

results_wide <- read_csv("https://raw.githubusercontent.com/JackEdTaylor/expra-wise24/master/lecture/introduction/example_wide_res.csv")

print(results_wide)

[90m# A tibble: 100 x 5[39m
   item_id   A_LMM  B_LMM  A_CLMM B_CLMM
   [3m[90m<chr>[39m[23m     [3m[90m<dbl>[39m[23m  [3m[90m<dbl>[39m[23m   [3m[90m<dbl>[39m[23m  [3m[90m<dbl>[39m[23m
[90m 1[39m word_001  0.647 -[31m1[39m[31m.[39m[31m0[39m[31m2[39m  -[31m0[39m[31m.[39m[31m198[39m  -[31m0[39m[31m.[39m[31m168[39m
[90m 2[39m word_002  5.09   3.20  -[31m0[39m[31m.[39m[31m0[39m[31m24[4m1[24m[39m  0.397
[90m 3[39m word_003  1.63   3.53  -[31m0[39m[31m.[39m[31m800[39m  -[31m0[39m[31m.[39m[31m821[39m
[90m 4[39m word_004  3.43   0.839 -[31m1[39m[31m.[39m[31m98[39m   -[31m1[39m[31m.[39m[31m54[39m 
[90m 5[39m word_005  1.37   2.48   0.508   0.399
[90m 6[39m word_006  3.12   2.79   0.691   0.662
[90m 7[39m word_007 -[31m0[39m[31m.[39m[31m712[39m  0.477 -[31m0[39m[31m.[39m[31m920[39m  -[31m0[39m[31m.[39m[31m716[39m
[90m 8[39m word_008  2.92   3.00   0.402   0.877
[90m 9[39m word_009 -[3

In [None]:
print(results_wide)

Notice that the data are in wide format - we have all the estimates for each item in separate columns, and one row per item. This is important, as `cocor()` will use this to detect which kind of correlation comparison it should use. We want it to know that these estimates refer to the same items.

Remember that we can check the correlation matrix like so:

In [None]:
results_wide |>
  # remove non-numeric variable
  select(-item_id) |>
  # get correlation matrix
  cor()

Unnamed: 0,A_LMM,B_LMM,A_CLMM,B_CLMM
A_LMM,1.0,0.7749739,0.680393,0.6803572
B_LMM,0.7749739,1.0,0.7273935,0.7014971
A_CLMM,0.680393,0.7273935,1.0,0.9012779
B_CLMM,0.6803572,0.7014971,0.9012779,1.0


In [None]:
results_wide |>
  select(-item_id) |>
  cor() |>
  print()

           A_LMM     B_LMM    A_CLMM    B_CLMM
A_LMM  1.0000000 0.7749739 0.6803930 0.6803572
B_LMM  0.7749739 1.0000000 0.7273935 0.7014971
A_CLMM 0.6803930 0.7273935 1.0000000 0.9012779
B_CLMM 0.6803572 0.7014971 0.9012779 1.0000000


A_LMM and B_LMM have a correlation of r = .77.

A_CLMM and B_CLMM have a correlation of r = .9.

<br>


## Using `cocor()`

We can use `cocor()` to see whether the sizes of these correlations differ to an extent that we can consider statistically significant.

#### Notes on syntax:

The syntax for `cocor()` formulas:
* Starts with a `~`
* Has the first correlation as `var_1 + var_2`
* Uses a `|` symbol to say separate the correlations that should be compared
* Has the second correlation as `var_3 + var_4`

#### Notes on bugs

When you specify `data = my_data_frame`, you may have to explicitly tell the function that the data is a dataframe, with `data.frame(my_data_frame)`. This is because `cocor` doesn't recognise `tidyverse` tibbles as dataframes.

#### Example Usage

In [None]:
cocor(
    ~ A_LMM + B_LMM | A_CLMM + B_CLMM,
    data = data.frame(results_wide)
)


  Results of a comparison of two nonoverlapping correlations based on dependent groups

Comparison between r.jk (A_LMM, B_LMM) = 0.775 and r.hm (A_CLMM, B_CLMM) = 0.9013
Difference: r.jk - r.hm = -0.1263
Related correlations: r.jh = 0.6804, r.jm = 0.6804, r.kh = 0.7274, r.km = 0.7015
Data: data.frame(results_wide): j = A_LMM, k = B_LMM, h = A_CLMM, m = B_CLMM
Group size: n = 100
Null hypothesis: r.jk is equal to r.hm
Alternative hypothesis: r.jk is not equal to r.hm (two-sided)
Alpha: 0.05

pearson1898: Pearson and Filon's z (1898)
  z = -3.2476, p-value = 0.0012
  Null hypothesis rejected

dunn1969: Dunn and Clark's z (1969)
  z = -3.6893, p-value = 0.0002
  Null hypothesis rejected

steiger1980: Steiger's (1980) modification of Dunn and Clark's z (1969) using average correlations
  z = -3.6859, p-value = 0.0002
  Null hypothesis rejected

raghunathan1996: Raghunathan, Rosenthal, and Rubin's (1996) modification of Pearson and Filon's z (1898)
  z = -3.6893, p-value = 0.0002
  Null hy

This gives us several estimates of whether the correlation is significantly larger for CLMMs than for LMMs. The estimates are usually likely to be similar for the different methods. For the ExPra, if you didn't preregister which comparison you would use, then Pearson and Filon's z (1898) will usually suffice.

#### How to Report

We could report this as:

> We found that the correlation between item estimates from group A and group B was larger for the CLMMs (*r* = 9) than it was for the LMMs (*r* = .77). Comparing the size of these correlations with Pearson and Filon's (1898) *z* value method revealed that the correlation between the CLMMs was significantly larger than the correlation between the LMMs (*z* = -3.25, *p* = .001).