cohens_d gives different results from cohen.d function from package psych. #222

studerus · 2020-12-04T12:23:52Z

Here is an example:

cohens_d(CO2$uptake, CO2$Treatment)$Cohens_d

gives 0.6652288

psych::cohen.d(CO2[, c('uptake', 'Treatment')], 'Treatment')$cohen.d

gives -0.6732924

When I calculate cohen's d manually, I can confirm the results of the psych package. So, it seems that the bug is in the effectsize package.

The text was updated successfully, but these errors were encountered:

mattansb · 2020-12-04T13:47:36Z

@studerus ,

library(dplyr)

mtcars$am <- factor(mtcars$am)

sum <- mtcars %>% 
  group_by(am) %>% 
  summarise(across(
    .cols = mpg, 
    .fns = list(mean = mean, sd = sd, n = length), 
    .names = "{fn}"
  ))
#> `summarise()` ungrouping output (override with `.groups` argument)

sum
#> # A tibble: 2 x 4
#>   am     mean    sd     n
#>   <fct> <dbl> <dbl> <int>
#> 1 0      17.1  3.83    19
#> 2 1      24.4  6.17    13

Using the following to get the estimate of the population’s (pooled) sd:

We get the same results from effectsize:

with(sum, {
  d <- mean[1] - mean[2]
  sp <- sqrt(((n[1] - 1) * sd[1] ^ 2 + (n[2] - 1) * sd[2] ^ 2) / (n[1] + n[2] - 2))
  d / sp
})
#> [1] -1.477947

effectsize::cohens_d(mtcars$mpg, mtcars$am)$Cohens_d
#> [1] -1.477947

However, in psych instead of dividing by n1 + n2 - 2, the function divides by n1 + n2:

with(sum, {
  d <- mean[1] - mean[2]
  sp <- sqrt(((n[1] - 1) * sd[1] ^ 2 + (n[2] - 1) * sd[2] ^ 2) / (n[1] + n[2]))
  d / sp
})
#> [1] -1.526417

psych::cohen.d(mtcars$mpg, mtcars$am)$cohen.d[2]
#> [1] 1.526417

(and also subtracts the first mean from the second… But that would only change the sign.)

I cannot find a source for using this formulation - so I'm not sure if this is a mistake, or deliberate on the part of psych?

studerus · 2020-12-04T14:56:38Z

Thanks. In my german statistics textbook, the formula for the pooled standard deviation is:

where is the sample estimation of the variance.

x1 <- mtcars$mpg[mtcars$am == 0]
x2 <- mtcars$mpg[mtcars$am == 1]

n1 <- length(x1)
n2 <- length(x2)

#Calculate sample variance. I do not use the var function as this would divide
#by n - 1 and give the population variance
var1 <- sum((mean(x1) - x1)^2) / n1
var2 <- sum((mean(x2) - x2)^2) / n2

#Calculate pooled standard deviation
pooled_sd <- sqrt((var1 * n1 + var2 * n2) / (n1 + n2))

#Cohen's d
(mean(x1) - mean(x2)) / pooled_sd
#> [1] -1.526417

I therefore get the same result as the psych package

DominiqueMakowski · 2020-12-04T15:06:53Z

Isn't this related to the Bessel's correction? (I think it mentions it here)

mattansb · 2020-12-04T15:11:15Z

Yes, I would expect to use Bessel's correction, to estimate the populations Cohen's d. Without it, it would say psych is returning a biased estimate (or the sample's d).

@studerus what book are you using? Interesting that it uses the sample SD instead of the estimated population SD.

@DominiqueMakowski I guess I can note this in the docs of cohens_d / pooled_sd? What do you think?

studerus · 2020-12-04T16:15:37Z

@mattansb
It's this german introductory statistics textbook for psychology students:
https://www.amazon.de/Forschungsmethoden-Statistik-Psychologen-Sozialwissenschaftler-Pearson/dp/3868943218/

studerus · 2020-12-04T16:53:10Z

My book also says that you can calculate Cohen's d from the Pearson correlation with this formula:

where p and q are the proportions of the sample sizes of the two groups from the total sample.

If I use the d that I calculated above and that you get with psych, I could confirm that I get r. With the estimate from effectsize package I get different results.

p <- n1 / (n1 + n2)
q <- n2 / (n1 + n2)
d / sqrt(d^2 + (1/ (p * q)))
#-0.5998324

with(mtcars, cor(mpg, am))
#[1] 0.5998324

strengejacke · 2020-12-04T17:05:06Z

FWIW:

effectsize::cohens_d(CO2$uptake, CO2$Treatment)$Cohens_d
#> [1] 0.6652288

-1 * psych::cohen.d(CO2[, c('uptake', 'Treatment')], 'Treatment')$hedges.g
#>    uptake 
#> 0.6652288

^{Created on 2020-12-04 by the reprex package (v0.3.0)}

studerus · 2020-12-04T17:22:32Z

This might explain it:

Source: https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00863/full

mattansb · 2020-12-04T20:11:58Z

Looking at McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: the case of r and d. Psychological methods, 11(4), 386., it seems that:

sometimes the unbiased (population) Cohen's d is called Hedge's g,
sometimes Hedge's _g_is given without the 1 - 3 / (4 * n - 9) correction

Which seems to be what psych is doing.

However, as it is given in Harris Cooper, Larry V. Hedges, Jeffrey C. Valentine - The Handbook of Research Synthesis and Meta-Analysis-Russell Sage Foundation (2019) (and in any other places - see wikipedia, with all the references there):

Cohen's d is with the unbiased pooled SD (as given by Cohen 1988!)
Hendge's g is Cohen's d with the small sample bias correction (as given in Hedge, 1981!) .

Which is what we do here in effectsize, which I feel comfortable with leaving as is. (I will make this explicit in the docs to reflect this.)

Maybe we can add to cohens_d the option to give the sample estimate (without CIs) instead of the population estimate? Does anyone want that? @IndrajeetPatil you know what the masses want...

The one "problem" I find here (which is unrelated to what @studerus brought here) is that the correction option, which gives McGrath & Meyer, 2006 correction, is their version of Hedge's g's correction. That is, from what I understand, they suggest the ((n - 3) / (n - 2.25)) * sqrt((n - 2) / n) correction instead of the 1 - 3 / (4 * n - 9) correction.
Perhaps @DominiqueMakowski / @IndrajeetPatil can shed some light here?

mattansb · 2020-12-04T20:12:35Z

(If there's one thing you can't say about me, it's that I don't do my homework!)

mattansb · 2020-12-05T12:25:24Z

Okay here's what I've done:

Made it clear in the docs that we are returning the population estimates.
The correction argument only works in hedges_g() - more importantly, only one correction at a time - can choose between Hedges' original, and the "newer" one.

@IndrajeetPatil Please see how this affects any of your code.

IndrajeetPatil · 2020-12-05T15:47:27Z

Thanks for the heads-up!

I think some tests for statsExpressions will fail with these changes and CRAN-revdep will complain. But just mention in cran_comments that the maintainer has already been informed and will be submitting a new version. That should help avoid any hold-up.

As for what the masses want, I am actually not sure. The first time we talked about this, we had compared results from effectsize with JASP and jamovi and they closely matched. So I am guessing there doesn't seem to be consensus on this issue. That said, you are far deeper into the weeds here and I'd trust your informed judgment here over my superficial read of the situation.

mattansb · 2020-12-06T07:11:54Z

The first time we talked about this, we had compared results from effectsize with JASP and jamovi and they closely matched

I forgot about this!

you are far deeper into the weeds here

Feel more like I'm off the deep end 🤪

Alright, closing...

IndrajeetPatil added the Discussion 🦜 Talking about our ~feelings~ stats label Dec 4, 2020

mattansb added a commit that referenced this issue Dec 5, 2020

"fix" d and g (with corrections) #222

c39f329

mattansb added bug 🐜 Something isn't working and removed Discussion 🦜 Talking about our ~feelings~ stats labels Dec 5, 2020

mattansb mentioned this issue Dec 5, 2020

next update round: Christmas special easystats/easystats#83

Closed

9 tasks

mattansb closed this as completed Dec 6, 2020

mattansb mentioned this issue Dec 6, 2020

Perp for CRAN 0.4.1 #223

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cohens_d gives different results from cohen.d function from package psych. #222

cohens_d gives different results from cohen.d function from package psych. #222

studerus commented Dec 4, 2020

mattansb commented Dec 4, 2020

studerus commented Dec 4, 2020

DominiqueMakowski commented Dec 4, 2020

mattansb commented Dec 4, 2020

studerus commented Dec 4, 2020

studerus commented Dec 4, 2020

strengejacke commented Dec 4, 2020

studerus commented Dec 4, 2020

mattansb commented Dec 4, 2020

mattansb commented Dec 4, 2020

mattansb commented Dec 5, 2020

IndrajeetPatil commented Dec 5, 2020

mattansb commented Dec 6, 2020

cohens_d gives different results from cohen.d function from package psych. #222

cohens_d gives different results from cohen.d function from package psych. #222

Comments

studerus commented Dec 4, 2020

mattansb commented Dec 4, 2020

studerus commented Dec 4, 2020

DominiqueMakowski commented Dec 4, 2020

mattansb commented Dec 4, 2020

studerus commented Dec 4, 2020

studerus commented Dec 4, 2020

strengejacke commented Dec 4, 2020

studerus commented Dec 4, 2020

mattansb commented Dec 4, 2020

mattansb commented Dec 4, 2020

mattansb commented Dec 5, 2020

IndrajeetPatil commented Dec 5, 2020

mattansb commented Dec 6, 2020