Skip to content

Query regarding the calculation of Nagelkerke R2 #852

@ianchlee

Description

@ianchlee

I am trying to use Nagelkerke's R2 for some general linear models (negbinom) using performance::r2_nagelkerke.
The formula to Cox & Snell's R2 ($R^2_{CS}$) and Nagelkerke's R2 ($R^2_N$) are as follows (ref):

$R^2_{CS} = 1 - \left( \frac {\text{L}(M_{full})} {\text{L}(M_{null})} \right) ^{\frac {2} {n}} = 1 -\text{exp} \left(\frac{2}{n} \times \left(\text{logLik}(M_{full}) - \text{logLik}(M_{null})\right)\right)$

$R^2_N = \frac {1 - \left( \frac {\text{L}(M_{full})} {\text{L}(M_{null})} \right) ^{\frac {2} {n}}} {1 - \text{L}(M_{null}) ^ {\frac {2} {n}} } = \frac {R^2_{CS}} {1 - \text{exp} \left(\frac{2}{n} \times \text{logLik}(M_{null})\right) } $

Based on the source code to r2_nagelkerke.glm, I can benchmark the following calculation:

library(performance)
library(insight)

# create model glm
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
glm_dt <- data.frame(treatment, outcome, counts) # showing data

glm_model <- glm(data = glm_dt, counts ~ outcome, family = poisson())
# null model, same as null_model(glm_model)
model0 <- glm(data = glm_dt, counts ~ 1, family = poisson())

n <- n_obs(glm_model, disaggregate = TRUE)
model_dev <- glm_model$deviance
null_dev <- glm_model$null.deviance

cs_r2 <- ( 1 - exp( (model_dev - null_dev) / n ) )
r2_coxsnell(glm_model) == cs_r2
# > true

r2_nag_func <- r2_nagelkerke(glm_model)
r2_nag_calc <- cs_r2 / ( 1 - exp(-null_dev / n) )

r2_nag_func == r2_nag_calc
# > true

I understand that the deviance is different from log likelihood and can be calculated from difference of log-likelihood (see link), with
$\text{dev} = 2 \times \left( \text{logLik}(M_{saturated}) - \text{logLik}(M_{proposed}) \right)$

null_dev == -2 * logLik(model0) 
# > false
model_dev == -2 * logLik(glm_model) 
# > false

For Cox & Snell's R2 (CS R2), the result should be the same, as the common term of $\text{logLik}(M_{saturated})$ is cancelled:

D.LL_glm <- as.numeric(-2 * logLik(glm_model))
D.LL_null <- as.numeric(-2 * logLik(model0))

all.equal(cs_r2,  1 - exp( (D.LL_glm - D.LL_null) / n) )
# > true

However, with Nagelkerke's R2, wouldn't dividing the$R^2_{CS}$ with $1 - \text{exp} \left(- \frac{\text{dev} (M_{null}) } {n} \right)$ be different from $1 - \text{exp} \left(\frac{2}{n} \times \text{logLik}(M_{null})\right)$?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions