Query regarding the calculation of Nagelkerke R2

I am trying to use Nagelkerke's R2 for some general linear models (negbinom) using `performance::r2_nagelkerke`.
The formula to Cox & Snell's R2 ($R^2_{CS}$) and Nagelkerke's R2 ($R^2_N$) are as follows ([ref](https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/)):

$R^2_{CS} = 1 - \left( \frac {\text{L}(M_{full})} {\text{L}(M_{null})} \right) ^{\frac {2} {n}} = 1 -\text{exp} \left(\frac{2}{n} \times \left(\text{logLik}(M_{full}) - \text{logLik}(M_{null})\right)\right)$

$R^2_N = \frac {1 - \left( \frac {\text{L}(M_{full})} {\text{L}(M_{null})} \right) ^{\frac {2} {n}}} {1 - \text{L}(M_{null}) ^ {\frac {2} {n}} } = \frac {R^2_{CS}} {1 - \text{exp} \left(\frac{2}{n} \times \text{logLik}(M_{null})\right) } $

Based on the source code to `r2_nagelkerke.glm`, I can benchmark the following calculation:

```
library(performance)
library(insight)

# create model glm
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
glm_dt <- data.frame(treatment, outcome, counts) # showing data

glm_model <- glm(data = glm_dt, counts ~ outcome, family = poisson())
# null model, same as null_model(glm_model)
model0 <- glm(data = glm_dt, counts ~ 1, family = poisson())

n <- n_obs(glm_model, disaggregate = TRUE)
model_dev <- glm_model$deviance
null_dev <- glm_model$null.deviance

cs_r2 <- ( 1 - exp( (model_dev - null_dev) / n ) )
r2_coxsnell(glm_model) == cs_r2
# > true

r2_nag_func <- r2_nagelkerke(glm_model)
r2_nag_calc <- cs_r2 / ( 1 - exp(-null_dev / n) )

r2_nag_func == r2_nag_calc
# > true
```

I understand that the deviance is different from log likelihood and can be calculated from difference of log-likelihood (see [link](https://stats.stackexchange.com/questions/237702/comparing-models-using-the-deviance-and-log-likelihood-ratio-tests)), with 
$\text{dev} = 2 \times \left( \text{logLik}(M_{saturated}) - \text{logLik}(M_{proposed}) \right)$
```
null_dev == -2 * logLik(model0) 
# > false
model_dev == -2 * logLik(glm_model) 
# > false
```

For Cox & Snell's R2 (CS R2), the result should be the same, as the common term of $\text{logLik}(M_{saturated})$ is cancelled:

```
D.LL_glm <- as.numeric(-2 * logLik(glm_model))
D.LL_null <- as.numeric(-2 * logLik(model0))

all.equal(cs_r2,  1 - exp( (D.LL_glm - D.LL_null) / n) )
# > true
```

However, with Nagelkerke's R2, wouldn't dividing the$R^2_{CS}$ with $1 - \text{exp} \left(- \frac{\text{dev} (M_{null}) } {n} \right)$ be different from $1 - \text{exp} \left(\frac{2}{n} \times \text{logLik}(M_{null})\right)$? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Query regarding the calculation of Nagelkerke R2 #852

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Query regarding the calculation of Nagelkerke R2 #852

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions