Wrong DFs in `model_parameters.fixest` #892

vincentarelbundock · 2023-08-15T01:19:34Z

parameters::model_parameters appears to use the wrong degrees of freedom to compute p values in this model, which results in an overly conservative test:

library(parameters)
library(fixest)
library(carData)
d <- Greene
d$dv <- ifelse(Greene$decision == 'yes', 1, 0)

mod <- feglm(dv ~ language | judge,
             data = d,
             cluster = c('judge'), family = 'logit')

# many degrees of freedom
insight::get_df(mod)
# [1] 382

# normal and t give pretty similar results
pnorm(-2.48139) * 2
# [1] 0.01308711

pt(-2.48139, df = insight::get_df(mod)) * 2
# [1] 0.0135165

# but parameters::model_parameters uses 9 DF
pt(-2.48139, df = 9) * 2
# [1] 0.03491173

model_parameters(mod)
# # Fixed Effects
# 
# Parameter         | Log-Odds |   SE |         95% CI |     z | df |     p
# -------------------------------------------------------------------------
# language [French] |    -0.68 | 0.28 | [-1.31, -0.06] | -2.48 |  9 | 0.035
#

The text was updated successfully, but these errors were encountered:

bwiernik · 2023-08-15T01:29:17Z

I haven't dug into this, but I would assume the correct df here is 9 given that it's a fixed effects estimate, so the get_dof() is what I would expect to be wrong

vincentarelbundock · 2023-08-15T01:54:41Z

I don’t think so. This feglm command is equivalent to estimating a glm() model with +factor(judge). That introduces 10 additional dummies, so we are estimating at most 11 parameters. Going from 384 observations to 9 degrees of freedom would be a steep penalty…

@lrberge, what is the best way for us to extract the relevant degrees of freedom to compute the p value with pt()?

library(parameters)
library(fixest)
library(carData)

d <- Greene
d$dv <- ifelse(Greene$decision == 'yes', 1, 0)

# These two models are equivalent
coef(feglm(dv ~ language | judge,
             data = d,
             family = 'logit'))["languageFrench"]
# languageFrench 
#     -0.6837183
coef(glm(dv ~ language + factor(judge), d, family = binomial))["languageFrench"]
# languageFrench 
#     -0.6837183

bwiernik · 2023-08-15T01:56:41Z

That's fair--like I said I hadn't dug into the example--what's the p value reported by the package itself?

vincentarelbundock · 2023-08-15T02:01:12Z

what's the p value reported by the package itself?

Not 100% sure, but the package reports a much smaller p, so it's either using pnorm() or a large df.

bwiernik · 2023-08-15T02:07:59Z

I can investigate this weekend

lrberge · 2023-08-15T06:15:27Z

Degrees of freedom are tricky because they depend on the type of standard error computed. In particular, when SEs are clustered, the variables nested in the clusters are discarded from DoF calculation. So that they differ from the regular DoF we have in mind.

I think that's what happens in your example. To get the DoF as used in fixest, there is the function degrees_freedom. To get the one we have in mind (N - all vars), there is degrees_freedom_iid.

I hope it helps!

strengejacke · 2023-08-15T08:24:32Z

There are three types of df in fixest::degrees_freedom(), in parameters we use type "t" for wald, and "resid" for residual" df. This works for ci() and p_value(), but does not seem to be passed by model_parameters(). The default used in fixest seems residual-df, i.e. 382.

library(parameters)
library(fixest)
#> 
#> Attaching package: 'fixest'
#> The following objects are masked from 'package:parameters':
#> 
#>     demean, dof
library(carData)
d <- Greene
d$dv <- ifelse(Greene$decision == 'yes', 1, 0)

mod <- feglm(dv ~ language | judge,
             data = d,
             cluster = c('judge'), family = 'logit')

summary(mod)
#> GLM estimation, family = binomial(link = "logit"), Dep. Var.: dv
#> Observations: 384 
#> Fixed-effects: judge: 10
#> Standard-errors: Clustered (judge) 
#>                 Estimate Std. Error  t value Pr(>|t|)    
#> languageFrench -0.683718   0.275539 -2.48139 0.013087 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -211.1   Adj. Pseudo R2: 0.053406
#>            BIC:  487.6     Squared Cor.: 0.114888

ci(mod, method = "wald")
#>        Parameter   CI   CI_low     CI_high
#> 1 languageFrench 0.95 -1.30703 -0.06040618
ci(mod, method = "residual")
#>        Parameter   CI    CI_low    CI_high
#> 1 languageFrench 0.95 -1.225481 -0.1419557

p_value(mod, method = "wald")
#>        Parameter          p
#> 1 languageFrench 0.03491193
p_value(mod, method = "residual")
#>        Parameter          p
#> 1 languageFrench 0.01351663

model_parameters(mod, ci_type = "wald")
#> # Fixed Effects
#> 
#> Parameter         | Log-Odds |   SE |         95% CI |     z | df |     p
#> -------------------------------------------------------------------------
#> language [French] |    -0.68 | 0.28 | [-1.31, -0.06] | -2.48 |  9 | 0.035
#> 
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
#>   using a Wald z-distribution approximation.
#> 
#> The model has a log- or logit-link. Consider using `exponentiate =
#>   TRUE` to interpret coefficients as ratios.
model_parameters(mod, ci_type = "residual")
#> # Fixed Effects
#> 
#> Parameter         | Log-Odds |   SE |         95% CI |     z | df |     p
#> -------------------------------------------------------------------------
#> language [French] |    -0.68 | 0.28 | [-1.31, -0.06] | -2.48 |  9 | 0.035
#> 
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
#>   using a Wald z-distribution approximation.

^{Created on 2023-08-15 with reprex v2.0.2}

strengejacke · 2023-08-15T10:57:37Z

What would be good to know is a) whether fixest() always return a t-value (and not z) as test statistic, and if not, b), what is the best way to determine, which test statistic was used? insight::find_statistic() currently returns "z-statistic" for the above model, although the output states t value.

I guess, fixing insight::find_statistic() and probably insight::get_dof() first, and then revising the methods in parameters (defaulting to ci_method = "residuals") should be the next steps to resolve this issue.

strengejacke · 2023-08-15T10:59:32Z

(cf.

parameters/R/methods_fixest.R

Line 25 in 81ebf40

degrees_of_freedom.fixest <- function(model, method = "wald", ...) {

)

lrberge · 2023-08-15T20:54:34Z

I'm on holidays w/t computer so can't be very thorough and sorry my first reply was off, but:

the "conservative DF" of the OP is the one fixest uses for the t test. This is governed by how the vcov is computed, see ssc argument t.df. If the user passes ssc(t.df="conv") when computing the vcov, we end up with stg close to the normal with large N and clustered SEs (the one the OP had in mind). The ref on this is a paper by the usual suspects Cameron et al on clustered SEs.
on z vs t: feols is t, for feglm it depends on the model. It actually mimics glm so it should be t for Poisson and logit (methinks)
on the best way to find out... I don't know on the top of my head. It's in the column names of the $coeftable item for sure but there may be better ways

strengejacke · 2023-08-23T10:59:25Z

There's a mismatch between summary() and $coeftable:

library(fixest)
library(carData)

d <- Greene
d$dv <- ifelse(Greene$decision == 'yes', 1, 0)

# These two models are equivalent
m <- feglm(dv ~ language | judge,
             data = d,
             family = 'logit')

# z-value?
m$coeftable
#>                  Estimate Std. Error   z value   Pr(>|z|)
#> languageFrench -0.6837183  0.3241979 -2.108953 0.03494862

# t-value?
summary(m)
#> GLM estimation, family = binomial(link = "logit"), Dep. Var.: dv
#> Observations: 384 
#> Fixed-effects: judge: 10
#> Standard-errors: Clustered (judge) 
#>                 Estimate Std. Error  t value Pr(>|t|)    
#> languageFrench -0.683718   0.275539 -2.48139 0.013087 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -211.1   Adj. Pseudo R2: 0.053406
#>            BIC:  487.6     Squared Cor.: 0.114888

^{Created on 2023-08-23 with reprex v2.0.2}

lrberge · 2023-08-28T14:24:07Z

Indeed, it's normal. Since computing VCOVs is costly, it is something that is delayed in fixest until the user asks to to something with the object (and it does impact coeftable). See here: .

Once you pass summary(), it's the "right" standard-errors.

I advise in the doc ()I think but maybe not! to use user-level methods to access the objects, like the function coeftable, but it's not super clear, and, importantly it is not the standard in R (where we're used to access stuff directly in the object).

Sorry for the implementation that creates confusion.

Fixes #892

strengejacke · 2023-09-09T09:16:19Z

library(parameters)
library(fixest)
#> 
#> Attaching package: 'fixest'
#> The following objects are masked from 'package:parameters':
#> 
#>     demean, dof
library(carData)
d <- Greene
d$dv <- ifelse(Greene$decision == 'yes', 1, 0)

mod <- feglm(dv ~ language | judge,
             data = d,
             cluster = c('judge'), family = 'logit')

summary(mod)
#> GLM estimation, family = binomial(link = "logit"), Dep. Var.: dv
#> Observations: 384 
#> Fixed-effects: judge: 10
#> Standard-errors: Clustered (judge) 
#>                 Estimate Std. Error  t value Pr(>|t|)    
#> languageFrench -0.683718   0.275539 -2.48139 0.013087 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -211.1   Adj. Pseudo R2: 0.053406
#>            BIC:  487.6     Squared Cor.: 0.114888

model_parameters(mod)
#> # Fixed Effects
#> 
#> Parameter         | Log-Odds |   SE |         95% CI | t(382) |     p
#> ---------------------------------------------------------------------
#> language [French] |    -0.68 | 0.28 | [-1.23, -0.14] |  -2.48 | 0.014
#> 
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
#>   using a Wald t-distribution approximation.
#> 
#> The model has a log- or logit-link. Consider using `exponentiate =
#>   TRUE` to interpret coefficients as ratios.

^{Created on 2023-09-09 with reprex v2.0.2}

Do we have any examples for z-statistic models?

strengejacke · 2023-09-09T09:17:56Z

(requires easystats/insight#802 to be installed to work correctly)

strengejacke · 2023-09-09T10:34:33Z

Ok, this is confusing. When do I know that when I have a t-statistic, whether we need residual- or t-df to compute p-values?

library(parameters)
library(fixest)
#> 
#> Attaching package: 'fixest'
#> The following objects are masked from 'package:parameters':
#> 
#>     demean, dof
library(carData)

data("qol_cancer")
d <- Greene
d$dv <- ifelse(Greene$decision == "yes", 1, 0)
qol_cancer <- cbind(
  qol_cancer,
  datawizard::demean(qol_cancer, select = c("phq4", "QoL"), group = "ID")
)

mod1 <- feglm(dv ~ language | judge,
  data = d,
  cluster = c("judge"), family = "logit"
)
mod2 <- fixest::feols(
  QoL ~ time + phq4 | ID,
  data = qol_cancer
)

summary(mod1)
#> GLM estimation, family = binomial(link = "logit"), Dep. Var.: dv
#> Observations: 384 
#> Fixed-effects: judge: 10
#> Standard-errors: Clustered (judge) 
#>                 Estimate Std. Error  t value Pr(>|t|)    
#> languageFrench -0.683718   0.275539 -2.48139 0.013087 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -211.1   Adj. Pseudo R2: 0.053406
#>            BIC:  487.6     Squared Cor.: 0.114888
statistic <- insight::get_statistic(mod1)$Statistic
statistic
#> [1] -2.481386

# no "t" df here, mismatch between p and p from summary
degrees_freedom(mod1, type = "t")
#> [1] 9
dof <- 9
2 * stats::pt(abs(statistic), df = dof, lower.tail = FALSE)
#> [1] 0.03491193

# resid-df are close and *almost* match "p" from summary
degrees_freedom(mod1, type = "resid")
#> [1] 382
dof <- 382
2 * stats::pt(abs(statistic), df = dof, lower.tail = FALSE)
#> [1] 0.01351663

# seems like Inf seems correct here
dof <- Inf
2 * stats::pt(abs(statistic), df = dof, lower.tail = FALSE)
#> [1] 0.01308724

summary(mod2)
#> OLS estimation, Dep. Var.: QoL
#> Observations: 564 
#> Fixed-effects: ID: 188
#> Standard-errors: Clustered (ID) 
#>      Estimate Std. Error  t value   Pr(>|t|)    
#> time  1.08785   0.669197  1.62561 1.0572e-01    
#> phq4 -3.66052   0.671116 -5.45438 1.5425e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 10.1     Adj. R2: 0.613325
#>              Within R2: 0.179856
statistic <- insight::get_statistic(mod2)$Statistic
statistic
#> [1]  1.625609 -5.454377

# now t-df are correct and match "p" from summary
degrees_freedom(mod2, type = "t")
#> [1] 187
dof <- 187
2 * stats::pt(abs(statistic), df = dof, lower.tail = FALSE)
#> [1] 1.057170e-01 1.542519e-07

# resid-df are *not* correct and do not match "p" from summary
degrees_freedom(mod2, type = "resid")
#> [1] 561
dof <- 561
2 * stats::pt(abs(statistic), df = dof, lower.tail = FALSE)
#> [1] 1.045945e-01 7.379301e-08

# Inf not correct
dof <- Inf
2 * stats::pt(abs(statistic), df = dof, lower.tail = FALSE)
#> [1] 1.040329e-01 4.914483e-08

^{Created on 2023-09-09 with reprex v2.0.2}

strengejacke · 2023-09-09T10:40:52Z

To summarize:

feglm, df = Inf, test-statistic = t
feols, df = resid, test-static = t

strengejacke · 2023-09-09T10:41:58Z

So we need to check

> mod1$method
[1] "feglm"
> mod1$method_type
[1] "feglm"
> mod2$method
[1] "feols"
> mod2$method_type
[1] "feols"

?

Fixes #892

* Wrong DFs in `model_parameters.fixest` Fixes #892 * desc, version * Wrong DFs in `model_parameters.fixest` Fixes #892 * add test * fix test * fix

lrberge · 2023-09-09T21:34:05Z

The z value is used for binomial and Poisson models (quasi families use t). This is identical to what stats::glm does.

To be clear:

library(fixest)

# tiny data set
base = setNames(head(iris, 10), c("y", "x1", "x2", "x3", "species"))
set.seed(1)
base$y01 = sample(0:1, nrow(base), TRUE)

gather_pvalues = function(family){
    est_fixest = feglm(y01 ~ x1 + x2, base, family = family, vcov = iid ~ ssc(adj = FALSE))
    est_glm    =   glm(y01 ~ x1 + x2, base, family = family)
    
    tstat = tstat(est_fixest, keep = "x1")
    
    res = c(fixest = pvalue(est_fixest, keep = "x1"),
            glm = pvalue(est_glm, keep = "x1"),
            normal = pnorm(-abs(tstat)) * 2,
            "t" = pt(-abs(tstat), degrees_freedom(est_fixest, "resid")) * 2)
    
    round(res, 4)
}

# N = normal; t = Student
cbind("logit (N)" = gather_pvalues(binomial()),
      "probit (N)" = gather_pvalues(binomial("probit")),
      "poisson (N)" = gather_pvalues(poisson()),
      "quasipoisson (t)" = gather_pvalues(quasipoisson()),
      "gaussian (t)" = gather_pvalues(gaussian()))
#>           logit (N) probit (N) poisson (N) quasipoisson (t) gaussian (t)
#> fixest.x1    0.2139     0.2099      0.3168           0.3275       0.2707
#> glm.x1       0.2139     0.2099      0.3168           0.3275       0.2707
#> normal.x1    0.2139     0.2099      0.3168           0.2925       0.2318
#> t.x1         0.2539     0.2502      0.3501           0.3275       0.2707

BTW there was a century old display bug in fixest output.... (z values displaying as t values.) So using the names of the coeftable certainly didn't work. Thanks for making me fix it!

strengejacke · 2023-09-10T09:50:21Z

thanks, that helped me understand how to fix this in parameters!

vincentarelbundock · 2023-09-10T13:13:52Z

thanks all for your work and insights into this. I really appreciate your time!

vincentarelbundock mentioned this issue Aug 15, 2023

Stars in fixest vincentarelbundock/modelsummary#651

Closed

strengejacke added bug 🐛 Something isn't working 3 investigators ❔❓ Need to look further into this issue labels Sep 1, 2023

strengejacke mentioned this issue Sep 9, 2023

fix issues in find_statistic for fixest easystats/insight#802

Merged

strengejacke self-assigned this Sep 9, 2023

strengejacke added a commit that referenced this issue Sep 9, 2023

Wrong DFs in model_parameters.fixest

fd11fb6

Fixes #892

strengejacke mentioned this issue Sep 9, 2023

Wrong DFs in model_parameters.fixest #899

Merged

strengejacke added a commit that referenced this issue Sep 9, 2023

Wrong DFs in model_parameters.fixest

a131fca

Fixes #892

strengejacke closed this as completed in #899 Sep 9, 2023

strengejacke added a commit that referenced this issue Sep 9, 2023

Wrong DFs in model_parameters.fixest (#899)

d4c4b5a

* Wrong DFs in `model_parameters.fixest` Fixes #892 * desc, version * Wrong DFs in `model_parameters.fixest` Fixes #892 * add test * fix test * fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong DFs in `model_parameters.fixest` #892

Wrong DFs in `model_parameters.fixest` #892

vincentarelbundock commented Aug 15, 2023

bwiernik commented Aug 15, 2023

vincentarelbundock commented Aug 15, 2023

bwiernik commented Aug 15, 2023

vincentarelbundock commented Aug 15, 2023

bwiernik commented Aug 15, 2023

lrberge commented Aug 15, 2023

strengejacke commented Aug 15, 2023 •

edited

Loading

strengejacke commented Aug 15, 2023

strengejacke commented Aug 15, 2023

lrberge commented Aug 15, 2023

strengejacke commented Aug 23, 2023

lrberge commented Aug 28, 2023

strengejacke commented Sep 9, 2023

strengejacke commented Sep 9, 2023

strengejacke commented Sep 9, 2023 •

edited

Loading

strengejacke commented Sep 9, 2023

strengejacke commented Sep 9, 2023

lrberge commented Sep 9, 2023

strengejacke commented Sep 10, 2023

vincentarelbundock commented Sep 10, 2023

Wrong DFs in model_parameters.fixest #892

Wrong DFs in model_parameters.fixest #892

Comments

vincentarelbundock commented Aug 15, 2023

bwiernik commented Aug 15, 2023

vincentarelbundock commented Aug 15, 2023

bwiernik commented Aug 15, 2023

vincentarelbundock commented Aug 15, 2023

bwiernik commented Aug 15, 2023

lrberge commented Aug 15, 2023

strengejacke commented Aug 15, 2023 • edited Loading

strengejacke commented Aug 15, 2023

strengejacke commented Aug 15, 2023

lrberge commented Aug 15, 2023

strengejacke commented Aug 23, 2023

lrberge commented Aug 28, 2023

strengejacke commented Sep 9, 2023

strengejacke commented Sep 9, 2023

strengejacke commented Sep 9, 2023 • edited Loading

strengejacke commented Sep 9, 2023

strengejacke commented Sep 9, 2023

lrberge commented Sep 9, 2023

strengejacke commented Sep 10, 2023

vincentarelbundock commented Sep 10, 2023

Wrong DFs in `model_parameters.fixest` #892

Wrong DFs in `model_parameters.fixest` #892

strengejacke commented Aug 15, 2023 •

edited

Loading

strengejacke commented Sep 9, 2023 •

edited

Loading