Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Effect sizes in Bayesian regressions #8

Closed
DominiqueMakowski opened this issue Oct 13, 2019 · 28 comments
Closed

Effect sizes in Bayesian regressions #8

DominiqueMakowski opened this issue Oct 13, 2019 · 28 comments
Assignees
Labels
enhancement 🔥 New feature or request
Milestone

Comments

@DominiqueMakowski
Copy link
Member

Opening a new thread as we strayed from the topic of #7

this correspondence between test statistics and indices of effect size is quite convenient in some cases... What would be the alternative for Bayesian models??

Unfortunately, there isn't any... It seems like any effect sized in a Bayes framework actually needs to be computed from the posteriors (even when it's simple, it's hard... good luck...).
Generally, there seems to be very little work on standardized effect sizes in Bayesian modeling. This is probably because up until recently most users of Bayes were high-level stats / modeling users who felt they were above the dumb social sci needs were not interested in standardized anything.

But I read some tweet (or maybe on the JASP forum?) that the JASP team is working on some partial eta square / beta stuff...

EDIT: found it: https://onlinelibrary.wiley.com/doi/full/10.1111/stan.12173

Originally posted by @mattansb in #7 (comment)

@DominiqueMakowski DominiqueMakowski changed the title Effect size in Bayesian regressions Effect sizes in Bayesian regressions Oct 13, 2019
@DominiqueMakowski
Copy link
Member Author

Marsman's paper seems very interesting (unfortunately I didn't manage to through the wall of equations 😞)

If I understand, the only existing alternative for Bayesian regressions (currently in the work in this package) is a smart standardization of the posterior that would approximate (partial) correlations or some standardized differences?

@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Oct 13, 2019

@mattansb I've implemented some approximation to test stat for Bayesian models. Not sure if it makes a lot of sense from a theoretical perspective but seems to work from an applied one 😅

https://easystats.github.io/effectsize/articles/bayesian_models.html

The idea is to get t from coef and posterior SD, find the frequentist DoF (😱) and get the r posterior... Seems almost to easy

@mattansb
Copy link
Member

I it's time for a dev branch so we can discuss commits...

@DominiqueMakowski
Copy link
Member Author

👍

@DominiqueMakowski
Copy link
Member Author

dev is up

@mattansb
Copy link
Member

Okay, so I'll split my comment into de facto and de jure:

de facto

You are doing:

se <- sd(posterior)
t <- posterior / se
df <- from_freq_magic(...)

convert_t_to_r(t, df)

This will probably almost always give results very close to the true results....

Except...

How does this function deal with multiple dfs (I can't seem to re-build the package to check)? Like in mixed models? If it cannot return the approx-correct df, the calculation will be not be a good approx...

de jure

The idea of degrees of freedom in a Bayesian framework is almost heresy! That is - df represent the ~"number of bits of data free to change that can still result in this estimate". But in Bayes the estimates aren't based only on the data - but also on the prior (and thus also on previous data, and so forth). (Also there isn't 1 estimate, but a whole distribution.)

So in the most technical sense, this is wrong 😅

But see de facto above 😉

@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Oct 13, 2019

How does this function deal with multiple dfs

Here lies the problem. Currently, I added this function in parameters, and for mixed models the most straigthforward and quick method is to 1) refit the model using lme4 (see here) and 2) use the kenward-roger approximations (same as for p values for lme4).
It's clearly not the best way to tackle this.

The idea of degrees of freedom in a Bayesian framework is almost heresy!

Haha I'd bet this hybrid approach would trigger all the bayestremists 😅 As a pragmaticist I'd tend to say "as long as it serves better science" but it's true that best would be to also be theoretically correct... But as we said, is there currently any better alternatives out there?

Another thing I am wondering is for non-linear models, e.g., logistic regression. The freq models usually report a z value. But as we only have access to the sample SD for the posteriors (same as for linear model), does it make sense to still compute t and then t to r or should we use z values?

@mattansb
Copy link
Member

Another thing I am wondering is for non-linear models

For these models there's not need for all this "black magic" because they have OR / log odds / rate etc. as the model parameters!
Actually, this problem of effect sizes is only a problem in linear models (because you need to estimate the variance explained and the error variance).

is there currently any better alternatives out there?

posterior_predict used like this can give some form of variance explained, but it's not really partialized (so it's interpretation is not straightforward...)

@DominiqueMakowski
Copy link
Member Author

For these models there's not need for all this "black magic" because they have OR / log odds / rate etc. as the model parameters!

Forgot about that haha

posterior_predict used like this can give some form of variance explained

Interesting... but not quite there yet 😞

@DominiqueMakowski
Copy link
Member Author

library(parameters)
library(effectsize)
#> 
#> Attaching package: 'effectsize'
#> The following objects are masked from 'package:parameters':
#> 
#>     cohens_f, epsilon_squared, eta_squared, omega_squared

model <- glm(vs ~ cyl + disp + drat, data = mtcars, family = "binomial") 

parameters <- model_parameters(model)
#> Waiting for profiling to be done...
parameters$r <- convert_z_to_r(parameters$z, n = insight::n_obs(model))
parameters$r_from_odds <- convert_odds_to_r(parameters$Coefficient, log = TRUE)
parameters
#> Parameter   | Coefficient |    SE |         95% CI |     z | df |     p |     r | r_from_odds
#> ---------------------------------------------------------------------------------------------
#> (Intercept) |       25.09 | 11.88 | [ 6.43, 56.94] |  2.11 | 28 | < .05 |  0.35 |        0.99
#> cyl         |       -1.99 |  1.05 | [-4.57, -0.13] | -1.89 | 28 | 0.06  | -0.32 |       -0.48
#> disp        |       -0.01 |  0.02 | [-0.06,  0.02] | -0.47 | 28 | > .1  | -0.08 |        0.00
#> drat        |       -3.18 |  2.16 | [-8.70,  0.49] | -1.47 | 28 | > .1  | -0.25 |       -0.66

Created on 2019-10-13 by the reprex package (v0.3.0)

The two "r" do not match tho... any idea why?

@mattansb
Copy link
Member

mattansb commented Oct 13, 2019

I think it's because in a multiple regression setting, z and log odds don't directly correspond.

In any case r doesn't make any sense in a non-Gaussian model. All of these "just because you can doesn't mean it makes sense" uses should be documented somewhere... Maybe drop convert_odds_to_r altogether? What users would probably want is to standardize coeffs here...

Also convert_F_to_r usually won't make sense - if df > 1 it would be multiple R, usually reported as R2, and then it's just partial eta squared.

Also also... The file names and contents for all these functions needs to be make concise - it took me way too long to find convert_z_to_r and convert_odds_to_r!

Also also also, @DominiqueMakowski this thread is about Bayes!

@DominiqueMakowski
Copy link
Member Author

z and log odds don't directly correspond

That's strange... Which one gives the most "accurate" r/d? For instance in a metaanalysis if you want to convert everything to d, it's kinda problematic if two approaches give quite different values...

In any case r doesn't make any sense in a non-Gaussian model.

If you want to interpret it as variance explained sure, but it makes sense if you just want to have all your effects on the same scales... and possibility + good documentation > impossibility (as we discussed several times ^^)

it would be multiple R, usually reported as R2, and then it's just partial eta squared.

Since eta squared is equivalent to R2, which is a more generic name ("Eta Squared, however, is used specifically in ANOVA models.", here), wouldn't it be clearer to rename all those as t_to_r2 etc. (it looks better too)?

The file names and contents for all these functions needs to be make concise

Agreed

this thread is about Bayes!

U started! 😁

@mattansb
Copy link
Member

Which one gives the most "accurate" r/d?

For d, the odds to d looks like the correct one.

but it makes sense if you just want to have all your effects on the same scales...

It "makes sense" to want that, but it doesn't make sense to get that in this method - you should instead use parameters_standardize.

wouldn't it be clearer to rename all those as t_to_r2 etc

I think it's a great idea to have !

F_to_partial_R2 <- F_to_partial_eta_squared
t_to_partial_R2 <- t_to_partial_eta_squared 

Then the convert_t_to_r could be:

convert_t_to_partial_r <- function(t, df_error) sign(t) * sqrt(t_to_partial_R2(t, df_error))

(I think it is important to have the "partial" in the names or at the very least the doc title - users shouldn't think they're getting the simple correlation.)


Another thing I am wondering is for non-linear models, e.g., logistic regression.

You stated!

@mattansb
Copy link
Member

Going back to:

I've implemented some approximation to test stat for Bayesian models. Not sure if it makes a lot of sense from a theoretical perspective but seems to work from an applied one 😅

I said that de facto...

This will probably almost always give results very close to the true results....

I'd like to amend that: It will work in cases where the prior is relatively lax (i.e., not strong); in these cases SDposteriorSEfrequentist (because the posterior ≈ the likelihood function). But when this equality will be broken with strong priors, where SDposterior< SEfrequentist, which will lead to miss-estimating the effect size.

@strengejacke
Copy link
Member

strengejacke commented Oct 13, 2019

posterior_predict used like this can give some form of variance explained

Which is implemented in performance::icc() :-)

@mattansb
Copy link
Member

@strengejacke interesting.... Care to take a stand at that here for gaussian models? (This will give a non-partial R2, if I understand correctly?)

@mattansb
Copy link
Member

@DominiqueMakowski I suggest moving the related functions and vignette to a separate branch, as they don't yet fully work / ongoing debate here about how to do this, and It is possible at this time to get regular beta (standardized coefficients) for Bayesian models.

@DominiqueMakowski
Copy link
Member Author

Yes I agree, or rename it as something WIP

mattansb added a commit that referenced this issue Mar 21, 2020
@mattansb mattansb mentioned this issue Mar 25, 2020
14 tasks
@mattansb
Copy link
Member

@DominiqueMakowski What is left to do here?

Standardized regression coeffs are available... Are we talking about an Eta2-like effect size?

@DominiqueMakowski
Copy link
Member Author

I am talking about a consistent and robust method to get a unified ("similar") index for different kinds of models (regressions with factors, interactions, and logistic models) 😬

mattansb added a commit that referenced this issue May 25, 2020
@mattansb
Copy link
Member

Okay, I've come up with an idea.

  1. Sample from the ppd.
  2. For each sample:
    2.1. fit a linear model
    2.2 estimate the effect size
  3. Use this posterior distribution of effect sizes to get HDI etc.
library(rstanarm)
library(effectsize)

eta_squared_posterior <- function(model, partial = FALSE, nsamps = 1000, verbose = TRUE, ...) {
  if (!insight::model_info(model)$is_linear) {
    stop("Only applicable to linear models")
  }

  if (partial) {
    warning("Only support non partial PVE.")
  }

  # get ppd
  ppd <- rstantools::posterior_predict(model)
  nsamps <- min(nsamps, nrow(ppd))
  i <- sample(nrow(ppd), size = nsamps)
  ppd <- ppd[i,]

  # get model data
  f <- insight::find_formula(model)$conditional
  X <- insight::get_predictors(model)
  resp_name <- insight::find_response(model)


  if (verbose) {
    message("Sampleing effect size... This can take a while...")
  }
  res <- apply(ppd, 1, function(r) {
    # sampled outcome + predictors
    temp_dat <- X
    temp_dat[[resp_name]] <- r

    # fit a simple linear model
    temp_fit <- lm(f, temp_dat)

    # compute effect size
    ANOVA <- car::Anova(temp_fit, type = 3)
    es <- eta_squared(ANOVA, ci = NA, partial = FALSE)

    data.frame(t(setNames(es$Eta_Sq, es$Parameter)), check.names = F)
  })

  res <- do.call("rbind", res)
  return(res)
}


model <- stan_lmer(mpg ~ wt + qsec * factor(am) + (1|cyl),
                   data = mtcars, refresh = 0)
pp_eta2 <- eta_squared_posterior(model)
#> Sampleing effect size... This can take a while...

bayestestR::describe_posterior(pp_eta2)
#> # Description of Posterior Distributions
#> 
#> Parameter       | Median |         89% CI | pd |        89% ROPE | % in ROPE
#> ----------------------------------------------------------------------------
#> wt              |  0.408 | [0.178, 0.642] |  1 | [-0.100, 0.100] |     0.000
#> qsec            |  0.105 | [0.000, 0.276] |  1 | [-0.100, 0.100] |    53.984
#> factor(am)      |  0.024 | [0.000, 0.100] |  1 | [-0.100, 0.100] |   100.000
#> qsec:factor(am) |  0.036 | [0.000, 0.133] |  1 | [-0.100, 0.100] |    92.368

## Compare to:

library(magrittr)
lm(mpg ~ wt + qsec * factor(am), data = mtcars) %>%
  car::Anova(type = 3) %>%
  eta_squared(partial = FALSE)
#> Parameter       | Eta2 |       90% CI
#> -------------------------------------
#> wt              | 0.44 | [0.21, 0.60]
#> qsec            | 0.08 | [0.00, 0.27]
#> factor(am)      | 0.05 | [0.00, 0.21]
#> qsec:factor(am) | 0.07 | [0.00, 0.24]

Created on 2020-05-25 by the reprex package (v0.3.0)

For now, this only works with non-partial eta squared, so it gives the total (unique) variance explained. For now, I only use a sub-sample of the ppd to speed it up.

you can find it here:
https://github.com/easystats/effectsize/blob/dev/WIP/eta_squared_posterior.R

@DominiqueMakowski
Copy link
Member Author

finally some good news on this front, that looks super promissive!

So let me rephrase to see if I understood correctly:

Let's say you have a model y ~ x with 10 obs.

  1. for each of the 10 obs of y, you get the posterior distribution (of for instance 200 samples) of the prediction
  2. for each of these 200 samples, you fit a linear model predicting the predicted values with the original data (?)
  3. You get 200 samples of for instance eta2 for each parameter that you consider as your posterior distribution of eta2

@mattansb mattansb added this to the CRAN 0.4.0 milestone May 25, 2020
@mattansb mattansb added the enhancement 🔥 New feature or request label May 25, 2020
@mattansb
Copy link
Member

Exactly!

I will have to think if there is any way to make it faster, as it is currently pretty slow (working across each sample...).

@mattansb mattansb self-assigned this May 26, 2020
@mattansb
Copy link
Member

A collection of stan threads that seem to support the logic of my method:

Using the PPD

(I also asked here if my method is applicable. Waiting for a response...)

Using the posterior parameters

@emmawang123123

This comment has been minimized.

@mattansb

This comment has been minimized.

@emmawang123123

This comment has been minimized.

@mattansb
Copy link
Member

#107

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🔥 New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants