-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lmer and arm::sim instead of stan_glmer on large data? #219
Comments
This seems reasonable, as this exact same procedure is used by the |
Also, you might find |
Looking at Gelman & Hill (2007). pages 140 following, making inference based on simulations can be called an "informal Bayesian approach". Technically, you are summarizing a vector. This is what is also done for "simple" bootstrapping etc., and then you take the mean/median as "point estimate", and some quantiles for the confidence intervals. So, yes, you can simply compute the hdi or eti from your simulated draws. I haven't checked for other models than simple linear models, but at least I'm implementing a metod for the |
And you can also use ROPE - it's similar to saying, for a "simple" frequentist model w/o simulations, "we have a significant effect, but it is such small that it is practically not relevant". You do something similar with ROPE. |
Here are examples from the current implementation. Objects from library(arm)
library(lme4)
library(bayestestR)
m <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
sim_fit <- arm::sim(m, 1000)
hdi(sim_fit)
#> Parameter CI CI_low CI_high
#> 1 (Intercept) 89 240.332765 261.07408
#> 2 Days 89 8.307967 13.15562
hdi(sim_fit, effects = "all")
#> Parameter CI CI_low CI_high Group
#> 1 (Intercept) 89 240.3327645 261.0740824 fixed
#> 2 Days 89 8.3079667 13.1556201 fixed
#> 3 (Intercept).308 89 -16.0073828 24.8423555 random
#> 4 Days.308 89 4.3755654 13.0849383 random
#> 5 (Intercept).309 89 -61.8609206 -20.8313242 random
#> 6 Days.309 89 -13.1609398 -4.6711144 random
#> 7 (Intercept).310 89 -56.8298998 -17.5504370 random
#> 8 Days.310 89 -9.7004658 -0.9473180 random
#> 9 (Intercept).330 89 4.6399315 45.1686082 random
#> 10 Days.330 89 -9.6072097 -0.9570290 random
#> 11 (Intercept).331 89 4.4185954 42.5649342 random
#> 12 Days.331 89 -7.1452027 1.3021877 random
#> 13 (Intercept).332 89 -8.0982316 31.2101980 random
#> 14 Days.332 89 -4.0594400 4.4935157 random
#> 15 (Intercept).333 89 -5.0627446 35.5392311 random
#> 16 Days.333 89 -4.8338501 3.7091377 random
#> 17 (Intercept).334 89 -25.9108186 13.8798611 random
#> 18 Days.334 89 -3.3262606 5.0170427 random
#> 19 (Intercept).335 89 -22.5379495 19.2082650 random
#> 20 Days.335 89 -15.3973349 -6.4065836 random
#> 21 (Intercept).337 89 12.6562494 55.1728956 random
#> 22 Days.337 89 4.5316538 13.6732566 random
#> 23 (Intercept).349 89 -44.7094372 -3.1723557 random
#> 24 Days.349 89 -2.7934368 6.3273793 random
#> 25 (Intercept).350 89 -31.8035689 10.0281715 random
#> 26 Days.350 89 2.2081889 10.7003413 random
#> 27 (Intercept).351 89 -16.3572131 25.8744674 random
#> 28 Days.351 89 -7.7999446 0.9852168 random
#> 29 (Intercept).352 89 0.3227598 41.7302096 random
#> 30 Days.352 89 -0.9254458 8.4896326 random
#> 31 (Intercept).369 89 -15.4252308 25.1249666 random
#> 32 Days.369 89 -3.8442958 4.7227639 random
#> 33 (Intercept).370 89 -47.7736334 -5.0288098 random
#> 34 Days.370 89 0.7492633 9.4866305 random
#> 35 (Intercept).371 89 -19.1119936 22.8159279 random
#> 36 Days.371 89 -5.5520323 3.1366488 random
#> 37 (Intercept).372 89 -8.0621136 33.6017260 random
#> 38 Days.372 89 -3.4948385 5.2878925 random
eti(sim_fit)
#> Parameter CI CI_low CI_high
#> 1 (Intercept) 89 241.151504 262.0106
#> 2 Days 89 7.962468 12.9025
ci(sim_fit, effects = "random", parameters = "^Days")
#> Parameter CI CI_low CI_high
#> 2 Days.308 89 4.8214107 13.7000214
#> 4 Days.309 89 -13.0035203 -4.5120176
#> 6 Days.310 89 -9.9596965 -1.0915085
#> 8 Days.330 89 -9.2507534 -0.3174804
#> 10 Days.331 89 -7.3513576 1.1702467
#> 12 Days.332 89 -4.4429156 4.2187887
#> 14 Days.333 89 -4.6865175 3.8929752
#> 16 Days.334 89 -3.4735085 4.9383328
#> 18 Days.335 89 -15.3919325 -6.3964247
#> 20 Days.337 89 3.5976629 12.9306430
#> 22 Days.349 89 -3.1541218 5.9806673
#> 24 Days.350 89 2.2605994 10.7451412
#> 26 Days.351 89 -7.5749799 1.3499830
#> 28 Days.352 89 -1.2548782 8.3217727
#> 30 Days.369 89 -3.7034767 5.0224508
#> 32 Days.370 89 0.1093224 9.0175893
#> 34 Days.371 89 -5.4278820 3.3033422
#> 36 Days.372 89 -3.2618426 5.6145144 Created on 2019-08-27 by the reprex package (v0.3.0) |
We can also extend the functions to simulatons from "simple" models, where And we could also extend the support to functions like
|
Ok, once you update both bayestestR and see from GitHub, you can also plot: library(arm)
library(lme4)
library(bayestestR)
m <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
sim_fit <- arm::sim(m, 1000)
x <- ci(sim_fit, ci = c(.5, .8, .95), effects = "random", parameters = "^Days")
plot(x) Created on 2019-08-27 by the reprex package (v0.3.0) |
Ok, I have implemented a So, now you need to update insight, bayestestR and see from GitHub, and you can use following functions with objects from
The plot-method for Actually, we could make sim/sim.merMod objects also work for pd, p_map, bayesfactor etc. I'l look at it. |
bayesfactors do work now as well, however, I'm not sure about the prior-specification. By default, normal priors N(0,2) are used: library(arm)
library(lme4)
library(bayestestR)
m <- lmer(mpg ~ wt + gear + (1 | cyl), mtcars)
sim_fit <- arm::sim(m, 1000)
plot(bayesfactor(sim_fit))
#> Warning in bayesfactor_parameters.sim.merMod(..., prior = prior, direction
#> = direction, : Prior not specified! Please specify priors (with column
#> order matching 'posterior') to get meaningful results. Using normally
#> distributed priors with location=0 and scale=2 as default. Created on 2019-08-27 by the reprex package (v0.3.0) @mattansb any idea for sensible defaults? |
Ok, I think the most important functions are covered for arm:sim-objects. That means you can now also use
Here an example of the current implementation: library(arm)
library(lme4)
library(bayestestR)
data(mtcars)
m <- lm(mpg ~ wt + cyl + gear + disp, data = mtcars)
x <- arm::sim(m, 1000)
describe_posterior(x, test = "all")
#> Warning in bayesfactor_parameters.sim(x, prior = bf_prior, ...): Prior not
#> specified! Please specify priors (with column order matching 'posterior')
#> to get meaningful results. Using normally distributed priors with
#> location=0 and scale=2 as default.
#> Parameter Median CI CI_low CI_high p_MAP pd
#> 1 (Intercept) 43.603826962 89 35.32796965 50.48548749 0.00000000 1.000
#> 2 wt -3.818821010 89 -5.44566695 -2.07529838 0.00000000 1.000
#> 3 cyl -1.744702403 89 -2.72878740 -0.67135432 0.03530381 0.996
#> 4 gear -0.490838016 89 -1.91306977 0.60695506 0.77366017 0.734
#> 5 disp 0.006572141 89 -0.01075689 0.02754414 0.96379712 0.700
#> ROPE_CI ROPE_low ROPE_high ROPE_Percentage ROPE_Equivalence BF
#> 1 89 -0.1 0.1 0.00000000 Rejected 9.747696e+07
#> 2 89 -0.1 0.1 0.00000000 Rejected 1.131505e+02
#> 3 89 -0.1 0.1 0.00000000 Rejected 8.860870e+00
#> 4 89 -0.1 0.1 0.08417508 Undecided 5.035772e-01
#> 5 89 -0.1 0.1 1.00000000 Accepted 6.321714e-03
m <- lmer(mpg ~ wt + gear + (1 | cyl), mtcars)
sim_fit <- arm::sim(m, 1000)
describe_posterior(sim_fit, test = "all")
#> Warning in bayesfactor_parameters.sim.merMod(x, prior = bf_prior, ...):
#> Prior not specified! Please specify priors (with column order matching
#> 'posterior') to get meaningful results. Using normally distributed priors
#> with location=0 and scale=2 as default.
#> Parameter Median CI CI_low CI_high p_MAP pd ROPE_CI
#> 1 (Intercept) 33.8506438 89 24.912640 40.7477347 0.0000000 1.000 89
#> 2 wt -3.7075638 89 -4.867696 -2.2566382 0.0000000 1.000 89
#> 3 gear -0.4131907 89 -1.600556 0.8184972 0.8257983 0.733 89
#> ROPE_low ROPE_high ROPE_Percentage ROPE_Equivalence BF
#> 1 -0.1 0.1 0.00000000 Rejected 2.125676e+05
#> 2 -0.1 0.1 0.00000000 Rejected 1.275944e+03
#> 3 -0.1 0.1 0.09652076 Undecided 4.160897e-01 Created on 2019-08-27 by the reprex package (v0.3.0) |
@strengejacke it seems the flat priors are used, so by definition BFs will always be |
ok, but this won't work with |
No... But the original model can be used. rope.merMod <- function(x, ..., n.sims = NULL){
if (!is.null(n.sims)) {
x <- arm::sim(x, n.sims = n.sims)
rope.sim(x, ...)
}
...
} (or something along those lines) And then in all cases the model can be used as is.
Nope 😊 If the priors are flat, then the posteriors are based on those flat priors... So any different priors would need different posteriors... That's just the price one pays for using flat priors! |
In other words - if we set some / any default "priors", we will be providing misleading erroneous results... |
Ok, so I will remove BF-support for sim/sim.merMod objects... I was just curious because we had data frame methods as well... |
I made those for use with Bayesian models non yet supported by us (🤫), but they too do not have default priors - they just always return a 1 - the most uninformative BF possible haha library(bayestestR)
x <- data.frame(A = rnorm(1000))
bayesfactor_parameters(x)
#> Warning in bayesfactor_parameters.data.frame(x): Prior not specified!
#> Please specify priors (with column order matching 'posterior') to get
#> meaningful results.
#> Loading required namespace: logspline
#> # Bayes Factor (Savage-Dickey density ratio)
#>
#> Parameter Bayes Factor
#> A 1
#>
#> * Evidence Against The Null: [0] Created on 2019-08-27 by the reprex package (v0.3.0) |
Ok, my argument would be that rstanarm, if no priors provided, "creates" weakly informative priors based on the data. Thus, I would say we could do the same for arm::sim objects, since we have the data, and the "posterior" from arm::sim is some kind of posterior including priors, because they used a multivariate normal distribution to get the simulated posterior... ;-) |
Okay, just saw that too in Gelamn & Hall's book (pp 142-143). In that case, the priors aren't even flat, as there are no priors, because they aren't really using posteriors! They use more of a sampling-distribution-type-procedure (similar to bootstrapping, which though similar in practice, is not the same). |
Awesome work and very interesting discussion 👏🙌 Based on what you said, I'd tend to think that it still makes sense to add the BF method, even tho for extremists Bayesians it doesn't really make sense (but what does for them 😁). I have a prior that it could still be practically useful for people ;) As often, I support returning a thoroughly documented result. In fact, it would be interesting to write a blogpost discussing a "Bayesian approach to frequentist algorithms", such as bootstrapping/simulation. |
I don't think I'm being an extremists Bayesian by saying that a Bayes factor against the null with no prior on the alternative will be definition always support the null... This is exactly one of the (so called) limitations of the Bayes factor - one needs to specify a prior, otherwise - what is being tested? |
What about a result + a warning? |
I think the method is already there, I moved it to the WIP folder: https://github.com/easystats/bayestestR/blob/master/WIP/bayesfactor_sim.R |
Result and warning is good (:
Will try and get to this tomorrow
Welcome back!
…--
Mattan S. Ben-Shachar, PhD student
Department of Psychology & Zlotowski Center for Neuroscience
Ben-Gurion University of the Negev
The Developmental ERP Lab
On Sat, Sep 7, 2019, 18:40 Daniel ***@***.***> wrote:
But if you guys really want a Bf method that will reflect this 👆, I will
add it.
I think the method is already there, I moved it to the WIP folder:
https://github.com/easystats/bayestestR/blob/master/WIP/bayesfactor_sim.R
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#219?email_source=notifications&email_token=AINRP6BTPCWXT5KDEUSTDLLQIPDOVA5CNFSM4IPUDZ22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6E3MLY#issuecomment-529118767>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AINRP6CIOEISIIEJJ6EEGHDQIPDOVANCNFSM4IPUDZ2Q>
.
|
library(arm)
library(lme4)
library(bayestestR)
m <- lmer(mpg ~ wt + gear + (1 | cyl), mtcars)
sim_fit <- arm::sim(m, 1000)
Bfs <- bayesfactor_parameters(sim_fit)
#> Warning: Simulated posteriors cannot be compared to non-existant priors.
#> Bayes factors are by definition equal to zero.
#> Loading required namespace: logspline
Bfs
#> # Bayes Factor (Savage-Dickey density ratio)
#>
#> Parameter Bayes Factor
#> (Intercept) 0.00e+00
#> wt 0.00e+00
#> gear 0.00e+00
#>
#> * Evidence Against The Null: [0]
plot(Bfs) Created on 2019-09-08 by the reprex package (v0.3.0) |
@mattansb I'm not sure if this makes sense, since it's not very instructive. What about my suggestion using weakly informative priors? Else, I would indeed say to not add a BF method, we can't pull any information out of it. |
I completely agree that, as is, this makes no sense and no Bayesian information can be gained from such a statistic... As for any weakly informed priors - though technically possible, would be wrong (an in, they would produce a statistic, but it would be wrong to call it a Bayes factor), and so I feel it would be misleading to even allow... I'm okay with leaving it like this, or dropping this method. Your call 😎 |
Yes, I understand your point. I would suggest directly throwing a warning, w/o printing any "result", or what would you both say? Although the simulation of the "posterior" is based on the coefficients (location) and variance-covariance matrix (scale), so assuming weakly informative priors with location/scale based on the data doesn't seem to be completely nonsense (maybe only a bit nonsense) ;-) |
I'm not saying the samples make no sense, just that the are not real posteriors - do not reflect updated priors, and thus the "updating" cannot be quantified.
So, an error? 😜 |
Still, when you fit a model in rstanarm, the functions automatically calculate "weakly informative priors", which are centered around location 0 with scale based on the predictor's variance (exactly what I did with the original data from model that was processed in the simulations). So we literally started from "no priors" as well, it's just that rstanarm created weakly informative priors. My problem here is, that what I have done here feels at least "technically" similar, because we started from weakly informative priors and got the simulated draws ("posterior"). |
This is not true - the posteriors are unrelated to the priors - you don't start here with the priors and end with the posteriors, but instead you make prior and posterior draws in an unrelated manner. I.e., the posteriors are not the results of the priors, and that makes all the difference! So to reiterate, the problem isn't with the priors you suggest, or with the posteriors made by What
What you're suggesting:
This isn't just some technical note, or me being unnecessarily stubborn, this is exactly the essence of the Bayes factor - this is what it measures. If someone wants to get BFs from freq models for comparing models, we have a different functions for that 😎 |
True, it's just that "weakly informative priors" based on the data frame or simulating from results from freq model are in so far related, as freq models assume "null" hypothesis, i.e. something centered around zero. But I think, and agree with you, if it is statistically not sound, we should not support such things. So ...
|
Then I shall remove the methods 😃 |
The initial issue was to have hdi() etc. for sim-objects, which has been implemented. So yes, we can close this now, I guess. |
Do not remove it entirely but throw informative error explaining why it doesn't make sense "Bayes factors are based on the shift from a prior to a posterior. As simulated draws are unrelated to any priors, computing Bayes factors does not make sense" |
Hello!
At my company I spend a lot of time analyzing experiments and the datasets I'm working with are rather large. I typically build multilevel models with lmer. Building these models with the
brm
function or thestan_glmer
function isn't practical because it would take forever for them to build. As a workaround, I try to get a posterior distribution of coefficient estimates by using thesim
function from the arm package on the model object. I then take the fixed effects data frame from that object and pass it through to thehdi
andrope
functions from thebayestestR
package. What are the major differences between using stan_glmer and the method I've outlined above? Is it OK for me to report on things like HDI and ROPE for coefficient estimates using my method or is that statistically incorrect? Any help would be greatly appreciated! Below is an example of my workflow.The text was updated successfully, but these errors were encountered: