Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to broom tidiers to standardize parameters #28

Closed
IndrajeetPatil opened this issue Mar 7, 2019 · 7 comments
Closed

Add option to broom tidiers to standardize parameters #28

IndrajeetPatil opened this issue Mar 7, 2019 · 7 comments

Comments

@IndrajeetPatil
Copy link

Standardized regression coefficients are much easier to compare, interpret, visualize, etc. than unstandardized ones. So it'll be nice if broom tidiers gain a standardize argument that decides whether the regression coefficients are to be standardized.

Few other packages have tried to do this.

@wlandau
Copy link

wlandau commented Mar 7, 2019

Maybe rstanarm too?

@IndrajeetPatil
Copy link
Author

IndrajeetPatil commented Mar 7, 2019

Apparently, it doesn't have any function that does that (https://discourse.mc-stan.org/t/obtaining-standardized-coefficients-from-rstanarm-package-in-r/3603).

@wlandau
Copy link

wlandau commented Mar 7, 2019

Could broom itself do the standardization on the posterior samples? Or would that fall outside scope?

@tjmahr
Copy link

tjmahr commented Mar 7, 2019

If we are doing stuff on posterior samples, then tidybayes might be better. I think of that as the Bayesian broom. (Sorry Alex 😅.)

What would we need to standardize the coefficients? Is there some magic we can do with a variance-covariance matrix? Or do we have to dig inside the model's data/matrix, take the SD of the numeric variables and do Gelman's thing to the binary variables?

@IndrajeetPatil
Copy link
Author

Haha. broom actually no longer contains anything Bayesian. All relevant tidiers were moved to broom.mixed. That said, I don't really know Bayesian statistics that well and so I don't have much to add on how to standardize regression estimates from these models.

Yes, the latter sounds about right. Need to think more about the implementation.

@alexpghayes
Copy link

Re: Bayes stuff: totally agree, tidybayes is the place for this.

I haven't really invested any effort into this in broom so far mostly because the effort-reward ratio feels low, so I've watched the dotwhisker approach from a distance. There are basically two options: standardize the input data, or standardize the final design matrix. See the dotwhisker discussion for commentary -- roughly, it seems like either would be fine.

If you want to standardize the input data, this is pre fit() time, and the appropriate tool is recipes in my opinion. If you want to standardize the predictors, I still think that the appropriate tool is recipes, just with some additional steps.

If you want to standardize coefficients after the fact, that means going to find the terms object in a given model and calling model.frame() / model.matrix() and standardizing those. My experience is that dealing with bizarro edges cases based on idiosyncratic and/or partial support for formulas and model.matrix() is one of the more painful parts of the R universe. Especially when packages allow multiple forms of data specification (i.e. formula and x/y interfaces), the resulting model objects never have all the information you need to recreate the original model preprocessing.

I see the appeal of a standardize argument and am happy to provide guidance if someone wants to take it on (in particular I can point out several gotchas you might stumble onto), but just wanted to express why I've previously been hesitant about this.

@alexpghayes
Copy link

Additional thought: you might want to standardize after the fact so you could get both the standardized and original scale coefficients without fitting the model twice. Again, recipes will support this type of thing in the future because it will allow undoing steps, so you could fit on the recipes-standardized data, extract the column scales with tidy(step_scale) or perhaps a more immediate step_undo_scaling() type operation and recover the original scale coefs that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants