-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow sandwich/bootstrap/jackknife standard errors as options #42
Comments
On this issue a I have written a basic package to calculate autocorrelation and heteroskedasticity robust standard error https://github.com/gragusa/VCOV.jl. There are methods that work for I am planing to finish it in the next couple of weeks. However, it would be nice to have some infrastructure setup at the For instance, for models such that the parameters solve \sum_{i=1}^n g(W_i, \hat{\beta}) = 0, we could think of methods (for For showing the output would be only a matter to extend the |
@gragusa I had missed your comment, but it sounds cool. I guess open an issue against StatsBase with suggestions about the API to discuss this. |
I wrote a (registered) package CovarianceMatrices.jl that is already pretty
|
I also think one should be able to specify standard errors in A nice solution, which relies on multiple dispatch, would be to define an AbstractVcov type, and use it in the signature of Errors method could then be specified as an argument to fit(LinearModel, formula, df, VceCluster(:State))
fit(LinearModel, formula, df, VceHAC(:Year, lag = 2, kernell = Bartlett)) I've tried to do such an implementation in my fixed effects package |
Interesting arguments. The interface you suggest sounds good, but what about the implementation? How, when and with what arguments should the |
I think in principle having the fitting method dispatch on That's why in I think he would be enough to have a Giuseppe Ragusa On Thu, Jun 25, 2015 at 2:04 PM, Milan Bouchet-Valat <
|
@gragusa Why do you think there should be code duplication? The The problems with only choosing the standard errors type when calling |
Exactly, computing errors as part of the In fact, every type child of The fit function in just written function fit(formula, df, vcov_method::AbstractVcov)
...
# instead of the df[complete_cases(df[allvars(formula)] step in ModelFrame
df[complete_cases(df[allvars(formula), allvars(vcov_method])
.....
# instead of returning the usual covariance matrix
vcov(vcov_method, df, Xhat, X, residuals)
end This is actually similar to how Stata work: the |
The problem is functional redundancy (for luck of a better word). In stata you have to refit the model to get new standard error. In julia you could
and then have
to get new standard errors. So you have two different interfaces to do the same thing. |
I don't see why that would be an issue. It works the same in R. Also, these are not really the same since |
Thinking more about it, I do not agree to include variance information in the If the variance of the error vector is not a multiple of the identity matrix, then the efficient estimator is different. So I would expect a different fitting method. Conceptually, i would expect
and
What i am trying to say is that a variance in |
@gragusa Sorry, I'm not familiar with the methods you're talking about. But it seems to me that in many cases several variance estimation methods can be used with a given fitting method. The simplest example is OLS regression, which can be combined with asymptotic (the default), heteroskedasticy-robust/sandwich, jackknife, bootstrap, and several kinds of survey design-based methods. Anyway, we're not saying you shouldn't be allowed to pass another method to Finally, I don't think you addressed @matthieugomez's point that for clustered standard errors, one needs to get rid of missing values on the clustering variable before the fit. |
I had a
In addition, it has to provide the following methods,
A variance covariance package should be able to compute any variance covariance estimator for linear models with that information. If packages save in their structs the inverse of the normal matrix or the pseudo-inverse (bread), these could be accessed and avoid computing the inverse twice.
|
@Nosferican The best way to show that this design suits people's needs is to put them in a package and write implementations using it for the various packages that can take advantage of it. (BTW, in Julia we usually omit the |
Here is a prototype VarianceCovarianceEstimators.jl. It provides OLS, and the multi-way clustering generalized estimators for HC1, HC2, and HC3. It relies on the |
Just a quote from a user that supports specifying standard errors in "I don’t know if GLM.jl supports somehow directly providing the adjusted SEs, in which case this request would be obsolete. So far run regressions with GLM.fit() and adjust the SEs afterwards using CovarianceMatrices.jl. This is a bit tiresome, if you have many regressions." |
I think between @matthieugomez, @lbittarello and I, we could probably commit to developing a solution à la
Does that sound like a good plan? @dmbates, would that work for MixedModels? For example, using the new StatsModels.Formula and passing (|) syntax to a dedicated parser. Let me know what y'all think. I would say most of the features are implemented already across the board, but need to be organized. Something I started experimenting with was to borrow the GLM framework for fitting the model and use it as a wrapper for the model matrix, model response, weights, etc. That way, the package would have access to GLM estimators, easier for CovarianceEstimators to be ported, etc. |
Can we keep this discussion focused on the covariance issue? There was a previous discussion about tests at JuliaStats/HypothesisTests.jl#121. @Nosferican Can you describe changes that would be needed in StatsBase/StatsModels to suit your needs? Then others could comment on whether it works for them. Apart from the functions you described above, maybe we'd need |
See Microeconometrics. It requires models to define The alternative I am exploring right now is having the Regression package use Estimators that can be computed as transformations to the linear predictor and the response can still use the framework by first computing the transformations and then passing these to the As for support in other packages, there will be probably a CorrelationStructure package to handle computing distances for special dimensions (temporal: period, In summary, if we with Support for linear models can be achieved already using the current hierarchy inheriting from |
OK. I guess it's up to you, @dmbates, @matthieugomez and @lbittarello whether you want to base your packages on GLM or on something more generic living in StatsBase/StatsModels. |
One advantage is that we can put more effort to make GLM as good as it can be. For example, rather than checking all possible failure cases in for logistic that can either be worked in the utilities package or even in GLM (e.g., handling linearly dependent variables). Either way, the dependency on GLM is soft. A package wouldn't actually need to use GLM (just behave like GLM). Specifying |
That is sort-of what I do in |
Here's the rationale for The asymptotic variance of GMM estimators takes the form E (ψ Ω ψ'), where ψ is the influence function and Ω is the correlation structure. Moreover, ψ = (E J)⁻¹ s, where J is the Jacobian of the score vector (a.k.a. the Hessian matrix) and s is the score vector (i.e., the moment conditions). If we have the score and the Jacobian, we can compute all sorts of variances (White, clustered, etc.) by varying Ω. GMM subsumes almost all parametric estimators in econometrics (as well as many semiparametric ones). (I'm not familiar with biostatistics and other fields.) For example, MLE and GLM are special cases. We can also interpret two-stage estimators (e.g., IPW) as special cases if both stages are parametric. Therefore, basing the variance on |
Is there any update on this? |
Which extension in particular are you interested in? I am in the process of
extending the API and suggestions are very useful. What is your use case?
On Thu, 14 Jun 2018 at 11:59, Ilja Kantorovitch ***@***.***> wrote:
Is there any update on this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#42 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAfRgoYYSy7rzkXFwIFgQY95rKGzBgbjks5t8jQLgaJpZM4BUdvk>
.
--
Giuseppe Ragusa
|
I don't need anything fancy, but to make my workflow work in Julia I need the following:
Then either 2a. Specify during the regression already what kind of standard errors I need (for example HAC or HC - robust SEs) Most importantly then
For now I do 1 -> 2b -> 3 in R. |
Got vcov, see the new additions to StatsBase ( |
@nalimilan Essentially there needs to be a way to combine the outputs of
I hope I made myself sufficiently clear. For illustration I post the R code that I am using:
I don't think that this is possible at the moment using the above-mentioned packages, but this is a very standard setup for me. |
The Julia way is to avoid type-piracy. StatsBase provides the The issue with the R implementation is that many times the |
@IljaK91 Something you can do is:
which gives
|
did this work under julia 1.0.1 for you? I am getting
|
RegressionTables.jl does not support Julia 0.7/1.0 yet. |
@iwelch Please use Discourse for questions. GitHub isn't the place for that, especially as this one is about a different package. Thanks! |
@gragusa Your solution does not work anymore :( Can it be amended or is there a more standard solution by now?
|
This is more a feature request or policy question than a bug report.
I'm wondering whether you would like to add an argument allowing to easily compute sandwich (heteroskedasticity-robust), bootstrap, jackknife and possibly other types of variance-covariance matrix and standard errors, instead of the asymptotic ones. Or whether you would like to provide a function to compute these after the model has been fitted.
R makes it relatively cumbersome to get these common standard errors (see e.g. [1] or [2]), which either means that researchers with limited programming skills are incited to keep using Stata, or to keep relying only on asymptotic standard errors. I see no good reason why such an option shouldn't be provided in the standard modeling package.
1: http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-bootstrapping.pdf
2: http://cran.r-project.org/web/packages/sandwich/vignettes/sandwich-OOP.pdf
The text was updated successfully, but these errors were encountered: