Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expanding fweights and pweights #283

Closed
jeffwong opened this issue Jul 15, 2017 · 2 comments
Closed

Expanding fweights and pweights #283

jeffwong opened this issue Jul 15, 2017 · 2 comments

Comments

@jeffwong
Copy link

jeffwong commented Jul 15, 2017

I would like to bring fweights and pweights into the GLM package.

Different types of weights affect the vcov function in the GLM package. Here is a reference on how cov varies

One thing that I would like to add is the ability to combine fweights and pweights in a GLM. I think we can start here in StatsBase to expose the combination of weights. For example, I believe the appropriate varcorrection function for combining fweights and pweights would be

@inline function varcorrection(fw::FrequencyWeights, pw::ProbabilityWeights, corrected::Bool=false)
    n_f = fw.sum    
    if corrected
        n = count(!iszero, pw)
        n / (n_f * (n - 1))
    else
        1 / n_f
    end
end

The intuition is that the fweights tell us how many data points we are able to observe in our sample. The pweights tell us a relative weighting within the sample depending on the probability that a particular data sampled was sampled. The varcorrection would replace the "s" with the s from the fweights, as that is the component which is telling us how large the sample is.

This feels like it is just ProbabilityWeights with sum = fw.sum, and it doesn't need a separate varcorrection function. I wonder if it is clearer though to write it like this, to show that fweights and pweights are being combined?

In GLM, I plan to write a similar function that combines fweights and pweights into a single weight vector that is used for maximum likelihood

@ararslan
Copy link
Member

I'll preface this by saying that I quite rarely have had to work with weights, so take my opinions with a grain of salt... I believe @nalimilan and @rofinn are better versed in weights.

In GLM, I plan to write a similar function that combines fweights and pweights into a single weight vector that is used for maximum likelihood

👍 though it sounds like that function should probably live in this package instead since it could be more generally applicable to other packages in addition to GLM.

The function that you've shown here seems reasonable I think, though we'll need to do some 0 checking. (For example, it could be the case that the probability weights are [0,0,1,0] or whatever, in which case we'd get a division error in the corrected case.)

@nalimilan
Copy link
Member

Closing in favor of JuliaStats/GLM.jl#186.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants