Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split AnalyticWeights into several types #758

Open
ParadaCarleton opened this issue Jan 19, 2022 · 17 comments
Open

Split AnalyticWeights into several types #758

ParadaCarleton opened this issue Jan 19, 2022 · 17 comments

Comments

@ParadaCarleton
Copy link
Contributor

As mentioned when working on #754, it might make sense to split AnalyticWeights into two separate types of weights:

  1. PrecisionWeights, which represent observations with a known variance. The weight for each observation should be equal to 1/σ^2.
  2. BatchWeights, which represent observations that are themselves averages of several observations (as used in e.g. meta-analysis). Each weight should be an integer equal to the number of observations in the batch.

It's possible to break these down further -- BatchWeights might be broken down based on whether they're homogenous (identical means) or heterogenous (differing means), and PrecisionWeights could be broken down based on whether each observation has a variance that is completely known or known only up to a constant.

This would solve the current problem where the ambiguity of AnalyticWeights can result in nonsense answers -- for instance, var currently returns an estimate that assumes a sample size equal to length(weights), which is incorrect if each observation is actually an average of several sub-observations (the way the docstring for AnalyticWeights says they should be used).

@nalimilan
Copy link
Member

Indeed we really need to clarify the definition of weights. Though I'm not sure whether we should deprecate AnalyticWeights altogether, or just define them more precisely, and possibly add another kind of weights for the case they don't support. We should first review our current implementation and check what definition is uses (hoping that it's consistent everywhere).

Also, before adding a new weights type, I'd like to know whether that exists in other implementations. If not, we have to consider very seriously why.

FWIW, here's the link to SAS's documentation for the VARDEF argument to PROC MEANS: https://v8doc.sas.com/sashtml/proc/z0146729.htm#z0093503. All formulas are given. In this procedure, weights are analytic weights. VARDEF=DF estimates the variance of an observation with a unit weight (this is equivalent to what we do for frequency weights), and VARDEF=WGT estimates the variance of an observation with average weight (this is equivalent to corrected=false for analytic weights in StatsBase). The formula we use for analytic weights when correct=true isn't supported by SAS AFAICT.

@ParadaCarleton
Copy link
Contributor Author

Indeed we really need to clarify the definition of weights. Though I'm not sure whether we should deprecate AnalyticWeights altogether, or just define them more precisely, and possibly add another kind of weights for the case they don't support. We should first review our current implementation and check what definition is uses (hoping that it's consistent everywhere).

The current implementation uses the following definition/assumptions:

  1. Every weight is proportional to, but does not equal, the precision (inverse-variance) of a particular observation.
  2. The total number of observations equals length(weights).

I haven't seen extra weight types like this in other languages. Why that is varies; sometimes it's solved using keyword arguments, sometimes by clearly documenting in each function what the weights mean.

@nalimilan
Copy link
Member

OK. Do you have examples of either case?

@nalimilan
Copy link
Member

The total number of observations equals length(weights).

Why does the result depend on the scale of weights if the number of observations is taken to be the length? :-/

@ParadaCarleton
Copy link
Contributor Author

The total number of observations equals length(weights).

Why does the result depend on the scale of weights if the number of observations is taken to be the length? :-/

For the current implementation, it doesn't. However, for most implementations outside of Hmisc (e.g. Stata) the scale does matter. The docstring also implies the scale should matter by giving the following examples of how AnalyticWeights should be used:

  1. First it defines them as inverse variance weights, in which case the weights for each observation should affect the variance calculations: Large weights mean the precision is high/variance is small.
  2. Then it says they could be used as weights for batches, where the weight for each observation is the total number of samples taken. In that case, large weights mean each value is the average of many observations. If each value is the average of many observations, but the observed variance for the values as a whole is held constant, that increases the implied variance for a single observation.

@ParadaCarleton
Copy link
Contributor Author

OK. Do you have examples of either case?

As an example of documenting this, Stata explains here how the weights behave, noting that:

the variance is a function of α, the average of the reciprocal weights. If the weights are scaled arbitrarily, so is the variance.

Hmisc uses a keyword argument, where the sample size for a weighted variance is treated as length(x) when normwt=TRUE; otherwise, the sample size is considered to be the sum of the weights. See here.

@nalimilan
Copy link
Member

Interesting link about Stata. So what I understand from there is that analytic weights in Stata are really used for the case where cases are averages of independent observations, but the estimate of the mean squared error given by regress is not an estimate of the population MSE/variance but rather of the average variance of the cases in the dataset. This is necessary to be insensitive to the scale of the weights. Is that right?

Our docstring is modeled after Stata's analytic weights, so it inherits its strengths and weaknesses. AFAICT it's not plain wrong to say that AnalyticWeights represent the number of observations that have been averaged; but it's misleading as then var does not give an estimate of the variance in the original population, which people may assume it does.

So what I seem to understand is that we could keep AnalyticWeights to represent inverse-variance weights, which are scale-independent. But we would introduce a new kind of weights (now or later) which would represent the number of averaged observations, and for which var would estimate the population variance. What do you think?

@nalimilan nalimilan mentioned this issue Jan 24, 2022
@ParadaCarleton
Copy link
Contributor Author

Interesting link about Stata. So what I understand from there is that analytic weights in Stata are really used for the case where cases are averages of independent observations, but the estimate of the mean squared error given by regress is not an estimate of the population MSE/variance but rather of the average variance of the cases in the dataset. This is necessary to be insensitive to the scale of the weights. Is that right?

Our docstring is modeled after Stata's analytic weights, so it inherits its strengths and weaknesses. AFAICT it's not plain wrong to say that AnalyticWeights represent the number of observations that have been averaged; but it's misleading as then var does not give an estimate of the variance in the original population, which people may assume it does.

Yes, although even here we run up against several ambiguities:

  1. var returns the variance for an observation with a weight equal to the sample mean, which may be different from the average weight in the population. If the mean population weight is unknown, e.g. when sampling from a population where subgroups have differing variances, the estimate returned by var will be downward-biased. Populations where the mean weight is unknown are common when dealing with heteroscedastic data.
  2. There's an ambiguity in whether observations can be treated as coming from homogenous groups or not -- the appropriate variance calculation differs depending on whether or not we can assume all populations have the same mean.

I believe Stata's choice is mostly motivated by their defining the standard error as var(x) / length(x).

So what I seem to understand is that we could keep AnalyticWeights to represent inverse-variance weights, which are scale-independent. But we would introduce a new kind of weights (now or later) which would represent the number of averaged observations, and for which var would estimate the population variance. What do you think?

I personally think the best solution is to create several types of weights we can document unambiguously, while leaving AnalyticWeights unchanged but deprecated to avoid breaking old code. I think this minimizes the risk of errors.

@nalimilan
Copy link
Member

OK. Could you write a detailed proposal regarding which formulas these new weights types would use in var, sem, quantile, etc.?

@ParadaCarleton
Copy link
Contributor Author

OK. Could you write a detailed proposal regarding which formulas these new weights types would use in var, sem, quantile, etc.?

Sure; but actually, let me think about this a little bit more, since I'm starting to wonder how much of this should go in StatsBase and how much should go in a different package for meta-analysis.

@ParadaCarleton
Copy link
Contributor Author

OK. Could you write a detailed proposal regarding which formulas these new weights types would use in var, sem, quantile, etc.?

Ok, thinking about it, I think most of the weights types can be folded into AnalyticWeights to avoid creating too many types to deal with, but I think this is only possible if we add extra information to AnalyticWeights. Namely, we'd need the sample size.

The only type I can't imagine figuring out how to fold in is inverse-variance weights that are known exactly ahead of time, but that's definitely an extremely rare use case.

@nalimilan
Copy link
Member

OK. So you suggest we store the sample size as a separate field, defaulting to length(wv) to preserve the current behavior? And then if weights represent the average of several observations, you would set it to the original number of independent observations?

@ParadaCarleton
Copy link
Contributor Author

ParadaCarleton commented Feb 16, 2022

OK. So you suggest we store the sample size as a separate field, defaulting to length(wv) to preserve the current behavior? And then if weights represent the average of several observations, you would set it to the original number of independent observations?

Correct; that way, var is an estimator for the population variance. Although I am a little iffy on making length(wv) the default behavior, and think we should deprecate that (later on, in case anyone's using it right now). Using length(wv) implies each weight represents a single observation, in which case why use weights? I think setting the sample equal to the sum of weights would be a better default behavior.

@nalimilan
Copy link
Member

@gragusa This discussion is probably relevant for JuliaStats/GLM.jl#487.

@ParadaCarleton
Copy link
Contributor Author

The more I've thought about it, the more I think the simplest solution might be best: clearly defining AnalyticWeights to mean "the average of n samples, as in meta-analysis." This is by far the most common use case. In that case, the weights should be unambiguously documented as requiring weights.total to exactly equal the total/combined sample size for all experiments, and we should avoid referring to them as inverse-variance or reliability weights, which is likely to cause confusion.

This would require deprecating any variance methods that allow passing AnalyticWeights, however. Unlike for the case of inverse probability weights, there are many important assumptions that go into a meta-analysis, most notably about homogeneity of means and variances. And when the recorded sample variance within each group is unknown, and has to be inferred from the vector of means, the best estimator is highly inefficient; users should probably be discouraged from using it, unless they know what they're doing and know there's no better estimator available.

@nalimilan
Copy link
Member

This definition sounds fine as it's quite precise. Though as long as the estimator makes some sense, I wouldn't completely deprecate it, but instead add a warning to the docs. Also, are you sure all our current methods are consistent with that definition? They would have to be scale-dependent, right?

Do you still think we would need to add another weights type like PrecisionWeights?

@ParadaCarleton
Copy link
Contributor Author

This definition sounds fine as it's quite precise. Though as long as the estimator makes some sense, I wouldn't completely deprecate it, but instead add a warning to the docs.

The problem is that asking for the variance here is ambiguous: are you asking for the variance across the aggregates, or across individual cases? One is much bigger than the other. There's also the problem that lots of users won't read the docs from top to bottom, especially since they think they know what var means. Most of the time, someone interested in one definition won't realize the other definition is even a possibility!

Do you still think we would need to add another weights type like PrecisionWeights?

I don't think we'd have to, given that situations with a precisely-known variance are rare outside of textbook problems, but there's not much reason to exclude them. They're just not a very high priority IMO.

Also, are you sure all our current methods are consistent with that definition? They would have to be scale-dependent, right?

Methods would have to be scale-dependent for sem or var if we included them, but that shouldn't be the problem for means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants