[fb-survey] Use effective samples size for binary SEs #1065

nmdefries · 2021-05-26T17:22:15Z

Description

Documented and actual calculations disagree on use of weights in binary.R standard errors. Define and use effective sample size (corresponding to respondent weights) instead of raw sample size (corresponding to uniform weights).

Changelog

Define effective sample size as 1/sum(w_i^2), which equals raw sample size when using uniform weights.
Use effective sample size instead of raw sample size in SE calculation.
Use effective sample size when adjusting binary proportions with Jeffreys prior.
Use effective sample size when calculating binary SEs.
Add tests to check that implementation matches definition as given in our documentation.
Update gold_receiving files.

Fixes

Code changes for #1271.

capnrefsmmat · 2021-10-22T14:58:01Z

I think we definitely need a couple of unit tests for SE calculations using weighted data, maybe by making up five or ten weights and comparing the result to our documentation. Without that, I'll be nervous.

Also, does this PR only change binary indicators, and not the CLI/ILI indicators? Those will also need correct weighting.

nmdefries · 2021-10-22T22:15:24Z

Sure, I'll add some tests.

CLI/ILI should already match the documentation. Here's the counts indicators code for effective sample size and initial SE and for adjusted SE.

I believe it's only binary indicators that haven't been incorporating weights properly. Let me know if I'm misunderstanding something.

nmdefries · 2021-10-27T21:37:34Z

Discovered some things.

About the docs

The documentation for how binary indicators are calculated doesn't include in the formula or mention the Jeffreys prior. We should at least throw in a sentence about that.
There's a typo in the count indicators adjusted SE formula in the documentation. It shouldn't have the n_eff in n_eff * se_unadjusted ^ 2.

About the code

binary.R::jeffreys_percentage should probably use effective_sample_size when adding the Jeffreys adjustment. It's currently using plain sample_size.This means that previous proportions will have been calculated differently (incorrectly?). Not sure if it's wrong, or if both ways are technically valid and it's user's choice. However, count indicators do use effective sample size when applying Jeffreys prior.
The formula for adjusted SE for count indicators in the code isn't exact. Out of (1 + 2 n_eff + n_eff^2) * old_se^2, the first two terms appear to have been dropped (since they're << than the third term) which means that it is close to but doesn't exactly agree with the docs.
The formula for SE for binary indicators also appears to be approximate. It's not clear to me that the p * (1-p) / n formula generalizes to the weighted case. Indeed, the documented formula and in-code formula give more-different results the farther from uniform the weights are.

These last two points are causing the test failures, although the absolute difference in implemented and from-the-docs SEs is smallish, O(0.001-0.01).

capnrefsmmat · 2021-10-28T14:31:00Z

I'll set aside the documentation points for now, since we should figure out how to fix the code and then ensure all documentation matches it.

binary.R::jeffreys_percentage should probably use effective_sample_size when adding the Jeffreys adjustment. It's currently using plain sample_size. This means that previous proportions will have been calculated differently (incorrectly?). Not sure if it's wrong, or if both ways are technically valid and it's user's choice. However, count indicators do use effective sample size when applying Jeffreys prior.

It's more than that: compute_binary_response does not attempt to generate an effective sample size at all, and just uses the sample size. So the proportions, Jeffreys adjustments, and SEs all match: they don't use effective sample size.

The formula for adjusted SE for count indicators in the code isn't exact. Out of (1 + 2 n_eff + n_eff^2) * old_se^2, the first two terms appear to have been dropped (since they're << than the third term) which means that it is close to but doesn't exactly agree with the docs.

Your link points to jeffreys_se, but the main SE calculation for count indicators is in compute_count_response. jeffreys_se is intended to adjust that standard error to be calculated as if the proportion had the Jeffreys adjustment applied. I believe that calculation correctly matches the documentation.

The formula for SE for binary indicators also appears to be approximate. It's not clear to me that the p * (1-p) / n formula generalizes to the weighted case. Indeed, the documented formula and in-code formula give more-different results the farther from uniform the weights are.

Yes, I think you're right here, and the code needs to match the documentation.

nmdefries · 2021-10-28T16:50:06Z

binary.R::jeffreys_percentage should probably use effective_sample_size when adding the Jeffreys adjustment. It's currently using plain sample_size. This means that previous proportions will have been calculated differently (incorrectly?). Not sure if it's wrong, or if both ways are technically valid and it's user's choice. However, count indicators do use effective sample size when applying Jeffreys prior.

It's more than that: compute_binary_response does not attempt to generate an effective sample size at all, and just uses the sample size.

I agree, but I've already fixed that particular aspect, that is, I've defined effective_sample_size using respondent weights so that where it is used, it's the correct value. But effective_sample_size is not being used in binary.R::jeffreys_percentage. At the moment, we're adjusting binary estimates like

(percentage * sample_size + 50) / (sample_size + 1)

regardless of if the estimate is weighted or unweighted. But we should be using (the correct value for) effective sample size instead, right?

The implication of this is that historical percentages for binary indicators are wrong. Perhaps we should delete them along with SEs.

The formula for adjusted SE for count indicators in the code isn't exact. Out of (1 + 2 n_eff + n_eff^2) * old_se^2, the first two terms appear to have been dropped (since they're << than the third term) which means that it is close to but doesn't exactly agree with the docs.

Your link points to jeffreys_se, but the main SE calculation for count indicators is in compute_count_response. jeffreys_se is intended to adjust that standard error to be calculated as if the proportion had the Jeffreys adjustment applied. I believe that calculation correctly matches the documentation.

The unadjusted formula looks fine, it's the adjustment formula that is inexact. Let me write out the algebra.

So starting from the documentation for SE adjustment (minus the typo),

But the implemented formula is

So there's these two missing terms in the square root, 1 * s_p ^2 and 2 * n_eff * s_p ^2, that appear to have been dropped because they're small relative to n_eff^2 * s_p ^2. It's not wrong, but it's not like it's harder to calculate the exact value, plus it doesn't match the documentation. So we can change the adjustment formula to be exact or mention the approximation in the docs. Or I guess decide it's close enough not to matter?

The formula for SE for binary indicators also appears to be approximate. It's not clear to me that the p * (1-p) / n formula generalizes to the weighted case.

Yes, I think you're right here, and the code needs to match the documentation.

👍

capnrefsmmat · 2021-11-01T18:02:28Z

I agree, but I've already fixed that particular aspect, that is, I've defined effective_sample_size using respondent weights so that where it is used, it's the correct value. But effective_sample_size is not being used in binary.R::jeffreys_percentage. At the moment, we're adjusting binary estimates like
(percentage * sample_size + 50) / (sample_size + 1)
regardless of if the estimate is weighted or unweighted. But we should be using (the correct value for) effective sample size instead, right?

The implication of this is that historical percentages for binary indicators are wrong. Perhaps we should delete them along with SEs.

Well... in principle, the Jeffreys adjustment is a hack. We're taking the actual proportion and deliberately fudging it so it can't be 0. A Jeffreys prior is one such way to fudge it, and can be justified by appeal to a Bayesian inference procedure, but there are others. The approach we're using matches the Jeffreys prior in the unweighted case but not in the weighted case. How bad is that?

Not too bad, I don't think, as long as the difference between the two approaches is not large and we document it. In fact this current approach does have one advantage: the user can undo it using data in the API, so they can see the actual sample proportion.

The unadjusted formula looks fine, it's the adjustment formula that is inexact. Let me write out the algebra.

So starting from the documentation for SE adjustment (minus the typo),

But the implemented formula is

So there's these two missing terms in the square root, 1 * s_p ^2 and 2 * n_eff * s_p ^2, that appear to have been dropped because they're small relative to n_eff^2 * s_p ^2. It's not wrong, but it's not like it's harder to calculate the exact value, plus it doesn't match the documentation. So we can change the adjustment formula to be exact or mention the approximation in the docs. Or I guess decide it's close enough not to matter?

Hmmm. How did you arrive at the conclusion that the documented version should not have that n_e? Note that n_e in the documentation is one over the sum of the squared weights, and not just the sum of the weights.

The original derivation is documented here: https://github.com/cmu-delphi/covid-19/blob/deeb4dc1e9a30622b415361ef6b99198e77d2a94/facebook/prepare-extracts/aggregations-setup.R#L210-L247

It uses slightly different notation, so let me do it here. Suppose w_i is the weight for observation i, after normalization so they sum to 1. Let y_i be the response for observation i and let \hat \mu be the mean over all observations. Then we have that

Now we add a "fake" observation with value 1/2. Working on both sides,

Now we solve for the new standard error:

Now we see the outline of what's implemented in jeffreys_se, since the weight normalization means that the sum of the weights is the sample size.

The remaining question is whether jeffreys_se is using the correct inputs to achieve the desired results. First, it's using sample_size instead of the sum of the weights. As it's written now, sample_size is actually the effective sample size produced in compute_count_response, i.e.

where n_e is defined as in the documentation. Second, it's using

as the "old se" instead of the one defined above that correctly accounts for the weights. If we substitute these choices in, what do we get?

I dunno, I'm tired, let me finish this later. But evidently I've gone about this a different way than you did. Any idea where we diverged specifically, so we can figure out what's wrong before I finish this off?

nmdefries · 2021-11-01T22:25:05Z

The approach we're using matches the Jeffreys prior in the unweighted case but not in the weighted case. How bad is that?

Not too bad, I don't think, as long as the difference between the two approaches is not large and we document it. In fact this current approach does have one advantage: the user can undo it using data in the API, so they can see the actual sample proportion.

The difference is small (something like O(0.01)) on test data using weights drawn from a uniform distribution; haven't compared with real data.

But we are applying the Jeffreys prior using effective sample size for count indicators vs plain sample size for binary indicators, so there's some inconsistency there. Is using raw sample size outright wrong in the weighted case, or is it user's choice and either raw or weighted sample size would be valid (since the adjustment is kind of hacky, like you said)? Are we justified in using plain sample size?

The original derivation is documented here: https://github.com/cmu-delphi/covid-19/blob/deeb4dc1e9a30622b415361ef6b99198e77d2a94/facebook/prepare-extracts/aggregations-setup.R#L210-L247

Thanks for linking the derivation, that's helpful. I didn't realize there were different definitions of n_eff floating around.

RE what I think is a typo,

Adding more in a separate comment on the derivation you linked.

capnrefsmmat · 2021-11-02T15:15:38Z

Okay, I just sent you an invite to an Overleaf document (basically Google Docs for LaTeX, if you haven't used it before) so we can work this out there instead of sharing screenshots of LaTeX snippets. I've written out most of the derivation, with some corrections made for typos and such.

So far I'm leaning in the direction that you're right and the code has a typo. But I'd like to write out the derivation in the code in detail and see how, precisely, the two versions differ and where we might have gone wrong in the code; that seems prudent when we don't quite understand how the code got this way, since maybe it's smarter than we are.

nmdefries · 2021-11-02T21:31:03Z

Thanks for setting up the doc. I added a derivation in the same vein as the one you linked but assuming that the weights are normalized.

capnrefsmmat · 2021-11-08T16:52:42Z

I've updated the Overleaf: I think the difference between code and documentation can be reduced to a simple difference (w0 = 1/neff vs 1/(1 + neff)). Does that look right to you?

If so, then we can proceed to deciding which one is more correct.

nmdefries · 2021-11-08T17:36:20Z

Yep, looks good!

capnrefsmmat · 2021-11-08T19:52:25Z

Made a further update to reorganize the document so it's easy to compare the approaches and understand them. I think that helped me realize a further issue in the documented version that explains why it's different, beyond the choice of w_0. I've added extra explanation.

I think the conclusion is that the code is correct and the documentation is wrong. Your definition followed the documentation and started at equation (30), but the code started at equation (19), which is more correct. Or at least it's more correct if you want to add the extra fake weight to both numerator and denominator. Let me know if you agree.

nmdefries · 2021-11-09T22:38:12Z

Yes, I agree with your conclusion. The discussion of what w_0 should be in the unweighted case is particularly compelling in support of the implemented definition.

The derivations in question, for documentation purposes.

nmdefries · 2021-11-11T19:05:41Z

I sent you an invite to another Overleaf doc dealing with the binary case.

Transformˆp: recalculate it as if there were one additional observation i = 0 with Y_0= 1/2, then calculate the standard error according to the equation above while including that extra observation in the sum.

We can't do this directly using the explicit sum(w_i ^ 2 * (Y_i - p_hat) ^ 2 since the lists of weights and responses aren't available in the recalculation/adjustment function. More detail in the doc.

capnrefsmmat · 2021-11-11T19:59:09Z

Thanks, the document looks good. I think we can go ahead and implement option 2, and prepare a separate PR to fix the documentation for both binary and count indicators.

capnrefsmmat · 2021-11-11T20:01:52Z

And if I understand correctly, all this work led to the conclusion "This PR is correct"? So we just need to prepare the documentation?

krivard · 2021-11-11T21:13:36Z

(fix the tests)

clarify comment add uniform tests, increase sample size, correct counts formula update tests new test names

nmdefries · 2021-11-12T20:19:11Z

And derivation/discussion for binary indicators

This PR isn't quite done, I still need to repair the integration tests.

nmdefries · 2021-11-16T16:59:29Z

@capnrefsmmat This is ready to review.

capnrefsmmat

Looking good. Let's plan to deploy this the week of Nov 30, and figure out a backfill plan with Katie to begin that week.

Mix respondent weights for all indicators

nmdefries · 2021-12-16T17:53:07Z

Wait for #1397 to be merged.

nmdefries · 2022-07-15T18:21:05Z

@krivard This is ready to merge. Since the CTIS API and contingency table pipelines are turned off at this point, this will be used for regenerating/backfilling data with the new, corrected SE definition.

use effective samples size for binary SEs

fd14e57

nmdefries marked this pull request as draft May 26, 2021 17:22

nmdefries changed the title ~~[fb-survey] Use effective samples size for to calculate binary SEs~~ [fb-survey] Use effective samples size for binary SEs May 26, 2021

Merge branch 'main' into fb-package-weight-binary-ses

82100c1

nmdefries marked this pull request as ready for review October 19, 2021 14:47

generalize eff_n calculation so can drop assert

6a08698

nmdefries requested a review from capnrefsmmat October 20, 2021 17:12

nmdefries mentioned this pull request Nov 12, 2021

Update formulas for weighted count and binary indicators cmu-delphi/delphi-epidata#766

Merged

nmdefries added 5 commits November 12, 2021 12:32

combine loops

69c5909

update unit tests

5d957b4

update integration tests

1ae97c4

adjust prop with Jeffreys prior on n_eff

987f0ae

update binary_se formula

90c12e3

test that values match online docs

f13f9c7

clarify comment add uniform tests, increase sample size, correct counts formula update tests new test names

nmdefries force-pushed the fb-package-weight-binary-ses branch from edaa3bf to f13f9c7 Compare November 12, 2021 17:47

repair integration tests and make new gold test files

99c681b

nmdefries mentioned this pull request Nov 18, 2021

Mix respondent weights for all indicators #1372

Merged

capnrefsmmat approved these changes Nov 19, 2021

View reviewed changes

nmdefries added 5 commits December 6, 2021 11:27

Merge branch 'main' into fb-package-weight-binary-ses

b7fa57a

mix weights for all indicators, deprecate skip_mixing

873de82

update gold receiving

3b65ec4

add mixing to binary ind tests

845da21

Merge pull request #1372 from cmu-delphi/survey-comprehensive-mixing

9a9382a

Mix respondent weights for all indicators

nmdefries mentioned this pull request Dec 16, 2021

Discontinue unweighted survey signals #1335

Closed

Merge branch 'main' into fb-package-weight-binary-ses

cbd6d14

krivard merged commit 8065c6a into main Jul 15, 2022

krivard deleted the fb-package-weight-binary-ses branch July 15, 2022 18:57

krivard mentioned this pull request Jul 27, 2022

Release covidcast-indicators 0.3.19 #1663

Merged

[fb-survey] Use effective samples size for binary SEs #1065

[fb-survey] Use effective samples size for binary SEs #1065

Uh oh!

Conversation

nmdefries commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changelog

Fixes

Uh oh!

capnrefsmmat commented Oct 22, 2021

Uh oh!

nmdefries commented Oct 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nmdefries commented Oct 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About the docs

About the code

Uh oh!

capnrefsmmat commented Oct 28, 2021

Uh oh!

nmdefries commented Oct 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

capnrefsmmat commented Nov 1, 2021

Uh oh!

nmdefries commented Nov 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

capnrefsmmat commented Nov 2, 2021

Uh oh!

nmdefries commented Nov 2, 2021

Uh oh!

capnrefsmmat commented Nov 8, 2021

Uh oh!

nmdefries commented Nov 8, 2021

Uh oh!

capnrefsmmat commented Nov 8, 2021

Uh oh!

nmdefries commented Nov 9, 2021

Uh oh!

nmdefries commented Nov 11, 2021

Uh oh!

capnrefsmmat commented Nov 11, 2021

Uh oh!

capnrefsmmat commented Nov 11, 2021

Uh oh!

krivard commented Nov 11, 2021

Uh oh!

nmdefries commented Nov 12, 2021

Uh oh!

nmdefries commented Nov 16, 2021

Uh oh!

capnrefsmmat left a comment

Choose a reason for hiding this comment

Uh oh!

nmdefries commented Dec 16, 2021

Uh oh!

nmdefries commented Jul 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nmdefries commented May 26, 2021 •

edited

Loading

nmdefries commented Oct 22, 2021 •

edited

Loading

nmdefries commented Oct 27, 2021 •

edited

Loading

nmdefries commented Oct 28, 2021 •

edited

Loading

nmdefries commented Nov 1, 2021 •

edited

Loading