Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve smoothed Facebook signal in locations that only occasionally meet sample size thresholds #36

Closed
capnrefsmmat opened this issue May 21, 2020 · 5 comments
Labels
CTIS Improvements and reporting for CTIS

Comments

@capnrefsmmat
Copy link
Contributor

If a county only has one observation over two weeks, the raw signal will have a spike in it; the smoothed signal will spike and then drop later, since it takes the last 7 days of data.

If a county consistently has observations, but most days they're under the sample size threshold, the raw signal will report NAs on most days and signals on others. That's fine, but the smoothing uses the past 7 days, and will smooth over the NAs and possibly create strange visual artifacts.

Is there a better smoothing or filtering method to avoid this?

@krivard
Copy link
Contributor

krivard commented May 21, 2020

It looks like we're actually dropping counties below the sample size threshold, not reporting NA. In aggregations-setup.R:

   231  filter_WITHSMALL = function(WITHSMALL.aggregation, max.allowed.normalized.weight=default.small.normalized.weight) {
   232      WITHSMALL.aggregation %>>%
   233          (? range(.$MaxHouseholdMixedWeight)) %>>%
   234          ("before num response filter" ? nrow(.)) %>>%
   235          dplyr::filter(NumberResponses >= 100L) %>>%
   236          ("after num response filter" ? nrow(.)) %>>%
   237          dplyr::filter(EffectiveNumberResponses >= 100L) %>>%
   238          ("after effective num response filter" ? nrow(.)) %>>%
   239          dplyr::filter(MaxHouseholdMixedWeight <= max.allowed.normalized.weight) %>>%

We do this after smoothing though, so I'm not sure how that affects the weird spike behavior.

Forex prepare-covidalert-firstsmoothed_last7_1in7-noagsf-yesweighted-aggregations.R:

     1  #### Prepare in earlier Delphi-internal access&format:
     2  firstsmoothed_last7_1in7.noagsf.yesweighted.daily.hrr.for.delphi =
     3      NOSHARE.WITHSMALL.unsmoothed.yesweighted.daily.hrr %>>%
     4      smooth_aggregation_skiprollq("HRRnum",7L,c(1L,7L)) %>>%
     5      filter_WITHSMALL() %>>%
     6      dplyr::filter(Date <= LAST_WEIGHTS) %>>%
     7      {.}

@krivard
Copy link
Contributor

krivard commented May 26, 2020

@nloliveira It's also worth investigating whether the effect seen here in the facebook household cli signals occurs in other sources -- ght comes to mind -- and whether they can/should share a similar solution.

@krivard
Copy link
Contributor

krivard commented Jul 1, 2020

The new fb pipeline may not have this problem, since smoothing happens before the sample size filters.

@capnrefsmmat capnrefsmmat added the CTIS Improvements and reporting for CTIS label Jul 5, 2020
@dshemetov dshemetov reopened this Jul 27, 2020
@dshemetov
Copy link
Contributor

Woops, misclick.

@capnrefsmmat
Copy link
Contributor Author

In the new pipeline, this should only happen to a county that gets nearly 100 observations per 7 days. Some 7-day periods will be reported and some will be omitted. This should not cause jumps or drops in the signal, though. Closing for now; we can open a new issue if there's a more specific problem with the new behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CTIS Improvements and reporting for CTIS
Projects
None yet
Development

No branches or pull requests

5 participants