-
Notifications
You must be signed in to change notification settings - Fork 16
Script to amend monthly rollup microdata #1650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add state field based on county FIPS. Change name of `wave` field to `version`.
capnrefsmmat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will be fine, just slightly more complicated than it has to be, since the FIPS conversion can be easier.
Not sure if it's worth changing the conversion if it already works. Could you maybe add an assertion after line 42 that checks that all rows with FIPS codes get a non-NA state? That should definitely happen, but if there's something weird about our mapping files (like with territories), we could get issues, and we don't want to have missingness because of that.
| # some people enter 9-digit ZIPs, which could make them easily identifiable in | ||
| # the individual output files. rather than truncating to 5 digits -- which may | ||
| # turn nonsense entered by some respondents into a valid ZIP5 -- we simply | ||
| # replace these ZIPs with NA. | ||
| data$zip5 <- ifelse(nchar(data$zip5) > 5, NA_character_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for these two functions is borrowed from delphiFacebook/R/responses.R, but I'm wondering if this line is supposed to be handling A3. We drop zip5 anyway, so nulling it out here seems unnecessary. @capnrefsmmat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter_complete_responses does the other half of the work:
covidcast-indicators/facebook/delphiFacebook/R/responses.R
Lines 772 to 777 in 3590e64
| # what zip5 values have a large enough population (>100) to include in micro | |
| # output. Those with too small of a population are blanked to NA | |
| zip_metadata <- produce_zip_metadata(params$static_dir)[, c("zip5", "keep_in_agg")] | |
| zipitude <- left_join(data_full, zip_metadata, by = "zip5") | |
| change_zip <- !is.na(zipitude$keep_in_agg) & !zipitude$keep_in_agg | |
| data_full$A3[change_zip] <- NA |
We join with the ZIP metadata based on zip5, then blank out A3 based on the results (not zip5). Don't ask me why we need to use two different columns; undoubtedly the code did that in early 2020 and the logic just got ported over to this version
|
@krivard This is ready to merge. |
Description
Add state field based on county FIPS. Change name of
wavefield toversion.Changelog
microdata_add_state_col__rename_wave.R