Skip to content

Conversation

@nmdefries
Copy link
Contributor

Description

Add state field based on county FIPS. Change name of wave field to
version.

Changelog

  • Add script microdata_add_state_col__rename_wave.R

Add state field based on county FIPS. Change name of `wave` field to
`version`.
@nmdefries nmdefries requested a review from capnrefsmmat June 28, 2022 21:06
Copy link
Contributor

@capnrefsmmat capnrefsmmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be fine, just slightly more complicated than it has to be, since the FIPS conversion can be easier.

Not sure if it's worth changing the conversion if it already works. Could you maybe add an assertion after line 42 that checks that all rows with FIPS codes get a non-NA state? That should definitely happen, but if there's something weird about our mapping files (like with territories), we could get issues, and we don't want to have missingness because of that.

@nmdefries nmdefries requested a review from capnrefsmmat June 30, 2022 16:12
Comment on lines +73 to +77
# some people enter 9-digit ZIPs, which could make them easily identifiable in
# the individual output files. rather than truncating to 5 digits -- which may
# turn nonsense entered by some respondents into a valid ZIP5 -- we simply
# replace these ZIPs with NA.
data$zip5 <- ifelse(nchar(data$zip5) > 5, NA_character_,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for these two functions is borrowed from delphiFacebook/R/responses.R, but I'm wondering if this line is supposed to be handling A3. We drop zip5 anyway, so nulling it out here seems unnecessary. @capnrefsmmat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter_complete_responses does the other half of the work:

# what zip5 values have a large enough population (>100) to include in micro
# output. Those with too small of a population are blanked to NA
zip_metadata <- produce_zip_metadata(params$static_dir)[, c("zip5", "keep_in_agg")]
zipitude <- left_join(data_full, zip_metadata, by = "zip5")
change_zip <- !is.na(zipitude$keep_in_agg) & !zipitude$keep_in_agg
data_full$A3[change_zip] <- NA

We join with the ZIP metadata based on zip5, then blank out A3 based on the results (not zip5). Don't ask me why we need to use two different columns; undoubtedly the code did that in early 2020 and the logic just got ported over to this version

@nmdefries
Copy link
Contributor Author

@krivard This is ready to merge.

@krivard krivard merged commit 7e5c8a4 into main Jul 11, 2022
@krivard krivard deleted the ndefries/microdata-state-col branch July 11, 2022 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants