Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Total cvap is not exactly the sum of each race #2

Closed
kuriwaki opened this issue Jul 18, 2022 · 3 comments
Closed

Total cvap is not exactly the sum of each race #2

kuriwaki opened this issue Jul 18, 2022 · 3 comments

Comments

@kuriwaki
Copy link
Contributor

This graph compares the total CVAP with the sum of the cvap_race partitions:

block_est <- cvap_distribute_censable('DE')

block_est %>% 
    mutate(cvap_implied_total = cvap_white + cvap_black + cvap_hisp + cvap_asian + 
               cvap_aian + cvap_nhpi + cvap_two + cvap_other) %>% 
    ggplot(aes(cvap, cvap_implied_total))  + 
    geom_abline(alpha = 0.5, color = "red") + 
    geom_point(size = 0.1, alpha = 0.25) +
    coord_equal() + theme_bw()

image

So the implied total of races is not exactly the same as the cvap variable, though they are highly correlated. I guess this makes sense in that each cvap variable is separately estimated and so some noise gets included independently in each one. Might be worth a documentation note somewhere.

This particular discrepancy can be relevant in EI applications, when the row margins must add up exactly to the grand total.

@christopherkenny
Copy link
Owner

Thanks Shiro. This is an interesting point. They should always align at the block group level, but the block level is necessarily estimated and it uses the subgroup that matches for that purpose. I can add a note.

Maybe, it's worth adding a method that does it all by the same piece so that counts should be consistent. Would that be helpful?

@kuriwaki
Copy link
Contributor Author

A version that is consistent with respect totals would be nice for EI, especially if that carries through to the VTD level. But it's not a strict necessity for me at the moment. For now we can ignore the total cvap variable and used the implied total for the total cvap population. Thanks!

christopherkenny added a commit that referenced this issue Mar 16, 2023
@christopherkenny
Copy link
Owner

Sorry @kuriwaki, I let this get lost.

I've added an option include_implied which adds a column impl_cvap for the implied cvap. This allows you to retain both the "best" total and the "implied" total. Default for include_implied is TRUE, as it adds about 10ms to the runtime on average!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants