Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually Spot-Check Maternal Mortality Data #2891

Closed
4 tasks
benhammondmusic opened this issue Feb 9, 2024 · 1 comment
Closed
4 tasks

Manually Spot-Check Maternal Mortality Data #2891

benhammondmusic opened this issue Feb 9, 2024 · 1 comment

Comments

@benhammondmusic
Copy link
Collaborator

benhammondmusic commented Feb 9, 2024

Manually Spot-Check Data

do #2890 first. It is essential to remember, our automated tests can only help confirm that the CODE itself is functioning the way we expected. It does NOT ensure scientific or mathematic accuracy. Before writing extensive tests, we need to manually confirm that our calculations / data transformations are accurate. The following checks should be documented in the description of a PR

  • Pick a random sample of data points, attempting to get something from every demographic breakdown and from every geographic breakdown. Also ensure data points from any known edge cases are checked
  • CHECK 100K / PCT RATES: If you are obtaining rates directly from the source data, you can simply confirm the data is passing through from the source to the BigQuery and further exported into the Bucket accurately. Otherwise if calculations are happening in the datasource file, you should "show your work" and recreate those calculations showing the source numerator and the source denominator
  • Repeat the above manual calculations and documentation for all displayed metrics: PCT_SHARE, PCT_RELATIVE_INEQUITY, POP_PCT_SHARE, etc.
  • Screenshots with both the source table and the HET BigQuery table side by side can be helpful to include in the PR documentation as well
@benhammondmusic
Copy link
Collaborator Author

SPOT CHECK

Current Year, National, Hispanic, Per 100k

Screenshot 2024-04-24 at 11 08 15 AM

2001, Alabama, Hispanic, Per 100k

Screenshot 2024-04-24 at 11 19 44 AM

1999, National, Black NH, pct_share of MM

In BigQuery
pct_rel_inequity = +206.7%

pct_rel_inequity =
(pct_share of condition - pct_share of population) / pct_share of population

pct_share of condition = a race groups count of condition / total count of condition all races

black mm count = 186
all mm count = 505

black pct_share_mm = 186/505 = 36.8%

black lb count = 593,200
all lb count = 3,965,200

black pct_share_pop (lb) = 593,200 / 3,965,200 = 15.0%

(36.8 - 15) / 15 = +145.3% higher than expected (not of expected)

Need to do #3192 to more accurately calculate our pct_rel inequity figures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants