Manually Spot-Check Maternal Mortality Data #2891

benhammondmusic · 2024-02-09T04:21:12Z

Manually Spot-Check Data

do #2890 first. It is essential to remember, our automated tests can only help confirm that the CODE itself is functioning the way we expected. It does NOT ensure scientific or mathematic accuracy. Before writing extensive tests, we need to manually confirm that our calculations / data transformations are accurate. The following checks should be documented in the description of a PR

Pick a random sample of data points, attempting to get something from every demographic breakdown and from every geographic breakdown. Also ensure data points from any known edge cases are checked
CHECK 100K / PCT RATES: If you are obtaining rates directly from the source data, you can simply confirm the data is passing through from the source to the BigQuery and further exported into the Bucket accurately. Otherwise if calculations are happening in the datasource file, you should "show your work" and recreate those calculations showing the source numerator and the source denominator
Repeat the above manual calculations and documentation for all displayed metrics: PCT_SHARE, PCT_RELATIVE_INEQUITY, POP_PCT_SHARE, etc.
Screenshots with both the source table and the HET BigQuery table side by side can be helpful to include in the PR documentation as well

benhammondmusic · 2024-04-24T17:48:57Z

SPOT CHECK

Current Year, National, Hispanic, Per 100k

2001, Alabama, Hispanic, Per 100k

1999, National, Black NH, pct_share of MM

In BigQuery
pct_rel_inequity = +206.7%

pct_rel_inequity =
(pct_share of condition - pct_share of population) / pct_share of population

pct_share of condition = a race groups count of condition / total count of condition all races

black mm count = 186
all mm count = 505

black pct_share_mm = 186/505 = 36.8%

black lb count = 593,200
all lb count = 3,965,200

black pct_share_pop (lb) = 593,200 / 3,965,200 = 15.0%

(36.8 - 15) / 15 = +145.3% higher than expected (not of expected)

Need to do #3192 to more accurately calculate our pct_rel inequity figures

benhammondmusic added Data 👕 T-Shirt MD Python labels Feb 9, 2024

benhammondmusic added this to the Maternal Mortality Data Pipeline milestone Feb 9, 2024

benhammondmusic mentioned this issue Feb 9, 2024

Generate Maternal Mortality "golden_data" files #2892

Closed

6 tasks

benhammondmusic assigned JDemlow and alinix1 Apr 16, 2024

benhammondmusic closed this as completed May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manually Spot-Check Maternal Mortality Data #2891

Manually Spot-Check Maternal Mortality Data #2891

benhammondmusic commented Feb 9, 2024 •

edited by alinix1

benhammondmusic commented Apr 24, 2024

Manually Spot-Check Maternal Mortality Data #2891

Manually Spot-Check Maternal Mortality Data #2891

Comments

benhammondmusic commented Feb 9, 2024 • edited by alinix1

Manually Spot-Check Data

benhammondmusic commented Apr 24, 2024

SPOT CHECK

Current Year, National, Hispanic, Per 100k

2001, Alabama, Hispanic, Per 100k

1999, National, Black NH, pct_share of MM

benhammondmusic commented Feb 9, 2024 •

edited by alinix1