Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rounding/Truncation Error when calculating strata incidence rates #1307

Closed
chrisknoll opened this issue Aug 22, 2019 · 0 comments · Fixed by #1310
Closed

Rounding/Truncation Error when calculating strata incidence rates #1307

chrisknoll opened this issue Aug 22, 2019 · 0 comments · Fixed by #1310
Assignees
Milestone

Comments

@chrisknoll
Copy link
Collaborator

When creating an incidence rate calculation with multiple strata, and one of the strata groups included the entire cohort, there is a mismatch in the summary time at risk compared to the individual strata group's time at risk, as shown below:

image

Note: this is on PDW. This appears to be due to how the CAST(SUM(tar/365.25) as bigint) is casting all of the inner expressions to BIGINT before summing, which leads to a small truncation of decimal values for each row aggregated. The expectation was that the SUM would sum the decimal values together and then cast the result to BIGINT, but that does not appear to be the case.

After some testing, the fix is to write it in the following form:

old:

select 77 as analysis_id, T.target_id, T.outcome_id, CAST(E.strata_mask AS bigint) as bitmask,
  COUNT(subject_id) as person_count, 
  CAST(sum(1.0 * time_at_risk / 365.25) AS BIGINT) as time_at_risk,
  sum(is_case) as cases
from #time_at_risk T
JOIN (
  select E.event_id, E.person_id, E.start_date, E.end_date, SUM(coalesce(POWER(cast(2 as bigint), SC.strata_sequence), 0)) as strata_mask
  FROM #analysis_events E
  LEFT JOIN #strataCohorts SC on SC.person_id = E.person_id and SC.event_id = E.event_id
  group by E.event_id, E.person_id, E.start_date, E.end_date
) E on T.subject_id = E.person_id and T.cohort_start_date = E.start_date and T.cohort_end_date = E.end_date
GROUP BY T.target_id, T.outcome_id, E.strata_mask
)

Fixed:

select 77 as analysis_id, T.target_id, T.outcome_id, CAST(E.strata_mask AS bigint) as bitmask,
  COUNT(subject_id) as person_count, 
  CAST(ROUND(sum(1.0 * time_at_risk / 365.25), 0) AS BIGINT) as time_at_risk,
  sum(is_case) as cases
from #time_at_risk T
JOIN (
  select E.event_id, E.person_id, E.start_date, E.end_date, SUM(coalesce(POWER(cast(2 as bigint), SC.strata_sequence), 0)) as strata_mask
  FROM #analysis_events E
  LEFT JOIN #strataCohorts SC on SC.person_id = E.person_id and SC.event_id = E.event_id
  group by E.event_id, E.person_id, E.start_date, E.end_date
) E on T.subject_id = E.person_id and T.cohort_start_date = E.start_date and T.cohort_end_date = E.end_date
GROUP BY T.target_id, T.outcome_id, E.strata_mask

In the fixed version the ROUND seems to be applied after the SUM completes, and the resulting round value is cast into bigint (to store the final result as an integer). This leads to a matching result of the ovarll cohort IR and the strata group that contains all members of the cohort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant