Documentation on cohort stats tables #79

ablack3 · 2023-02-22T15:49:36Z

I'm not aware of any documentation that describes how to interpret the stats/attrition tables created during cohort generation.

@catalamarti decoded the inclusion_result table:

This table contains: 4 columns: cohort_definition_id, inclusion_rule_mask, person_count, mode_id

cohort_definition_id refers to the cohort.
person_count number of persons that fulfill the inclusion_rule_mask condition
mode_id is 0 to refer to events and 1 to refer to subjects
inclusion_rule_mask defines the set of inclusion rules to associate with each row of the table

inclusion_rule_mask interpretation:

Each inclusion contributes 2^(inclusion_id) possible subsets of the cohort and the inclusion_rule_mask is the sum of the contributions. Each subset is a combination of inclusion rules.

Example:
Let's say we have 3 inclusion rules (inclusion rule 0, inclusion rule 1, and inclusion rule 2).
The first inclusion will contribute 2^0=1, the second 2^1=2, and the third inclusion: 2^2=4.
So, for example, individuals that fulfill the third and second conditions, but not the first will be recorded in inclusion_rule_mask = 2 + 4 = 6.

See the table below for all combinations for the three rules example:

inclusion 0   inclusion 1   inclusion 2   inclusion_rule_mask
  no            no            no            0
  yes           no            no            1
  no            yes           no            2
  yes           yes           no            3
  no            no            yes           4
  yes           no            yes           5
  no            yes           yes           6
  yes           yes           yes           7

In this case, we will build our attrition as:
all qualifying initial events: 0+1+2+3+4+5+6+7
satisfy inclusion 0: 1+3+5+7
satisfy inclusion 0 and 1: 3+7
satisfy inclusion 0, 1, and 2: 7

Can we add this to the CohortGenerator vignette or put it somewhere else?

The text was updated successfully, but these errors were encountered:

anthonysena · 2023-02-28T15:58:27Z

Thanks @ablack3 for raising this issue and agree this would be useful to document, either in this package or in circe-be. Tagging @chrisknoll since I am unsure what resources (beyond your write up above) exist. If we could link to those resource(s) in this issue, we could then add it to the CohortGenerator package (or link to it from the CG package).

chrisknoll · 2023-03-20T18:27:09Z

Hi, everyone,
Sorry for the late reply here, just wanted to clarify something:

In this case, we will build our attrition as:
all qualifying initial events: 0+1+2+3+4+5+6+7
satisfy inclusion 0: 1+3+5+7
satisfy inclusion 0 and 1: 3+7
satisfy inclusion 0, 1, and 2: 7

The 'all qualifying' correct, in that if you want to know the count of people who had entry events, but you would just sum up all the rows (including the 0 row, we record number of people that matched 0 rules). I was a little confused when @ablack3 described it as 0+1+2+3+4+5+6+7, but that's all the combinations of 3 inclusion rules, so that's technically correct, there's just a simpler implementation: sum up the counts in inclusion_result.

To find the rows that match certain inclusion rules, you would use a binary operator & to see if the number from the inclurion_rule_mask column matches the inclusion rules you want to test. So, satisfy inclusion 0 means is the first bit (2^0 = 1) set? To find out, you would do inclusion_rule_mask & 1 = 1, In this case any number > 0 would indicate that flag is set, however, in the multi-bit test, it becomes more clear why you do this:

'satisfy inclusion 0 and 1' means that the bits you are looking for is 2^0 + 2^1 = 1+2 = 3 (same as inclusion_rule_mask from the above table). To find the rows that have those 2 bits set: inclusion_rule_mask & 3 = 3. Why the =3? because if you tested a row where inclusion_rule_mask was 1, the above bitwise-and would have 1 & 3 = 1 (ie: the 1 bit of the 3 is set)....what you want to ensure is the bits you are testing result in the same value as the bitwise-and. 3 & 3 = 3, 5 & 3 = 3, 7 & 3 = 3. The other rows are: 1 & 3 = 1, 2 & 3 = 2, 4 & 3 = 0, 6 & 3 = 2. Note how you get a > 0 number, but not the number you are trying to test for. But anything that is not 3 is not matching on the first AND second rule.

I hope this clarifies things, I was personally a little confused when I read "satisfy inclusion 0: 1+3+5+7", but I now understand that to mean you add up the rows where inclusion rule mask = 1 or 3 or 5 or 7. Mechanically, that is filter(inclusion_rule_mask & 1 = 1) %>% sum(person_count) (in R dplyr pseudocode :) )

ablack3 · 2023-03-20T19:05:37Z

Thanks @chrisknoll. This would be really helpful to add to a vignette or some other documentation. So actually I have to give @catalamarti credit for decoding this. I just read over it and posted his description here.

pa-nathaniel · 2023-09-01T12:20:50Z

Jumping in this thread as I got here while trying to figure out how to obtain a cohort attrition table from cohorts created by CohortGenerator::generateCohortSet().

Essentially we're trying to figure how to generate a table, where each row represents an inclusion criteria, with a column that shows the number of persons remaining in the cohort after the application of the inclusion criteria.

Have there been updates to the documentation here on how to create this?

FYI I also posted a similar question in https://forums.ohdsi.org/t/how-to-get-attrition-table-in-hades/19746.

chrisknoll · 2023-09-01T15:57:05Z

Hi, I think there has been a request to provide a function that can read the cohort generation stats tables (that store the individual inclusion rule matches, and the combination of inclusion rules described above).

I've seen implementations of building this attrition table in JavaScript (Atlas does it this way) but also we've done it internally using R code (I believe @gowthamrao and Joel Swerdel have implemented this). I think it would make sense to expose a CohortGeneratior function to read the results int he generation stats tables and produce attrition tables, and I'm happy to help make a PR to implement this feature.

pa-nathaniel · 2023-09-01T21:20:48Z

Thanks @chrisknoll ! We're trying right now to come up with something (will share if we get it right), but until then would love to see what you and others have come up with.

anthonysena · 2023-10-18T16:20:12Z

Relating this to #123 even though they are a bit different but the documentation should cover both approaches.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation on cohort stats tables #79

Documentation on cohort stats tables #79

ablack3 commented Feb 22, 2023 •

edited

anthonysena commented Feb 28, 2023

chrisknoll commented Mar 20, 2023 •

edited

ablack3 commented Mar 20, 2023

pa-nathaniel commented Sep 1, 2023

chrisknoll commented Sep 1, 2023 •

edited

pa-nathaniel commented Sep 1, 2023

anthonysena commented Oct 18, 2023

Documentation on cohort stats tables #79

Documentation on cohort stats tables #79

Comments

ablack3 commented Feb 22, 2023 • edited

anthonysena commented Feb 28, 2023

chrisknoll commented Mar 20, 2023 • edited

ablack3 commented Mar 20, 2023

pa-nathaniel commented Sep 1, 2023

chrisknoll commented Sep 1, 2023 • edited

pa-nathaniel commented Sep 1, 2023

anthonysena commented Oct 18, 2023

ablack3 commented Feb 22, 2023 •

edited

chrisknoll commented Mar 20, 2023 •

edited

chrisknoll commented Sep 1, 2023 •

edited