Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation on cohort stats tables #79

Open
ablack3 opened this issue Feb 22, 2023 · 7 comments
Open

Documentation on cohort stats tables #79

ablack3 opened this issue Feb 22, 2023 · 7 comments

Comments

@ablack3
Copy link
Contributor

ablack3 commented Feb 22, 2023

I'm not aware of any documentation that describes how to interpret the stats/attrition tables created during cohort generation.

@catalamarti decoded the inclusion_result table:

This table contains: 4 columns: cohort_definition_id, inclusion_rule_mask, person_count, mode_id

  • cohort_definition_id refers to the cohort.
  • person_count number of persons that fulfill the inclusion_rule_mask condition
  • mode_id is 0 to refer to events and 1 to refer to subjects
  • inclusion_rule_mask defines the set of inclusion rules to associate with each row of the table

inclusion_rule_mask interpretation:

Each inclusion contributes 2^(inclusion_id) possible subsets of the cohort and the inclusion_rule_mask is the sum of the contributions. Each subset is a combination of inclusion rules.

Example:
Let's say we have 3 inclusion rules (inclusion rule 0, inclusion rule 1, and inclusion rule 2).
The first inclusion will contribute 2^0=1, the second 2^1=2, and the third inclusion: 2^2=4.
So, for example, individuals that fulfill the third and second conditions, but not the first will be recorded in inclusion_rule_mask = 2 + 4 = 6.

See the table below for all combinations for the three rules example:

inclusion 0   inclusion 1   inclusion 2   inclusion_rule_mask
  no            no            no            0
  yes           no            no            1
  no            yes           no            2
  yes           yes           no            3
  no            no            yes           4
  yes           no            yes           5
  no            yes           yes           6
  yes           yes           yes           7

In this case, we will build our attrition as:
all qualifying initial events: 0+1+2+3+4+5+6+7
satisfy inclusion 0: 1+3+5+7
satisfy inclusion 0 and 1: 3+7
satisfy inclusion 0, 1, and 2: 7

Can we add this to the CohortGenerator vignette or put it somewhere else?

@anthonysena
Copy link
Collaborator

Thanks @ablack3 for raising this issue and agree this would be useful to document, either in this package or in circe-be. Tagging @chrisknoll since I am unsure what resources (beyond your write up above) exist. If we could link to those resource(s) in this issue, we could then add it to the CohortGenerator package (or link to it from the CG package).

@chrisknoll
Copy link

chrisknoll commented Mar 20, 2023

Hi, everyone,
Sorry for the late reply here, just wanted to clarify something:

In this case, we will build our attrition as:
all qualifying initial events: 0+1+2+3+4+5+6+7
satisfy inclusion 0: 1+3+5+7
satisfy inclusion 0 and 1: 3+7
satisfy inclusion 0, 1, and 2: 7

The 'all qualifying' correct, in that if you want to know the count of people who had entry events, but you would just sum up all the rows (including the 0 row, we record number of people that matched 0 rules). I was a little confused when @ablack3 described it as 0+1+2+3+4+5+6+7, but that's all the combinations of 3 inclusion rules, so that's technically correct, there's just a simpler implementation: sum up the counts in inclusion_result.

To find the rows that match certain inclusion rules, you would use a binary operator & to see if the number from the inclurion_rule_mask column matches the inclusion rules you want to test. So, satisfy inclusion 0 means is the first bit (2^0 = 1) set? To find out, you would do inclusion_rule_mask & 1 = 1, In this case any number > 0 would indicate that flag is set, however, in the multi-bit test, it becomes more clear why you do this:

'satisfy inclusion 0 and 1' means that the bits you are looking for is 2^0 + 2^1 = 1+2 = 3 (same as inclusion_rule_mask from the above table). To find the rows that have those 2 bits set: inclusion_rule_mask & 3 = 3. Why the =3? because if you tested a row where inclusion_rule_mask was 1, the above bitwise-and would have 1 & 3 = 1 (ie: the 1 bit of the 3 is set)....what you want to ensure is the bits you are testing result in the same value as the bitwise-and. 3 & 3 = 3, 5 & 3 = 3, 7 & 3 = 3. The other rows are: 1 & 3 = 1, 2 & 3 = 2, 4 & 3 = 0, 6 & 3 = 2. Note how you get a > 0 number, but not the number you are trying to test for. But anything that is not 3 is not matching on the first AND second rule.

I hope this clarifies things, I was personally a little confused when I read "satisfy inclusion 0: 1+3+5+7", but I now understand that to mean you add up the rows where inclusion rule mask = 1 or 3 or 5 or 7. Mechanically, that is filter(inclusion_rule_mask & 1 = 1) %>% sum(person_count) (in R dplyr pseudocode :) )

@ablack3
Copy link
Contributor Author

ablack3 commented Mar 20, 2023

Thanks @chrisknoll. This would be really helpful to add to a vignette or some other documentation. So actually I have to give @catalamarti credit for decoding this. I just read over it and posted his description here.

@pa-nathaniel
Copy link

Jumping in this thread as I got here while trying to figure out how to obtain a cohort attrition table from cohorts created by CohortGenerator::generateCohortSet().

Essentially we're trying to figure how to generate a table, where each row represents an inclusion criteria, with a column that shows the number of persons remaining in the cohort after the application of the inclusion criteria.

Have there been updates to the documentation here on how to create this?

FYI I also posted a similar question in https://forums.ohdsi.org/t/how-to-get-attrition-table-in-hades/19746.

@chrisknoll
Copy link

chrisknoll commented Sep 1, 2023

Hi, I think there has been a request to provide a function that can read the cohort generation stats tables (that store the individual inclusion rule matches, and the combination of inclusion rules described above).

I've seen implementations of building this attrition table in JavaScript (Atlas does it this way) but also we've done it internally using R code (I believe @gowthamrao and Joel Swerdel have implemented this). I think it would make sense to expose a CohortGeneratior function to read the results int he generation stats tables and produce attrition tables, and I'm happy to help make a PR to implement this feature.

@pa-nathaniel
Copy link

Thanks @chrisknoll ! We're trying right now to come up with something (will share if we get it right), but until then would love to see what you and others have come up with.

@anthonysena
Copy link
Collaborator

Relating this to #123 even though they are a bit different but the documentation should cover both approaches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants