Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Verification: Crohn's Disease Cohort #102

Open
callahantiff opened this issue Jul 22, 2019 · 13 comments
Open

Query Verification: Crohn's Disease Cohort #102

callahantiff opened this issue Jul 22, 2019 · 13 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@callahantiff
Copy link
Owner

callahantiff commented Jul 22, 2019

@mgkahn - Can you please help me verify the query to select Crohn's Disease patients?


COHORT CRITERIA
Case Criteria:
CASE TYPE 1: Crohn’s and medications only

  • dx_case_inclusion_criteria_1: Presence of >2 occurrences of relevant diagnosis codes (ICD9 codes - criteria # 1)
  • rx_case_inclusion_criteria_1: Presence of >=1 medications (drug strings - criteria # 2)
  • dx_case_exclusion_criteria_1: NOT presence of any ulcerative colitis diagnosis codes (ICD9 codes - criteria # 3)

CASE TYPE 2: Crohn’s and medications with Ulcerative Colitis Diagnosis

  • dx_case_inclusion_criteria_1: Presence of >2 occurrences of relevant diagnosis codes (ICD9 codes - criteria # 1)
  • rx_case_inclusion_criteria_1: Presence of >=1 medications (drug strings - criteria # 2)
  • dx_case_inclusion_criteria_2: Presence of >2 ulcerative colitis diagnosis codes (ICD9 codes - criteria # 4)

Control Criteria:

  • None of the above criteria:
    • dx_control_exclusion_criteria_1: NOT presence of >2 occurrences of relevant codes (ICD9 codes - criteria # 5)
    • rx_control_exclusion_criteria_1: NOT presence of >=1 medications (drug strings - criteria # 6)
    • dx_control_exclusion_criteria_2: NOT presence of >2 ulcerative colitis diagnosis codes (ICD9 codes - criteria # 7)
  • dx_control_exclusion_criteria_3: NOT presence of ANY exclusionary diagnosis code (ICD9 codes - criteria # 8)
  • dx_control_exclusion_criteria_4: NOT presence of ANY diagnosis keywords (condition strings - criteria # 9)


Cohort Logic Table

COHORT CHUNK LOGICAL OPERATOR
CASE TYPE 1 dx_case_inclusion_criteria_1 AND
CASE TYPE 1 rx_case_inclusion_criteria_1 AND
CASE TYPE 1 dx_case_exclusion_criteria_1 ---
CASE TYPE 2 dx_case_inclusion_criteria_1 AND
CASE TYPE 2 rx_case_inclusion_criteria_1 AND
CASE TYPE 2 dx_case_inclusion_criteria_2 ---
CONTROL dx_control_exclusion_criteria_1 AND
CONTROL rx_control_exclusion_criteria_1 AND
CONTROL dx_control_exclusion_criteria_2 AND
CONTROL dx_control_exclusion_criteria_3 AND
CONTROL dx_control_exclusion_criteria_4 ---

NOTE.

  1. Replace all instances of {database} with CHCO_DeID_Oct2018
  2. Comment out all instances of {code_set_group}

Query can be found here and is also included below:

WITH dx_case_inclusion_criteria_1 AS (
SELECT co.person_id, cohort.standard_code_set AS code_set 
FROM 
  {database}.condition_occurrence co, 
  {database}.CROHNSDISEASE_COHORT_VARS cohort 
WHERE 
  co.condition_concept_id = CAST(cohort.standard_concept_id AS int64) 
  AND cohort.phenotype_definition_number = 1
  AND cohort.standard_code_set = {code_set_group}
GROUP BY 
  co.person_id, cohort.standard_code_set 
HAVING 
  COUNT(DISTINCT co.condition_concept_id) > 2
),

rx_case_inclusion_criteria_1 AS (
SELECT de.person_id, cohort.standard_code_set AS code_set 
FROM 
  {database}.drug_exposure de, 
  {database}.CROHNSDISEASE_COHORT_VARS cohort 
WHERE 
  de.drug_concept_id = CAST(cohort.standard_concept_id AS int64) 
  AND cohort.phenotype_definition_number = 2 
  AND cohort.standard_code_set = {code_set_group}
GROUP BY 
  de.person_id, cohort.standard_code_set 
HAVING 
  COUNT(DISTINCT de.drug_concept_id) >= 1
),

dx_case_exclusion_criteria_1 AS (
SELECT co.person_id, cohort.standard_code_set AS code_set
FROM 
  {database}.condition_occurrence co,
  {database}.CROHNSDISEASE_COHORT_VARS cohort
WHERE 
  co.condition_concept_id = CAST(cohort.standard_concept_id AS int64)
  AND cohort.phenotype_definition_number = 3
  AND cohort.standard_code_set = {code_set_group}
GROUP BY co.person_id, cohort.standard_code_set
HAVING 
  COUNT(DISTINCT co.condition_concept_id) >= 1
),

dx_case_inclusion_criteria_2 AS (
SELECT co.person_id, cohort.standard_code_set AS code_set 
FROM 
  {database}.condition_occurrence co, 
  {database}.CROHNSDISEASE_COHORT_VARS cohort 
WHERE 
  co.condition_concept_id = CAST(cohort.standard_concept_id AS int64) 
  AND cohort.phenotype_definition_number = 4
  AND cohort.standard_code_set = {code_set_group}
GROUP BY 
  co.person_id, cohort.standard_code_set 
HAVING 
  COUNT(DISTINCT co.condition_concept_id) > 2
),

dx_control_exclusion_criteria_1 AS (
SELECT co.person_id, cohort.standard_code_set AS code_set 
FROM 
  {database}.condition_occurrence co, 
  {database}.CROHNSDISEASE_COHORT_VARS cohort 
WHERE 
  co.condition_concept_id = CAST(cohort.standard_concept_id AS int64) 
  AND cohort.phenotype_definition_number = 5
  AND cohort.standard_code_set = {code_set_group}
GROUP BY 
  co.person_id, cohort.standard_code_set 
HAVING 
  COUNT(DISTINCT co.condition_concept_id) > 2
),

rx_control_exclusion_criteria_1 AS (
SELECT de.person_id, cohort.standard_code_set AS code_set 
FROM 
  {database}.drug_exposure de, 
  {database}.CROHNSDISEASE_COHORT_VARS cohort 
WHERE 
  de.drug_concept_id = CAST(cohort.standard_concept_id AS int64) 
  AND cohort.phenotype_definition_number = 6
  AND cohort.standard_code_set = {code_set_group}
GROUP BY 
  de.person_id, cohort.standard_code_set 
HAVING 
  COUNT(DISTINCT de.drug_concept_id) >= 1
),

dx_control_exclusion_criteria_2 AS (
SELECT co.person_id, cohort.standard_code_set AS code_set 
FROM 
  {database}.condition_occurrence co, 
  {database}.CROHNSDISEASE_COHORT_VARS cohort 
WHERE 
  co.condition_concept_id = CAST(cohort.standard_concept_id AS int64) 
  AND cohort.phenotype_definition_number = 7
  AND cohort.standard_code_set = {code_set_group}
GROUP BY 
  co.person_id, cohort.standard_code_set 
HAVING 
  COUNT(DISTINCT co.condition_concept_id) > 2
),

dx_control_exclusion_criteria_3 AS (
SELECT co.person_id, cohort.standard_code_set AS code_set 
FROM 
  {database}.condition_occurrence co, 
  {database}.CROHNSDISEASE_COHORT_VARS cohort 
WHERE 
  co.condition_concept_id = CAST(cohort.standard_concept_id AS int64) 
  AND cohort.phenotype_definition_number = 8
  AND cohort.standard_code_set = {code_set_group}
GROUP BY 
  co.person_id, cohort.standard_code_set 
HAVING 
  COUNT(DISTINCT co.condition_concept_id) >= 1
),

dx_control_exclusion_criteria_4 AS (
SELECT co.person_id, cohort.standard_code_set AS code_set 
FROM 
  {database}.condition_occurrence co, 
  {database}.CROHNSDISEASE_COHORT_VARS cohort 
WHERE 
  co.condition_concept_id = CAST(cohort.standard_concept_id AS int64) 
  AND cohort.phenotype_definition_number = 9
  AND cohort.standard_code_set = {code_set_group}
GROUP BY 
  co.person_id, cohort.standard_code_set 
HAVING 
  COUNT(DISTINCT co.condition_concept_id) >= 1
)

SELECT * FROM
  (SELECT person_id, code_set, 'CROHNSDISEASE_CASE_TYPE1' AS cohort_type
    FROM (
    (SELECT person_id, code_set FROM dx_case_inclusion_criteria_1
      INTERSECT DISTINCT 
     SELECT person_id, code_set FROM rx_case_inclusion_criteria_1)
      EXCEPT DISTINCT
     SELECT person_id, code_set FROM dx_case_exclusion_criteria_1)
   GROUP BY person_id, code_set, cohort_type)

  UNION ALL

  (SELECT person_id, code_set, 'CROHNSDISEASE_CASE_TYPE2' AS cohort_type
    FROM (
     SELECT person_id, code_set FROM dx_case_inclusion_criteria_1
      INTERSECT DISTINCT 
     SELECT person_id, code_set FROM rx_case_inclusion_criteria_1
      INTERSECT DISTINCT 
     SELECT person_id, code_set FROM dx_case_inclusion_criteria_2)
   GROUP BY person_id, code_set, cohort_type)

  UNION ALL
  (SELECT p.person_id, {code_set_group} AS code_set, 'CROHNSDISEASE_CONTROL' AS cohort_type
    FROM {database}.person p
    WHERE p.person_id NOT IN (
      SELECT person_id FROM dx_control_exclusion_criteria_1
        INTERSECT DISTINCT
       SELECT person_id FROM rx_control_exclusion_criteria_1
        INTERSECT DISTINCT
       SELECT person_id FROM dx_control_exclusion_criteria_2)
        INTERSECT DISTINCT
       SELECT person_id FROM dx_control_exclusion_criteria_3
        INTERSECT DISTINCT
       SELECT person_id FROM dx_control_exclusion_criteria_4)
  GROUP BY p.person_id, code_set, cohort_type)
;
@callahantiff callahantiff added the help wanted Extra attention is needed label Jul 22, 2019
@callahantiff callahantiff self-assigned this Jul 22, 2019
@callahantiff callahantiff added this to Needed Scripts in Coding Tasks via automation Jul 22, 2019
@mgkahn
Copy link
Collaborator

mgkahn commented Aug 6, 2019

Edited to reflect "three or more" (>2) not 2.

dx_case_inclusion_criteria_1: The SQL is written as if the criteria required the 3 or more occurrences to be different inclusion diagnoses. Another interpretation of the same English is that the same/different inclusion diagnosis appears in three different encounters. If the second sentence is the intended interpretation, then you need to bring in three visit_occurrences (v1, v2,v3), make sure then are not equal (e.g. V1< V2, V2 < V3) and look at their linked diagnoses independently as I sketch out with a previous phenotype that needed two visits 30 days apart.

@mgkahn
Copy link
Collaborator

mgkahn commented Aug 6, 2019

dx_case_exclusion_criteria_1: I do not see justification for
HAVING
COUNT(DISTINCT co.condition_concept_id) > 1

The English ("any dx") implies that any instant of an exclusion criteria is qualifying. If so, a count >= 1 (including the equals) is correct.

@mgkahn
Copy link
Collaborator

mgkahn commented Aug 6, 2019

dx_case_inclusion_criteria_2
Same question as previous. The current query looks for two distinct UC codes rather than the same UC code at two different visits. Make sure the query is the intended semantics of the English.

CASE TYPE 2 is a confusing case definition. CASE TYPE 1 explicitly removes patients with any UC diagnosis yet CASE TYPE 2 explicitly puts them back in. Between these two definitions, the only UC cases not included have only 1 UC Dx.

Is this the intended behavior of the combination of the two inclusion types? Seems so by the text after "CASE TYPE 2" but weird.

@mgkahn
Copy link
Collaborator

mgkahn commented Aug 6, 2019

If I followed the logic for controls correctly, the query would allow a person to be declared a control if they had 0-1 relevant codes, 1 relevant medication, 0-1 ulcerative colitis codes and none of the exclusion_criteria_3 and _4 codes. Without looking at what is in exclusion_criteria_3 and _4 codes, this logic would allow somebody who had some "hints" about Crohn's and UC to be included.

@mgkahn
Copy link
Collaborator

mgkahn commented Aug 6, 2019

End of comments for Crohn's

@callahantiff
Copy link
Owner Author

Edited to reflect "three or more" (>2) not 2.

dx_case_inclusion_criteria_1: The SQL is written as if the criteria required the 3 or more occurrences to be different inclusion diagnoses. Another interpretation of the same English is that the same/different inclusion diagnosis appears in three different encounters. If the second sentence is the intended interpretation, then you need to bring in three visit_occurrences (v1, v2,v3), make sure then are not equal (e.g. V1< V2, V2 < V3) and look at their linked diagnoses independently as I sketch out with a previous phenotype that needed two visits 30 days apart.

This is the only information we are given:

image

Which, means my initial line of >2 was OK, right?

@callahantiff
Copy link
Owner Author

dx_case_exclusion_criteria_1: I do not see justification for
HAVING
COUNT(DISTINCT co.condition_concept_id) > 1

The English ("any dx") implies that any instant of an exclusion criteria is qualifying. If so, a count >= 1 (including the equals) is correct.

OK, I agree with that. In that case, I think I need to verify all exclusion criteria for "any occurrence" and make sure it matches this.

@callahantiff
Copy link
Owner Author

dx_case_inclusion_criteria_2
Same question as previous. The current query looks for two distinct UC codes rather than the same UC code at two different visits. Make sure the query is the intended semantics of the English.

CASE TYPE 2 is a confusing case definition. CASE TYPE 1 explicitly removes patients with any UC diagnosis yet CASE TYPE 2 explicitly puts them back in. Between these two definitions, the only UC cases not included have only 1 UC Dx.

Is this the intended behavior of the combination of the two inclusion types? Seems so by the text after "CASE TYPE 2" but weird.

Here is the information I am provided:

Screen Shot 2019-08-06 at 23 06 49

Going off of this, like the first example, I believe it is represented correctly. If you still disagree, I'm happy to discuss further. I tried hard to only go off of what the text indicated and not include additional knowledge as that should represent what an "average" person might interpret. Or, at least that was my goal in doing this in the most reproducible manner.

@callahantiff
Copy link
Owner Author

If I followed the logic for controls correctly, the query would allow a person to be declared a control if they had 0-1 relevant codes, 1 relevant medication, 0-1 ulcerative colitis codes and none of the exclusion_criteria_3 and _4 codes. Without looking at what is in exclusion_criteria_3 and _4 codes, this logic would allow somebody who had some "hints" about Crohn's and UC to be included.

So, for the control criteria, the definition doc states:

image

Where "none of the above" refers to the case criteria and the "also excludes" is an additional list of ICD codes mentioned in dx_control_exclusion_criteria_3 and keywords I have included in dx_control_exclusion_criteria_4. Given this, do you agree with the way I have written the control query?

The full doc is here if you want more than than what I have included in this and the other issues.

@mgkahn
Copy link
Collaborator

mgkahn commented Aug 7, 2019

So weird..... Do they provide any examples of implementations of this phenotype on the web site so we can see how others have implemented these criteria?

@callahantiff
Copy link
Owner Author

So weird..... Do they provide any examples of implementations of this phenotype on the web site so we can see how others have implemented these criteria?

This goes back to a fun conversation we had at very beginning of this project 😄. In general, most of the phenotypes we are exploring have been implemented by several universities, but the results of these implantations are not publicly available, although you can gain access to them if you create an account.

All that said, the point of this project (in my opinion) is to demonstrate what an average, unbiased approach built on top of OMOP looks like. While I understand wanting to go see how others did it, we also know others have made mistakes and how are we to know you is "right". Further, if we peek for this phenotype, then we would need to do that for the others too otherwise we are introducing more bias. I'd prefer to write a paper that shows a realistic perspective on what it's like to implement eMERGE phenotypes. I strongly believe that it's the only way we will be able to understand how these things can be automated in the future.

@mgkahn
Copy link
Collaborator

mgkahn commented Aug 11, 2019

NEW COMMENT: Logic for exclusion criteria is wrong. You need an OR (UNION) rather than an AND. Current logics says that an exclusion needs to meet all exclusion criteria. My understanding is that you are excluded if you meet ANY exclusion criteria. If so, then an OR (UNION) is the right logic in your NOT IN subquery.

@callahantiff
Copy link
Owner Author

NEW COMMENT: Logic for exclusion criteria is wrong. You need an OR (UNION) rather than an AND. Current logics says that an exclusion needs to meet all exclusion criteria. My understanding is that you are excluded if you meet ANY exclusion criteria. If so, then an OR (UNION) is the right logic in your NOT IN subquery.

Thanks for bringing this up, this was something that I went back and forth on too. The criteria that we were provided says that the control group can contain "none" of the case group criteria as well as some other criteria. I interpreted that as meaning they wanted to identify people that were the opposite of the case group in that they should have none of the criteria that was used to select cases. Do you disagree? I am happy to do whatever you think is best, but wanted to explain why I chose "AND" and not "OR".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
Coding Tasks
  
Needed Scripts
Development

No branches or pull requests

2 participants