# Project: Performance of phenotype algorithms for the identification of opioid-exposed infants, Andrew D. Wiese et al. Hospital Pediatrics 2024
# Title: Identify birthing parents with evidence of outpatient opioid drug exposure from drug exposure data
# Summary: 
## Identify birthing parents with evidence of outpatient opioid drug exposure from drug exposure data from 30 to 2 days prior to delivery

# Notes:
- x_drug_exposure is a VUMC specific table containing extra information about drug exposures. This table was used to filter for outpatient drug exposures based on the source document, e.g. EPIC OP ORDERS. This was in addition to using the visit_occurence_id to select for outpatient drug exposures.

##### Algorithm steps:
```
1.Load person table
2.Load mom-baby table
3.Filter mom-baby table to births after 2010
4.Join filtered table to person to get mom-baby pairs
5.Load drug exposure table
6.Filter drug exposure table to moms in pairs
7.Join drug exposure with opioid search terms on drug name
8.Filter out excluded drugs from join
9.Join back to mom-baby pairs table 
10.Filter to drugs 30 days before birth
11.Join to drug exposure extra table (VUMC only)
12.Join back to original drug exposure table
13.Filter for outpatient records:
    - Has occurrence ID and visit concept ID 9202
    - No occurrence ID but visit end date:
        - Visit start <= drug start <= visit end
    - No occurrence ID, visit end date null:
        - Visit start <= drug start <= visit end + 3 days 
    - Has x_doc_type indicating an out-patient visit
14. Ouput final filtering and selected columns
```

##### Data Dictionaries:

**person**: Person dimension table  
**mom_baby_step1**: Table with mom-baby pairs and birth dates   
**drug_exposure**: Table with drug exposures for persons  
**mom_drug_search_term_list**: List of opioid search terms  

**mom_baby_step1_2010_mombabypair**: Filtered mom-baby pairs to those after 2010  
**drug_exp_mom_df**: Drug exposures for moms in pairs  
**mom_drug_info**: Joined drug exposures and search terms  
**mom_opioid_within_dob_30days**: Filtered to 30 days before birth  
**mom_opioid_within_dob_30days_extra**: Joined to drug exposure extra table  
**mom_opioid_within_dob_30days_all**: Joined back to original drug exposure  
**mom_drug_dob30days_all_doctype_op**: Final output table  

##### Usage Notes:

- This pipeline focuses on moms exposed to opioids in the 30 days before giving birth, using outpatient prescription data
- Uses a list of opioid search terms to identify relevant drugs
- Filters out certain excluded drugs not considered opioids for this analysis
- Handles different cases in defining outpatient records based on occurrence IDs and visit start/end dates
- Output table contains one row per opioid prescription for a mom in defined period before birth

In [0]:
%run "./project_modules"

##### MOM exposed to OPIOID DRUG (in table drug_exposure)
- Use the DRUG_DOC_TYPE to define the outpatient drug exposure
- Use Opioid Search Terms - phenotyping.mprint_mom_drug_search_term_v2
- Remove drug: albuterol, ultramin, opium,oms
- Time limit: Maternal opioid prescriptions within 30 days of birth 
- baby_birth_date - 30 <= drug_exposure_start_date & drug_exposure_start_date < baby_birth_date - 2
- Just need the records with outpatient doc_type (x_doc_type = 'RXSTAR' or 'EPIC OP ORDER' or 'EPIC HIST MED')

In [0]:
%sql
--- Default bin size for range join optimization for all datetime comparision
SET spark.databricks.optimizer.rangeJoin.binSize=90

##### Focus on the cohort whose birth date after 2010 (optional)

In [0]:
sql=f"""
       select b.person_id as mom_person_id,
       c.person_id as baby_person_id, 
       a.birth_datetime as baby_birth_datetime from
       (select distinct FACT_ID_1,FACT_ID_2,birth_datetime from global_temp.mom_baby_step1 where year(birth_datetime) >=2010) a
       inner join {person_table} b on a.FACT_ID_1 = b.person_id
       inner join {person_table} c on a.FACT_ID_2 = c.person_id;
      
    """
   
mom_baby_step1_2010_mombabypair = spark.sql(sql)
mom_baby_step1_2010_mombabypair.name='mom_baby_step1_2010_mombabypair'
register_parquet_global_view(mom_baby_step1_2010_mombabypair)

In [0]:
sql=f"""
     select drug_exposure_id,person_id,drug_exposure_start_date,drug_exposure_end_date,
     lower(trim(drug_source_value)) as drug_source_value,drug_source_concept_id,visit_occurrence_id,drug_concept_id,drug_exposure_start_datetime,drug_exposure_end_datetime,verbatim_end_date,drug_type_concept_id,stop_reason,refills,quantity,days_supply,sig,route_concept_id,lot_number,provider_id,route_source_value,dose_unit_source_value,x_drug_type_source_concept_id 
     from {drug_exp_table} where person_id in 
     (select mom_person_id from global_temp.mom_baby_step1_2010_mombabypair) 
    """

drugexp_mombabypair_df = spark.sql(sql)
drugexp_mombabypair_df.name='drug_exp_mom_df'
register_parquet_global_view(drugexp_mombabypair_df)

In [0]:
display(drugexp_mombabypair_df)

##### Search in table 'drug_exposure'

In [0]:
df1=spark.sql(f"select * from global_temp.drug_exp_mom_df")
df2=spark.sql(f"select * from {mom_drug_search_term_list}")
merged_df = df1.join(F.broadcast(df2), df1.drug_source_value.contains(df2["generic"]), "inner")
mom_drug_info=merged_df.filter("drug_source_value not like '%albuterol%' and drug_source_value not like '%ultramini%'")

mom_drug_info.name='mom_drug_info'
register_parquet_global_view(mom_drug_info)

##### Validation

In [0]:
display(mom_drug_info)

In [0]:
sql="""
    select count(*) as total, count(distinct person_id) as unique_patient from global_temp.mom_drug_info;
    """
inspect_df= spark.sql(sql)
inspect_df.display()

##### Time limit: Maternal opioid prescriptions within 30 days of birth

In [0]:
sql="""
       select a.*,b.mom_person_id,b.baby_person_id,b.baby_birth_datetime from global_temp.mom_drug_info a,
       global_temp.mom_baby_step1_2010_mombabypair b
       where a.person_id = b.mom_person_id
       and date_sub(baby_birth_datetime, 30)<= drug_exposure_start_date and 
       drug_exposure_start_date < date_sub(baby_birth_datetime,2);
    """

mom_opioid_within_dob_30days = spark.sql(sql)
mom_opioid_within_dob_30days.name='mom_opioid_within_dob_30days'
register_parquet_global_view(mom_opioid_within_dob_30days)

##### Validation

In [0]:
sql="""
     select count(*) as total,count(distinct person_id) as unique_mom from global_temp.mom_opioid_within_dob_30days;
    """
inspect_df= spark.sql(sql)
inspect_df.display()

##### Add the information of X_DRUG_EXPOSURE, expecially the column 'x_doc_type' (VUMC only)

In [0]:
sql=f"""
     select * from global_temp.mom_opioid_within_dob_30days a
     inner join 
     {drug_exp_table_extra} b
     using (drug_exposure_id,person_id);
    """
mom_opioid_within_dob_30days_extra= spark.sql(sql)
mom_opioid_within_dob_30days_extra.createOrReplaceTempView("mom_opioid_within_dob_30days_extra")

In [0]:
sql=f"""
     select * from mom_opioid_within_dob_30days_extra a
     inner join 
     (select drug_exposure_id,person_id from {drug_exp_table}) b
     using (drug_exposure_id,person_id);
    """
mom_opioid_within_dob_30days_all= spark.sql(sql)
mom_opioid_within_dob_30days_all.createOrReplaceTempView("mom_opioid_within_dob_30days_all")

##### Outpatients records only
##### Patient has occurrence id, the visit concept id is '9202'

In [0]:
op1=spark.sql("SELECT * FROM mom_opioid_within_dob_30days_all WHERE VISIT_OCCURRENCE_ID IS NOT NULL")
op2=spark.sql(f"SELECT * FROM {visit_table} WHERE VISIT_CONCEPT_ID = '9202'")
op1.createOrReplaceTempView("op1")
op2.createOrReplaceTempView("op2")

cond = [op1.person_id == op2.person_id, op1.visit_occurrence_id == op2.visit_occurrence_id]

mom_drug_dob_30days_all_ocurrenceid_op = op1.join(op2, cond, 'inner').drop(op2.person_id)
mom_drug_dob_30days_all_ocurrenceid_op.createOrReplaceTempView("mom_drug_dob_30days_all_ocurrenceid_op")

In [0]:
sql="""
     SELECT * FROM mom_opioid_within_dob_30days_all WHERE VISIT_OCCURRENCE_ID IS NULL;
    """

spark.sql(sql).createOrReplaceTempView("mom_drug_dob_30days_all_no_occurrenceid")

##### Records does not have occurrence id, but visit_end_date is not null
- VISIT_START_DATE <= DRUG_EXPOSURE_START_DATE and
- DRUG_EXPOSURE_START_DATE <= VISIT_END_DATE


In [0]:
sql=f"""
     SELECT * FROM 
     (
      mom_drug_dob_30days_all_no_occurrenceid A 
      JOIN 
      (
      SELECT * FROM {visit_table} WHERE person_id in 
      (select person_id from mom_drug_dob_30days_all_no_occurrenceid) and VISIT_END_DATE IS NOT NULL
      ) B 
      USING (PERSON_ID)
     ) 

      WHERE ((B.VISIT_START_DATE <= A.DRUG_EXPOSURE_START_DATE) AND (A.DRUG_EXPOSURE_START_DATE <= B.VISIT_END_DATE)
    );
    """

spark.sql(sql).createOrReplaceTempView("mom_drug_dob_30days_all_no_occurrenceid_detaill1")

##### Records do not have occurrence id, but visit_end_date is not null
- VISIT_START_DATE <= DRUG_EXPOSURE_START_DATE and 
- DRUG_EXPOSURE_START_DATE <= VISIT_END_DATE and 
- visit concept id = '9202'

In [0]:
sql=f"""
    SELECT * FROM 
    (
     mom_drug_dob_30days_all_no_occurrenceid A 
     JOIN (SELECT * FROM {visit_table} WHERE person_id in (select person_id from mom_drug_dob_30days_all_no_occurrenceid) 
     and VISIT_END_DATE IS NOT NULL and VISIT_CONCEPT_ID = '9202') B 
     USING (PERSON_ID)
    ) 
    
    WHERE 
    (((B.VISIT_START_DATE <= A.DRUG_EXPOSURE_START_DATE) AND (A.DRUG_EXPOSURE_START_DATE <= B.VISIT_END_DATE)));
    """

spark.sql(sql).createOrReplaceTempView("mom_drug_dob_30days_all_no_occurrenceid_detaill1_op")

##### Records do not have occurrence id, but visit_end_date is null
- VISIT_START_DATE <= DRUG_EXPOSURE_START_DATE AND DRUG_EXPOSURE_START_DATE <= B.VISIT_END_DATE + 3; and visit concept id = '9202'

In [0]:
sql=f""" 
    SELECT * FROM 
    (
     mom_drug_dob_30days_all_no_occurrenceid A 
     JOIN (SELECT * FROM {visit_table} WHERE person_id in (select person_id from mom_drug_dob_30days_all_no_occurrenceid) 
     and VISIT_END_DATE IS NULL and VISIT_CONCEPT_ID = '9202') B 
     USING (PERSON_ID)
    ) 
    WHERE 
    (
     ((B.VISIT_START_DATE <= A.DRUG_EXPOSURE_START_DATE) AND (A.DRUG_EXPOSURE_START_DATE <= date_add(B.VISIT_END_DATE, 3)))
    );

    """

spark.sql(sql).createOrReplaceTempView("mom_drug_dob_30days_all_no_occurrenceid_detaill2_op")

In [0]:
sql="""
    (
    (SELECT * FROM mom_drug_dob_30days_all_ocurrenceid_op) 
     UNION 
    (SELECT * FROM mom_drug_dob_30days_all_no_occurrenceid_detaill1_op)
    ) 
    UNION (SELECT * FROM mom_drug_dob_30days_all_no_occurrenceid_detaill2_op);
   """

spark.sql(sql).createOrReplaceTempView("mom_drug_dob_30days_op_all")

In [0]:

sql="""
       select drug_exposure_id,person_id,drug_concept_id,drug_exposure_start_date,drug_exposure_start_datetime,
       drug_exposure_end_date,drug_exposure_end_datetime,verbatim_end_date,drug_type_concept_id,stop_reason,
       refills, quantity,days_supply,sig,route_concept_id,lot_number,drug_source_value,drug_source_concept_id,route_source_value,dose_unit_source_value, x_drug_type_source_concept_id,generic,mom_person_id,baby_person_id,baby_birth_datetime,x_doc_type,x_doc_stype,x_dose,x_drug_form,x_strength,x_frequency 
       from mom_drug_dob_30days_op_all
       where x_doc_type = 'RXSTAR' or x_doc_type ='EPIC OP ORDER' or x_doc_type = 'EPIC HIST MED';
    """

mom_drug_dob30days_all_doctype_op=spark.sql(sql).distinct()

mom_drug_dob30days_all_doctype_op.name='mom_drug_dob30days_all_doctype_op'
register_parquet_global_view(mom_drug_dob30days_all_doctype_op)

In [0]:
df_inspection("global_temp.mom_drug_dob30days_all_doctype_op","mom")