# Project: Performance of phenotype algorithms for the identification of opioid-exposed infants, Andrew D. Wiese et al. Hospital Pediatrics 2024
# Title: Create final output table from phenotype algorithm 
# Summary: 
## Create final output table from phenotype algorithm




##### Algorithm steps:
```
1. FUNCTION get_2010_mombabypair_df(source_df_name):
  GET mom-baby pairs from source dataframe where year >= 2010
  JOIN with person table to get extra columns
  RETURN dataframe

2. CREATE dataframe by calling get_2010_mombabypair_df on live birth code source dataframe

3. CREATE dataframe by calling get_2010_mombabypair_df on critical illness source dataframe 

4. CREATE dataframe by calling get_2010_mombabypair_df on respiratory procedure code source dataframe

5. CREATE dataframe by calling get_2010_mombabypair_df on fetal anomalies source dataframe

6. DEFINE SQL query to join base dataframe with all other dataframes
    BASE dataframe is 2010 mom-baby pairs
    LEFT JOIN with each indicator dataframe
    CASE statements to generate indicator columns  

7. RUN SQL query to create final dataframe

8. SAVE final dataframe to table
```

##### Data Dictionaries

**mom_baby_step1_2010_mombabypair:** Contains mom-baby pairs after 2010 filtered from step 1 source dataframe 

**mom_baby_step1_birth_code_2010_mombabypair:** Contains mom-baby pairs after 2010 filtered from live birth code dataframe

**mom_baby_step3_critical4cpt_2010_mombabypair:** Contains mom-baby pairs after 2010 filtered from critical illness dataframe 

**mom_baby_step3_respiratory_2010_mombabypair:** Contains mom-baby pairs after 2010 filtered from respiratory procedure code dataframe

**mom_baby_step3_fetal_anomalies_2010_mombabypair:** Contains mom-baby pairs after 2010 filtered from fetal anomalies dataframe

**phenotype_output.mprint_cohort:** Final output table containing all mom-baby pairs after 2010 with indicator columns joined from different source dataframes

##### Usage Notes
```
- The `get_2010_mombabypair_df` function can be reused to filter each source dataframe to only records after 2010
- The final dataframe joins all of the filtered dataframes and adds indicator columns based on whether a mom-baby pair exists in each source dataframe
- The output table contains the full set of mom-baby pairs after 2010 with all associated indicators
```

In [0]:
%run "./project_modules"

##### Mom baby pairs records after 2010 (optional)

In [0]:
def get_2010_mombabypair_df(source_df_name):
    
   sql=f"""
         select mom_person_id,b.person_source_value as mom_person_source_value,
         year(a.birth_datetime)-year(b.birth_datetime) as age_at_delivery, b.birth_datetime as mom_birth_datetime, 
         b.gender_source_value as mom_gender,b.race_source_value as mom_race,
         baby_person_id,baby_person_source_value, a.birth_datetime as baby_birth_datetime, 
         c.gender_source_value as baby_gender,c.race_source_value as baby_race from 

        (select distinct mom_person_id,baby_person_id,baby_person_source_value,birth_datetime from {source_df_name} where 
        year(birth_datetime) >=2010) a
        inner join {person_table} b on a.mom_person_id = b.person_id
        inner join {person_table} c on a.baby_person_id = c.person_id;
      """

   df= spark.sql(sql)
   return df

##### live_birth_code mom baby pair

In [0]:
mom_baby_step1_birth_code_2010_mombabypair=get_2010_mombabypair_df("global_temp.mom_baby_step1_birth_code")
mom_baby_step1_birth_code_2010_mombabypair.createOrReplaceTempView("mom_baby_step1_birth_code_2010_mombabypair") 

In [0]:
df_inspection("mom_baby_step1_birth_code_2010_mombabypair","all")

##### Critical illness 4CPT mom baby pair

In [0]:
mom_baby_step3_critical4cpt_2010_mombabypair=get_2010_mombabypair_df("global_temp.mom_baby_step3_critical4cpt")
mom_baby_step3_critical4cpt_2010_mombabypair.createOrReplaceTempView("mom_baby_step3_critical4cpt_2010_mombabypair")

In [0]:
df_inspection("mom_baby_step3_critical4cpt_2010_mombabypair","all")

##### Respiratory procedure code  mom baby pair


In [0]:
mom_baby_step3_respiratory_2010_mombabypair=get_2010_mombabypair_df("global_temp.mom_baby_step3_respiratory")
mom_baby_step3_respiratory_2010_mombabypair.createOrReplaceTempView("mom_baby_step3_respiratory_2010_mombabypair")

In [0]:
df_inspection("mom_baby_step3_respiratory_2010_mombabypair","all")

##### Fetal anomalies at birth mom baby pair

In [0]:
mom_baby_step3_fetal_anomalies_2010_mombabypair=get_2010_mombabypair_df("global_temp.mom_baby_step3_fetal_anomalies")
mom_baby_step3_fetal_anomalies_2010_mombabypair.createOrReplaceTempView("mom_baby_step3_fetal_anomalies_2010_mombabypair")

In [0]:
df_inspection("mom_baby_step3_fetal_anomalies_2010_mombabypair","all")

##### Combine all stages' output


In [0]:
sql="""
select
  a.*,
  --select a.*,g.value_as_string as gestational_age,
  case
    when b.baby_person_id is not null then 1
    else 0
  end as live_birth_code,
  case
    when d.baby_person_id is not null then 1
    else 0
  end as gestational_age_w33_or_uncertain,
  case
    when e.baby_person_id is not null then 1
    else 0
  end as critical_illness_4cpt,
  case
    when f.baby_person_id is not null then 1
    else 0
  end as respiratory_procedure_code,
  case
    when g.baby_person_id is not null then 1
    else 0
  end as fetal_anomalies_code,
  case
    when i.baby_person_id is not null then 1
    else 0
  end as gestational_age_uncertain,
  case
    when j.baby_person_id is not null then 1
    else 0
  end as nows_baby_code,
  case
    when k.person_id is not null then 1
    else 0
  end as infant_tox_lab,
  case
    when l.baby_person_id is not null then 1
    else 0
  end as mom_oud,
  case
    when m.baby_person_id is not null then 1
    else 0
  end as mom_oud_inpatient,
  case
    when n.baby_person_id is not null then 1
    else 0
  end as mom_oud_outpatient,
  case
    when o.baby_person_id is not null then 1
    else 0
  end as mom_drug,
  case
    when p.fact_id_2 is not null then 1
    else 0
  end as mom_drug_in_note,
  case
    when q.baby_person_id is not null then 1
    else 0
  end as mom_opioid_tox,
  case
    when r.baby_person_id is not null then 1
    else 0
  end as baby_1st_visit_problem
from
  global_temp.mom_baby_step1_2010_mombabypair a
  left join mom_baby_step1_birth_code_2010_mombabypair b on a.mom_person_id = b.mom_person_id
  and a.baby_person_id = b.baby_person_id
  left join global_temp.ega_w33_or_uncertain_gestation_date d
  on a.mom_person_id = d.mom_person_id
  and a.baby_person_id = d.baby_person_id
  left join mom_baby_step3_critical4cpt_2010_mombabypair e on a.mom_person_id = e.mom_person_id
  and a.baby_person_id = e.baby_person_id
  left join mom_baby_step3_respiratory_2010_mombabypair f on a.mom_person_id = f.mom_person_id
  and a.baby_person_id = f.baby_person_id
  left join mom_baby_step3_fetal_anomalies_2010_mombabypair g on a.mom_person_id = g.mom_person_id
  and a.baby_person_id = g.baby_person_id
  left join (
    select
      *
    from
      global_temp.ega_w33_or_uncertain_gestation_date
    where
      weeks = 0
  ) i on a.mom_person_id = i.mom_person_id
  and a.baby_person_id = i.baby_person_id
  left join (
    select
      distinct baby_person_id
    from
      global_temp.mom_baby_step4_nowsbaby
  ) j on a.baby_person_id = j.baby_person_id
  left join (
    select
      distinct person_id
    from
      global_temp.infant_tox_lab
  ) k on a.baby_person_id = k.person_id
  left join (
    select
      distinct baby_person_id
    from
      global_temp.mom_baby_step4_mom_oud_update
  ) l on a.baby_person_id = l.baby_person_id
  left join global_temp.mom_baby_step4_mom_oud_ip1_update m on a.baby_person_id = m.baby_person_id
  left join global_temp.mom_baby_step4_mom_oud_op2_update n on a.baby_person_id = n.baby_person_id
  left join (
    select
      distinct baby_person_id
    from
      global_temp.mom_drug_dob30days_all_doctype_op
  ) o on a.baby_person_id = o.baby_person_id
  left join (
    select
      distinct FACT_ID_2
    from
      global_temp.search_opioid_terms_cleaned_medicationlist
  ) p on a.baby_person_id = p.FACT_ID_2
  left join (
    select
      distinct baby_person_id
    from
      global_temp.mom_baby_step1_matopioidtoxicology_all_birthhospital
  ) q --check later
  on a.baby_person_id = q.baby_person_id
  left join (
    select
      distinct baby_person_id
    from
      global_temp.mom_baby_step1_baby1stvisit_all
    where
      baby_1st_visit_problem != 'N'
  ) r on a.baby_person_id = r.baby_person_id;
"""
mom_baby_2010_mombabypair_latest_version = spark.sql(sql)
mom_baby_2010_mombabypair_latest_version.write.mode("overwrite").saveAsTable("phenotype_output.mprint_cohort")

In [0]:
df_inspection("phenotype_output.mprint_cohort","all")