# Project: Performance of phenotype algorithms for the identification of opioid-exposed infants, Andrew D. Wiese et al. Hospital Pediatrics 2024
# Title: Birthing Parent-Child Dyads with Defined Birth Hospitalization Stay
# Summary: 
## Select birthing parent-child dyads with evidence of live births (from 2_birthing_parent_child_dyads_with_live_births.ipynb/.html) with a defined birth hospitalization stay, e.g. start and end dates around child date of birth

# Notes:
- If only have partial birth hospitalization stay information, will impute missing information


##### Algorithm steps:
```
1. Get baby birth inpatient visits where:
   - Birth date is within 3 days of visit start date
   - Visit is inpatient (concept ID 9201)
   - Visit end date is not null

2. For each baby's first inpatient visit:
   - Get earliest visit start date
   - Check if birth date is within visit period
   - Flag 'Y' if birth during visit, 'N' otherwise

3. Get visit details for first inpatient visits
   - Join to visit table on matching person and visit dates

4. Explore patients without identified inpatient visit
   - Get list of babies missing from step 2

5. For patients without meaningful inpatient visit:
   - Handle different visit concepts and end dates
   - Get first visit start and end dates

6. For patients with no visit info around birth:
   - Return null for visit dates
   - Flag as 'no_visitinfo_during_birth'

7. Merge all cases:
   - Valid inpatient visit
   - No meaningful inpatient
   - No visit info
   - Not in visit table

8. Filter out problematic cases:
   - Keep only valid inpatient or no meaningful inpatient


```

##### Data Dictionary:

**mom_baby_step1**: Fact and dimension table linking mom and baby persons 

**person**: Person data including demographics

**visit**: Visit data including dates and concepts

**baby_df**: Baby data filtered for inpatient visits

**mom_baby_step1_baby1stvisit_noproblem_prepare**: Baby first inpatient visits

**mom_baby_step1_baby1stvisit_noproblem**: Clean info for first inpatient visits 

**mom_baby_step1_baby1stvisit_noproblem_detail**: Details for first inpatient visits

**mom_baby_step1_baby1stvisit_nomeaninfulipvisitinfo**: Babies missing from first inpatient visit data

**mom_baby_step1_baby1stvisit_nomeaninfulipvisitinfo_prepare**: Visit data for babies missing inpatient info

**mom_baby_step1_baby1stvisit_nomeaninfulipvisitinfo_1stvisit**: Constructed first visit data for these babies

**mom_baby_step1_baby1stvisit_novisitinfobirth**: Babies with no visit data around birth

**mom_baby_step1_baby1stvisit_novisitinfo**: Babies not in visit table 

**mom_baby_step1_baby1stvisit_all**: Merged data from all cases

**mom_baby_step1_baby1stvisit**: Final cohort filtering out problematic cases


##### Usage Notes:

```
- This extracts the first inpatient visit around birth for babies linked to mothers
- It handles several edge cases like missing visits or inconsistent data
- The final cohort contains only babies with valid inpatient data or no meaningful inpatient but other visit data
- Making assumptions about visit start/end dates for different concepts when data is missing
```

In [0]:
%run "./project_modules"

##### Get the baby birth inpatient visit
##### a. (birth_datetime) >= visit_start_date and  date(birth_datetime) <= visit_start_date + 3 
##### b. inpatient (visit_concept_id = 9201)
##### c. visit_end_date is not null


In [0]:
sql=f"""
       select person_id as baby_person_id,birth_datetime,location_id,person_source_value as  baby_grid,gender_source_value,race_source_value,min(visit_start_date) as earliest_visit_start_date 
       from 
       (select * from {person_table} where person_id in (select fact_id_2 from global_temp.mom_baby_step1)) a 
       inner join 
       (select * from {visit_table} where visit_concept_id = '9201') b 
       using (person_id) 
       group by person_id,birth_datetime,location_id,person_source_value,gender_source_value,race_source_value;
    """
spark.sql(sql).createOrReplaceTempView("baby_df") 

sql=f"""
       select a.*,
       case 
       when date(birth_datetime) >= a.earliest_visit_start_date 
       and  date(birth_datetime) <= date_add(a.earliest_visit_start_date, 3) 
       then 'Y' 
       else 'N' 
       end as birthdate_during_visit, b.* from baby_df a, {visit_table} b 
       where a.baby_person_id==b.person_id and 
       a.earliest_visit_start_date==b.visit_start_date and 
       b.visit_concept_id='9201'
    """

spark.sql(sql).createOrReplaceTempView("mom_baby_step1_baby1stvisit_noproblem_prepare")

####  if there is multiple inpatient visit during birth hospitalization, then include all the hospital stay length

In [0]:
sql="""
    select baby_person_id,baby_grid,birth_datetime,
    min(visit_start_date) as first_visit_start_date,min(visit_start_datetime) as first_visit_start_datetime,
    max(visit_end_date) as  first_visit_end_date, 'N' as baby_1st_visit_problem from mom_baby_step1_baby1stvisit_noproblem_prepare
    where visit_end_date is not null and birthdate_during_visit = 'Y'
    group by baby_person_id,baby_grid,birth_datetime;
    """
spark.sql(sql).createOrReplaceTempView("mom_baby_step1_baby1stvisit_noproblem")

##### the detail visit info

In [0]:
sql=f"""
     select a.*,b.visit_occurrence_id,b.visit_start_date,b.visit_end_date,b.visit_concept_id,b.visit_source_value 
     from mom_baby_step1_baby1stvisit_noproblem a 
     inner join 
     {visit_table} b
     on a.baby_person_id = b.person_id and a.first_visit_start_date = b.visit_start_date and 
     a.first_visit_end_date = b.visit_end_date
     where visit_concept_id = '9201'
    """
spark.sql(sql).createOrReplaceTempView("mom_baby_step1_baby1stvisit_noproblem_detail") 

##### Explore other patients' 1st visit information


In [0]:
sql="""
    select fact_id_2 as baby_person_id from global_temp.mom_baby_step1 where fact_id_2 
    not in (select baby_person_id from mom_baby_step1_baby1stvisit_noproblem);
    """
spark.sql(sql).createOrReplaceTempView("mom_baby_step1_baby1stvisit_nomeaninfulipvisitinfo") 

##### Explore the cohort that did not have meaningful inpatinet(9201) visit_information, but had other type visit information, or had inpatient visit with 'null' as visit_end_date
##### if visit_concept_id = 9201 (inpatient), visit_start_date + 21 as visit_end_date (if it was null in the visit_table)
##### if visit_concept_id = 9202 (outpatient), visit_start_date + 3 as visit_end_date (if it was null in the visit_table)

In [0]:
sql=f"""
   select *,
    case 
    when visit_concept_id = 9201 and visit_end_date is not null then visit_end_date 
    when visit_concept_id = 9201 and visit_end_date is null then date_add(visit_start_date, 21)
    when visit_concept_id = 9202 and visit_end_date is not null then visit_end_date 
    when visit_concept_id = 9202 and visit_end_date is null then date_add(visit_start_date, 3)
    else visit_end_date end ifnull_first_visit_end_date
   from
    (select fact_id_2 as person_id,person_source_value as baby_grid,birth_datetime from global_temp.mom_baby_step1 where 
    fact_id_2 in (select baby_person_id from mom_baby_step1_baby1stvisit_nomeaninfulipvisitinfo)) a
    inner join {visit_table} b using(person_id)
    where date(birth_datetime) >= visit_start_date and  date(birth_datetime) <= date_add(visit_start_date, 3);
    """
    
spark.sql(sql).createOrReplaceTempView("mom_baby_step1_baby1stvisit_nomeaninfulipvisitinfo_prepare")

##### mom_baby_step1_baby1stvisit_nomeaninfulipvisitinfo_1stvisit

In [0]:
sql="""
    select person_id as baby_person_id,baby_grid,birth_datetime,min(visit_start_date) as first_visit_start_date,
    min(visit_start_datetime) as first_visit_start_datetime,max(ifnull_first_visit_end_date) as first_visit_end_date,
    'no_meaningfull_ip_visit_info' as baby_1st_visit_problem
    
    from mom_baby_step1_baby1stvisit_nomeaninfulipvisitinfo_prepare 
    group by person_id,baby_grid,birth_datetime;
    """
    
spark.sql(sql).createOrReplaceTempView("mom_baby_step1_baby1stvisit_nomeaninfulipvisitinfo_1stvisit")

##### mom_baby_step1_baby1stvisit_novisitinfobirth

In [0]:
sql=f"""
     select fact_id_2 as baby_person_id,person_source_value as baby_grid, birth_datetime,
     null as first_visit_start_date,null as first_visit_start_datetime,null as first_visit_end_date,
     'no_visitinfo_during_birth' as baby_1st_visit_problem
     from global_temp.mom_baby_step1 where fact_id_2 in 
     (select person_id from {visit_table})
     and fact_id_2 not in (select baby_person_id from mom_baby_step1_baby1stvisit_nomeaninfulipvisitinfo_1stvisit)
     and fact_id_2 not in (select baby_person_id from mom_baby_step1_baby1stvisit_noproblem);
    """
    
spark.sql(sql).createOrReplaceTempView("mom_baby_step1_baby1stvisit_novisitinfobirth")

##### mom_baby_step1_baby1stvisit_novisitinfo

In [0]:
sql=f"""
    select fact_id_2 as baby_person_id,person_source_value as baby_grid, birth_datetime,
    null as first_visit_start_date,null as first_visit_start_datetime,null as first_visit_end_date,
    'noinfo_in_visit_table' as baby_1st_visit_problem
    from global_temp.mom_baby_step1 where fact_id_2 not in 
    (select distinct person_id from {visit_table});
    """


spark.sql(sql).createOrReplaceTempView("mom_baby_step1_baby1stvisit_novisitinfo") 

##### Merge all 4 conditions:
##### 1) had valid birth hospitalization inpatient information
##### 2) had birth visit, but did not have meaningful inpatient visit information
##### 3) did not have visit around birthdate
##### 4) did not exist in visit_occurrence table;

In [0]:
sql="""
    select * from mom_baby_step1_baby1stvisit_noproblem
    union 
    select * from mom_baby_step1_baby1stvisit_nomeaninfulipvisitinfo_1stvisit
    union 
    select * from mom_baby_step1_baby1stvisit_novisitinfobirth
    union 
    select * from mom_baby_step1_baby1stvisit_novisitinfo;
    """
mom_baby_step1_baby1stvisit_all = spark.sql(sql)
mom_baby_step1_baby1stvisit_all.name='mom_baby_step1_baby1stvisit_all'
register_parquet_global_view(mom_baby_step1_baby1stvisit_all)

In [0]:
df_inspection("global_temp.mom_baby_step1_baby1stvisit_all","baby")

In [0]:
sql="""
       select * from global_temp.mom_baby_step1_baby1stvisit_all where baby_1st_visit_problem = 'N' or 
       baby_1st_visit_problem = 'no_meaningfull_ip_visit_info';
    """
mom_baby_step1_baby1stvisit = spark.sql(sql)
mom_baby_step1_baby1stvisit.name='mom_baby_step1_baby1stvisit'
register_parquet_global_view(mom_baby_step1_baby1stvisit)

In [0]:
df_inspection("global_temp.mom_baby_step1_baby1stvisit","baby")

In [0]:
%sql
select * from global_temp.mom_baby_step1_baby1stvisit