## To Do

- ~~Finish transfer counts variable (after recoding suspect transfers)~~
- ~~Get rid of double admits w/ differing service codes~~
- ~~Rename all column variables to lower case~~
- ~~Expand out the final patient data frame with visit ids & rows per each day in the hospital~~
- ~~Get final table of RUIDs that remain in the dataset~~

- Clean up pipeline for creating final leftmost table
    - Import RUID as a string
    - Adjust column names on import
    - Restructure joins so there only needs to be one join->reset_index step


- Analyze how much of a difference recoded transfers make?
- Properly characterize missingness based on dropped discharge dates
- Properly characterize amount of data loss with each cohort change

Shape of final table:

| ruid | visit_id | admit_date | discharge_date | hospital_day | n_transfers | stay_length | readmit_time | readmit_30d |
|------|----------|------------|----------------|--------------|-------------|-------------|--------------|-------------|
| user id | hospital stay # | date admitted | date discharged | date in hospital | number of transfers | duration of stay | time from last discharge to this admission | was the patient a 30d readmit? |


## Nice-to-do

- Construct missing discharge/admit dates from CPT codes (as above) -- do not do this for events where both are missing as these may be ER visits w/o admit, but do check if they fall in the range of an existing stay
- Characterize the amount of missingness of entire hospital visits from CPT codes

## Loading data

In [2]:
%ls ../data

FONNESBECK_ADT_20151202.csv        [1m[31mFONNESBECK_LAB_20151202.csv[m[m*
[1m[31mFONNESBECK_BMI_20151202.csv[m[m*       [1m[31mFONNESBECK_MED_20151202.csv[m[m*
[1m[31mFONNESBECK_BP_20151202.csv[m[m*        [1m[31mFONNESBECK_phenotype_20151202.csv[m[m*
[1m[31mFONNESBECK_CPT_20151202.csv[m[m*       Fonnesbeck_DD_2014102014.xlsx
[1m[31mFONNESBECK_EGFR_20151202.csv[m[m*      adt_cms_final.pkl
[1m[31mFONNESBECK_ICD9_20151202.csv[m[m*


In [3]:
import pandas as pd
import datetime
import numpy as np

In [4]:
adt = pd.read_table('../data/FONNESBECK_ADT_20151202.csv', encoding='latin1', sep = ',', infer_datetime_format=True, parse_dates=['Admission_date','Event_Date','DISCHARGE_DATE'])
pheno = pd.read_table('../data/FONNESBECK_phenotype_20151202.csv', encoding='latin1', sep = ',', infer_datetime_format=True, parse_dates=['DOB','DOD'])
cpt = pd.read_table('../data/FONNESBECK_CPT_20151202.csv', encoding='latin1', sep = ',', infer_datetime_format=True, parse_dates=['Event_date'])
# import RUID as a string, rename columns

In [5]:
svc = pd.read_excel('../data/FONNESBECK_DD_2014102014.xlsx',sheet_name='Service code', sep = ',')
svc.rename(columns = {"Service Code":"SVC", "Service Code Desc":"Desc"}, inplace = True)

In [6]:
adt.Event = pd.Categorical(adt.Event,categories = ['Admit','Transfer','Discharge'])
adt = adt.sort_values(by = ['RUID','Admission_date','Event','Event_Date']).reset_index(drop = True)

In [7]:
adt.describe(include='all')

Unnamed: 0,RUID,Event,Admission_date,Event_Date,SRV_CODE,CHIEF_COMPLAINT,DISCHARGE_DATE
count,121530.0,121530,119969,121530,121530,120603.0,119472
unique,,3,4192,4279,73,13118.0,4195
top,,Transfer,2013-03-14 00:00:00,2013-12-28 00:00:00,GMD,296.9,2010-12-23 00:00:00
freq,,61636,111,69,13062,2394.0,111
first,,,2004-01-28 00:00:00,2004-01-28 00:00:00,,,2004-02-11 00:00:00
last,,,2015-11-26 00:00:00,2015-11-26 00:00:00,,,2015-11-23 00:00:00
mean,53668610.0,,,,,,
std,462820.6,,,,,,
min,50135260.0,,,,,,
25%,53729800.0,,,,,,


## Looking at missingness

In [8]:
adt.isnull().sum()/adt.shape[0]

RUID               0.000000
Event              0.000000
Admission_date     0.012845
Event_Date         0.000000
SRV_CODE           0.000000
CHIEF_COMPLAINT    0.007628
DISCHARGE_DATE     0.016934
dtype: float64

In [9]:
adt[adt.Admission_date.isnull() & adt.DISCHARGE_DATE.isnull()].SRV_CODE.value_counts() # may correspond to ER visits without admission which we don't need to predict

CAR    295
GMD    269
PED    110
NEP     69
GER     55
PUL     51
GNS     46
OBS     45
ORT     43
INF     40
NES     38
ONC     37
PGS     37
HEM     34
PON     34
URO     32
NEU     31
GAS     23
EMR     22
OTO     20
TRA     20
HEP     17
PLS     16
EGS     15
PCC     14
PGA     14
LTS     14
PNP     11
BRN     10
PCA     10
PNE      9
PPU      9
RTS      8
CSX      8
VAS      7
PSY      6
PEN      5
GIL      4
GYN      4
OES      4
CLP      3
DIA      3
ADO      2
RAD      2
GEN      2
CTS      2
PTA      2
THS      2
NEO      2
RHM      1
Name: SRV_CODE, dtype: int64

In [10]:
adt[(adt.Admission_date.isnull() & adt.DISCHARGE_DATE.isnull())].shape # may correspond to ER visits without admission which we don't need to predict

(1557, 7)

In [11]:
adt[adt.Admission_date.isnull()].Event.value_counts()

Transfer     1559
Discharge       2
Admit           0
Name: Event, dtype: int64

In [12]:
adt[adt.DISCHARGE_DATE.isnull()].Event.value_counts()

Transfer     1860
Admit         198
Discharge       0
Name: Event, dtype: int64

In [13]:
adt[(adt.Admission_date.isnull()) & (adt.Event == 'Discharge')]

Unnamed: 0,RUID,Event,Admission_date,Event_Date,SRV_CODE,CHIEF_COMPLAINT,DISCHARGE_DATE
76409,53733158,Discharge,NaT,2007-08-06,TRA,STAT,2007-08-06
76578,53733172,Discharge,NaT,2013-02-17,PED,SEPSIS,2013-02-17


In [14]:
adt[76400:76420]

Unnamed: 0,RUID,Event,Admission_date,Event_Date,SRV_CODE,CHIEF_COMPLAINT,DISCHARGE_DATE
76400,53733157,Admit,2012-02-14,2012-02-14,PUL,HEMOPTYSIS,2012-02-15
76401,53733157,Transfer,2012-02-14,2012-02-14,PUL,HEMOPTYSIS,2012-02-15
76402,53733157,Discharge,2012-02-14,2012-02-15,PUL,HEMOPTYSIS,2012-02-15
76403,53733157,Admit,2012-04-14,2012-04-14,ONC,FAILURE TO THRIVE; DEHYDRATION; KIDNEY CA,2012-04-15
76404,53733157,Transfer,2012-04-14,2012-04-14,ONC,FAILURE TO THRIVE; DEHYDRATION; KIDNEY CA,2012-04-15
76405,53733157,Discharge,2012-04-14,2012-04-15,ONC,FAILURE TO THRIVE; DEHYDRATION; KIDNEY CA,2012-04-15
76406,53733157,Transfer,NaT,2011-11-21,ONC,HEMOPTYSIS,NaT
76407,53733157,Transfer,NaT,2011-11-21,HEM,HEMOPTYSIS,NaT
76408,53733158,Transfer,NaT,2007-08-05,TRA,STAT,2007-08-06
76409,53733158,Discharge,NaT,2007-08-06,TRA,STAT,2007-08-06


## Adding age data & removing pediatric patients

In [15]:
adt_age = pd.merge(adt,pheno)
# admits = adt_age.Admission_date.dt
events = adt_age.Event_Date.dt
birthdays = adt_age.DOB.dt

adt_age['age'] = events.year - birthdays.year + ((events.month < birthdays.month) & (events.day < birthdays.day))
# above from https://stackoverflow.com/questions/2217488/age-from-birthdate-in-python/9754466#9754466

In [16]:
# getting rid of peds & psychiatric patients...
# we're removing these because they aren't part of the CMS criteria so 30-day readmits for them don't lose the hospital money
ped_svc = '|'.join(svc.SVC[svc.Desc.str.contains("CHILD|PED")])
psych_svc = '|'.join(svc.SVC[svc.Desc.str.contains("PSYCH")])

# ped_filter = ((adt_age.age < 18) | (adt_age.SRV_CODE.str.contains(ped_svc)) & ~((adt_age.age > 35) & (adt_age.SRV_CODE.str.contains(ped_svc))))
# the ~ condition here contains a handful of rows that I think are coding errors -- very old patients admitted to pediatric services
# the cutoff is 35 because some pediatric cancer/cardiac/etc patients will continue with pediatric services for their original condition into adulthood

ped_filter = (adt_age.age < 18)
psych_filter = (adt_age.SRV_CODE.str.contains(psych_svc))

# doing this the simplest way possible
# the psych filter will remove patients with a primary psych admit, which removes them from our consideration, but NOT patients who have psych consults
# may want to explicitly do that 

In [17]:
adt_cms = adt[~(ped_filter | psych_filter)].copy()

## Filtering to admits & eliminating missing discharges

In [18]:
adt_cms['imputed_transfer'] = [0]*adt_cms.shape[0]
txmask = (adt_cms.Event == "Admit") & (adt_cms.Admission_date != adt_cms.Event_Date) # there are 431 of these

adt_cms.loc[txmask,'Event'] = "Transfer"
adt_cms.loc[txmask,'imputed_transfer'] = 1

In [19]:
adt_cms[adt_cms.imputed_transfer == 1]

Unnamed: 0,RUID,Event,Admission_date,Event_Date,SRV_CODE,CHIEF_COMPLAINT,DISCHARGE_DATE,imputed_transfer
461,50135624,Transfer,2015-06-24,2015-06-25,CAR,SOB,2015-06-28,1
529,50135821,Transfer,2013-10-24,2013-10-27,GMD,LEG LAC,2013-10-29,1
1574,50139667,Transfer,2014-10-02,2014-10-03,CAR,CHF EXACERBATION,2014-10-22,1
1769,50141958,Transfer,2014-03-11,2014-03-12,PUL,TRAUMA,2014-03-13,1
1805,50142794,Transfer,2014-08-11,2014-08-12,CAR,CHF,2014-08-18,1
2490,53719335,Transfer,2014-04-24,2014-04-25,GMD,GI BLEED,2014-04-25,1
2519,53719335,Transfer,2015-03-01,2015-03-02,GMD,PNA,2015-03-04,1
2530,53719335,Transfer,2015-04-02,2015-04-04,GMD,N/V,2015-04-13,1
2551,53719857,Transfer,2014-08-25,2014-08-26,GMD,HYPERK,2014-08-27,1
2810,53725468,Transfer,2014-11-14,2014-11-15,PUL,S/P LIVER TXPLANT; RENAL FAILURE,2014-11-27,1


In [20]:
adt_cms_admits = adt_cms[(adt_cms.Event == 'Admit') & ~(adt_cms.DISCHARGE_DATE.isnull())].copy().reset_index(drop = True)
# removing missing discharge dates because I can't fix them right now
# adt_cms[(adt_cms.Event == 'Admit') & (adt_cms.DISCHARGE_DATE.isnull())]

# adt_cms_admits = adt_cms_admits[adt_cms_admits.Admission_date == adt_cms_admits.Event_Date].reset_index(drop = True)
# this removes admits that aren't the same day as the admit date
# i'm not sure what these actually are--they might be miscoded transfers or admissions to another department

## Constructing variables

In [21]:
adt_cms_admits['stay_length'] = adt_cms_admits.DISCHARGE_DATE - adt_cms_admits.Admission_date
adt_cms_admits['readmit_time'] = adt_cms_admits.Admission_date - adt_cms_admits.DISCHARGE_DATE.shift()

didx = ~(adt_cms_admits.RUID.shift() == adt_cms_admits.RUID)

adt_cms_admits['readmit_time'] = adt_cms_admits['readmit_time'].mask(didx)

adt_cms_admits['readmit_30d'] = np.where(adt_cms_admits.readmit_time <= datetime.timedelta(days=30),1,0)
adt_cms_admits = adt_cms_admits[~(adt_cms_admits.readmit_time < datetime.timedelta(days=0))] # get rid of double admits where we had a different
# chief complaint or svc code

In [22]:
adt_cms_admits.shape # we now have 21123 admissions to work with

(21123, 11)

In [23]:
event_counts = (adt_cms[~(adt_cms.DISCHARGE_DATE.isnull())].groupby(by=['RUID','Admission_date'])
                .Event
                .value_counts(sort=False)
                .unstack(fill_value = 0))

n_transfers = event_counts['Transfer'] # now pull the number of transfers and we're good
# merge this by multindex onto the other table once it's cleaned & ready to go

In [24]:
adt_cms_admits2 = (adt_cms_admits.drop(labels=['Event','Event_Date','SRV_CODE','imputed_transfer','CHIEF_COMPLAINT'], axis = 1)
                  .set_index(['RUID','Admission_date'])
                  .join(n_transfers)
                  .reset_index(drop = False)
                  .rename({'RUID': 'ruid', 'Admission_date': 'admit_date', 'DISCHARGE_DATE': 'discharge_date', 'Transfer': 'n_transfers'},axis = 1))

In [25]:
adt_cms_admits2['visit_id'] = adt_cms_admits2.groupby('ruid').cumcount()

In [26]:
def date_ranger(x):
    start = x.iloc[0]['admit_date']
    end = x.iloc[0]['discharge_date']
    return pd.DataFrame(pd.date_range(start=start, end=end).tolist())

In [27]:
hospital_day = (adt_cms_admits2.groupby(['ruid','visit_id'])
                .apply(date_ranger)
                .reset_index(drop = False)
                .drop('level_2',axis = 1)
                .set_index(['ruid','visit_id']))

# takes a bit to run

In [28]:
adt_cms_final = (adt_cms_admits2.set_index(['ruid','visit_id'])
                .join(hospital_day)
                .reset_index(drop = False)
                .rename({0:'hospital_day'},axis=1))[['ruid','visit_id','admit_date','discharge_date','hospital_day',
                                                     'stay_length','n_transfers','readmit_time','readmit_30d']]

In [29]:
adt_cms_final

Unnamed: 0,ruid,visit_id,admit_date,discharge_date,hospital_day,stay_length,n_transfers,readmit_time,readmit_30d
0,50135262,0,2007-02-08,2007-02-12,2007-02-08,4 days,2,NaT,0
1,50135262,0,2007-02-08,2007-02-12,2007-02-09,4 days,2,NaT,0
2,50135262,0,2007-02-08,2007-02-12,2007-02-10,4 days,2,NaT,0
3,50135262,0,2007-02-08,2007-02-12,2007-02-11,4 days,2,NaT,0
4,50135262,0,2007-02-08,2007-02-12,2007-02-12,4 days,2,NaT,0
5,50135262,1,2007-08-03,2007-08-06,2007-08-03,3 days,3,172 days,0
6,50135262,1,2007-08-03,2007-08-06,2007-08-04,3 days,3,172 days,0
7,50135262,1,2007-08-03,2007-08-06,2007-08-05,3 days,3,172 days,0
8,50135262,1,2007-08-03,2007-08-06,2007-08-06,3 days,3,172 days,0
9,50135262,2,2007-08-28,2007-08-29,2007-08-28,1 days,1,22 days,1


In [30]:
final_ruids = adt_cms_final.ruid.unique()

In [31]:
len(final_ruids) # from 8000 patients, we're down to 5664.

5664

In [32]:
adt_cms_admits2.describe()

Unnamed: 0,ruid,stay_length,readmit_time,readmit_30d,n_transfers,visit_id
count,21123.0,21123,15459,21123.0,21123.0,21123.0
mean,53655410.0,5 days 01:52:33.131657,222 days 06:58:03.330098,0.269327,2.035128,5.632912
std,511651.9,6 days 15:08:41.550742,408 days 05:35:22.089900,0.443621,1.895853,10.275376
min,50135260.0,0 days 00:00:00,0 days 00:00:00,0.0,0.0,0.0
25%,53729700.0,2 days 00:00:00,17 days 00:00:00,0.0,1.0,0.0
50%,53731950.0,3 days 00:00:00,58 days 00:00:00,0.0,2.0,2.0
75%,53734270.0,6 days 00:00:00,222 days 00:00:00,1.0,3.0,6.0
max,53736420.0,206 days 00:00:00,3775 days 00:00:00,1.0,36.0,104.0


In [35]:
adt_cms[adt_cms.RUID == 53736286]

Unnamed: 0,RUID,Event,Admission_date,Event_Date,SRV_CODE,CHIEF_COMPLAINT,DISCHARGE_DATE,imputed_transfer
119081,53736286,Admit,2007-08-27,2007-08-27,HEM,PAIN CRISIS,2007-09-06,0
119082,53736286,Transfer,2007-08-27,2007-08-28,HEM,PAIN CRISIS,2007-09-06,0
119083,53736286,Transfer,2007-08-27,2007-09-02,HEM,PAIN CRISIS,2007-09-06,0
119084,53736286,Discharge,2007-08-27,2007-09-06,HEM,PAIN CRISIS,2007-09-06,0
119085,53736286,Admit,2007-09-25,2007-09-25,HEM,SICKLE CELL CRISIS,2007-09-30,0
119086,53736286,Transfer,2007-09-25,2007-09-25,HEM,SICKLE CELL CRISIS,2007-09-30,0
119087,53736286,Transfer,2007-09-25,2007-09-26,HEM,SICKLE CELL CRISIS,2007-09-30,0
119088,53736286,Discharge,2007-09-25,2007-09-30,HEM,SICKLE CELL CRISIS,2007-09-30,0
119089,53736286,Admit,2007-11-02,2007-11-02,HEM,SICKLE CELL CRISIS,2007-11-06,0
119090,53736286,Transfer,2007-11-02,2007-11-02,HEM,SICKLE CELL CRISIS,2007-11-06,0


In [36]:
adt_cms_admits.readmit_30d.sum() # number of events in our final cohort

5689

## (Attempting to) Impute missing discharge dates from CPT hospitalization & discharge codes

In [None]:
hosp_ed_cpts = ["99217", "99218", "99219", "99220", "99221", "99222", "99223", "99224", "99225", "99226", "99231", "99232", "99233", "99234", "99235", "99236", "99238", "99239", "99251", "99252", "99253", "99254", "99255", "99289","99290", "99291", "99292", "99293", "99294", "99295","99296", "99297", "99356", "99357", "99358", "99359", "99433", "99435", "99460", "99461", "99462", "99463", "99466", "99467", "99468", "99469","99471", "99472", "99475", "99476", "99477", "99478", "99479", "99480", "99485", "99486", "99281", "99282", "99283", "99284", "99285"]
cpt_pat = "|".join(hosp_ed_cpts)
disch_pat = "|".join(["99217", "99238", "99239"])

In [None]:
cpt_hosp = cpt[cpt.CPT_Code.str.match(cpt_pat)].sort_values(by=['RUID','Event_date','CPT_Code'])

In [None]:
missing_discharge = adt_cms[(adt_cms.Event == "Admit") & (adt_cms.DISCHARGE_DATE.isnull())].copy().reset_index()
missing_discharge['IMPUTED_DISCHARGE'] = missing_discharge.DISCHARGE_DATE
missing_discharge.head()

In [None]:
for idx, row in missing_discharge.iterrows():
    cpt_sub = cpt_hosp[(cpt_hosp.RUID == row.RUID) & (cpt_hosp.Event_date > row.Admission_date)]
    cpt_disch = cpt_sub[cpt_sub.CPT_Code.str.match(disch_pat)]
    orig_idx = row.index
    
    if cpt_disch.shape[0]:
        missing_discharge.IMPUTED_DISCHARGE[idx] = cpt_disch.iloc[0,2]
        # this will modify the original df
        # but we should probable be careful about that
        # so i'm commenting it out
        # adt_cms.DISCHARGE_DATE[orig_idx] = cpt_disch.iloc[0,2]

# find gaps in CPT codes -- use the last code before a non-contiguous gap and put in as discharge codes

In [None]:
missing_discharge
# this isn't a reliable way of doing this -- sometimes there aren't any discharge codes for a given admit so it picks one several months later

## Characterizing the population & missingness

In [24]:
adt.describe(include = "all")

Unnamed: 0,RUID,Event,Admission_date,Event_Date,SRV_CODE,CHIEF_COMPLAINT,DISCHARGE_DATE
count,121530.0,121530,119969,121530,121530,120603.0,119472
unique,8000.0,3,4192,4279,73,13118.0,4195
top,53736286.0,Transfer,2013-03-14 00:00:00,2013-12-28 00:00:00,GMD,296.9,2010-12-23 00:00:00
freq,374.0,61636,111,69,13062,2394.0,111
first,,,2004-01-28 00:00:00,2004-01-28 00:00:00,,,2004-02-11 00:00:00
last,,,2015-11-26 00:00:00,2015-11-26 00:00:00,,,2015-11-23 00:00:00


In [27]:
adt_age[adt_age.Event == "Admit"].describe()
# this may not be totally meaningful since many patients are in here multiple times & have aged over the course of contact
# with the system

Unnamed: 0,age
count,30199.0
mean,39.94321
std,24.513748
min,0.0
25%,19.0
50%,41.0
75%,60.0
max,102.0


In [33]:
len(np.unique(adt_cms.RUID)) - len(np.unique(adt_cms_admits.RUID))
# 381 individuals either have no admit events or have admit events without a discharge date

381

In [42]:
peds = set(adt_age.RUID[ped_filter])
psych = set(adt_age.RUID[psych_filter])
final = set(final_ruids)

intpeds = peds.intersection(final)
intpsych = psych.intersection(final)

In [51]:
print("Starting from a cohort of 8000 patients, there are {fin} patients in the final cohort. ").format(fin=len(final),peds=len(peds),psych=len(psych),inpeds=len(intpeds),intpsych=len(intpsych))

5664

In [53]:
len(peds.intersection(final))

146

In [54]:
len(psych.intersection(final))

354

In [62]:
len(psych.intersection(peds))

156

In [63]:
len(psych)

765

In [64]:
len(peds)

2131

In [39]:
no_bd = set(pheno.RUID[pheno.DOB.isnull()])

In [65]:
len(no_bd)

43

In [63]:
adt_age[(adt_age.age.isnull()) & (adt_age.Event == "Admit") & ((adt_age.SRV_CODE.str.contains(ped_svc)) | (adt_age.SRV_CODE.str.contains("NUR")) | (adt_age.CHIEF_COMPLAINT == "NEWBORN"))].shape

(45, 12)

In [64]:
adt_age[(adt_age.age.isnull()) & (adt_age.Event == "Admit") & ((adt_age.SRV_CODE.str.contains(ped_svc)) | (adt_age.SRV_CODE.str.contains("NUR")) | (adt_age.CHIEF_COMPLAINT == "NEWBORN"))]

Unnamed: 0,RUID,Event,Admission_date,Event_Date,SRV_CODE,CHIEF_COMPLAINT,DISCHARGE_DATE,Sex,DOB,DOD,Race,age
6146,53728072,Admit,2014-11-17,2014-11-17,PGS,PYLORIC STENOSIS,2014-11-20,,NaT,NaT,U,
6147,53728072,Admit,2014-11-17,2014-11-18,PGS,PYLORIC STENOSIS,2014-11-20,,NaT,NaT,U,
9140,53728274,Admit,2014-10-10,2014-10-10,PCC,WOUND ISSUE,2014-10-14,,NaT,NaT,U,
9146,53728274,Admit,2015-07-12,2015-07-12,PED,SHUNT MALFUNCTION,2015-07-12,,NaT,NaT,U,
12215,53728472,Admit,2015-01-09,2015-01-09,NUR,,2015-01-10,,NaT,NaT,U,
14042,53728579,Admit,2014-11-19,2014-11-19,NUR,,2014-11-20,,NaT,NaT,U,
14390,53728602,Admit,2013-12-19,2013-12-19,NUR,NEWBORN,2013-12-30,,NaT,NaT,U,
15560,53728684,Admit,2014-06-21,2014-06-21,NUR,NEWBORN,2014-09-15,,NaT,NaT,U,
19987,53729008,Admit,2015-01-01,2015-01-01,NUR,NEWBORN,2015-01-03,,NaT,NaT,U,
37721,53730329,Admit,2014-06-07,2014-06-07,NUR,NEWBORN,2014-06-09,,NaT,NaT,U,


In [67]:
len(no_bd.intersection(final))

41

In [68]:
newborns = set(adt_age.RUID[adt_age.CHIEF_COMPLAINT == "NEWBORN"])

In [71]:
newborns.intersection(no_bd)

{53728602,
 53728684,
 53729008,
 53730329,
 53731021,
 53731561,
 53733438,
 53734654,
 53735626,
 53736032}

In [72]:
len(newborns)

248