# Querying Hospital Inpatient Discharges (SPARCS De-Identified) from Open Data NY - Department of Health
See the NY Gov overview [here](https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/tg3i-cinn)

Using query help from [https://dev.socrata.com/foundry/health.data.ny.gov/tg3i-cinn](https://dev.socrata.com/foundry/health.data.ny.gov/tg3i-cinn)

## All discharge data (skip this)

In [5]:
import numpy as np

In [6]:
import pandas as pd
from sodapy import Socrata

client = Socrata("health.data.ny.gov", None)
results = client.get("tg3i-cinn", limit=2000)

# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)
results_df



Unnamed: 0,hospital_service_area,hospital_county,operating_certificate_number,permanent_facility_id,facility_name,age_group,zip_code_3_digits,gender,race,ethnicity,...,apr_severity_of_illness,apr_risk_of_mortality,apr_medical_surgical,payment_typology_1,payment_typology_2,emergency_department_indicator,total_charges,total_costs,birth_weight,payment_typology_3
0,New York City,Bronx,7000006,001169,Montefiore Medical Center - Henry & Lucy Moses...,70 or Older,104,M,Other Race,Spanish/Hispanic,...,Major,Extreme,Medical,Medicare,Medicaid,Y,320922.43,60241.34,,
1,New York City,Bronx,7000006,001169,Montefiore Medical Center - Henry & Lucy Moses...,50 to 69,104,F,White,Not Span/Hispanic,...,Moderate,Minor,Medical,Private Health Insurance,,Y,61665.22,9180.69,,
2,New York City,Bronx,7000006,001168,Montefiore Medical Center-Wakefield Hospital,18 to 29,104,F,Other Race,Spanish/Hispanic,...,Minor,Minor,Surgical,Medicaid,,N,42705.34,11366.50,,
3,New York City,Bronx,7000006,003058,Montefiore Med Center - Jack D Weiler Hosp of ...,70 or Older,104,M,Other Race,Spanish/Hispanic,...,Major,Major,Medical,Medicare,Medicaid,Y,72700.17,12111.75,,
4,New York City,Bronx,7000006,001169,Montefiore Medical Center - Henry & Lucy Moses...,50 to 69,104,F,Black/African American,Not Span/Hispanic,...,Moderate,Minor,Medical,Medicare,Medicaid,Y,55562.51,8339.72,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,New York City,Manhattan,7002001,001438,Bellevue Hospital Center,70 or Older,112,M,Black/African American,Not Span/Hispanic,...,Moderate,Moderate,Surgical,Medicaid,,N,157971.13,90846.67,,
1996,New York City,Kings,7001009,001294,Coney Island Hospital,30 to 49,112,F,White,Not Span/Hispanic,...,Minor,Minor,Surgical,Medicaid,,N,39071.37,17854.80,,
1997,New York City,Manhattan,7002001,001438,Bellevue Hospital Center,0 to 17,113,F,Other Race,Spanish/Hispanic,...,Minor,Minor,Medical,Medicaid,,N,14311.91,8230.55,02600,
1998,Hudson Valley,Rockland,4324000,000776,Montefiore Nyack,70 or Older,109,M,White,Not Span/Hispanic,...,Extreme,Extreme,Medical,Medicare,,Y,135524.70,26028.38,,


## All columns in this data

In [7]:
results_df.columns

Index(['hospital_service_area', 'hospital_county',
       'operating_certificate_number', 'permanent_facility_id',
       'facility_name', 'age_group', 'zip_code_3_digits', 'gender', 'race',
       'ethnicity', 'length_of_stay', 'type_of_admission',
       'patient_disposition', 'discharge_year', 'ccsr_diagnosis_code',
       'ccsr_diagnosis_description', 'ccsr_procedure_code',
       'ccsr_procedure_description', 'apr_drg_code', 'apr_drg_description',
       'apr_mdc_code', 'apr_mdc_description', 'apr_severity_of_illness_code',
       'apr_severity_of_illness', 'apr_risk_of_mortality',
       'apr_medical_surgical', 'payment_typology_1', 'payment_typology_2',
       'emergency_department_indicator', 'total_charges', 'total_costs',
       'birth_weight', 'payment_typology_3'],
      dtype='object')

In [8]:
results_df['ccsr_diagnosis_description'].unique()

array(['CORONAVIRUS DISEASE 2019 (COVID-19)', 'MULTIPLE SCLEROSIS',
       'PREVIOUS C-SECTION', 'URINARY TRACT INFECTIONS',
       'PARALYSIS (OTHER THAN CEREBRAL PALSY)',
       'COMPLICATION OF OTHER SURGICAL OR MEDICAL CARE, INJURY, INITIAL ENCOUNTER',
       'TRAUMATIC BRAIN INJURY (TBI); CONCUSSION, INITIAL ENCOUNTER',
       'LIVEBORN', 'NONINFECTIOUS GASTROENTERITIS', 'ASTHMA',
       'DIABETES MELLITUS WITH COMPLICATION', 'SICKLE CELL TRAIT/ANEMIA',
       'CHRONIC OBSTRUCTIVE PULMONARY DISEASE AND BRONCHIECTASIS',
       'OTHER SPECIFIED DISEASES OF VEINS AND LYMPHATICS',
       'STRESS FRACTURE, INITIAL ENCOUNTER',
       'ENCOUNTER FOR ANTINEOPLASTIC THERAPIES', 'BENIGN NEOPLASMS',
       'NERVE AND NERVE ROOT DISORDERS',
       'COMPLICATION OF TRANSPLANTED ORGANS OR TISSUE, INITIAL ENCOUNTER',
       'FLUID AND ELECTROLYTE DISORDERS',
       'INTESTINAL OBSTRUCTION AND ILEUS', 'EPILEPSY; CONVULSIONS',
       'MALE REPRODUCTIVE SYSTEM CANCERS - PROSTATE',
       'SCHIZOPHR

### NOTES:
My model can only take in data that will be available at the relevant time of the prediction (eg. I can't try to predict stock price of yesterday knowing today's stock price). This brings up some questions for the context of predicting length of stay:  
1. Would this prediction happen before the woman goes to the hospital? In this case the only predictive input data to go on would be where they live, the hospital they are going to (planning to), their age group, gender, race, and maybe an idea of the procedure to be done (but no official diagnosis which is what I have in the dataset nor knowledge of the complications to arise). This is tricky.
2. Would this prediction happen once the woman is in the hospital and presumably has given birth; are we estimating recovery time in hospital for a known procedure? This would allow me to use all of the data I have (yay) but I'm not sure if it would be as helpful in context. (Would a soon-to-be-again mother prepare guarenteed childcare for the kids for 2 days but wonder once given birth how much longer should be arranged, or is that superfulous information at that point?)

In [9]:
# for col in results_df.columns:
#     print(col)
#     print('\t', results_df[col].unique())

In [10]:
import pandas as pd
from sodapy import Socrata

client = Socrata("health.data.ny.gov", None)
soql_query = "SELECT * WHERE ccsr_diagnosis_code LIKE '%INJ%' LIMIT 10000000"
injuries = client.get("tg3i-cinn", query=soql_query)

injuries_df = pd.DataFrame.from_records(injuries)
injuries_df



Unnamed: 0,hospital_service_area,hospital_county,operating_certificate_number,permanent_facility_id,facility_name,age_group,zip_code_3_digits,gender,race,ethnicity,...,apr_severity_of_illness,apr_risk_of_mortality,apr_medical_surgical,payment_typology_1,emergency_department_indicator,total_charges,total_costs,payment_typology_2,payment_typology_3,birth_weight
0,New York City,Bronx,7000006,003058,Montefiore Med Center - Jack D Weiler Hosp of ...,50 to 69,105,M,Other Race,Spanish/Hispanic,...,Major,Moderate,Medical,Private Health Insurance,Y,109269.27,18443.00,,,
1,New York City,Bronx,7000006,001169,Montefiore Medical Center - Henry & Lucy Moses...,50 to 69,104,F,Black/African American,Not Span/Hispanic,...,Minor,Minor,Medical,Medicare,Y,24437.19,3060.38,Medicaid,,
2,New York City,Bronx,7000006,001169,Montefiore Medical Center - Henry & Lucy Moses...,50 to 69,104,M,Other Race,Spanish/Hispanic,...,Moderate,Moderate,Medical,Medicare,Y,270656.16,48268.17,Medicaid,,
3,New York City,Bronx,7000006,001169,Montefiore Medical Center - Henry & Lucy Moses...,30 to 49,104,M,Other Race,Unknown,...,Extreme,Major,Medical,Private Health Insurance,N,88167.44,15579.66,,,
4,New York City,Manhattan,7002001,001438,Bellevue Hospital Center,50 to 69,114,F,Black/African American,Unknown,...,Major,Minor,Medical,Medicaid,Y,41548.87,23894.09,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154431,New York City,Bronx,7000002,001165,Jacobi Medical Center,18 to 29,104,M,Other Race,Unknown,...,Moderate,Minor,Surgical,Self-Pay,Y,179936.46,106799.13,,,
154432,New York City,Bronx,7000002,001165,Jacobi Medical Center,70 or Older,100,M,Black/African American,Not Span/Hispanic,...,Extreme,Extreme,Surgical,Miscellaneous/Other,Y,289988.58,172119.24,Medicare,,
154433,New York City,Queens,7003000,001626,Elmhurst Hospital Center,70 or Older,113,F,Other Race,Spanish/Hispanic,...,Moderate,Moderate,Surgical,Medicare,Y,119307.70,55453.74,Medicaid,,
154434,New York City,Bronx,7000002,001165,Jacobi Medical Center,0 to 17,104,F,Black/African American,Not Span/Hispanic,...,Moderate,Minor,Medical,Miscellaneous/Other,Y,6324.49,3753.83,Medicaid,,


In [11]:
injuries_df['ccsr_diagnosis_description'].unique()

array(['COMPLICATION OF OTHER SURGICAL OR MEDICAL CARE, INJURY, INITIAL ENCOUNTER',
       'TRAUMATIC BRAIN INJURY (TBI); CONCUSSION, INITIAL ENCOUNTER',
       'COMPLICATION OF TRANSPLANTED ORGANS OR TISSUE, INITIAL ENCOUNTER',
       'FRACTURE OF TORSO, INITIAL ENCOUNTER',
       'DISLOCATIONS, INITIAL ENCOUNTER',
       'POISONING BY DRUGS, INITIAL ENCOUNTER',
       'OPEN WOUNDS OF HEAD AND NECK, INITIAL ENCOUNTER',
       'FRACTURE OF THE UPPER LIMB, INITIAL ENCOUNTER',
       'OPEN WOUNDS TO LIMBS, INITIAL ENCOUNTER', 'ALLERGIC REACTIONS',
       'AMPUTATION OF A LIMB, INITIAL ENCOUNTER',
       'INTERNAL ORGAN INJURY, INITIAL ENCOUNTER',
       'FRACTURE OF THE NECK OF THE FEMUR (HIP), INITIAL ENCOUNTER',
       'FRACTURE OF THE LOWER LIMB (EXCEPT HIP), INITIAL ENCOUNTER',
       'TRAUMATIC BRAIN INJURY (TBI); CONCUSSION, SUBSEQUENT ENCOUNTER',
       'COMPLICATION OF INTERNAL ORTHOPEDIC DEVICE OR IMPLANT, INITIAL ENCOUNTER',
       'COMPLICATION OF GENITOURINARY DEVICE, IMPLANT

## All pregnancy-related visits

In [12]:
import pandas as pd
from sodapy import Socrata

client = Socrata("health.data.ny.gov", None)
soql_query = "SELECT * WHERE ccsr_diagnosis_code LIKE '%PRG%' LIMIT 10000000"
prg_results = client.get("tg3i-cinn", query=soql_query)

pregnancy_visits = pd.DataFrame.from_records(prg_results)
pregnancy_visits



Unnamed: 0,hospital_service_area,hospital_county,operating_certificate_number,permanent_facility_id,facility_name,age_group,zip_code_3_digits,gender,race,ethnicity,...,apr_severity_of_illness,apr_risk_of_mortality,apr_medical_surgical,payment_typology_1,emergency_department_indicator,total_charges,total_costs,payment_typology_2,payment_typology_3,birth_weight
0,New York City,Bronx,7000006,001168,Montefiore Medical Center-Wakefield Hospital,18 to 29,104,F,Other Race,Spanish/Hispanic,...,Minor,Minor,Surgical,Medicaid,N,42705.34,11366.50,,,
1,New York City,Kings,7001009,001294,Coney Island Hospital,30 to 49,112,F,Other Race,Spanish/Hispanic,...,Moderate,Minor,Surgical,Medicaid,N,34775.79,15891.81,,,
2,New York City,Manhattan,7002001,001438,Bellevue Hospital Center,30 to 49,100,F,Black/African American,Not Span/Hispanic,...,Minor,Minor,Medical,Medicaid,N,24475.78,14075.63,,,
3,New York City,Manhattan,7002001,001438,Bellevue Hospital Center,30 to 49,113,F,Other Race,Spanish/Hispanic,...,Minor,Minor,Surgical,Medicaid,N,31914.35,18353.43,,,
4,New York City,Kings,7001045,001692,Woodhull Medical & Mental Health Center,18 to 29,112,F,White,Unknown,...,Moderate,Minor,Medical,Medicaid,N,23311.38,15888.78,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
221015,Long Island,Nassau,7003004,001630,Long Island Jewish Medical Center,18 to 29,117,F,White,Not Span/Hispanic,...,Minor,Minor,Medical,Blue Cross/Blue Shield,N,42902.01,8440.02,Self-Pay,,
221016,Long Island,Nassau,7003004,001630,Long Island Jewish Medical Center,30 to 49,110,F,Other Race,Not Span/Hispanic,...,Moderate,Minor,Medical,Blue Cross/Blue Shield,N,16526.90,5337.24,Medicaid,Medicaid,
221017,Long Island,Nassau,7003004,001630,Long Island Jewish Medical Center,18 to 29,112,F,White,Not Span/Hispanic,...,Minor,Minor,Medical,Private Health Insurance,N,19741.34,7198.87,Medicaid,Self-Pay,
221018,New York City,Queens,7003007,001633,Queens Hospital Center,18 to 29,114,F,Black/African American,Unknown,...,Moderate,Minor,Medical,Medicaid,Y,49696.50,22213.54,,,


##### The following are the DIAGNOSIS codes in the pregnancy-related data

In [13]:
print(pregnancy_visits['ccsr_diagnosis_code'].unique())
print(pregnancy_visits['ccsr_diagnosis_description'].unique())

# notes: 201k data points is too many, need to cut down
# I'm not sure that I want to consider separate post-op visits like previous c section complications, but if I was predicting time in the hospital maybe?

['PRG016' 'PRG020' 'PRG023' 'PRG024' 'PRG019' 'PRG013' 'PRG006' 'PRG018'
 'PRG022' 'PRG005' 'PRG009' 'PRG014' 'PRG011' 'PRG029' 'PRG012' 'PRG003'
 'PRG028' 'PRG027' 'PRG026' 'PRG017' 'PRG004' 'PRG021' 'PRG015' 'PRG010'
 'PRG025' 'PRG007' 'PRG001']
['PREVIOUS C-SECTION'
 'HYPERTENSION AND HYPERTENSIVE-RELATED CONDITIONS COMPLICATING PREGNANCY; CHILDBIRTH; AND THE PUERPERIUM'
 'COMPLICATIONS SPECIFIED DURING CHILDBIRTH'
 'MALPOSITION, DISPROPORTION OR OTHER LABOR COMPLICATIONS'
 'DIABETES OR ABNORMAL GLUCOSE TOLERANCE COMPLICATING PREGNANCY; CHILDBIRTH; OR THE PUERPERIUM'
 'MATERNAL CARE RELATED TO FETAL CONDITIONS'
 'MOLAR PREGNANCY AND OTHER ABNORMAL PRODUCTS OF CONCEPTION'
 'MATERNAL CARE RELATED TO DISORDERS OF THE PLACENTA AND PLACENTAL IMPLANTATION'
 'PROLONGED PREGNANCY'
 'ECTOPIC PREGNANCY AND COMPLICATIONS OF ECTOPIC PREGNANCY'
 'EARLY, FIRST OR UNSPECIFIED TRIMESTER HEMORRHAGE'
 'POLYHYDRAMNIOS AND OTHER PROBLEMS OF AMNIOTIC CAVITY'
 'EARLY OR THREATENED LABOR'
 'UNCOMPLICATED 

In [14]:
print('Lengths of stay for ALL pregnancy-related visits:')
lens = list(pregnancy_visits['length_of_stay'].unique())
lens.remove('120 +')
lengths_of_stay = [int(i) for i in lens]
print(lengths_of_stay)
print('MIN:', min(lengths_of_stay))
print('MAX:', max(lengths_of_stay))
print('MEDIAN:', np.median(lengths_of_stay))
print('MEAN:', np.mean(lengths_of_stay))

Lengths of stay for ALL pregnancy-related visits:
[2, 3, 1, 6, 5, 4, 31, 23, 9, 43, 10, 7, 11, 12, 14, 29, 8, 15, 18, 20, 27, 19, 13, 21, 30, 37, 16, 28, 51, 17, 26, 33, 24, 42, 38, 55, 22, 44, 45, 75, 32, 25, 50, 87, 35, 65, 36, 56, 52, 39, 34, 73, 48, 96, 59, 40, 46, 112, 57, 61, 53, 76, 66, 103, 67, 41, 49, 69, 64, 78, 70, 86, 92, 62, 47, 80, 63, 54, 85, 119, 58, 93, 68, 81, 74]
MIN: 1
MAX: 119
MEDIAN: 43.0
MEAN: 45.11764705882353


## Uncomplicated pregancy, delivery of puerperium
##### So far, this is the only column that I see to be strictly delivery related. If you see other columns that look related I would be happy to add them to the following.

In [15]:
uncomplicated_births = pregnancy_visits.loc[pregnancy_visits['ccsr_diagnosis_description'] == 'UNCOMPLICATED PREGNANCY, DELIVERY OR PUERPERIUM']
uncomplicated_births

Unnamed: 0,hospital_service_area,hospital_county,operating_certificate_number,permanent_facility_id,facility_name,age_group,zip_code_3_digits,gender,race,ethnicity,...,apr_severity_of_illness,apr_risk_of_mortality,apr_medical_surgical,payment_typology_1,emergency_department_indicator,total_charges,total_costs,payment_typology_2,payment_typology_3,birth_weight
42,New York City,Manhattan,7002024,001456,Mount Sinai Hospital,30 to 49,100,F,Other Race,Not Span/Hispanic,...,Moderate,Minor,Surgical,Private Health Insurance,N,43658.10,15155.30,Self-Pay,,
54,New York City,Queens,7003000,001626,Elmhurst Hospital Center,30 to 49,113,F,Other Race,Spanish/Hispanic,...,Moderate,Minor,Medical,Medicaid,N,25611.60,11904.17,,,
61,New York City,Bronx,7000008,001172,Lincoln Medical & Mental Health Center,18 to 29,100,F,Other Race,Spanish/Hispanic,...,Minor,Minor,Medical,Medicaid,N,22641.71,14470.88,,,
67,New York City,Kings,7001009,001294,Coney Island Hospital,30 to 49,112,F,Other Race,Spanish/Hispanic,...,Moderate,Moderate,Medical,Medicaid,N,23060.37,10538.10,,,
78,New York City,Manhattan,7002009,001445,Harlem Hospital Center,18 to 29,100,F,Black/African American,Not Span/Hispanic,...,Minor,Minor,Medical,Medicaid,N,44202.07,33912.80,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
220985,New York City,Kings,7001020,001305,Maimonides Medical Center,30 to 49,112,F,White,Not Span/Hispanic,...,Minor,Minor,Surgical,Private Health Insurance,Y,43112.04,11321.96,Medicaid,Self-Pay,
220993,New York City,Bronx,7000002,001186,North Central Bronx Hospital,30 to 49,104,F,Other Race,Spanish/Hispanic,...,Minor,Minor,Medical,Medicaid,N,15972.32,9480.18,,,
220994,New York City,Manhattan,7002024,001456,Mount Sinai Hospital,30 to 49,100,F,White,Not Span/Hispanic,...,Moderate,Minor,Surgical,Private Health Insurance,N,47626.98,15963.48,Self-Pay,,
221004,New York City,Manhattan,7002021,001454,Metropolitan Hospital Center,30 to 49,100,F,Other Race,Spanish/Hispanic,...,Minor,Minor,Medical,Medicaid,N,30544.21,27551.34,,,


In [16]:
print(uncomplicated_births['ccsr_procedure_description'].unique())

['SPONTANEOUS VAGINAL DELIVERY' 'FETAL HEART RATE MONITORING'
 'PERINEAL MUSCLE LACERATION REPAIR (2ND DEGREE OBSTETRICAL AND OTHER)'
 'CESAREAN SECTION' 'ASSISTED VAGINAL DELIVERY'
 'PERINEAL SKIN REPAIR (1ST DEGREE OBSTETRICAL AND OTHER)' 'EPISIOTOMY'
 'CERVICAL RIPENING' 'INTRAUTERINE DEVICE (IUD) INSERTION'
 'INTRAVENOUS INDUCTION OF LABOR'
 'FEMALE GENITAL TRACT REPAIR (EXCLUDING VULVA)'
 'ADMINISTRATION OF THERAPEUTIC SUBSTANCES, NEC'
 'VULVAR LACERATION REPAIR' nan
 'ANORECTAL REPAIR (3RD AND 4TH DEGREE OBSTETRICAL REPAIRS AND OTHER)'
 'PREGNANCY AND FETAL PROCEDURES, NEC' 'SALPINGECTOMY'
 'TRANSFUSION OF BLOOD AND BLOOD PRODUCTS'
 'ENDOSCOPIC CONTROL OF BLEEDING'
 'REMOVAL OF PLACENTA AND OTHER RETAINED PRODUCTS OF CONCEPTION'
 'REGIONAL ANESTHESIA' 'PHERESIS THERAPY'
 'FALLOPIAN TUBE LIGATION AND EXCISION'
 'SUBCUTANEOUS CONTRACEPTIVE IMPLANT'
 'FEMALE LOWER GENITAL TRACT EXCISION'
 'CONTROL OF BLEEDING (NON-ENDOSCOPIC)' 'VACCINATIONS'
 'FEMALE REPRODUCTIVE SYSTEM PROCEDURES, 

In [17]:
print('Lengths of stay for uncomplicated-birth-related visits:')
lens = list(uncomplicated_births['length_of_stay'].unique())
lengths_of_stay = [int(i) for i in lens]
print(lengths_of_stay)
print('MIN:', min(lengths_of_stay))
print('MAX:', max(lengths_of_stay))
print('MEDIAN:', np.median(lengths_of_stay))
print('MEAN:', np.mean(lengths_of_stay))

Lengths of stay for uncomplicated-birth-related visits:
[3, 2, 1, 4, 5, 6, 14, 11, 7, 10, 8, 9, 12, 17, 13]
MIN: 1
MAX: 17
MEDIAN: 8.0
MEAN: 8.133333333333333


In [18]:
uncomplicated_births.columns

Index(['hospital_service_area', 'hospital_county',
       'operating_certificate_number', 'permanent_facility_id',
       'facility_name', 'age_group', 'zip_code_3_digits', 'gender', 'race',
       'ethnicity', 'length_of_stay', 'type_of_admission',
       'patient_disposition', 'discharge_year', 'ccsr_diagnosis_code',
       'ccsr_diagnosis_description', 'ccsr_procedure_code',
       'ccsr_procedure_description', 'apr_drg_code', 'apr_drg_description',
       'apr_mdc_code', 'apr_mdc_description', 'apr_severity_of_illness_code',
       'apr_severity_of_illness', 'apr_risk_of_mortality',
       'apr_medical_surgical', 'payment_typology_1',
       'emergency_department_indicator', 'total_charges', 'total_costs',
       'payment_typology_2', 'payment_typology_3', 'birth_weight'],
      dtype='object')

In [19]:
uncomplicated_births['apr_risk_of_mortality'].value_counts()

apr_risk_of_mortality
Minor       18239
Moderate      495
Major          42
Extreme        10
Name: count, dtype: int64

In [20]:
planned_delivery = uncomplicated_births.loc[:, ['hospital_service_area', 'hospital_county',
       'operating_certificate_number', 'permanent_facility_id',
       'facility_name', 'age_group', 'zip_code_3_digits', 'gender', 'race',
       'ethnicity', 'payment_typology_1','payment_typology_2', 'payment_typology_3', 'length_of_stay']]
#planned_delivery = planned_delivery.reset_index()

In [21]:
planned_delivery

Unnamed: 0,hospital_service_area,hospital_county,operating_certificate_number,permanent_facility_id,facility_name,age_group,zip_code_3_digits,gender,race,ethnicity,payment_typology_1,payment_typology_2,payment_typology_3,length_of_stay
42,New York City,Manhattan,7002024,001456,Mount Sinai Hospital,30 to 49,100,F,Other Race,Not Span/Hispanic,Private Health Insurance,Self-Pay,,3
54,New York City,Queens,7003000,001626,Elmhurst Hospital Center,30 to 49,113,F,Other Race,Spanish/Hispanic,Medicaid,,,3
61,New York City,Bronx,7000008,001172,Lincoln Medical & Mental Health Center,18 to 29,100,F,Other Race,Spanish/Hispanic,Medicaid,,,2
67,New York City,Kings,7001009,001294,Coney Island Hospital,30 to 49,112,F,Other Race,Spanish/Hispanic,Medicaid,,,2
78,New York City,Manhattan,7002009,001445,Harlem Hospital Center,18 to 29,100,F,Black/African American,Not Span/Hispanic,Medicaid,,,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
220985,New York City,Kings,7001020,001305,Maimonides Medical Center,30 to 49,112,F,White,Not Span/Hispanic,Private Health Insurance,Medicaid,Self-Pay,3
220993,New York City,Bronx,7000002,001186,North Central Bronx Hospital,30 to 49,104,F,Other Race,Spanish/Hispanic,Medicaid,,,1
220994,New York City,Manhattan,7002024,001456,Mount Sinai Hospital,30 to 49,100,F,White,Not Span/Hispanic,Private Health Insurance,Self-Pay,,3
221004,New York City,Manhattan,7002021,001454,Metropolitan Hospital Center,30 to 49,100,F,Other Race,Spanish/Hispanic,Medicaid,,,2


In [22]:
planned_delivery['length_of_stay'].unique()

array(['3', '2', '1', '4', '5', '6', '14', '11', '7', '10', '8', '9',
       '12', '17', '13'], dtype=object)

In [23]:
planned_delivery.dtypes

hospital_service_area           object
hospital_county                 object
operating_certificate_number    object
permanent_facility_id           object
facility_name                   object
age_group                       object
zip_code_3_digits               object
gender                          object
race                            object
ethnicity                       object
payment_typology_1              object
payment_typology_2              object
payment_typology_3              object
length_of_stay                  object
dtype: object

In [30]:
planned_delivery['operating_certificate_number'].unique()

array(['7002024', '7003000', '7000008', '7001009', '7002009', '7001016',
       '7000006', '7002053', '7001021', '2908000', '7003010', '5901000',
       nan, '7002021', '3824000', '7002032', '7002001', '7000024',
       '2950002', '7001037', '4324000', '7000002', '7001045', '7003007',
       '3429000', '2701005', '2701001', '5904001', '7001003', '7000001',
       '0101000', '3301008', '3523000', '5263000', '5601000', '1101000',
       '5001000', '0228000', '7001020', '4601001', '5932000', '3202003',
       '0301001', '5155000', '5154000', '2950001', '7003004', '0101004',
       '7002017', '2601001', '2951001', '7004003', '0602001', '3702000',
       '3950000', '2201000', '1327000', '2527000', '5922000', '5153000',
       '7004010', '5957001', '1624000', '5151001', '3301007', '5401001',
       '4429000', '2238700', '5501001', '7001024', '2801001', '7003001',
       '0601000', '0401001', '7003003', '3102000', '7001035', '4501000',
       '3522000', '6027000', '0701000', '3201002', '70010

## Export data to csv files for later

In [24]:
#pregnancy_visits.to_csv('./data/pregnancy_related_visits.csv')

In [25]:
#uncomplicated_births.to_csv('./data/uncomplicated_delivery_visits.csv')

In [26]:
#planned_delivery.to_csv('./data/planned_deliveries.csv')