# teaching hospital information

In [21]:
import pandas as pd
import numpy as np

In [13]:
# load excel data from data/03
teaching = pd.read_excel(
    "../data/hospital_level_info//03_teaching_hospital_info.xlsx",
    index_col=0,
)

In [15]:
teaching.head()

Unnamed: 0_level_0,CCN,TIN,Hospital Name,PECOS Legal Business Name,Street Address,PO Box,City,State,Zip Code,Street Address 1,Street Address 2,City.1,State.1,Zip Code.1
ROW,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
7,10001,636004476,Southeast Health Medical Center,Houston County Healthcare Authority,1108 Ross Clark Circle,6987.0,Dothan,AL,36301,1108 Ross Clark Circle,,Dothan,AL,36301
8,10011,630578923,St. Vincents East,St. Vincent'S East,50 Medical Park Drive East,,Birmingham,AL,35235,50 Medical Park Dr E,"Building 46, Suite 300, Finance",Birmingham,AL,35235
9,10018,461468253,Callahan Eye Foundation Hosp,Uab Callahan Eye Hospital Authority,1720 University Boulevard,,Birmingham,AL,35233,1720 University Blvd,Suite 500,Birmingham,AL,35233
10,10023,203204949,Baptist Medical Center South,"The Health Care Authority For Baptist Health, ...",2105 East South Boulevard,11010.0,Montgomery,AL,36116,2105 E South Blvd,,Montgomery,AL,36116
11,10033,636005396,University Of Alabama Hospital,University Of Alabama At Birmingham,619 South 19th Street,,Birmingham,AL,35233,619 19th St S,,Birmingham,AL,35249


Description of the data:
- CCN: int, CMS control number, CMS is the Center for Medicare-Medicaid Services (CMS)
- TIN: int, Taxpayer identification number
- Hospital Name: str, Name of a hospital
- PECOS Legal Business Name: str, a hospital's registered name at IRS
- Street Address: str, Address as appears on hospital cost report, This address should be understood as the address of the office of the hospital. For a chained hospital or a hospital whose office is different from the actual location of the hospital, this address might not match with the hospital's business address.
- PO Box: int, Address as appears on hospital cost report
- City: str, Address as appears on hospital cost report
- State: str, Address as appears on hospital cost report
- Zip code: int, Address as appears on hospital cost report, since the zip code is as an int, preceding zeros are removed. 
- Street Address 1/2, City.1, State.1, Zip Code.1: str,  NPPES Business Address where NPPES stands for National Plan and Provider Enumeration System. It is a system developed by CMS. I think this address better matches the location of a hospital: when these two addresses differ, this address is always the same as the google map.

In [55]:
# visualize describe teaching
teaching.describe()

Unnamed: 0,CCN,TIN,Zip Code,Zip Code.1
count,1344.0,1344.0,1344.0,1344.0
mean,256999.560268,495638500.0,47276.016369,47314.921875
std,148124.704453,268841400.0,29586.088855,29615.7686
min,10001.0,10211490.0,680.0,674.0
25%,110194.25,271398700.0,19567.25,19568.75
50%,260069.5,462036800.0,45407.5,45419.0
75%,371155.75,730619000.0,74109.75,74128.5
max,670034.0,990274000.0,99519.0,99508.0


Zip code seems to be at the right range.

In [52]:
# preprocess the missing rate of each column
percent_missing = teaching.isna().sum() / len(teaching)
missing_value_df = pd.DataFrame({'column_name': teaching.columns,
                                 'missing_rate': percent_missing})

In [53]:
# visualize missing_rate
missing_value_df

Unnamed: 0,column_name,missing_rate
CCN,CCN,0.0
TIN,TIN,0.0
Hospital Name,Hospital Name,0.0
PECOS Legal Business Name,PECOS Legal Business Name,0.0
Street Address,Street Address,0.0
PO Box,PO Box,0.890625
City,City,0.000744
State,State,0.0
Zip Code,Zip Code,0.0
Street Address 1,Street Address 1,0.0


Observation:
- CCN, TIN, Hospital Name, Legal Business Name do not have missing values.
- Zip code and Stree Address do not have missing values.
- City has missing values and different missing rates in different addresses.

In [40]:
# visualize the rows that zip code != zip code.1.
teaching[teaching['Zip Code'] != teaching['Zip Code.1']].head()

Unnamed: 0_level_0,CCN,TIN,Hospital Name,PECOS Legal Business Name,Street Address,PO Box,City,State,Zip Code,Street Address 1,Street Address 2,City.1,State.1,Zip Code.1
ROW,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
11,10033,636005396,University Of Alabama Hospital,University Of Alabama At Birmingham,619 South 19th Street,,Birmingham,AL,35233,619 19th St S,,Birmingham,AL,35249
18,10104,203391873,Grandview Medical Center,Affinity Hospital Llc,800 Montclair Road,,Birmingham,AL,35213,3690 Grandview Pkwy,,Birmingham,AL,35243
19,10113,630288856,Mobile Infirmary Medical Center,Mobile Infirmary Association,Five Mobile Infirmary Circle,2144.0,Mobile,AL,36652,3 Mobile Infirmary Cir,,Mobile,AL,36607
26,20001,920016429,Providence Alaska Medical Center,Providence Health & Services Washington,3200 Providence Drive,196604.0,Anchorage,AK,99519,3200 Providence Dr,,Anchorage,AK,99508
37,30055,942916102,Kingman Regional Medical Center,Kingman Hospital Inc,3269 Stockton Hill Road,,Kingman,AZ,86401,3269 N Stockton Hill Rd,,Kingman,AZ,86409


In [44]:
# visualize the rows that city != city.1.
teaching[teaching['City'] != teaching['City.1']].head()

Unnamed: 0_level_0,CCN,TIN,Hospital Name,PECOS Legal Business Name,Street Address,PO Box,City,State,Zip Code,Street Address 1,Street Address 2,City.1,State.1,Zip Code.1
ROW,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
94,50139,941105628,Kfh - Downey,Kaiser Foundation Hospitals,9400 Rosecrans Avenue,,Bellflower,CA,90706,9333 Imperial Hwy,,Downey,CA,90242
171,50717,956000927,Rancho Los Amigos Natl.Rehab.Ctr.,County Of Los Angeles,7601 E. Imperial Highway,,Downey,CA,90242,7601 Imperial Hwy,Ssa Room 2208 Downey,,CA,90242
183,53302,951690977,Childrens Hospital Los Angeles,Children'S Hospital Los Angeles,4650 Sunset Boulevard,MAIL STOP #21 Los Angeles,,CA,90027,4650 W Sunset Blvd,,Los Angeles,CA,90027
192,60010,841262971,Poudre Valley Hospital,Poudre Valley Health Care Inc,1024 S. Lemay Avenue,,Ft. Collins,CO,80524,2127 E Harmony Rd Ste 100,,Fort Collins,CO,80528
218,70006,60646917,The Stamford Hospital,Stamford Hospital,One Hospital Plaza,9317,Stamford Ct 06904,CT,6904,30 Shelburne Rd,,Stamford,CT,6904


In [42]:
# visualize the rows that state != state.1.
teaching[teaching['State'] != teaching['State.1']]

Unnamed: 0_level_0,CCN,TIN,Hospital Name,PECOS Legal Business Name,Street Address,PO Box,City,State,Zip Code,Street Address 1,Street Address 2,City.1,State.1,Zip Code.1
ROW,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1


Observation: all the hospitals' state information in two addresses are the same.

In [48]:
# visualize the rows whose city is missing
teaching[teaching['City'].isna()]

Unnamed: 0_level_0,CCN,TIN,Hospital Name,PECOS Legal Business Name,Street Address,PO Box,City,State,Zip Code,Street Address 1,Street Address 2,City.1,State.1,Zip Code.1
ROW,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
183,53302,951690977,Childrens Hospital Los Angeles,Children'S Hospital Los Angeles,4650 Sunset Boulevard,MAIL STOP #21 Los Angeles,,CA,90027,4650 W Sunset Blvd,,Los Angeles,CA,90027


Observation: There is only one case where the city is missing and by looking at this case, it turns out to be the city goes to the PO box column.

In [49]:
# visualize the rows whose city is missing
teaching[teaching['City.1'].isna()]

Unnamed: 0_level_0,CCN,TIN,Hospital Name,PECOS Legal Business Name,Street Address,PO Box,City,State,Zip Code,Street Address 1,Street Address 2,City.1,State.1,Zip Code.1
ROW,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
171,50717,956000927,Rancho Los Amigos Natl.Rehab.Ctr.,County Of Los Angeles,7601 E. Imperial Highway,,Downey,CA,90242,7601 Imperial Hwy,Ssa Room 2208 Downey,,CA,90242
760,310074,222783298,Jersey City Medical Center,Jersey City Medical Center,355 Grand Street,,Jersey City,NJ,7302,355 Grand St,Executive Office Jersey City,,NJ,7302
1198,450133,751584559,Midland Memorial Hospital,Midland County Hospital District,400 Rosalind Redfern Grover Parkway,,Midland,TX,79701,2200 W Illinois Ave,Business Office Midland,,TX,79701
1335,520098,391835630,University Of Wi Hospitals & Clinics,University Of Wisconsin Hospitals And Clinics ...,600 Highland Ave,,Madison,WI,53792,600 Highland Ave,Compliance Mail Code 2433 Madison,,WI,53792


Observation: There are only four cases where the city.1 is missing and by looking at those cases, city.1 can be found under street address.2.

In [64]:
# visualize find hospitals with the same name
teaching[teaching.duplicated(subset='Hospital Name', keep=False)].sort_values('Hospital Name').head()

Unnamed: 0_level_0,CCN,TIN,Hospital Name,PECOS Legal Business Name,Street Address,PO Box,City,State,Zip Code,Street Address 1,Street Address 2,City.1,State.1,Zip Code.1
ROW,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
249,100002,592447554,Bethesda Hospital,Bethesda Hospital Inc.,2815 S. Seacrest Blvd,,Boynton Beach,FL,33435,2815 S Seacrest Blvd,,Boynton Beach,FL,33435
981,360179,310537122,Bethesda Hospital,Bethesda Hospital Inc.,619 Oak Street,,Cincinnati,OH,45206,10500 Montgomery Rd,,Cincinnati,OH,45242
518,193300,720467503,Childrens Hospital,Childrens Hospital Inc,200 Henry Clay Avenue,,New Orleans,LA,70118,200 Henry Clay Ave,,New Orleans,LA,70118
246,93300,530196580,Childrens Hospital,Childrens Hospital,111 Michigan Ave Nw,,Washington,DC,20010,111 Michigan Ave Nw,,Washington,DC,20010
996,363303,340714357,Childrens Hospital Medical Center,Children'S Hospital Medical Center Of Akron,1 Perkins Square,,Akron,OH,44308,1 Perkins Sq,,Akron,OH,44308


In [68]:
# visualize show hospitals with the same TIN
teaching[teaching.duplicated(subset='TIN', keep=False)].sort_values('TIN').head()

Unnamed: 0_level_0,CCN,TIN,Hospital Name,PECOS Legal Business Name,Street Address,PO Box,City,State,Zip Code,Street Address 1,Street Address 2,City.1,State.1,Zip Code.1
ROW,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
520,200019,10238552,Southern Maine Health Care,Mainehealth,1 Medical Center Drive,626.0,Biddeford,ME,4005,One Medical Center Drive,,Biddeford,ME,4005
519,200009,10238552,Maine Medical Center,Mainehealth,22 Bramhall Street,,Portland,ME,4102,335 Brighton Ave,Suite 200,Portland,ME,4102
233,70036,66000798,John Dempsey Hospital,State Of Connecticut,263 Farmington Avenue,,Farmington,CT,6030,263 Farmington Avenue,,Farmington,CT,6030
235,74011,66000798,Connecticut Mental Health Center,State Of Connecticut,34 Park Street,351.0,New Haven,CT,6519,34 Park Street,,New Haven,CT,6790
803,330080,132655001,Lincoln Medical&mental Health Center,New York City Health And Hospitals Corporation,234 East 149th Street,,Bronx,NY,10451,234 E 149th St Rm 2a1,,Bronx,NY,10451


In [66]:
# visualize show hospitals with the same CCN
teaching[teaching.duplicated(subset='CCN', keep=False)].sort_values('CCN').head()

Unnamed: 0_level_0,CCN,TIN,Hospital Name,PECOS Legal Business Name,Street Address,PO Box,City,State,Zip Code,Street Address 1,Street Address 2,City.1,State.1,Zip Code.1
ROW,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1


Observation: CCN is different for each row.

In [74]:
# visualize show hospitals with the same CCN
teaching[teaching.duplicated(subset=['Street Address 1', 'Street Address 2'], keep=False)].sort_values(['Street Address 1', 'Street Address 2'])

Unnamed: 0_level_0,CCN,TIN,Hospital Name,PECOS Legal Business Name,Street Address,PO Box,City,State,Zip Code,Street Address 1,Street Address 2,City.1,State.1,Zip Code.1
ROW,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
970,360137,341567805,Uh Cleveland Medical Center,University Hospitals Cleveland Medical Center,11100 Euclid Avenue,,Cleveland,OH,44106,11100 Euclid Ave,,Cleveland,OH,44106
995,363302,341567805,Rainbow Babies & Childrens Hospital,University Hospitals Cleveland Medical Center,11100 Euclid Avenue,,Cleveland,OH,44106,11100 Euclid Ave,,Cleveland,OH,44106
401,140180,362235165,Presence Saints Mary & Elizabeth Med,Presence Chicago Hospitals Network,2233 West Division Street,,Chicago,IL,60622,1127 N Oakley Blvd,4th Floor,Chicago,IL,60622
409,140224,362235165,Presence Saint Joseph Hosp-Chicago,Presence Chicago Hospitals Network,2900 North Lake Shore Drive,,Chicago,IL,60657,1127 N Oakley Blvd,4th Floor,Chicago,IL,60622
252,100009,592616017,University Of Miami Hospital,University Of Miami,1400 Nw 12th Avenue,,Miami,FL,33136,1475 Nw 12th Ave,,Miami,FL,33136
269,100079,592616017,University Of Miami Hosp & Clinics,University Of Miami,1475 N.W. 12th Avenue,,Miami,FL,33136,1475 Nw 12th Ave,,Miami,FL,33136
299,100240,592616017,Anne Bates Leach Eye Hospital,University Of Miami,900 Nw 17th Street,,Miami,FL,33136,1475 Nw 12th Ave,,Miami,FL,33136
116,50277,330730569,Pacific Hospital Of Long Beach,Healthsmart Pacific Inc,2776 Pacific Avenue,,Long Beach,CA,90801,2776 Pacific Ave,,Long Beach,CA,90806
179,50776,462760935,College Medical Center,Chlb Llc,2776 Pacific Ave,,Long Beach,CA,90801,2776 Pacific Ave,,Long Beach,CA,90806
143,50485,953527031,Long Beach Memorial Medical Center,Long Beach Memorial Medical,2801 Atlantic Avenue,,Long Beach,CA,90806,2801 Atlantic Ave,,Long Beach,CA,90806


Observation: 19 hospitals share the same business location. By looking at the result, it turns out that some hospitals are special hospitals and some are general hospitals. 

In [75]:
# preprocess find the hospitals with word children in it
children_hospital = teaching[teaching['Hospital Name'].contains('children')]

AttributeError: 'Series' object has no attribute 'contains'