### About this file
This dataset provides information on all of the hospitals registered with Medicare in the United States, including their addresses, phone numbers, hospital type, and more

* Hospital Name: The name of the hospital. (String)
* Address: The address of the hospital. (String)
* City: The city in which the hospital is located. (String)
* State: The state in which the hospital is located. (String)
* ZIP Code: The ZIP code of the hospital. (Integer)
* County Name: The county in which the hospital is located. (String)
* Phone Number: The phone number of the hospital. (String)
* Hospital Type: The type of hospital. (String)
* Hospital Ownership: The ownership of the hospital. (String)
* Emergency Services: Whether or not the hospital has emergency services. (Boolean)
* Meets criteria for meaningful use of EHRs: Whether or not the hospital meets the criteria for meaningful use of electronic health records. (Boolean)
* Hospital overall rating: The hospital's overall rating. (Float)
* Hospital overall rating footnote: A footnote for the hospital's overall rating. (String)
* Mortality national comparison: The hospital's mortality national comparison. (String)
* Mortality national comparison footnote: A footnote for the hospital's mortality national comparison. (String)
* Safety of care national comparison: The hospital's safety of care national comparison. (String)
* Safety of care national comparison footnote: A footnote for the hospital's safety of care national comparison. (String)
* Readmission national comparison: The hospital's readmission national comparison. (String)
* Readmission national comparison footnote: A footnote for the hospital's readmission national comparison. (String)
* Patient experience national comparison: The hospital's patient experience national comparison. (String)
* Patient experience national comparison footnote: A footnote for the hospital's patient experience national comparison. (String)
* Effectiveness of care national comparison: The hospital's effectiveness of care national comparison. (String)
* Effectiveness of care national comparison footnote: A footnote for the hospital's effectiveness of care national comparison. (String)
* Timeliness of care national comparison: The hospital's timeliness of care national comparison. (String)

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

In [2]:
hospital = pd.read_csv("Hospital_General_Information.csv")
loc = pd.read_csv("hospital_locations.csv")

In [3]:
hospital.head(3)

Unnamed: 0,index,Provider ID,Hospital Name,Address,City,State,ZIP Code,County Name,Phone Number,Hospital Type,...,Readmission national comparison footnote,Patient experience national comparison,Patient experience national comparison footnote,Effectiveness of care national comparison,Effectiveness of care national comparison footnote,Timeliness of care national comparison,Timeliness of care national comparison footnote,Efficient use of medical imaging national comparison,Efficient use of medical imaging national comparison footnote,Location
0,0,10005,MARSHALL MEDICAL CENTER SOUTH,2505 U S HIGHWAY 431 NORTH,BOAZ,AL,35957,MARSHALL,2565938310,Acute Care Hospitals,...,,Same as the National average,,Same as the National average,,Above the National average,,Below the National average,,"2505 U S HIGHWAY 431 NORTH\nBOAZ, AL 35957\n"
1,1,10032,WEDOWEE HOSPITAL,209 NORTH MAIN STREET,WEDOWEE,AL,36278,RANDOLPH,2563572111,Acute Care Hospitals,...,,Not Available,Results are not available for this reporting p...,Same as the National average,,Same as the National average,,Not Available,Results are not available for this reporting p...,"209 NORTH MAIN STREET\nWEDOWEE, AL 36278\n"
2,2,10131,CRESTWOOD MEDICAL CENTER,ONE HOSPITAL DR SE,HUNTSVILLE,AL,35801,MADISON,2568823100,Acute Care Hospitals,...,,Same as the National average,,Same as the National average,,Same as the National average,,Same as the National average,,"ONE HOSPITAL DR SE\nHUNTSVILLE, AL 35801\n"


In [4]:
loc.head()

Unnamed: 0,index,X,Y,OBJECTID,ID,NAME,ADDRESS,CITY,STATE,ZIP,...,VAL_DATE,WEBSITE,STATE_ID,ALT_NAME,ST_FIPS,OWNER,TTL_STAFF,BEDS,TRAUMA,HELIPAD
0,0,-13226510.0,4049626.0,2,53391362,LOS ROBLES HOSPITAL & MEDICAL CENTER - EAST CA...,150 VIA MERIDA,WESTLAKE VILAGE,CA,91362,...,2014/02/10 00:00:00+00,http://www.losrobleshospital.com,NOT AVAILABLE,NOT AVAILABLE,6,PROPRIETARY,-999,40,NOT AVAILABLE,N
1,1,-13156200.0,4031978.0,3,11190023,EAST LOS ANGELES DOCTORS HOSPITAL,4060 WHITTIER BOULEVARD,LOS ANGELES,CA,90023,...,2014/02/10 00:00:00+00,http://www.elalax.com,NOT AVAILABLE,NOT AVAILABLE,6,PROPRIETARY,-999,127,NOT AVAILABLE,N
2,2,-13171900.0,4041752.0,4,17090028,SOUTHERN CALIFORNIA HOSPITAL AT HOLLYWOOD,6245 DE LONGPRE AVENUE,HOLLYWOOD,CA,90028,...,2014/02/10 00:00:00+00,http://sch-hollywood.com/,NOT AVAILABLE,HOLLYWOOD COMMUNITY HOSPITAL OF HOLLYWOOD,6,PROPRIETARY,-999,100,NOT AVAILABLE,N
3,3,-13132080.0,4037270.0,5,23691706,KINDRED HOSPITAL BALDWIN PARK,14148 FRANCISQUITO AVENUE,BALDWIN PARK,CA,91706,...,2014/02/10 00:00:00+00,http://www.khbaldwinpark.com,NOT AVAILABLE,NOT AVAILABLE,6,PROPRIETARY,-999,95,NOT AVAILABLE,N
4,4,-13152220.0,4009980.0,6,25190712,LAKEWOOD REGIONAL MEDICAL CENTER,3700 EAST SOUTH STREET,LAKEWOOD,CA,90712,...,2014/02/10 00:00:00+00,http://www.lakewoodregional.com,NOT AVAILABLE,NOT AVAILABLE,6,PROPRIETARY,-999,172,NOT AVAILABLE,N


In [5]:
hospital.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4818 entries, 0 to 4817
Data columns (total 30 columns):
 #   Column                                                         Non-Null Count  Dtype  
---  ------                                                         --------------  -----  
 0   index                                                          4818 non-null   int64  
 1   Provider ID                                                    4818 non-null   int64  
 2   Hospital Name                                                  4818 non-null   object 
 3   Address                                                        4818 non-null   object 
 4   City                                                           4818 non-null   object 
 5   State                                                          4818 non-null   object 
 6   ZIP Code                                                       4818 non-null   int64  
 7   County Name                                                 

In [6]:
df = hospital.copy()
df.drop(["index", "Provider ID", "Address", "Phone Number", "Meets criteria for meaningful use of EHRs", 
         "Hospital overall rating footnote", "Mortality national comparison footnote", "Safety of care national comparison footnote",
         "Readmission national comparison footnote", "Patient experience national comparison footnote", 
         "Effectiveness of care national comparison footnote", "Timeliness of care national comparison footnote",
         "Efficient use of medical imaging national comparison footnote", "Location"],
         axis=1, inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4818 entries, 0 to 4817
Data columns (total 16 columns):
 #   Column                                                Non-Null Count  Dtype  
---  ------                                                --------------  -----  
 0   Hospital Name                                         4818 non-null   object 
 1   City                                                  4818 non-null   object 
 2   State                                                 4818 non-null   object 
 3   ZIP Code                                              4818 non-null   int64  
 4   County Name                                           4803 non-null   object 
 5   Hospital Type                                         4818 non-null   object 
 6   Hospital Ownership                                    4818 non-null   object 
 7   Emergency Services                                    4818 non-null   bool   
 8   Hospital overall rating                               3648

In [7]:
loc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7623 entries, 0 to 7622
Data columns (total 35 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   index       7623 non-null   int64  
 1   X           7623 non-null   float64
 2   Y           7623 non-null   float64
 3   OBJECTID    7623 non-null   int64  
 4   ID          7623 non-null   int64  
 5   NAME        7623 non-null   object 
 6   ADDRESS     7623 non-null   object 
 7   CITY        7623 non-null   object 
 8   STATE       7623 non-null   object 
 9   ZIP         7623 non-null   int64  
 10  ZIP4        7623 non-null   object 
 11  TELEPHONE   7623 non-null   object 
 12  TYPE        7623 non-null   object 
 13  STATUS      7623 non-null   object 
 14  POPULATION  7623 non-null   int64  
 15  COUNTY      7623 non-null   object 
 16  COUNTYFIPS  7623 non-null   object 
 17  COUNTRY     7623 non-null   object 
 18  LATITUDE    7623 non-null   float64
 19  LONGITUDE   7623 non-null  

In [8]:
loc.drop(["index", "X", "Y", "OBJECTID", "ID", "ADDRESS", "TELEPHONE", "COUNTYFIPS", "COUNTRY", "LATITUDE", "LONGITUDE", "SOURCE", 
         "SOURCEDATE", "VAL_METHOD", "VAL_DATE","WEBSITE", "STATE_ID", "ALT_NAME"], axis=1, inplace=True)
loc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7623 entries, 0 to 7622
Data columns (total 17 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   NAME        7623 non-null   object
 1   CITY        7623 non-null   object
 2   STATE       7623 non-null   object
 3   ZIP         7623 non-null   int64 
 4   ZIP4        7623 non-null   object
 5   TYPE        7623 non-null   object
 6   STATUS      7623 non-null   object
 7   POPULATION  7623 non-null   int64 
 8   COUNTY      7623 non-null   object
 9   NAICS_CODE  7623 non-null   int64 
 10  NAICS_DESC  7623 non-null   object
 11  ST_FIPS     7623 non-null   int64 
 12  OWNER       7623 non-null   object
 13  TTL_STAFF   7623 non-null   int64 
 14  BEDS        7623 non-null   int64 
 15  TRAUMA      7623 non-null   object
 16  HELIPAD     7623 non-null   object
dtypes: int64(6), object(11)
memory usage: 1012.6+ KB


In [9]:
df.head(3)

Unnamed: 0,Hospital Name,City,State,ZIP Code,County Name,Hospital Type,Hospital Ownership,Emergency Services,Hospital overall rating,Mortality national comparison,Safety of care national comparison,Readmission national comparison,Patient experience national comparison,Effectiveness of care national comparison,Timeliness of care national comparison,Efficient use of medical imaging national comparison
0,MARSHALL MEDICAL CENTER SOUTH,BOAZ,AL,35957,MARSHALL,Acute Care Hospitals,Government - Hospital District or Authority,True,3.0,Below the National average,Same as the National average,Above the National average,Same as the National average,Same as the National average,Above the National average,Below the National average
1,WEDOWEE HOSPITAL,WEDOWEE,AL,36278,RANDOLPH,Acute Care Hospitals,Government - Hospital District or Authority,True,4.0,Same as the National average,Not Available,Same as the National average,Not Available,Same as the National average,Same as the National average,Not Available
2,CRESTWOOD MEDICAL CENTER,HUNTSVILLE,AL,35801,MADISON,Acute Care Hospitals,Proprietary,True,3.0,Below the National average,Above the National average,Same as the National average,Same as the National average,Same as the National average,Same as the National average,Same as the National average


In [10]:
loc.head(3)

Unnamed: 0,NAME,CITY,STATE,ZIP,ZIP4,TYPE,STATUS,POPULATION,COUNTY,NAICS_CODE,NAICS_DESC,ST_FIPS,OWNER,TTL_STAFF,BEDS,TRAUMA,HELIPAD
0,LOS ROBLES HOSPITAL & MEDICAL CENTER - EAST CA...,WESTLAKE VILAGE,CA,91362,NOT AVAILABLE,GENERAL ACUTE CARE,OPEN,40,VENTURA,622110,GENERAL MEDICAL AND SURGICAL HOSPITALS,6,PROPRIETARY,-999,40,NOT AVAILABLE,N
1,EAST LOS ANGELES DOCTORS HOSPITAL,LOS ANGELES,CA,90023,NOT AVAILABLE,GENERAL ACUTE CARE,OPEN,127,LOS ANGELES,622110,GENERAL MEDICAL AND SURGICAL HOSPITALS,6,PROPRIETARY,-999,127,NOT AVAILABLE,N
2,SOUTHERN CALIFORNIA HOSPITAL AT HOLLYWOOD,HOLLYWOOD,CA,90028,NOT AVAILABLE,GENERAL ACUTE CARE,OPEN,100,LOS ANGELES,622110,GENERAL MEDICAL AND SURGICAL HOSPITALS,6,PROPRIETARY,-999,100,NOT AVAILABLE,N


In [12]:
# Checking for nulls
df.isnull().sum().sort_values(ascending=False)

Hospital overall rating                                 1170
County Name                                               15
Hospital Name                                              0
City                                                       0
State                                                      0
ZIP Code                                                   0
Hospital Type                                              0
Hospital Ownership                                         0
Emergency Services                                         0
Mortality national comparison                              0
Safety of care national comparison                         0
Readmission national comparison                            0
Patient experience national comparison                     0
Effectiveness of care national comparison                  0
Timeliness of care national comparison                     0
Efficient use of medical imaging national comparison       0
dtype: int64

In [11]:
# Unique Values
obj_col = df.columns
for col in obj_col:
    print("{}: {}".format(col,len(df[col].unique())))

Hospital Name: 4617
City: 2949
State: 56
ZIP Code: 4419
County Name: 1566
Hospital Type: 3
Hospital Ownership: 10
Emergency Services: 2
Hospital overall rating: 6
Mortality national comparison: 4
Safety of care national comparison: 4
Readmission national comparison: 4
Patient experience national comparison: 4
Effectiveness of care national comparison: 4
Timeliness of care national comparison: 4
Efficient use of medical imaging national comparison: 4


In [13]:
#removing blank spaces from column names
df.columns = df.columns.str.replace(' ','_')
df.head()

Unnamed: 0,Hospital_Name,City,State,ZIP_Code,County_Name,Hospital_Type,Hospital_Ownership,Emergency_Services,Hospital_overall_rating,Mortality_national_comparison,Safety_of_care_national_comparison,Readmission_national_comparison,Patient_experience_national_comparison,Effectiveness_of_care_national_comparison,Timeliness_of_care_national_comparison,Efficient_use_of_medical_imaging_national_comparison
0,MARSHALL MEDICAL CENTER SOUTH,BOAZ,AL,35957,MARSHALL,Acute Care Hospitals,Government - Hospital District or Authority,True,3.0,Below the National average,Same as the National average,Above the National average,Same as the National average,Same as the National average,Above the National average,Below the National average
1,WEDOWEE HOSPITAL,WEDOWEE,AL,36278,RANDOLPH,Acute Care Hospitals,Government - Hospital District or Authority,True,4.0,Same as the National average,Not Available,Same as the National average,Not Available,Same as the National average,Same as the National average,Not Available
2,CRESTWOOD MEDICAL CENTER,HUNTSVILLE,AL,35801,MADISON,Acute Care Hospitals,Proprietary,True,3.0,Below the National average,Above the National average,Same as the National average,Same as the National average,Same as the National average,Same as the National average,Same as the National average
3,PROVIDENCE ALASKA MEDICAL CENTER,ANCHORAGE,AK,99508,ANCHORAGE,Acute Care Hospitals,Voluntary non-profit - Church,True,3.0,Same as the National average,Below the National average,Above the National average,Below the National average,Below the National average,Below the National average,Same as the National average
4,YUKON KUSKOKWIM DELTA REG HOSPITAL,BETHEL,AK,99559,BETHEL,Acute Care Hospitals,Tribal,True,3.0,Same as the National average,Not Available,Same as the National average,Below the National average,Below the National average,Not Available,Not Available


In [14]:
df_median = df.copy()

df_median['Hospital_overall_rating'].fillna(df_median['Hospital_overall_rating'].median(), inplace=True)

In [15]:
df_median["Hospital_overall_rating"].value_counts()

3.0    2942
4.0     964
2.0     684
1.0     117
5.0     111
Name: Hospital_overall_rating, dtype: int64

In [16]:
df["Hospital_overall_rating"].value_counts()

3.0    1772
4.0     964
2.0     684
1.0     117
5.0     111
Name: Hospital_overall_rating, dtype: int64

In [17]:
df_linear = df.copy()

df_linear = df_linear.interpolate(method="linear")

In [18]:
# Checking for nulls
df_linear.isnull().sum().sort_values(ascending=False)

County_Name                                             15
Hospital_Name                                            0
City                                                     0
State                                                    0
ZIP_Code                                                 0
Hospital_Type                                            0
Hospital_Ownership                                       0
Emergency_Services                                       0
Hospital_overall_rating                                  0
Mortality_national_comparison                            0
Safety_of_care_national_comparison                       0
Readmission_national_comparison                          0
Patient_experience_national_comparison                   0
Effectiveness_of_care_national_comparison                0
Timeliness_of_care_national_comparison                   0
Efficient_use_of_medical_imaging_national_comparison     0
dtype: int64

In [19]:
df_linear["Hospital_overall_rating"].value_counts()

3.000000    2101
4.000000    1071
2.000000     789
3.500000     186
2.500000     135
1.000000     120
5.000000     113
3.333333      50
3.666667      47
2.666667      34
2.333333      32
3.250000      15
3.750000      14
4.500000      10
3.800000       8
3.200000       8
3.600000       8
3.400000       7
2.250000       5
2.750000       5
1.500000       5
2.666667       4
3.333333       4
1.666667       3
1.666667       3
2.600000       3
2.200000       3
4.333333       3
3.166667       2
2.400000       2
3.833333       2
2.800000       2
1.333333       2
1.750000       2
4.666667       1
1.333333       1
1.400000       1
1.800000       1
2.333333       1
2.166667       1
2.833333       1
2.545455       1
3.666667       1
2.636364       1
1.250000       1
2.090909       1
2.181818       1
2.272727       1
2.363636       1
2.454545       1
2.909091       1
2.818182       1
2.727273       1
3.400000       1
Name: Hospital_overall_rating, dtype: int64

In [21]:
from sklearn.impute import KNNImputer

df_knn = df.copy(deep=True)
knn_imputer = KNNImputer(n_neighbors=2, weights="uniform")
df_knn['Hospital_overall_rating'] = knn_imputer.fit_transform(df_knn[['Hospital_overall_rating']])

In [22]:
df_knn["Hospital_overall_rating"].value_counts()

3.000000    1772
3.073465    1170
4.000000     964
2.000000     684
1.000000     117
5.000000     111
Name: Hospital_overall_rating, dtype: int64