In [None]:
import numpy as np
import pandas as pd


We can use the pd.read_csv function to create a dataframe data type from the csv file.

In [None]:
df=pd.read_csv("hospital-data.csv")

df.shape gives us the number of rows and columns in the dataframe.

In [None]:
df.shape

(4826, 13)

df.size gives the total number of cells of data in the csv file. Numerically, it is equal to the multiple of the number of rows and colums.

In [None]:
df.size


62738

This gives all column names.

In [None]:
df.columns

Index(['Provider Number', 'Hospital Name', 'Address 1', 'Address 2',
       'Address 3', 'City', 'State', 'ZIP Code', 'County', 'Phone Number',
       'Hospital Type', 'Hospital Ownership', 'Emergency Services'],
      dtype='object')

df.City specifies to look for the column named "city", and df.City.value_counts() gives the number of times a certain value appears in the dataframe.

In [None]:
df.City.value_counts()


CHICAGO         27
HOUSTON         26
LOS ANGELES     21
DALLAS          19
PHILADELPHIA    17
                ..
BOGALUSA         1
THIBODAUX        1
NATCHITOCHES     1
MORGAN CITY      1
ADDISON          1
Name: City, Length: 3018, dtype: int64

df.dtypes gives the datatype of the data.

In [None]:
df.dtypes

Provider Number        object
Hospital Name          object
Address 1              object
Address 2             float64
Address 3             float64
City                   object
State                  object
ZIP Code                int64
County                 object
Phone Number            int64
Hospital Type          object
Hospital Ownership     object
Emergency Services     object
dtype: object

In [None]:
df.sample(n=10)

Unnamed: 0,Provider Number,Hospital Name,Address 1,Address 2,Address 3,City,State,ZIP Code,County,Phone Number,Hospital Type,Hospital Ownership,Emergency Services
4795,670031,ST LUKE'S PATIENTS MEDICAL CENTER,4600 EAST SAM HOUSTON PARKWAY SOUTH,,,PASADENA,TX,77505,HARRIS,2814870700,Acute Care Hospitals,Proprietary,Yes
2671,290046,SPRING VALLEY HOSPITAL MEDICAL CENTER,5400 SOUTH RAINBOW BLVD,,,LAS VEGAS,NV,89118,CLARK,7028533000,Acute Care Hospitals,Proprietary,Yes
2815,321301,SOCORRO GENERAL HOSPITAL,1202 HIGHWAY 60 WEST,,,SOCORRO,NM,87801,SOCORRO,5758351140,Critical Access Hospitals,Voluntary non-profit - Private,Yes
2925,330199,METROPOLITAN HOSPITAL CENTER,1901 FIRST AVENUE,,,NEW YORK,NY,10029,NEW YORK,2124237554,Acute Care Hospitals,Government - Local,Yes
140,30055,KINGMAN REGIONAL MEDICAL CENTER,3269 STOCKTON HILL ROAD,,,KINGMAN,AZ,86409,MOHAVE,9287572101,Acute Care Hospitals,Government - Hospital District or Authority,Yes
134,30030,PHOENIX BAPTIST HOSPITAL,2000 WEST BETHANY HOME ROAD,,,PHOENIX,AZ,85015,MARICOPA,6022490212,Acute Care Hospitals,Proprietary,No
4166,450605,CARE REGIONAL MEDICAL CENTER,1711 W WHEELER AVENUE,,,ARANSAS PASS,TX,78336,SAN PATRICIO,3617588585,Acute Care Hospitals,Proprietary,Yes
4201,450688,DALLAS REGIONAL MEDICAL CENTER,1011 NORTH GALLOWAY AVENUE,,,MESQUITE,TX,75149,DALLAS,2143207000,Acute Care Hospitals,Proprietary,Yes
2894,330151,ST JAMES MERCY HOSPITAL,411 CANISTEO STREET,,,HORNELL,NY,14843,STEUBEN,6073248000,Acute Care Hospitals,Voluntary non-profit - Church,Yes
4485,500012,YAKIMA REGIONAL MEDICAL AND CARDIAC CENTER,110 SOUTH NINTH AVE,,,YAKIMA,WA,98902,YAKIMA,5095755102,Acute Care Hospitals,Proprietary,Yes


df.sample gives random samples in the dataframe and the sample size is specified.

In [None]:
df.tail()

Unnamed: 0,Provider Number,Hospital Name,Address 1,Address 2,Address 3,City,State,ZIP Code,County,Phone Number,Hospital Type,Hospital Ownership,Emergency Services
4821,670072,"WESTBURY COMMUNITY HOSPITAL, LLC",5556 GASMER,,,HOUSTON,TX,77035,HARRIS,7134222650,Acute Care Hospitals,Proprietary,Yes
4822,670073,METHODIST HOSPITAL FOR SURGERY,17101 DALLAS PARKWAY,,,ADDISON,TX,75001,DALLAS,4692483900,Acute Care Hospitals,Proprietary,Yes
4823,670075,"ST LUKE'S HOSPITAL AT THE VINTAGE, LLC",20171 CHASEWOOD PARK DRIVE,,,HOUSTON,TX,77070,HARRIS,8325345000,Acute Care Hospitals,Proprietary,Yes
4824,670076,HERITAGE PARK SURGICAL HOSPITAL,3601 CALAIS DRIVE,,,SHERMAN,TX,75090,GRAYSON,9038133728,Acute Care Hospitals,Proprietary,Yes
4825,670077,METHODIST WEST HOUSTON HOSPITAL,18500 KATY FREEWAY,,,HOUSTON,TX,77094,HARRIS,8325221000,Acute Care Hospitals,Voluntary non-profit - Private,Yes


df.tail() gives the last five rows.

df.State.shape gives the shape of all the data under the state column.

In [None]:
df.State.shape


(4826,)

the square brackets are used here to give a new dataframe where only the column named "Emergency Services" is displayed.

In [None]:
df['Emergency Services']

0       Yes
1       Yes
2       Yes
3       Yes
4       Yes
       ... 
4821    Yes
4822    Yes
4823    Yes
4824    Yes
4825    Yes
Name: Emergency Services, Length: 4826, dtype: object

In [None]:
df['Emergency Services'].value_counts()

Yes              4572
No                231
Not Available      23
Name: Emergency Services, dtype: int64

Below is an example of generating a new dataframe by using location index, where only the columns "state" and "Hospital Type" are displayed with a random sample size of 5.

In [None]:
df.loc[:,["State","Hospital Type"]].sample(n=5)

Unnamed: 0,State,Hospital Type
1796,LA,Acute Care Hospitals
1954,MD,Acute Care Hospitals
4247,TX,Acute Care Hospitals
3039,NC,Acute Care Hospitals
3285,OH,Acute Care Hospitals


This is an example of using index location to locate the data by indexing the first 3 rows and the first 6 column.

In [None]:
df.iloc[1:4,1:7]

Unnamed: 0,Hospital Name,Address 1,Address 2,Address 3,City,State
1,MARSHALL MEDICAL CENTER SOUTH,2505 U S HIGHWAY 431 NORTH,,,BOAZ,AL
2,ELIZA COFFEE MEMORIAL HOSPITAL,205 MARENGO STREET,,,FLORENCE,AL
3,MIZELL MEMORIAL HOSPITAL,702 N MAIN ST,,,OPP,AL


In [None]:
df.loc[0:5,"State"]

0    AL
1    AL
2    AL
3    AL
4    AL
5    AL
Name: State, dtype: object

Next we can apply data filtering to only leave the data that meets the requirements described in the square brackets. The double equal sign is used here to indicate judgement of equality. This dataframe only gives the data which have North Carolina as the value for the State column.

In [None]:
df[df["State"]=="NC"]

Unnamed: 0,Provider Number,Hospital Name,Address 1,Address 2,Address 3,City,State,ZIP Code,County,Phone Number,Hospital Type,Hospital Ownership,Emergency Services
3011,340001,CAROLINAS MEDICAL CENTER-NORTHEAST,920 CHURCH ST N,,,CONCORD,NC,28025,CABARRUS,7047833000,Acute Care Hospitals,Government - Hospital District or Authority,Yes
3012,340002,MEMORIAL MISSION HOSPITAL AND ASHEVILLE SURGER...,509 BILTMORE AVE,,,ASHEVILLE,NC,28801,BUNCOMBE,8282131111,Acute Care Hospitals,Voluntary non-profit - Private,Yes
3013,340003,NORTHERN HOSPITAL OF SURRY COUNTY,830 ROCKFORD ST,,,MOUNT AIRY,NC,27030,SURRY,3367197000,Acute Care Hospitals,Government - Hospital District or Authority,Yes
3014,340004,HIGH POINT REGIONAL HOSPITAL,601 N ELM ST PO BOX HP-5,,,HIGH POINT,NC,27261,GUILFORD,3368786000,Acute Care Hospitals,Voluntary non-profit - Private,Yes
3015,340008,SCOTLAND MEMORIAL HOSPITAL,500 LAUCHWOOD DR,,,LAURINBURG,NC,28352,SCOTLAND,9102917000,Acute Care Hospitals,Voluntary non-profit - Private,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3119,341323,CHARLES A CANNON JR MEMORIAL HOSPITAL,434 HOSPITAL DRIVE,,,LINVILLE,NC,28646,AVERY,8287377000,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3120,341324,"THE OUTER BANKS HOSPITAL, INC",4800 SOUTH CROATAN HIGHWAY,,,NAGS HEAD,NC,27959,DARE,2524494500,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3121,341325,ASHE MEMORIAL HOSPITAL,200 HOSPITAL AVE,,,JEFFERSON,NC,28640,ASHE,3362467101,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3122,341326,ANGEL MEDICAL CENTER,120 RIVERVIEW ST PO BOX 1209,,,FRANKLIN,NC,28734,MACON,8285248411,Critical Access Hospitals,Voluntary non-profit - Private,Yes


Of course, we can create another independent dataframe to differentiate it from the original dataframe df.

In [None]:
NC=df[df["State"]=="NC"].copy()

In [None]:
NC

Unnamed: 0,Provider Number,Hospital Name,Address 1,Address 2,Address 3,City,State,ZIP Code,County,Phone Number,Hospital Type,Hospital Ownership,Emergency Services
3011,340001,CAROLINAS MEDICAL CENTER-NORTHEAST,920 CHURCH ST N,,,CONCORD,NC,28025,CABARRUS,7047833000,Acute Care Hospitals,Government - Hospital District or Authority,Yes
3012,340002,MEMORIAL MISSION HOSPITAL AND ASHEVILLE SURGER...,509 BILTMORE AVE,,,ASHEVILLE,NC,28801,BUNCOMBE,8282131111,Acute Care Hospitals,Voluntary non-profit - Private,Yes
3013,340003,NORTHERN HOSPITAL OF SURRY COUNTY,830 ROCKFORD ST,,,MOUNT AIRY,NC,27030,SURRY,3367197000,Acute Care Hospitals,Government - Hospital District or Authority,Yes
3014,340004,HIGH POINT REGIONAL HOSPITAL,601 N ELM ST PO BOX HP-5,,,HIGH POINT,NC,27261,GUILFORD,3368786000,Acute Care Hospitals,Voluntary non-profit - Private,Yes
3015,340008,SCOTLAND MEMORIAL HOSPITAL,500 LAUCHWOOD DR,,,LAURINBURG,NC,28352,SCOTLAND,9102917000,Acute Care Hospitals,Voluntary non-profit - Private,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3119,341323,CHARLES A CANNON JR MEMORIAL HOSPITAL,434 HOSPITAL DRIVE,,,LINVILLE,NC,28646,AVERY,8287377000,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3120,341324,"THE OUTER BANKS HOSPITAL, INC",4800 SOUTH CROATAN HIGHWAY,,,NAGS HEAD,NC,27959,DARE,2524494500,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3121,341325,ASHE MEMORIAL HOSPITAL,200 HOSPITAL AVE,,,JEFFERSON,NC,28640,ASHE,3362467101,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3122,341326,ANGEL MEDICAL CENTER,120 RIVERVIEW ST PO BOX 1209,,,FRANKLIN,NC,28734,MACON,8285248411,Critical Access Hospitals,Voluntary non-profit - Private,Yes


Now let's generate the same data for the state of South Carolina and glue the two dataframes together.

In [None]:
SC=df[df["State"]=="SC"].copy()

In [None]:
SC

Unnamed: 0,Provider Number,Hospital Name,Address 1,Address 2,Address 3,City,State,ZIP Code,County,Phone Number,Hospital Type,Hospital Ownership,Emergency Services
3770,420002,PIEDMONT MEDICAL CENTER,222 S HERLONG AVE,,,ROCK HILL,SC,29730,YORK,8033291234,Acute Care Hospitals,Proprietary,Yes
3771,420004,MUSC MEDICAL CENTER,169 ASHLEY AVE,,,CHARLESTON,SC,29425,CHARLESTON,8437922300,Acute Care Hospitals,Government - State,Yes
3772,420005,MCLEOD MEDICAL CENTER - DILLON,301 E JACKSON ST,,,DILLON,SC,29536,DILLON,8437744111,Acute Care Hospitals,Voluntary non-profit - Private,Yes
3773,420007,SPARTANBURG REGIONAL MEDICAL CENTER,101 E WOOD ST,,,SPARTANBURG,SC,29303,SPARTANBURG,8645606000,Acute Care Hospitals,Government - Hospital District or Authority,Yes
3774,420009,OCONEE MEDICAL CENTER,298 MEMORIAL DRIVE,,,SENECA,SC,29672,OCONEE,8648823351,Acute Care Hospitals,Voluntary non-profit - Private,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3828,421300,ALLENDALE COUNTY HOSPITAL,1787 ALLENDALE FAIRFAX RD PO BOX 218,,,FAIRFAX,SC,29827,ALLENDALE,8036323311,Critical Access Hospitals,Government - Local,Yes
3829,421301,ABBEVILLE AREA MEDICAL CENTER,420 THOMSON CIRCLE,,,ABBEVILLE,SC,29620,ABBEVILLE,8643665011,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3830,421302,FAIRFIELD MEMORIAL HOSPITAL,321 BYPASS PO BOX 620,,,WINNSBORO,SC,29180,FAIRFIELD,8036350233,Critical Access Hospitals,Government - Local,Yes
3831,421303,WILLIAMSBURG REGIONAL HOSPITAL,500 NELSON BOULEVARD,,,KINGSTREE,SC,29556,WILLIAMSBURG,8433558888,Critical Access Hospitals,Voluntary non-profit - Private,Yes


In [None]:
Carolina_raw=pd.concat([NC,SC],axis=0,ignore_index=True,sort=False)

We next formulate a data subset that gives us the combined data from NC and SC hospitals, providing their respecitve attributes including Hospital type and Emergency Services availability.

In [None]:
Carolina=Carolina_raw.loc[:,["State","Hospital Type","Emergency Services"]]

In [None]:
Carolina

Unnamed: 0,State,Hospital Type,Emergency Services
0,NC,Acute Care Hospitals,Yes
1,NC,Acute Care Hospitals,Yes
2,NC,Acute Care Hospitals,Yes
3,NC,Acute Care Hospitals,Yes
4,NC,Acute Care Hospitals,Yes
...,...,...,...
171,SC,Critical Access Hospitals,Yes
172,SC,Critical Access Hospitals,Yes
173,SC,Critical Access Hospitals,Yes
174,SC,Critical Access Hospitals,Yes


Now we have a dataframe only consisting of the hospital date in NC and SC.

Now let's try merging data.

In [None]:
series1=df.loc[:,["State","Hospital Type"]].sample(n=10)

In [None]:
series2=df.loc[:,["Hospital Name","Hospital Type"]].sample(n=10)

In [None]:
Merge_data=pd.merge(series1,series2)

In [None]:
Merge_data

Unnamed: 0,State,Hospital Type,Hospital Name
0,NJ,Acute Care Hospitals,MOREHOUSE GENERAL HOSPITAL
1,NJ,Acute Care Hospitals,VALLEYCARE MEDICAL CENTER
2,NJ,Acute Care Hospitals,INOVA FAIRFAX HOSPITAL
3,NJ,Acute Care Hospitals,CYPRESS POINTE SURGICAL HOSPITAL
4,NJ,Acute Care Hospitals,CLAXTON-HEPBURN MEDICAL CENTER
...,...,...,...
57,NE,Critical Access Hospitals,ASPIRUS GRAND VIEW HOSPITAL
58,IL,Critical Access Hospitals,ABRAHAM LINCOLN MEMORIAL HOSPITAL
59,IL,Critical Access Hospitals,ASPIRUS GRAND VIEW HOSPITAL
60,KS,Critical Access Hospitals,ABRAHAM LINCOLN MEMORIAL HOSPITAL


In [None]:
Merge_data.to_csv("Merge_data.csv", index=False)

In [None]:
Carolina.to_csv("Carolina.csv",index=False)