In [None]:
import numpy as np
import pandas as pd


We can use the pd.read_csv function to create a dataframe data type from the csv file.

In [None]:
df=pd.read_csv("hospital-data.csv")

df.shape gives us the number of rows and columns in the dataframe.

In [None]:
df.shape

(4826, 13)

df.size gives the total number of cells of data in the csv file. Numerically, it is equal to the multiple of the number of rows and colums.

In [None]:
df.size


62738

This gives all column names.

In [None]:
df.columns

Index(['Provider Number', 'Hospital Name', 'Address 1', 'Address 2',
       'Address 3', 'City', 'State', 'ZIP Code', 'County', 'Phone Number',
       'Hospital Type', 'Hospital Ownership', 'Emergency Services'],
      dtype='object')

df.City specifies to look for the column named "city", and df.City.value_counts() gives the number of times a certain value appears in the dataframe.

In [None]:
df.City.value_counts()


CHICAGO         27
HOUSTON         26
LOS ANGELES     21
DALLAS          19
PHILADELPHIA    17
                ..
BOGALUSA         1
THIBODAUX        1
NATCHITOCHES     1
MORGAN CITY      1
ADDISON          1
Name: City, Length: 3018, dtype: int64

df.dtypes gives the datatype of the data.

In [None]:
df.dtypes

Provider Number        object
Hospital Name          object
Address 1              object
Address 2             float64
Address 3             float64
City                   object
State                  object
ZIP Code                int64
County                 object
Phone Number            int64
Hospital Type          object
Hospital Ownership     object
Emergency Services     object
dtype: object

In [None]:
df.sample(n=10)

Unnamed: 0,Provider Number,Hospital Name,Address 1,Address 2,Address 3,City,State,ZIP Code,County,Phone Number,Hospital Type,Hospital Ownership,Emergency Services
2900,33015F,VA HUDSON VALLEY HEALTHCARE SYSTEM,2094 ALBANY POST ROAD,,,MONTROSE,NY,10548,WESTCHESTER,9147374400,ACUTE CARE - VETERANS ADMINISTRATION,Government Federal,Yes
2324,250044,BAPTIST MEMORIAL HOSPITAL BOONEVILLE,100 HOSPITAL DRIVE,,,BOONEVILLE,MS,38829,PRENTISS,6627205000,Acute Care Hospitals,Voluntary non-profit - Church,Yes
3435,370235,ST JOHN BROKEN ARROW,1000 WEST BOISE CIRCLE,,,BROKEN ARROW,OK,74012,TULSA,9189948199,Acute Care Hospitals,Proprietary,Yes
2899,330159,COMMUNITY-GENERAL HOSPITAL OF GREATER SYRACUSE,4900 BROAD ROAD,,,SYRACUSE,NY,13215,ONONDAGA,3154925011,Acute Care Hospitals,Voluntary non-profit-Private,Not Available
1623,171331,KIOWA DISTRICT HOSPITAL,810 DRUMM STREET,,,KIOWA,KS,67070,BARBER,6208254131,Critical Access Hospitals,Government - Hospital District or Authority,Yes
222,040047,FIVE RIVERS MEDICAL CENTER,2801 MEDICAL CENTER DRIVE,,,POCAHONTAS,AR,72455,RANDOLPH,8708926000,Acute Care Hospitals,Government - Local,Yes
3180,360017,GRANT MEDICAL CENTER,111 SOUTH GRANT AVENUE,,,COLUMBUS,OH,43215,FRANKLIN,6145669978,Acute Care Hospitals,Voluntary non-profit - Church,Yes
2882,330125,ROCHESTER GENERAL HOSPITAL,1425 PORTLAND AVENUE,,,ROCHESTER,NY,14621,MONROE,5859224000,Acute Care Hospitals,Voluntary non-profit - Private,Yes
3871,431326,MILBANK AREA HOSPITAL/AVERA HEALTH - CAH,901 E VIRGIL AVE,,,MILBANK,SD,57252,GRANT,6054324538,Critical Access Hospitals,Voluntary non-profit - Private,Yes
1954,210055,LAUREL REGIONAL MEDICAL CENTER,7300 VAN DUSEN ROAD,,,LAUREL,MD,20707,PRINCE GEORGES,3014977953,Acute Care Hospitals,Voluntary non-profit - Private,Yes


df.sample gives random samples in the dataframe and the sample size is specified.

In [None]:
df.tail()

Unnamed: 0,Provider Number,Hospital Name,Address 1,Address 2,Address 3,City,State,ZIP Code,County,Phone Number,Hospital Type,Hospital Ownership,Emergency Services
4821,670072,"WESTBURY COMMUNITY HOSPITAL, LLC",5556 GASMER,,,HOUSTON,TX,77035,HARRIS,7134222650,Acute Care Hospitals,Proprietary,Yes
4822,670073,METHODIST HOSPITAL FOR SURGERY,17101 DALLAS PARKWAY,,,ADDISON,TX,75001,DALLAS,4692483900,Acute Care Hospitals,Proprietary,Yes
4823,670075,"ST LUKE'S HOSPITAL AT THE VINTAGE, LLC",20171 CHASEWOOD PARK DRIVE,,,HOUSTON,TX,77070,HARRIS,8325345000,Acute Care Hospitals,Proprietary,Yes
4824,670076,HERITAGE PARK SURGICAL HOSPITAL,3601 CALAIS DRIVE,,,SHERMAN,TX,75090,GRAYSON,9038133728,Acute Care Hospitals,Proprietary,Yes
4825,670077,METHODIST WEST HOUSTON HOSPITAL,18500 KATY FREEWAY,,,HOUSTON,TX,77094,HARRIS,8325221000,Acute Care Hospitals,Voluntary non-profit - Private,Yes


df.tail() gives the last five rows.

df.State.shape gives the shape of all the data under the state column.

In [None]:
df.State.shape


(4826,)

the square brackets are used here to give a new dataframe where only the column named "Emergency Services" is displayed.

In [None]:
df['Emergency Services']

0       Yes
1       Yes
2       Yes
3       Yes
4       Yes
       ... 
4821    Yes
4822    Yes
4823    Yes
4824    Yes
4825    Yes
Name: Emergency Services, Length: 4826, dtype: object

In [None]:
df['Emergency Services'].value_counts()

Yes              4572
No                231
Not Available      23
Name: Emergency Services, dtype: int64

Below is an example of generating a new dataframe by using location index, where only the columns "state" and "Hospital Type" are displayed with a random sample size of 5.

In [None]:
df.loc[:,["State","Hospital Type"]].sample(n=5)

Unnamed: 0,State,Hospital Type
1291,IL,Critical Access Hospitals
2492,MO,Critical Access Hospitals
794,FL,Acute Care Hospitals
3292,OH,Acute Care Hospitals
1480,IA,Critical Access Hospitals


This is an example of using index location to locate the data by indexing the first 3 rows and the first 6 column.

In [None]:
df.iloc[1:4,1:7]

Unnamed: 0,Hospital Name,Address 1,Address 2,Address 3,City,State
1,MARSHALL MEDICAL CENTER SOUTH,2505 U S HIGHWAY 431 NORTH,,,BOAZ,AL
2,ELIZA COFFEE MEMORIAL HOSPITAL,205 MARENGO STREET,,,FLORENCE,AL
3,MIZELL MEMORIAL HOSPITAL,702 N MAIN ST,,,OPP,AL


In [None]:
df.loc[0:5,"State"]

0    AL
1    AL
2    AL
3    AL
4    AL
5    AL
Name: State, dtype: object

Next we can apply data filtering to only leave the data that meets the requirements described in the square brackets. The double equal sign is used here to indicate judgement of equality. This dataframe only gives the data which have North Carolina as the value for the State column.

In [None]:
df[df["State"]=="NC"]

Unnamed: 0,Provider Number,Hospital Name,Address 1,Address 2,Address 3,City,State,ZIP Code,County,Phone Number,Hospital Type,Hospital Ownership,Emergency Services
3011,340001,CAROLINAS MEDICAL CENTER-NORTHEAST,920 CHURCH ST N,,,CONCORD,NC,28025,CABARRUS,7047833000,Acute Care Hospitals,Government - Hospital District or Authority,Yes
3012,340002,MEMORIAL MISSION HOSPITAL AND ASHEVILLE SURGER...,509 BILTMORE AVE,,,ASHEVILLE,NC,28801,BUNCOMBE,8282131111,Acute Care Hospitals,Voluntary non-profit - Private,Yes
3013,340003,NORTHERN HOSPITAL OF SURRY COUNTY,830 ROCKFORD ST,,,MOUNT AIRY,NC,27030,SURRY,3367197000,Acute Care Hospitals,Government - Hospital District or Authority,Yes
3014,340004,HIGH POINT REGIONAL HOSPITAL,601 N ELM ST PO BOX HP-5,,,HIGH POINT,NC,27261,GUILFORD,3368786000,Acute Care Hospitals,Voluntary non-profit - Private,Yes
3015,340008,SCOTLAND MEMORIAL HOSPITAL,500 LAUCHWOOD DR,,,LAURINBURG,NC,28352,SCOTLAND,9102917000,Acute Care Hospitals,Voluntary non-profit - Private,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3119,341323,CHARLES A CANNON JR MEMORIAL HOSPITAL,434 HOSPITAL DRIVE,,,LINVILLE,NC,28646,AVERY,8287377000,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3120,341324,"THE OUTER BANKS HOSPITAL, INC",4800 SOUTH CROATAN HIGHWAY,,,NAGS HEAD,NC,27959,DARE,2524494500,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3121,341325,ASHE MEMORIAL HOSPITAL,200 HOSPITAL AVE,,,JEFFERSON,NC,28640,ASHE,3362467101,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3122,341326,ANGEL MEDICAL CENTER,120 RIVERVIEW ST PO BOX 1209,,,FRANKLIN,NC,28734,MACON,8285248411,Critical Access Hospitals,Voluntary non-profit - Private,Yes


Of course, we can create another independent dataframe to differentiate it from the original dataframe df.

In [None]:
NC=df[df["State"]=="NC"].copy()

In [None]:
NC

Unnamed: 0,Provider Number,Hospital Name,Address 1,Address 2,Address 3,City,State,ZIP Code,County,Phone Number,Hospital Type,Hospital Ownership,Emergency Services
3011,340001,CAROLINAS MEDICAL CENTER-NORTHEAST,920 CHURCH ST N,,,CONCORD,NC,28025,CABARRUS,7047833000,Acute Care Hospitals,Government - Hospital District or Authority,Yes
3012,340002,MEMORIAL MISSION HOSPITAL AND ASHEVILLE SURGER...,509 BILTMORE AVE,,,ASHEVILLE,NC,28801,BUNCOMBE,8282131111,Acute Care Hospitals,Voluntary non-profit - Private,Yes
3013,340003,NORTHERN HOSPITAL OF SURRY COUNTY,830 ROCKFORD ST,,,MOUNT AIRY,NC,27030,SURRY,3367197000,Acute Care Hospitals,Government - Hospital District or Authority,Yes
3014,340004,HIGH POINT REGIONAL HOSPITAL,601 N ELM ST PO BOX HP-5,,,HIGH POINT,NC,27261,GUILFORD,3368786000,Acute Care Hospitals,Voluntary non-profit - Private,Yes
3015,340008,SCOTLAND MEMORIAL HOSPITAL,500 LAUCHWOOD DR,,,LAURINBURG,NC,28352,SCOTLAND,9102917000,Acute Care Hospitals,Voluntary non-profit - Private,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3119,341323,CHARLES A CANNON JR MEMORIAL HOSPITAL,434 HOSPITAL DRIVE,,,LINVILLE,NC,28646,AVERY,8287377000,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3120,341324,"THE OUTER BANKS HOSPITAL, INC",4800 SOUTH CROATAN HIGHWAY,,,NAGS HEAD,NC,27959,DARE,2524494500,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3121,341325,ASHE MEMORIAL HOSPITAL,200 HOSPITAL AVE,,,JEFFERSON,NC,28640,ASHE,3362467101,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3122,341326,ANGEL MEDICAL CENTER,120 RIVERVIEW ST PO BOX 1209,,,FRANKLIN,NC,28734,MACON,8285248411,Critical Access Hospitals,Voluntary non-profit - Private,Yes


Now let's generate the same data for the state of South Carolina and glue the two dataframes together.

In [None]:
SC=df[df["State"]=="SC"].copy()

In [None]:
SC

Unnamed: 0,Provider Number,Hospital Name,Address 1,Address 2,Address 3,City,State,ZIP Code,County,Phone Number,Hospital Type,Hospital Ownership,Emergency Services
3770,420002,PIEDMONT MEDICAL CENTER,222 S HERLONG AVE,,,ROCK HILL,SC,29730,YORK,8033291234,Acute Care Hospitals,Proprietary,Yes
3771,420004,MUSC MEDICAL CENTER,169 ASHLEY AVE,,,CHARLESTON,SC,29425,CHARLESTON,8437922300,Acute Care Hospitals,Government - State,Yes
3772,420005,MCLEOD MEDICAL CENTER - DILLON,301 E JACKSON ST,,,DILLON,SC,29536,DILLON,8437744111,Acute Care Hospitals,Voluntary non-profit - Private,Yes
3773,420007,SPARTANBURG REGIONAL MEDICAL CENTER,101 E WOOD ST,,,SPARTANBURG,SC,29303,SPARTANBURG,8645606000,Acute Care Hospitals,Government - Hospital District or Authority,Yes
3774,420009,OCONEE MEDICAL CENTER,298 MEMORIAL DRIVE,,,SENECA,SC,29672,OCONEE,8648823351,Acute Care Hospitals,Voluntary non-profit - Private,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3828,421300,ALLENDALE COUNTY HOSPITAL,1787 ALLENDALE FAIRFAX RD PO BOX 218,,,FAIRFAX,SC,29827,ALLENDALE,8036323311,Critical Access Hospitals,Government - Local,Yes
3829,421301,ABBEVILLE AREA MEDICAL CENTER,420 THOMSON CIRCLE,,,ABBEVILLE,SC,29620,ABBEVILLE,8643665011,Critical Access Hospitals,Voluntary non-profit - Private,Yes
3830,421302,FAIRFIELD MEMORIAL HOSPITAL,321 BYPASS PO BOX 620,,,WINNSBORO,SC,29180,FAIRFIELD,8036350233,Critical Access Hospitals,Government - Local,Yes
3831,421303,WILLIAMSBURG REGIONAL HOSPITAL,500 NELSON BOULEVARD,,,KINGSTREE,SC,29556,WILLIAMSBURG,8433558888,Critical Access Hospitals,Voluntary non-profit - Private,Yes


In [None]:
Carolina=pd.concat([NC,SC],axis=0,ignore_index=True,sort=False)

In [None]:
Carolina

Unnamed: 0,Provider Number,Hospital Name,Address 1,Address 2,Address 3,City,State,ZIP Code,County,Phone Number,Hospital Type,Hospital Ownership,Emergency Services
0,340001,CAROLINAS MEDICAL CENTER-NORTHEAST,920 CHURCH ST N,,,CONCORD,NC,28025,CABARRUS,7047833000,Acute Care Hospitals,Government - Hospital District or Authority,Yes
1,340002,MEMORIAL MISSION HOSPITAL AND ASHEVILLE SURGER...,509 BILTMORE AVE,,,ASHEVILLE,NC,28801,BUNCOMBE,8282131111,Acute Care Hospitals,Voluntary non-profit - Private,Yes
2,340003,NORTHERN HOSPITAL OF SURRY COUNTY,830 ROCKFORD ST,,,MOUNT AIRY,NC,27030,SURRY,3367197000,Acute Care Hospitals,Government - Hospital District or Authority,Yes
3,340004,HIGH POINT REGIONAL HOSPITAL,601 N ELM ST PO BOX HP-5,,,HIGH POINT,NC,27261,GUILFORD,3368786000,Acute Care Hospitals,Voluntary non-profit - Private,Yes
4,340008,SCOTLAND MEMORIAL HOSPITAL,500 LAUCHWOOD DR,,,LAURINBURG,NC,28352,SCOTLAND,9102917000,Acute Care Hospitals,Voluntary non-profit - Private,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...
171,421300,ALLENDALE COUNTY HOSPITAL,1787 ALLENDALE FAIRFAX RD PO BOX 218,,,FAIRFAX,SC,29827,ALLENDALE,8036323311,Critical Access Hospitals,Government - Local,Yes
172,421301,ABBEVILLE AREA MEDICAL CENTER,420 THOMSON CIRCLE,,,ABBEVILLE,SC,29620,ABBEVILLE,8643665011,Critical Access Hospitals,Voluntary non-profit - Private,Yes
173,421302,FAIRFIELD MEMORIAL HOSPITAL,321 BYPASS PO BOX 620,,,WINNSBORO,SC,29180,FAIRFIELD,8036350233,Critical Access Hospitals,Government - Local,Yes
174,421303,WILLIAMSBURG REGIONAL HOSPITAL,500 NELSON BOULEVARD,,,KINGSTREE,SC,29556,WILLIAMSBURG,8433558888,Critical Access Hospitals,Voluntary non-profit - Private,Yes


Now we have a dataframe only consisting of the hospital date in NC and SC.

Now let's try merging data.

In [None]:
series1=df.loc[:,["State","Hospital Type"]].sample(n=10)

In [None]:
series2=df.loc[:,["Hospital Name","Hospital Type"]].sample(n=10)

In [None]:
Merge_data=pd.merge(series1,series2)

In [None]:
Merge_data

Unnamed: 0,State,Hospital Type,Hospital Name
0,GA,Acute Care Hospitals,SAINT JOSEPH HOSPITAL
1,GA,Acute Care Hospitals,FLORIDA HOSPITAL HEARTLAND MEDICAL CENTER
2,GA,Acute Care Hospitals,CONROE REGIONAL MEDICAL CENTER
3,GA,Acute Care Hospitals,RIVERSIDE SHORE MEMORIAL HOSPITAL
4,GA,Acute Care Hospitals,CENTENNIAL MEDICAL CENTER
5,GA,Acute Care Hospitals,MCDUFFIE REGIONAL MEDICAL CENTER
6,GA,Acute Care Hospitals,VOLUNTEER COMMUNITY HOSPITAL
7,PA,Acute Care Hospitals,SAINT JOSEPH HOSPITAL
8,PA,Acute Care Hospitals,FLORIDA HOSPITAL HEARTLAND MEDICAL CENTER
9,PA,Acute Care Hospitals,CONROE REGIONAL MEDICAL CENTER


In [None]:
Merge_data.to_csv("Merge_data.csv", index=False)

In [None]:
Carolina.to_csv("Carolina.csv",index=False)