# Code to get the hospital closest to the centre of an area unit, and area unit of hospital

We have a hospital dataframe, but don't know the area unit of the hospital. In this notebook we will take the hospital dataframe and the area unit dataframes and use the lat and long coordinates that each have to calculate the nearest hospital to certain unit area

In [2]:
# Installing  and attaching packages
using Pkg 
#Pkg.add("DataFrames")
#Pkg.add("CSV")

using CSV, DataFrames

First we need to load the datasets

In [3]:
area_units = CSV.read("areaXY.csv", DataFrame)
hospitals = CSV.read("CanterburyHospitalsLocations.df.csv", DataFrame)


Unnamed: 0_level_0,Premises.Name,Certification.Service.Type,Service.Types
Unnamed: 0_level_1,String,String,String
1,Ashburton Hospital,Public Hospital,"Medical, Maternity"
2,Burwood Hospital,Public Hospital,"Surgical, Geriatric, Psychogeriatric, Medical"
3,Chatham Island Health Centre,Public Hospital,Medical
4,Christchurch Hospital,Public Hospital,"Childrens health, Medical, Surgical, Maternity"
5,Darfield Hospital,Public Hospital,"Medical, Geriatric"
6,Ellesmere Hospital,Public Hospital,"Medical, Geriatric"
7,Hillmorton Hospital,Public Hospital,Mental health
8,Kaikoura Hospital,Public Hospital,"Medical, Maternity, Geriatric"
9,Lincoln Maternity Hospital,Public Hospital,Maternity
10,Oxford Hospital,Public Hospital,"Medical, Geriatric"


Note that area units has latitude and longitude switched around the wrong way, so we clean up this table.

In [17]:
# Fixing column names of area units table
area_units = select(area_units, "AU2017_NAME", "Longitude" => "Latitude", "Latitude" => "Longitude")

Unnamed: 0_level_0,AU2017_NAME,Latitude,Longitude
Unnamed: 0_level_1,String,Float64,Float64
1,Addington,-43.5435,172.62
2,Aidanfield,-43.5644,172.569
3,Akaroa,-43.8067,172.966
4,Akaroa Harbour,-43.7716,172.939
5,Allenton East,-43.8922,171.753
6,Allenton West,-43.8905,171.742
7,Amberley,-43.1558,172.73
8,Amuri,-42.5893,172.72
9,Aorangi,-43.4993,172.595
10,Aranui,-43.5107,172.703


In [28]:
# Crossjoin hospitals and area units to get one large df with each area unit matched with each hospital
cross_df = crossjoin(hospitals, area_units, makeunique = true)
describe(cross_df)

Unnamed: 0_level_0,variable,mean,min
Unnamed: 0_level_1,Symbol,Union…,Any
1,Premises.Name,,Ashburton Hospital
2,Certification.Service.Type,,NGO Hospital
3,Service.Types,,"Childrens health, Maternity, Surgical, Medical, Mental health"
4,Total.Beds,87.0476,3
5,Premises.Address,,1 Lincoln Road
6,Premises.Address.Suburb.Road,,Allenton
7,Premises.Address.Town.City,,Ashburton
8,Premises.Address.Post.Code,7792.24,7300
9,DHB.Name,,Canterbury District Health Board
10,Latitude,-43.6188,-44.4082


In [29]:
# Change column names to improve them
cross_df = select(cross_df, "Premises.Name" => "Hospital_Name",
    "Certification.Service.Type" => "Certification_Service_Type",
    "Service.Types" => "Service_Types",
    "Total.Beds" => "Total_Beds",
    "Premises.Address" => "Hospital_Address",
    "Premises.Address.Suburb.Road" => "Hospital_Suburb",
    "Premises.Address.Town.City" => "Hospital_City",
    "Premises.Address.Post.Code" => "Hospital_Postcode",
    "DHB.Name" => "DHB_Name",
    "Latitude" => "Hospital_Lat",
    "Longitude" => "Hospital_Long",
    "AU2017_NAME" => "Area_Unit",
    "Latitude_1" => "AU_Lat",
    "Longitude_1" => "AU_Long"
    )


Unnamed: 0_level_0,Hospital_Name,Certification_Service_Type,Service_Types,Total_Beds
Unnamed: 0_level_1,String,String,String,Int64
1,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
2,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
3,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
4,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
5,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
6,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
7,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
8,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
9,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
10,Ashburton Hospital,Public Hospital,"Medical, Maternity",56


Now we can write a function that will take the lat long of the hospitals and area units, and calculate the distance between them in metres - we then apply the function to the cross joined dataframe lat and long columns 

In [30]:
## Function to convert degrees to radians since we use radians to calculate distance
function deg2rad(x)

    return x * pi / 180
end

# Function to calculate distance in metres from two lat and long coordinates
function spherical_distance(lat1, long1, lat2, long2)
   
   
    x1 = 0.5*pi - deg2rad(lat1) # creating a variable for x1 in metres, converting latitudes of point 1 and 2 into radians
    x2 = 0.5*pi - deg2rad(lat2)
   
    r = 0.5*(6378137 + 6356752) # mean radius of the Earth in meters
   
    t = sin(x1)*sin(x2)*cos(deg2rad(long1)-deg2rad(long2)) + cos(x1)*cos(x2) # mathematical function that takes our inputs and calculates
                                                                             # distance accounting for points being on a sphere
    return float(r * acos(t))
end

# Testing function
spherical_distance(-43.57148, 172.61959, -43.5429307391304, 172.614197202899)

3202.3500502926418

In [31]:
# Apply function row-wise to 2 separate sets of lat and long columns in the dataframe
distance_df = transform!(cross_df, [:Hospital_Lat, :Hospital_Long, :AU_Lat, :AU_Long] => ByRow(spherical_distance) => :Distance )

Unnamed: 0_level_0,Hospital_Name,Certification_Service_Type,Service_Types,Total_Beds
Unnamed: 0_level_1,String,String,String,Int64
1,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
2,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
3,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
4,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
5,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
6,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
7,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
8,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
9,Ashburton Hospital,Public Hospital,"Medical, Maternity",56
10,Ashburton Hospital,Public Hospital,"Medical, Maternity",56


In [32]:
# Show selected columns
distance_df[!, [:Hospital_Name, :Area_Unit, :Distance]]

Unnamed: 0_level_0,Hospital_Name,Area_Unit,Distance
Unnamed: 0_level_1,String,String,Float64
1,Ashburton Hospital,Addington,80339.5
2,Ashburton Hospital,Aidanfield,75600.5
3,Ashburton Hospital,Akaroa,98281.2
4,Ashburton Hospital,Akaroa Harbour,96644.8
5,Ashburton Hospital,Allenton East,617.26
6,Ashburton Hospital,Allenton West,554.11
7,Ashburton Hospital,Amberley,1.14134e5
8,Ashburton Hospital,Amuri,1.65098e5
9,Ashburton Hospital,Aorangi,81127.9
10,Ashburton Hospital,Aranui,87920.0


Now that we have all the distances from each hospital to each suburb, we can group the dataframe by area unit and return the mimumim value fo the distances columns, to see which hospital is closesest to centre point of that area unit. Credit to user Bogumił Kamiński on stack overflow post https://stackoverflow.com/questions/65024962/select-rows-of-a-dataframe-containing-minimum-of-grouping-variable-in-julia for the following line of code.

In [36]:
# Grouping by hospital name andd returning row of minimum value for the distance
closest_hospital = combine(distance_df -> filter(:Distance => ==(minimum(distance_df.Distance)), distance_df), groupby(distance_df, :Area_Unit))

Unnamed: 0_level_0,Area_Unit,Hospital_Name,Certification_Service_Type
Unnamed: 0_level_1,String,String,String
1,Addington,Christchurch Hospital,Public Hospital
2,Aidanfield,Hillmorton Hospital,Public Hospital
3,Akaroa,The Princess Margaret Hospital,Public Hospital
4,Akaroa Harbour,The Princess Margaret Hospital,Public Hospital
5,Allenton East,Ashburton Hospital,Public Hospital
6,Allenton West,Ashburton Hospital,Public Hospital
7,Amberley,Rangiora Hospital,Public Hospital
8,Amuri,Waikari Hospital,Public Hospital
9,Aorangi,St George's Hospital,NGO Hospital
10,Aranui,Burwood Hospital,Public Hospital


Now we can find what area unit a hospital is in, by grouping by hospital and finding the smallest distance from hospital to any area unit.
NB: this is not as precise as checking if hospital in inside a certain boundary, as it is possible for a hospital to be in a certain Area Unit and still be closer to center of a different area unit. However we with the tools at our disposal, we are happy with this result. 

In [37]:
# Grouping by hospital name andd returning row of minimum value for the distance
hospital_area_unit = combine(distance_df -> filter(:Distance => ==(minimum(distance_df.Distance)), distance_df), groupby(distance_df, :Hospital_Name))

Unnamed: 0_level_0,Hospital_Name,Certification_Service_Type,Service_Types
Unnamed: 0_level_1,String,String,String
1,Ashburton Hospital,Public Hospital,"Medical, Maternity"
2,Burwood Hospital,Public Hospital,"Surgical, Geriatric, Psychogeriatric, Medical"
3,Chatham Island Health Centre,Public Hospital,Medical
4,Christchurch Hospital,Public Hospital,"Childrens health, Medical, Surgical, Maternity"
5,Darfield Hospital,Public Hospital,"Medical, Geriatric"
6,Ellesmere Hospital,Public Hospital,"Medical, Geriatric"
7,Hillmorton Hospital,Public Hospital,Mental health
8,Kaikoura Hospital,Public Hospital,"Medical, Maternity, Geriatric"
9,Lincoln Maternity Hospital,Public Hospital,Maternity
10,Oxford Hospital,Public Hospital,"Medical, Geriatric"


In [38]:
# Show selected columns 
hospital_area_unit[!, [:Hospital_Name, :Area_Unit, :Distance]]

Unnamed: 0_level_0,Hospital_Name,Area_Unit,Distance
Unnamed: 0_level_1,String,String,Float64
1,Ashburton Hospital,Allenton West,554.11
2,Burwood Hospital,Travis Wetland,586.853
3,Chatham Island Health Centre,Kaikoura Township,809089.0
4,Christchurch Hospital,Hagley Park,635.961
5,Darfield Hospital,Darfield,628.026
6,Ellesmere Hospital,Leeston,458.046
7,Hillmorton Hospital,Hillmorton,805.898
8,Kaikoura Hospital,Kaikoura Township,527.489
9,Lincoln Maternity Hospital,Lincoln,801.127
10,Oxford Hospital,Ashley Gorge,134.761


In [42]:
CSV.write("Area Unit Hospitals", closest_hospital)
CSV.write("Hospitals", hospital_area_unit)

"Hospitals"