# Week 2: Identify Nearest Health Facilities

<span style="color:red">
**UPDATE**

Thank you for your analysis. Despite our warning efforts so far, the virus continues to spread rapidly. We want to get infected individuals treatment as quickly as possible, so we need your help to calculate which hospital or clinic is closest to each known infected individual in the population.
</span>

Your goal for this notebook will be to identify the nearest hospital or clinic for each infected person.

## Imports

In [1]:
import cudf
import cuml
import cupy as cp

## Load Population Data

Begin by loading the `lat`, `long` and `infected` columns from `'./data/week2.csv'` into a cuDF data frame called `gdf`.

In [2]:
gdf = cudf.read_csv('./data/week2.csv', usecols=['lat', 'long', 'infected'])
print(gdf.dtypes)
gdf.shape

lat         float64
long        float64
infected    float64
dtype: object


(58479894, 3)

In [3]:
gdf.head()

Unnamed: 0,lat,long,infected
0,54.52251,-1.571896,0.0
1,54.55403,-1.524968,0.0
2,54.552486,-1.435203,0.0
3,54.537189,-1.566215,0.0
4,54.528212,-1.588462,0.0


## Load Hospital and Clinics Data

For this step, your goal is to create an `all_med` cuDF data frame that contains the latitudes and longitudes of all the hospitals (data found at `'./data/hospitals.csv'`) and clinics (data found at `'./data/clinics.csv'`).

In [4]:
clinics = cudf.read_csv('./data/clinics.csv', usecols=['Latitude', 'Longitude'])
clinics.shape

(19082, 2)

In [5]:
clinics.head()
clinics.count()

Latitude     19075
Longitude    19075
dtype: int64

In [6]:
hospitals = cudf.read_csv('./data/hospitals.csv', usecols=['Latitude', 'Longitude'])
hospitals.shape

(1229, 2)

In [7]:
hospitals.head()
hospitals.count()

Latitude     1226
Longitude    1226
dtype: int64

In [8]:
all_med = clinics.append(hospitals, ignore_index=True)
all_med.shape

(20311, 2)

In [9]:
all_med.count()

Latitude     20301
Longitude    20301
dtype: int64

Since we will be using the coordinates of those facilities, keep only those rows that are non-null in both  `Latitude` and `Longitude`.

In [10]:
all_med.dropna(inplace=True)
all_med.reset_index(drop=True, inplace=True)
all_med.head()
all_med.tail()

Unnamed: 0,Latitude,Longitude
20296,51.763874,-1.219792
20297,51.763874,-1.219792
20298,53.986664,-1.051122
20299,51.575409,-0.322023
20300,51.763874,-1.219792


In [11]:
all_med.count()

Latitude     20301
Longitude    20301
dtype: int64

## Make Grid Coordinates for Medical Facilities

Provided for you in the next cell (which you can expand by clicking on the "...", and contract again after executing by clicking on the blue left border of the cell) is the lat/long to grid coordinates converter you have used earlier in the workshop. Use this converter to create grid coordinate values stored in `northing` and `easting` columns of the `all_med` data frame you created in the last step.

In [12]:
# https://www.ordnancesurvey.co.uk/docs/support/guide-coordinate-systems-great-britain.pdf

def latlong2osgbgrid_cupy(lat, long, input_degrees=True):
    '''
    Converts latitude and longitude (ellipsoidal) coordinates into northing and easting (grid) coordinates, using a Transverse Mercator projection.
    
    Inputs:
    lat: latitude coordinate (N)
    long: longitude coordinate (E)
    input_degrees: if True (default), interprets the coordinates as degrees; otherwise, interprets coordinates as radians
    
    Output:
    (northing, easting)
    '''
    
    if input_degrees:
        lat = lat * cp.pi/180
        long = long * cp.pi/180

    a = 6377563.396
    b = 6356256.909
    e2 = (a**2 - b**2) / a**2

    N0 = -100000 # northing of true origin
    E0 = 400000 # easting of true origin
    F0 = .9996012717 # scale factor on central meridian
    phi0 = 49 * cp.pi / 180 # latitude of true origin
    lambda0 = -2 * cp.pi / 180 # longitude of true origin and central meridian
    
    sinlat = cp.sin(lat)
    coslat = cp.cos(lat)
    tanlat = cp.tan(lat)
    
    latdiff = lat-phi0
    longdiff = long-lambda0

    n = (a-b) / (a+b)
    nu = a * F0 * (1 - e2 * sinlat ** 2) ** -.5
    rho = a * F0 * (1 - e2) * (1 - e2 * sinlat ** 2) ** -1.5
    eta2 = nu / rho - 1
    M = b * F0 * ((1 + n + 5/4 * (n**2 + n**3)) * latdiff - 
                  (3*(n+n**2) + 21/8 * n**3) * cp.sin(latdiff) * cp.cos(lat+phi0) +
                  15/8 * (n**2 + n**3) * cp.sin(2*(latdiff)) * cp.cos(2*(lat+phi0)) - 
                  35/24 * n**3 * cp.sin(3*(latdiff)) * cp.cos(3*(lat+phi0)))
    I = M + N0
    II = nu/2 * sinlat * coslat
    III = nu/24 * sinlat * coslat ** 3 * (5 - tanlat ** 2 + 9 * eta2)
    IIIA = nu/720 * sinlat * coslat ** 5 * (61-58 * tanlat**2 + tanlat**4)
    IV = nu * coslat
    V = nu / 6 * coslat**3 * (nu/rho - cp.tan(lat)**2)
    VI = nu / 120 * coslat ** 5 * (5 - 18 * tanlat**2 + tanlat**4 + 14 * eta2 - 58 * tanlat**2 * eta2)

    northing = I + II * longdiff**2 + III * longdiff**4 + IIIA * longdiff**6
    easting = E0 + IV * longdiff + V * longdiff**3 + VI * longdiff**5

    return(northing, easting)

In [13]:
clinics.dropna(inplace=True)

In [14]:
cupy_lat = cp.asarray(all_med['Latitude'])
cupy_long = cp.asarray(all_med['Longitude'])
grid_n, grid_e = latlong2osgbgrid_cupy(cupy_lat, cupy_long)
n_series = cudf.Series(grid_n)
e_series = cudf.Series(grid_e)
all_med['northing'] = n_series
all_med['easting'] = e_series
all_med.head()

Unnamed: 0,Latitude,Longitude,northing,easting
0,51.804237,1.186376,216584.974494,619651.670335
1,51.815262,1.154707,217715.546537,617415.959247
2,51.780621,1.117907,213755.249447,615045.25966
3,53.482368,-2.885404,398798.159295,341250.934895
4,53.41563,-2.800874,391308.056122,346776.272538


## Find Closest Hospital or Clinic for Infected

Fit `cuml.NearestNeighbors` with `all_med`'s `northing` and `easting` values, using the named argument `n_neighbors` set to `1`, and save the model as `knn`.

In [17]:
knn = cuml.NearestNeighbors(n_neighbors=1)

In [18]:
all_med.head()

Unnamed: 0,Latitude,Longitude,northing,easting
0,51.804237,1.186376,216584.974494,619651.670335
1,51.815262,1.154707,217715.546537,617415.959247
2,51.780621,1.117907,213755.249447,615045.25966
3,53.482368,-2.885404,398798.159295,341250.934895
4,53.41563,-2.800874,391308.056122,346776.272538


In [19]:
knn.fit(all_med[['northing', 'easting']])

NearestNeighbors()

Save every infected member in `gdf` into a new dataframe called `infected_gdf`.

In [20]:
infected_gdf = gdf[gdf['infected'] == 1].reset_index(drop=True)
print(infected_gdf.shape)
infected_gdf.head()

(70880, 3)


Unnamed: 0,lat,long,infected
0,53.715826,-2.430079,1.0
1,53.664881,-2.425673,1.0
2,53.696765,-2.48894,1.0
3,53.696966,-2.488897,1.0
4,53.727804,-2.392959,1.0


Create `northing` and `easting` values for `infected_gdf`.

In [21]:
cupy_lat_inf = cp.asarray(infected_gdf['lat'])
cupy_long_inf = cp.asarray(infected_gdf['long'])
grid_n_inf, grid_e_inf = latlong2osgbgrid_cupy(cupy_lat_inf, cupy_long_inf)
n_series_inf = cudf.Series(grid_n_inf)
e_series_inf = cudf.Series(grid_e_inf)
infected_gdf['northing'] = n_series_inf
infected_gdf['easting'] = e_series_inf
infected_gdf.head()

Unnamed: 0,lat,long,infected,northing,easting
0,53.715826,-2.430079,1.0,424489.783814,371619.678741
1,53.664881,-2.425673,1.0,418820.687944,371876.492369
2,53.696765,-2.48894,1.0,422394.39894,367721.000265
3,53.696966,-2.488897,1.0,422416.821887,367723.973098
4,53.727804,-2.392959,1.0,425808.109929,374076.557677


In [28]:
infected_gdf.tail()

Unnamed: 0,lat,long,infected,northing,easting
70875,51.662717,-2.92685,1.0,196451.131712,335900.971171
70876,51.59935,-2.959175,1.0,189433.449408,333573.002864
70877,51.543825,-2.822984,1.0,183143.984735,342935.130365
70878,51.562536,-2.879492,1.0,185270.296666,339042.062616
70879,51.628748,-2.837151,1.0,192598.790941,342060.980289


In [27]:
all_med.head()

Unnamed: 0,Latitude,Longitude,northing,easting
0,,0.0,216584.974494,8e-323
1,2.121996e-314,,217715.546537,8e-323
2,0.0,,213755.249447,8e-323
3,,,398798.159295,8e-323
4,,2.121996e-314,391308.056122,8e-323


In [24]:
all_med.tail()

Unnamed: 0,Latitude,Longitude,northing,easting
20296,51.763874,-1.219792,207581.86969,453837.495458
20297,51.763874,-1.219792,207581.86969,453837.495458
20298,53.986664,-1.051122,454950.747087,462211.498826
20299,51.575409,-0.322023,187669.267394,516265.752713
20300,51.763874,-1.219792,207581.86969,453837.495458


Use `knn.kneighbors` with `n_neighbors=1` on `infected_gdf`'s `northing` and `easting` values. Save the return values in `distances` and `indices`.

In [25]:
distances, indices = knn.kneighbors(infected_gdf[['northing', 'easting']], 1)

In [26]:
indices.shape

(70880,)

### Check Your Solution

`indices`, returned from your use of `knn.kneighbors` immediately above, should map person indices to their closest clinic/hospital indices:

In [31]:
indices.head()

0    30
1    30
2    96
3    62
4    62
dtype: int64

In [32]:
infected_gdf.head()

Unnamed: 0,lat,long,infected,northing,easting
0,,-2.430079,1.0,424489.783814,371619.678741
1,,-2.425673,1.0,418820.687944,371876.492369
2,2.121996e-314,-2.48894,1.0,422394.39894,367721.000265
3,0.0,-2.488897,1.0,422416.821887,367723.973098
4,,-2.392959,1.0,425808.109929,374076.557677


Here you can print an infected individual's coordinates from `infected_gdf`:

In [29]:
infected_gdf.iloc[0] # get the coords of an infected individual (in this case, individual 0)

lat          5.371583e+01
long        -2.430079e+00
infected    2.914987e-322
northing     4.244898e+05
easting      3.716197e+05
Name: 0, dtype: float64

You should be able to used the mapped index for the nearest facility to see that indeed the nearest facility is at a nearby coordinate:

In [34]:
all_med.iloc[30] # printing the entry for facility 1234 (replace with the index identified as closest to the individual)

Latitude               NaN
Longitude     0.000000e+00
northing      4.158324e+05
easting      7.905050e-323
Name: 30, dtype: float64

<div align="center"><h2>Please Restart the Kernel</h2></div>

...before moving to the next notebook.

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)