# KNN

Using GPU-accelerated k-nearest neighbors to identify the nearest road nodes to hospitals.

## Imports

In [1]:
import cudf
import cuml

## Load Data

### Road Nodes

We begin by reading our road nodes data.

In [2]:
road_nodes = cudf.read_csv('./data/road_nodes_2-06.csv', dtype=['str', 'float32', 'float32', 'str'])

In [3]:
road_nodes.dtypes

node_id     object
east       float32
north      float32
type        object
dtype: object

In [4]:
road_nodes.shape

(3121148, 4)

In [5]:
road_nodes.head()

Unnamed: 0,node_id,east,north,type
0,id02FE73D4-E88D-4119-8DC2-6E80DE6F6594,320608.09375,870994.0,junction
1,id634D65C1-C38B-4868-9080-2E1E47F0935C,320628.5,871103.8125,road end
2,idDC14D4D1-774E-487D-8EDE-60B129E5482C,320635.46875,870983.875,junction
3,id51555819-1A39-4B41-B0C9-C6D2086D9921,320648.6875,871083.5625,junction
4,id9E362428-79D7-4EE3-B015-0CE3F6A78A69,320658.1875,871162.375,junction


### Hospitals

Next we load the hospital data.

In [6]:
hospitals = cudf.read_csv('./data/hospitals_2-06.csv')

In [None]:
hospitals.dtypes

In [None]:
hospitals.shape

In [None]:
hospitals.head()

## K-Nearest Neighbors

We are going to use the [k-nearest neighbors](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) algorithm to find the nearest *k* road nodes for every hospital. We will need to fit a KNN model with road data, and then give our trained model hospital locations so that it can return the nearest roads.

## Exercise: Prep the KNN Model

In [7]:
knn = cuml.NearestNeighbors(n_neighbors = 3)

## Exercise: Fit the KNN Model

Create a new dataframe `road_locs` using the `road_nodes` columns `east` and `north`. The order of the columns doesn't matter, except that we will need them to remain consistent over multiple operations, so please use the ordering `['east', 'north']`.

Fit the `knn` model with `road_locs` using the `knn.fit` method.

In [9]:
road_locs = road_nodes[['east', 'north']]
knn.fit(road_locs)

NearestNeighbors(n_neighbors=3, verbose=4, handle=<cuml.common.handle.Handle object at 0x7f002d8ea710>, algorithm='brute', metric='euclidean', p=2, metric_params=None, output_type='cudf')

## Exercise: Road Nodes Closest to Each Hospital

Use the `knn.kneighbors` method to find the 3 closest road nodes to each hospital. `knn.kneighbors` expects 2 arguments: `X`, for which you should use the `easting` and `northing` columns of `hospitals` (remember to retain the same column order as when you fit the `knn` model above), and `n_neighbors`, the number of neighbors to search for--in this case, 3. 

`knn.kneighbors` will return 2 cudf Dataframes, which you should name `distances` and `indices` respectively.

In [11]:
distances, indices = knn.kneighbors(hospitals[['easting', 'northing']], 3)

## Viewing a Specific Hospital

We can now use `indices`, `hospitals`, and `road_nodes` to derive information specific to a given hospital. Here we will examine the hospital at index `10`. First we view the hospital's grid coordinates:

In [12]:
SELECTED_RESULT = 10
print('hospital coordinates:\n', hospitals.loc[SELECTED_RESULT, ['easting', 'northing']], sep='')

hospital coordinates:
easting     260713.17190
northing     56303.21875
Name: 10, dtype: float64


Now we view the road node IDs for the 3 closest road nodes:

In [None]:
nearest_road_nodes = indices.iloc[SELECTED_RESULT, 0:3]
print('node_id:\n', nearest_road_nodes, sep='')

And finally the grid coordinates for the 3 nearest road nodes, which we can confirm are located in order of increasing distance from the hospital:

In [None]:
print('road_node coordinates:\n', road_nodes.loc[nearest_road_nodes, ['east', 'north']], sep='')