### Code for help in building a KNN regressor
#### Dr. Bruns

In this code I show the ideas you need to build a KNN regressor.  Read the notebook that gives help on KNN classification before reading this.

In [1]:
import numpy as np
import pandas as pd
from scipy.spatial import distance_matrix
from scipy.stats import zscore
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

Read the college data and set up training and test sets.  Our goal is to predict out-of-state tuition from expenditure per student and 4-year graduation rate.  Note that the training data is scaled.

In [2]:
df = pd.read_csv("https://raw.githubusercontent.com/grbruns/cst383/master/College.csv", index_col=0)

X = df[['Expend', 'Grad.Rate']].apply(zscore).values
y = df['Outstate'].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=5, random_state=42)

Create a distance matrix.
There is one row for every row of X_test.
There is one column for every row of X_train.

In [3]:
dm = distance_matrix(X_test, X_train)

The first row of dm corresponds to the first row of the test data.

In [4]:
row = dm[0,:]

Use argsort to the indexes of the training data rows that are most similar to 'row'.

In [5]:
indexes_k_closest = np.argsort(row)[:5]
indexes_k_closest

array([694, 577,  66, 315, 606], dtype=int64)

The training data target values give the tuitions associated with these rows of the training data.

In [6]:
tuitions_k_closest = y_train[np.argsort(row)][:5]
tuitions_k_closest

array([ 6597,  6530,  8650, 11208,  6704], dtype=int64)

To get the predicted tuition, just take the average value of these tuitions.

In [7]:
np.mean(tuitions_k_closest)

7937.8

Let's compare our prediction to the actual value.  
(In other words, the actual tuition associated with the first row of test data.)

In [8]:
y_test[0]

6550

In this example I showed how to make a prediction for the 
first row of the test data.  The same idea can be used to make 
predictions for all rows of the test data set.