# k-Nearest-Neighbors

## Success Criteria

 * Explain in a sentence what is the purpose of supervised machine learning?
 * Identify a few key assumptions that should be made before using machine learning. 
 * Give an example of features and a target for a given dataset. 
 * Describe the KNN model
 * Explain what happens to our KNN model as K increases/decreases
 * ID the distance metrics available for KNN
 * Explain the curse of dimensionality

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler as SS
from sklearn.preprocessing import MinMaxScaler as MMS

In [None]:
n = 200
weights = np.random.uniform(high = 10, size = (n))
species = np.random.uniform(high = 10, size = (n))>5
df = pd.DataFrame({'weight':weights, 'species':species })

In [None]:
def height(row):
    return row.species*(30 + 3*row.weight) + (1-row.species)*(10 + 12*row.weight) + 7* np.random.random()


df['height'] = df.apply(height, axis=1)

In [None]:
df.head()

## Plot unstandardized data

In [None]:
fig, ax = plt.subplots(1)
ax.scatter(df['height'][df['species']], df['weight'][df['species']])
ax.scatter(df['height'][~df['species']], df['weight'][~df['species']])
ax.set_xlabel('Height')
ax.set_ylabel('Weight')
ax.set_title('Our Two Species');

## Standardize axes and Plot

In [None]:
#normalization and standardization is extremely important when dealing with any distance metric 
#Especially if we are dealing with different units. Like inches and pounds. 
X = SS().fit_transform(df[['height','weight']])

fig, ax = plt.subplots(1)
plt.scatter(*X[df['species']].T)
plt.scatter(*X[~df['species']].T)
ax.set_xlabel('Height')
ax.set_ylabel('Weight')
ax.set_title('Our Two Species Standardized');

## Build a KNN Model on standardized data

In [None]:
#basic build of model
model = KNeighborsClassifier(1)

#fit your KNN (this just stores the data, ready to compare distances to each point stored)
model.fit(X,df['species'])

## Predict over a grid of data

In [None]:
#create 100 x_axis points and y_axis points to predict (for each xx, predict all yy to get a filled in graph)
xx = np.linspace(X[:,0].min(),X[:,0].max(), 100)
yy = np.linspace(X[:,1].min(),X[:,1].max(), 100)
predictions = []
xs, ys = [],[]
for x in xx:
    for y in yy:
        predictions.append(model.predict([[x,y]])[0])
        xs.append(x)
        ys.append(y)
predictions = np.array(predictions)

In [None]:
plt.scatter(np.array(xs)[predictions], np.array(ys)[predictions])
plt.scatter(np.array(xs)[~predictions], np.array(ys)[~predictions])
plt.scatter(*X[df['species']].T)
plt.scatter(*X[~df['species']].T);

### More Typical Example without the visual

The above example is a clear and created KNN model mixed with a visual to aid in the understanding of the topic. However, this is not a typical use of the KNN model. Usually we have more than two features with which to predict. Here is another KNN classification example using the Iris Dataset that is built in with Sklearn. 

In [None]:
#load Iris Data
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['class'])
iris_df = df.copy().sample(frac = 1)
iris_df.head()

In [None]:
known_data = iris_df.iloc[:120, :]
unknown_data = iris_df.iloc[120:, :]

In [None]:
target = known_data.pop('class')
features = known_data.values

In [None]:
#create your model with all hyper parameters
model = KNeighborsClassifier(3)

#fit model to your specific data
model.fit(features, target)

In [None]:
#predict from your model on unseen data... More about this soon
predictions = model.predict(unknown_data.iloc[:, :-1].values)

In [None]:
#take a look at the predicted classes
predictions

In [None]:
#Actual classes they should have been... now we can compare and see how good our model is
unknown_data.iloc[:, -1].values