![title](Header__0004_6.png "Header")
___
# Chapter 6 - Other Popular Machine Learning Models Models
## Segment 3 - Instance-based learning w/ k-Nearest Neighbor
#### Setting up for classification analysis

In [1]:
import numpy as np
import pandas as pd
import scipy

import matplotlib.pyplot as plt
from pylab import rcParams

import urllib

import sklearn
from sklearn.neighbors import KNeighborsClassifier
from sklearn import neighbors
from sklearn import preprocessing

# I'm going to show you how to use this to split your data into test and training sets. 
from sklearn.model_selection import train_test_split
from sklearn import metrics

In [2]:
#Now let's set our plotting parameters for the Jupyter notebook.
np.set_printoptions(precision=4, suppress=True) 
%matplotlib inline
rcParams['figure.figsize'] = 7, 4
plt.style.use('seaborn-whitegrid')

## Importing your data
We're going to use our antique cars data set, so we'll load that like we have been throughout this course, and then to use k-nearest neighbor, you should have a labeled data set.

We do. We're going to use the AM variable as our target. This variable labels a car as either having an automatic transmission or a manual transmission. 

For this analysis, we're going to use the variables MPG, displacement, HP, and weight as predictive features in our model.

We're going to build a model that predicts a car's transmission type based on values in these four fields. I picked these variables because they each hold information that's relevant to whether a car has an automatic or a manual transmission, and because they each have distinguishable sub groups.

In [5]:
address = 'C:/Users/piers/Downloads/PFDS v.2 Code/Ch06/06_03/mtcars.csv'

cars = pd.read_csv(address)
cars.columns = ['car_names','mpg','cyl','disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']

X_prime =  cars[['mpg','disp','hp','wt']].values

y = cars.iloc[:,9].values

In [6]:
# Before we can implement the k-nearest neighbor algorithm we need to scale our variables.
X = preprocessing.scale(X_prime)

That scales our variables. Now I'm going to split the data into test and training sets.

We use the training set for training the model, and the test set for evaluating the model's performance. 

To do this, we'll use scikitlearn's model selection tools, and we'll use the train_test_split() function. The train_test_split() function breaks the original data set into a list of train test splits. 

In [7]:
# Since the function splits the data randomly, we need to set the seed by passing this argument in, 
# and that will allow you to reproduce the same results as you see here on my computer.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.33, random_state=17)

Now, let's build our model.
## Building and training your model with training data
The first thing we need to do is instantiate a k- nearest neighbor object.

In [8]:
clf = neighbors.KNeighborsClassifier()

clf.fit(X_train, y_train)
print(clf)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=5, p=2,
           weights='uniform')


Now what we see here is our model parameters all printed out. 

Now let's evaluate the model's predictions against the test data set.

## Evaluating your model's predictions

In [9]:
y_expect = y_test
y_pred = clf.predict(X_test)

# To score the model I'll use scikitlearn's classification_report() function. That's part of the metrics module. 
print(metrics.classification_report(y_expect, y_pred))

              precision    recall  f1-score   support

           0       0.71      1.00      0.83         5
           1       1.00      0.67      0.80         6

   micro avg       0.82      0.82      0.82        11
   macro avg       0.86      0.83      0.82        11
weighted avg       0.87      0.82      0.82        11



There we have some model results. Now I'm going to take you into the other screen to show you what those mean.
