# Other Popular Machine Learning Models Models
## Instance-based learning w/ k-Nearest Neighbor
#### Setting up for classification analysis

In [1]:
import numpy as np
import pandas as pd
import scipy
import urllib
import sklearn

import matplotlib.pyplot as plt
from pylab import rcParams

from sklearn import neighbors
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn import metrics

In [2]:
from sklearn.neighbors import KNeighborsClassifier

In [3]:
np.set_printoptions(precision=4, suppress=True) 
%matplotlib inline
rcParams['figure.figsize'] = 7, 4
plt.style.use('seaborn-whitegrid')

## Importing your data

In [5]:
address = 'mtcars.csv'

cars = pd.read_csv(address)
cars.columns = ['car_names','mpg','cyl','disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']

X_prime = cars[['mpg', 'disp', 'hp', 'wt']].values

# target
y = cars.iloc[:,9].values

In [14]:
display(cars.head())
display(cars.shape)

Unnamed: 0,car_names,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


(32, 12)

In [6]:
# scale our variables
X = preprocessing.scale(X_prime)

In [7]:
# traing set to train the model & test set for evaluating the model's performance
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=.2, random_state=17)

## Building and training your model with training data

In [8]:
# instantiate classifier
clf = neighbors.KNeighborsClassifier()

# train our model
clf.fit(X_train, y_train)
print(clf)

KNeighborsClassifier()


## Evaluating your model's predictions

In [9]:
y_pred= clf.predict(X_test)

# expected values are actually our test data
y_expect = y_test

# use classification report function to evaluate how well the model performed
print(metrics.classification_report(y_expect, y_pred))

              precision    recall  f1-score   support

           0       0.80      1.00      0.89         4
           1       1.00      0.67      0.80         3

    accuracy                           0.86         7
   macro avg       0.90      0.83      0.84         7
weighted avg       0.89      0.86      0.85         7



In [None]:
# recall is a measure of your model's completeness
# of all your points that were labeled 1, only 67% of the results that were returned were truly relevant
# and of the entire dataset, 83% (macro avg ~ recall) of the results that were returned were truly relevant

# high precision + low recall = few results returned, but many of the label predictions that are returned are correct
# in other words, high accuracy but low completion

### When using classification models in machine learning, there are three common metrics that we use to assess the quality of the model:

1. Precision: Percentage of correct positive predictions relative to total positive predictions.

2. Recall: Percentage of correct positive predictions relative to total actual positives.

3. F1 Score: A weighted harmonic mean of precision and recall. The closer to 1, the better the model.

F1 Score: 2 * (Precision * Recall) / (Precision + Recall)

Precision: Out of all the cars that the model predicted would get labeled 1, 100% actual
Recall: Out of all the cars that actually get labeled 1, the model only predicted this outcome correctly for 67%
    
F1 Score: This value is calculated as:

F1 Score: 2 * (Precision * Recall) / (Precision + Recall)
F1 Score: 2 * (1 * .67) / (1 + .67)
F1 Score: 0.8
    
Since this value is very close to 1, 
it tells us that the model does a good job of predicting whether or not cars are labeled 1

Support: 
These values simply tell us how many cars belonged to each class in the test dataset. 
We can see that among the cars in the test dataset (0.2 * 32 = 6.4 ~ 7), 4 cars labeled 0, 3 cars labeled 1