# K Nearest Neighbors Project 

Welcome to the KNN Project! This will be a simple project very similar to the lecture, except you'll be given another data set. Go ahead and just follow the directions below.
## Import Libraries
**Import pandas,seaborn, and the usual libraries.**

In [42]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Get the Data
** Read the 'KNN_Project_Data csv file into a dataframe **

In [22]:
df = pd.read_csv("KNN_Project_Data")

**Check the head of the dataframe.**

In [None]:
df.head()

# EDA

Since this data is artificial, we'll just do a large pairplot with seaborn.

**Use seaborn on the dataframe to create a pairplot with the hue indicated by the TARGET CLASS column.**

In [None]:
sns.pairplot(df, hue='TARGET CLASS', palette='coolwarm')

In [None]:
sns.jointplot(
    x="XVPM", 
    y="GWYH", 
    hue="TARGET CLASS", 
    data=df, 
    kind="scatter", 
    palette="Set1"
)

# Standardize the Variables

Time to standardize the variables.

** Import StandardScaler from Scikit learn.**

In [24]:
from sklearn.preprocessing import StandardScaler

** Create a StandardScaler() object called scaler.**

In [25]:
scaler = StandardScaler()

** Fit scaler to the features.**

In [None]:
scaler.fit(df.drop("TARGET CLASS", axis = 1))

**Use the .transform() method to transform the features to a scaled version.**

In [27]:
scaled_features = scaler.transform(df.drop("TARGET CLASS", axis = 1))

**Convert the scaled features to a dataframe and check the head of this dataframe to make sure the scaling worked.**

In [None]:
df_scaled = pd.DataFrame(scaled_features)
df_scaled.head()

# Train Test Split

**Use train_test_split to split your data into a training set and a testing set.**

In [29]:
from sklearn.model_selection import train_test_split
X = df_scaled
y = df["TARGET CLASS"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Using KNN

**Import KNeighborsClassifier from scikit learn.**

In [30]:
from sklearn.neighbors import KNeighborsClassifier

**Create a KNN model instance with n_neighbors=1**

In [31]:
KNN = KNeighborsClassifier(n_neighbors=1)

**Fit this KNN model to the training data.**

In [None]:
KNN.fit(X_train, y_train)

# Predictions and Evaluations
Let's evaluate our KNN model!

**Use the predict method to predict values using your KNN model and X_test.**

In [33]:
prediction = KNN.predict(X_test)

** Create a confusion matrix and classification report.**

In [34]:
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
print(confusion_matrix(y_test, prediction))

In [None]:
print(classification_report(y_test, prediction))

# Choosing a K Value
Let's go ahead and use the elbow method to pick a good K Value!

** Create a for loop that trains various KNN models with different k values, then keep track of the error_rate for each of these models with a list. Refer to the lecture if you are confused on this step.**

In [37]:
accuracy_rate = []

for i in range(1, 50):
    knn = KNeighborsClassifier(n_neighbors=i)
    knn.fit(X_train, y_train)
    prediction_i= knn.predict(X_test)
    accuracy_rate.append(np.mean(prediction_i == y_test))

**Now create the following plot using the information from your for loop.**

In [None]:
plt.figure(figsize=(10,6))
plt.plot(range(1,50), accuracy_rate, color="blue", linestyle = 'dashed', marker = 'o',
         markerfacecolor= "red", markersize = 10)

plt.title("Accuracy Rate vs K Value")
plt.xlabel('K')
plt.ylabel("Accuracy Rate")

## Retrain with new K Value

**Retrain your model with the best K value (up to you to decide what you want) and re-do the classification report and the confusion matrix.**

In [39]:
KNN_refined = KNeighborsClassifier(n_neighbors=21)
KNN_refined.fit(X_train, y_train)
prediction = KNN_refined.predict(X_test)

In [None]:
print(confusion_matrix(y_test, prediction))

In [None]:
print(classification_report(y_test, prediction))

# Great Job!