# K Nearest Neighbors with Python

## Import Libraries



In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

## The Data

Let's start by reading in the Churn_Modelling.csv file into a pandas dataframe.

In [None]:
df = pd.read_csv('Churn_Modelling.csv')

In [None]:
df.head()

## Data Selection

In [None]:
df=df[['CreditScore','Age','Balance','EstimatedSalary','Tenure']]
df.head()

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scaler = StandardScaler()

In [None]:
scaler.fit(df.drop('CreditScore',axis=1))

In [None]:
scaled_features = scaler.transform(df.drop('CreditScore',axis=1))

In [None]:
df_feat = pd.DataFrame(scaled_features,columns=df.columns[1:])
df_feat.head()

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))

x_train_scaled = scaler.fit_transform(x_train)

x_train = pd.DataFrame(x_train_scaled)

x_test_scaled = scaler.fit_transform(x_test)

x_test = pd.DataFrame(x_test_scaled)

## Train Test Split

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(scaled_features,df['CreditScore'],
                                                    test_size=0.30)

In [None]:
#X_train, X_test, y_train, y_test = train_test_split(df.drop('Exited',axis=1),df['Exited'],
#                                                    test_size=0.30)

In [None]:
X_train[0:5,0:3]

## Using KNN

Remember that we are trying to come up with a model to predict whether someone will Exited or not. We'll start with k=1.

In [None]:
from sklearn.neighbors import KNeighborsRegressor

In [None]:
knn = KNeighborsRegressor(n_neighbors=3,p=2)
print(knn)

In [None]:
knn.fit(X_train,y_train)

In [None]:
pred = knn.predict(X_test)

In [None]:
pred

In [None]:
y_test[0:10]

## Predictions and Evaluations

Let's evaluate our KNN model!

In [None]:
from sklearn.metrics import mean_squared_error 
from math import sqrt
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
print("RMSE with k=3 is ",sqrt(mean_squared_error(y_test,pred)))

## Choosing a K Value

Let's go ahead and use the elbow method to pick a good K Value:

In [None]:
rmse_val = [] #to store rmse values for different k
for K in range(40):
    K = K+1
    knn = KNeighborsRegressor(n_neighbors=K,p=2)

    knn.fit(X_train, y_train)  #fit the model
    pred=knn.predict(X_test) #make prediction on test set
    error = sqrt(mean_squared_error(y_test,pred)) #calculate rmse
    rmse_val.append(error) #store rmse values
    print('RMSE value for k= ' , K , 'is:', error)

In [None]:
curve = pd.DataFrame(rmse_val) #elbow curve 
curve.plot()

Here we can see that that after arouns K>10 the error rate just tends to hover around 0.06-0.05 Let's retrain the model with that and check the classification report!

In [None]:
# FIRST A QUICK COMPARISON TO OUR ORIGINAL 
K=10
knn = KNeighborsRegressor(n_neighbors=K,p=2)

knn.fit(X_train,y_train)
pred = knn.predict(X_test)

print('WITH K={} \n'.format(K))
print(sqrt(mean_squared_error(y_test,pred)))