# Supervised Learning and K Nearest Neighbors Exercises

## Introduction

We will be using customer churn data from the telecom industry for the first week's exercises. The data file is called 
`Orange_Telecom_Churn_Data.csv`. We will load this data together, do some preprocessing, and use K-nearest neighbors to predict customer churn based on account characteristics.

In [2]:
from __future__ import print_function
import os
data_path = ['..', '..', 'data']

## Question 1

* Begin by importing the data. Examine the columns and data.
* Notice that the data contains a state, area code, and phone number. Do you think these are good features to use when building a machine learning model? Why or why not? 

We will not be using them, so they can be dropped from the data.

In [3]:
import pandas as pd

In [4]:
data=pd.read_csv("F:\Essential Files\Coding Projects\Machine Learning Class 1/Orange_Telecom_Churn_Data.csv")

In [5]:
data.head(1).T

Unnamed: 0,0
state,KS
account_length,128
area_code,415
phone_number,382-4657
intl_plan,no
voice_mail_plan,yes
number_vmail_messages,25
total_day_minutes,265.1
total_day_calls,110
total_day_charge,45.07


In [6]:
data.shape

(5000, 21)

In [7]:
data.dtypes

state                             object
account_length                     int64
area_code                          int64
phone_number                      object
intl_plan                         object
voice_mail_plan                   object
number_vmail_messages              int64
total_day_minutes                float64
total_day_calls                    int64
total_day_charge                 float64
total_eve_minutes                float64
total_eve_calls                    int64
total_eve_charge                 float64
total_night_minutes              float64
total_night_calls                  int64
total_night_charge               float64
total_intl_minutes               float64
total_intl_calls                   int64
total_intl_charge                float64
number_customer_service_calls      int64
churned                             bool
dtype: object

In [8]:
data.drop(['state', 'area_code', 'phone_number'], axis=1, inplace=True)
data.columns

Index(['account_length', 'intl_plan', 'voice_mail_plan',
       'number_vmail_messages', 'total_day_minutes', 'total_day_calls',
       'total_day_charge', 'total_eve_minutes', 'total_eve_calls',
       'total_eve_charge', 'total_night_minutes', 'total_night_calls',
       'total_night_charge', 'total_intl_minutes', 'total_intl_calls',
       'total_intl_charge', 'number_customer_service_calls', 'churned'],
      dtype='object')

In [9]:
#State, phone number, and area code are not good values for a machine learning model because they do not provide
#any way for the model to learn from these attributes. The model would need to understand where states, phone numbers, and
#area codes are geographically, therefore the numerical data has no meaning on a graph for the model to learn because 
#these numbers are actually categorical data. This model would need to be far too complex to include location into the model.

## Question 2

* Notice that some of the columns are categorical data and some are floats. These features will need to be numerically encoded using one of the methods from the lecture.
* Finally, remember from the lecture that K-nearest neighbors requires scaled data. Scale the data using one of the scaling methods discussed in the lecture.

In [10]:
from sklearn.preprocessing import LabelBinarizer
from sklearn.preprocessing import LabelEncoder

lb = LabelEncoder()
intl_plan = lb.fit(data['intl_plan'])
data['intl_plan'] = lb.transform(data['intl_plan'])

voice_mail_plan = lb.fit(data['voice_mail_plan'])
data['voice_mail_plan'] = lb.transform(data['voice_mail_plan'])

churned = lb.fit(data['churned'])
data['churned'] = lb.transform(data['churned'])

data.head()

Unnamed: 0,account_length,intl_plan,voice_mail_plan,number_vmail_messages,total_day_minutes,total_day_calls,total_day_charge,total_eve_minutes,total_eve_calls,total_eve_charge,total_night_minutes,total_night_calls,total_night_charge,total_intl_minutes,total_intl_calls,total_intl_charge,number_customer_service_calls,churned
0,128,0,1,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,0
1,107,0,1,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,0
2,137,0,0,0,243.4,114,41.38,121.2,110,10.3,162.6,104,7.32,12.2,5,3.29,0,0
3,84,1,0,0,299.4,71,50.9,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,0
4,75,1,0,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,0


In [11]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled_data = pd.DataFrame(scaler.fit_transform(data), index = data.index, columns=data.columns)

## Question 3

* Separate the feature columns (everything except `churned`) from the label (`churned`). This will create two tables.
* Fit a K-nearest neighbors model with a value of `k=3` to this data and predict the outcome on the same data.

In [12]:
y_data = scaled_data.churned
print(y_data)

0       0.0
1       0.0
2       0.0
3       0.0
4       0.0
5       0.0
6       0.0
7       0.0
8       0.0
9       0.0
10      1.0
11      0.0
12      0.0
13      0.0
14      0.0
15      1.0
16      0.0
17      0.0
18      0.0
19      0.0
20      0.0
21      1.0
22      0.0
23      0.0
24      0.0
25      0.0
26      0.0
27      0.0
28      0.0
29      0.0
       ... 
4970    0.0
4971    0.0
4972    0.0
4973    0.0
4974    0.0
4975    0.0
4976    0.0
4977    0.0
4978    0.0
4979    0.0
4980    1.0
4981    0.0
4982    0.0
4983    0.0
4984    0.0
4985    0.0
4986    0.0
4987    0.0
4988    0.0
4989    0.0
4990    1.0
4991    1.0
4992    0.0
4993    0.0
4994    0.0
4995    0.0
4996    1.0
4997    0.0
4998    0.0
4999    0.0
Name: churned, Length: 5000, dtype: float64


In [13]:
x_data = scaled_data.iloc[:,:17]
print(x_data)

      account_length  intl_plan  voice_mail_plan  number_vmail_messages  \
0           0.524793        0.0              1.0               0.480769   
1           0.438017        0.0              1.0               0.500000   
2           0.561983        0.0              0.0               0.000000   
3           0.342975        1.0              0.0               0.000000   
4           0.305785        1.0              0.0               0.000000   
5           0.483471        1.0              0.0               0.000000   
6           0.495868        0.0              1.0               0.461538   
7           0.603306        1.0              0.0               0.000000   
8           0.479339        0.0              0.0               0.000000   
9           0.578512        1.0              1.0               0.711538   
10          0.264463        0.0              0.0               0.000000   
11          0.301653        0.0              0.0               0.000000   
12          0.690083     

In [14]:
from sklearn.neighbors import KNeighborsClassifier
KNN = KNeighborsClassifier(n_neighbors=3)
KNN = KNN.fit(x_data, y_data)
y_predict = KNN.predict(x_data)
print(y_predict)

[0. 0. 0. ... 0. 0. 0.]


## Question 4

Ways to measure error haven't been discussed in class yet, but accuracy is an easy one to understand--it is simply the percent of labels that were correctly predicted (either true or false). 

* Write a function to calculate accuracy using the actual and predicted labels.
* Using the function, calculate the accuracy of this K-nearest neighbors model on the data.

In [38]:
import numpy as np
import numpy
import sys
numpy.set_printoptions(threshold=sys.maxsize)
def accuracy(actual_churned, predicted_churned):
    correct = 0
    result = pd.Series(actual_churned.eq(predicted_churned))
    correct = result.value_counts(True)
    counted_data = correct.rename_axis('unique_values').to_frame('counts')
    num_correct = counted_data.iloc[0, 0]
    return num_correct


predicted_data = pd.Series(y_predict)
accuracy(y_data, predicted_data)
        
    

0.9422

In [39]:
print(y_data)

0       0.0
1       0.0
2       0.0
3       0.0
4       0.0
5       0.0
6       0.0
7       0.0
8       0.0
9       0.0
10      1.0
11      0.0
12      0.0
13      0.0
14      0.0
15      1.0
16      0.0
17      0.0
18      0.0
19      0.0
20      0.0
21      1.0
22      0.0
23      0.0
24      0.0
25      0.0
26      0.0
27      0.0
28      0.0
29      0.0
       ... 
4970    0.0
4971    0.0
4972    0.0
4973    0.0
4974    0.0
4975    0.0
4976    0.0
4977    0.0
4978    0.0
4979    0.0
4980    1.0
4981    0.0
4982    0.0
4983    0.0
4984    0.0
4985    0.0
4986    0.0
4987    0.0
4988    0.0
4989    0.0
4990    1.0
4991    1.0
4992    0.0
4993    0.0
4994    0.0
4995    0.0
4996    1.0
4997    0.0
4998    0.0
4999    0.0
Name: churned, Length: 5000, dtype: float64


## Question 5

* Fit the K-nearest neighbors model again with `n_neighbors=3` but this time use distance for the weights. Calculate the accuracy using the function you created above. 
* Fit another K-nearest neighbors model. This time use uniform weights but set the power parameter for the Minkowski distance metric to be 1 (`p=1`) i.e. Manhattan Distance.

When weighted distances are used for part 1 of this question, a value of 1.0 should be returned for the accuracy. Why do you think this is? *Hint:* we are predicting on the data and with KNN the model *is* the data. We will learn how to avoid this pitfall in the next lecture.

In [40]:
KNN = KNeighborsClassifier(n_neighbors=3, weights='distance')
KNN = KNN.fit(x_data, y_data)
y2_predict = KNN.predict(x_data)
print(y2_predict)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.
 1. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.
 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 1. 0. 1. 0. 0. 0. 0.
 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0.
 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.
 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0.
 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 1. 0. 0. 1. 0.
 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.

In [41]:
accuracy(y_data, y2_predict)

1.0

In [46]:
from sklearn.neighbors import KNeighborsRegressor
KNN = KNeighborsClassifier(n_neighbors=3, weights='uniform',
     p=1, metric='minkowski')
KNN = KNN.fit(x_data, y_data)
y3_predict = KNN.predict(x_data)
print(y3_predict)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0.
 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.
 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.
 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 1. 0. 0. 1. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.

In [47]:
accuracy(y_data, y3_predict)

0.9456

In [None]:
#We are using KNN to predict the churn rate for that data. If the KNN model used is the same as the data, the 
#computer cannot be incorrect. therefore it's 100% accurate.

## Question 6

* Fit a K-nearest neighbors model using values of `k` (`n_neighbors`) ranging from 1 to 20. Use uniform weights (the default). The coefficient for the Minkowski distance (`p`) can be set to either 1 or 2--just be consistent. Store the accuracy and the value of `k` used from each of these fits in a list or dictionary.
* Plot (or view the table of) the `accuracy` vs `k`. What do you notice happens when `k=1`? Why do you think this is? *Hint:* it's for the same reason discussed above.

In [45]:
variable_k = []

for i in range(1, 21):
    KNN = KNeighborsClassifier(n_neighbors=i, weights='uniform', p=2, metric='minkowski')
    KNN = KNN.fit(x_data, y_data)
    y_variable_predict = KNN.predict(x_data)
    variable_to_series = pd.Series(y_variable_predict)
    value = accuracy(y_data, variable_to_series)
    variable_k.insert(i, value)
    
    
    print(variable_k)

[1.0]
[1.0, 0.9292]
[1.0, 0.9292, 0.9422]
[1.0, 0.9292, 0.9422, 0.9154]
[1.0, 0.9292, 0.9422, 0.9154, 0.9284]
[1.0, 0.9292, 0.9422, 0.9154, 0.9284, 0.9156]
[1.0, 0.9292, 0.9422, 0.9154, 0.9284, 0.9156, 0.9254]
[1.0, 0.9292, 0.9422, 0.9154, 0.9284, 0.9156, 0.9254, 0.9122]
[1.0, 0.9292, 0.9422, 0.9154, 0.9284, 0.9156, 0.9254, 0.9122, 0.9224]
[1.0, 0.9292, 0.9422, 0.9154, 0.9284, 0.9156, 0.9254, 0.9122, 0.9224, 0.9092]
[1.0, 0.9292, 0.9422, 0.9154, 0.9284, 0.9156, 0.9254, 0.9122, 0.9224, 0.9092, 0.9158]
[1.0, 0.9292, 0.9422, 0.9154, 0.9284, 0.9156, 0.9254, 0.9122, 0.9224, 0.9092, 0.9158, 0.9076]
[1.0, 0.9292, 0.9422, 0.9154, 0.9284, 0.9156, 0.9254, 0.9122, 0.9224, 0.9092, 0.9158, 0.9076, 0.9148]
[1.0, 0.9292, 0.9422, 0.9154, 0.9284, 0.9156, 0.9254, 0.9122, 0.9224, 0.9092, 0.9158, 0.9076, 0.9148, 0.905]
[1.0, 0.9292, 0.9422, 0.9154, 0.9284, 0.9156, 0.9254, 0.9122, 0.9224, 0.9092, 0.9158, 0.9076, 0.9148, 0.905, 0.9098]
[1.0, 0.9292, 0.9422, 0.9154, 0.9284, 0.9156, 0.9254, 0.9122, 0.9224, 0.

In [59]:
variable_KNN_df = pd.DataFrame(variable_k, columns=['accuracy'])
variable_KNN_df['K-Value'] = np.arange(len(variable_KNN_df)) + 1
print(variable_KNN_df)

    accuracy  K-Value
0     1.0000        1
1     0.9292        2
2     0.9422        3
3     0.9154        4
4     0.9284        5
5     0.9156        6
6     0.9254        7
7     0.9122        8
8     0.9224        9
9     0.9092       10
10    0.9158       11
11    0.9076       12
12    0.9148       13
13    0.9050       14
14    0.9098       15
15    0.9044       16
16    0.9080       17
17    0.9028       18
18    0.9078       19
19    0.9020       20


In [None]:
#When K = 1 the same thing occurs as in #5. The data for the KNN model is the same as the data that is being predicted
#This happens because if only the next point (k=1) is observed, it will copy the data used for the model. 