First: Develop a predictive model that predicts lawn mower ownership (use the RisingMower.csv data) using SVM classification and the three kernels demonstrated in class today. (we have yet to cover things like cross-validation and grid search - use the class material covered so far to address this problem). Display the results of each of these models (accuracy, precision, recall, and F1) and save the 'winning' model to a pickle file. (NOTE: Since this is a 'real' model you plan to deploy, be sure to include an appropriate train/test split into your modeling process)

Second: Create an application that asks the user for an income and lot size and answers if it's predicted that this property would own a lawnmower (also include the probability of this prediction). You can choose a text-based application or Web 

# SVM Demonstration

In this tutorial we will demonstrate how to use the `SVM` class in `scikit-learn` to perform logistic regression on a dataset. 

NOTE: We are not splitting the data in this example. For this example we focus on the fitting process and results of the model on training data. As we know, this isn't how you would normally use a model. You can easily add splitting the data (as we did in the previous examples).

## 1. Setup

Import modules

In [17]:
import pandas as pd
from sklearn.svm import SVC
from matplotlib import pyplot as plt
import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

np.random.seed(25)

Load data (it's already cleaned and preprocessed)

In [2]:
# Uncomment the following snippet of code to debug problems with finding the .csv file path
# This snippet of code will exit the program and print the current working directory.
#import os
#print(os.getcwd())

In [3]:
df = pd.read_csv('C:/Users/simra/Downloads/RidingMowers.csv') 
df.head(3)

Unnamed: 0,Income,Lot_Size,Ownership
0,60.0,18.4,Owner
1,85.5,16.8,Owner
2,64.8,21.6,Owner


In [19]:
# split the data into validation and training set
train_df, test_df = train_test_split(df, test_size=0.3)

# to reduce repetition in later code, create variables to represent the columns
# that are our predictors and target
target = 'Ownership'
predictors = list(df.columns)
predictors.remove(target)

In [20]:
train_X = train_df[predictors]
train_y = train_df[target] # train_target is now a series objecttrain_df.to_csv('airbnb_train_df.csv', index=False)
test_X = test_df[predictors]
test_y = test_df[target] # validation_target is now a series object


## 3. Model the data

First, let's create a dataframe to load the model performance metrics into.

In [21]:
performance = pd.DataFrame({"model": [], "Accuracy": [], "Precision": [], "Recall": [], "F1": []})

### 3.1 Fit a SVM classification model using linear kernal

In [33]:
svm_lin_model = SVC(kernel="linear", probability=True)
_ = svm_lin_model.fit(train_X, np.ravel(train_y))

In [34]:
model_preds = svm_lin_model.predict(test_X)
c_matrix = confusion_matrix(test_y, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"linear svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])
performance

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.75,0.8,0.8,0.8
0,rbf svm,0.5,0.666667,0.4,0.5
0,poly svm,0.625,1.0,0.4,0.571429
0,linear svm,0.75,0.8,0.8,0.8


### 3.2 Fit a SVM classification model using rbf kernal

In [24]:
svm_rbf_model = SVC(kernel="rbf", C=10, gamma='scale')
_ = svm_rbf_model.fit(train_X, np.ravel(train_y))

In [25]:
model_preds = svm_rbf_model.predict(test_X)
c_matrix = confusion_matrix(test_y, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"rbf svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])
performance

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.75,0.8,0.8,0.8
0,rbf svm,0.5,0.666667,0.4,0.5


### 3.3 Fit a SVM classification model using polynomial kernal

In [26]:
svm_poly_model = SVC(kernel="poly", degree=3, coef0=1, C=10)
_ = svm_poly_model.fit(train_X, np.ravel(train_y))

In [27]:
model_preds = svm_poly_model.predict(test_X)
c_matrix = confusion_matrix(test_y, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"poly svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])


In [29]:
## 4.0 Summary

performance

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.75,0.8,0.8,0.8
0,rbf svm,0.5,0.666667,0.4,0.5
0,poly svm,0.625,1.0,0.4,0.571429


In [35]:
import pickle

# save model
pickle.dump(svm_lin_model, open('C:/Users/simra/Downloads/svm_linear_ownership.pkl', "wb"))
