#In this assignment we are going to work with SVMs

*   We will learn about using SVMs for multi-class problems
*   Experiment and learn about how the different kernels affect performance
*   Understand the performance measures using confusion matrices






### Let's start by first importing packages





In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from sklearn.svm import SVC
from sklearn.metrics import accuracy_score




### Next load and format data

In [2]:
# Load data

df = pd.read_csv('loans.csv')
df.head()

Unnamed: 0,id_number,loan_amount,lender_count,status,funded_date,funded_amount,repayment_term,location_country_code,sector,description,use
0,736066,4825,60,funded,2014-08-03T17:51:50Z,4825,8,BJ,Retail,,
1,743090,975,34,funded,2014-08-18T09:10:54Z,975,12,BJ,Food,,
2,743120,950,25,funded,2014-08-09T17:46:35Z,950,14,BJ,Services,,
3,743121,825,28,funded,2014-08-24T17:00:38Z,825,14,BJ,Retail,,
4,743124,725,21,funded,2014-08-25T03:24:54Z,725,13,BJ,Retail,,


#Question 1 (3 points) Setting up X and y# 

We are going to build a classifier to predict whether the load was funded or not. We are going to use the 'status' column as our Y variable.

The rest of columns are going to be our explanatory variables.
But we will end up dropping some columns from the X dataframe because we don't want to deal with discrete values for this assignment for now.**bold text**

In [3]:
y_column = 'status'
y = df[y_column]


#YOUR CODE TO SET UP X DATAFRAME GOES HERE. YOUR X SHOULD LOOK LIKE THE BELOW

X = df.drop(['id_number','status','funded_date','location_country_code','sector','description','use'], axis=1)
X.head()

Unnamed: 0,loan_amount,lender_count,funded_amount,repayment_term
0,4825,60,4825,8
1,975,34,975,12
2,950,25,950,14
3,825,28,825,14
4,725,21,725,13


In [4]:
# Split data into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [6]:
y_test

2868    funded
5924    funded
3764    funded
4144    funded
2780    funded
         ...  
5926    funded
4216    funded
1351    funded
4603    funded
5668    funded
Name: status, Length: 1204, dtype: object

#Question 2 (3 points) Running SVM with different kernels#

Define and fit four classifiers using the svm.SVC library on the X_train and y_train



1.   Linear kernel
2.   RBF
1.   Polynomial
2.   Sigmoid

Next, use each of the fitted models to predict on X_test


And finally ouptut the accuracy of each model.









### Linear Kernel 

In [5]:


svclassifierlin = SVC(kernel='linear')
svclassifierlin.fit(X_train, y_train)


#YOUR CODE GOES HERE


SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [6]:
y_pred1=svclassifierlin.predict(X_test)

In [7]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred1))
print(classification_report(y_test,y_pred1))

[[   3    0   19]
 [   0 1022    0]
 [   1    0  159]]
              precision    recall  f1-score   support

     expired       0.75      0.14      0.23        22
      funded       1.00      1.00      1.00      1022
 fundraising       0.89      0.99      0.94       160

    accuracy                           0.98      1204
   macro avg       0.88      0.71      0.72      1204
weighted avg       0.98      0.98      0.98      1204



In [8]:
print('Accuracy: %.3f' % accuracy_score(y_test, y_pred1))

Accuracy: 0.983


### RBF Kernel

In [9]:
svclassifierrbf = SVC(kernel='rbf')
svclassifierrbf.fit(X_train, y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [10]:
y_pred2=svclassifierrbf.predict(X_test)

In [11]:
print(confusion_matrix(y_test,y_pred2))
print(classification_report(y_test,y_pred2))

[[   0    4   18]
 [   0 1022    0]
 [   0   22  138]]
              precision    recall  f1-score   support

     expired       0.00      0.00      0.00        22
      funded       0.98      1.00      0.99      1022
 fundraising       0.88      0.86      0.87       160

    accuracy                           0.96      1204
   macro avg       0.62      0.62      0.62      1204
weighted avg       0.95      0.96      0.95      1204



  _warn_prf(average, modifier, msg_start, len(result))


In [12]:
print('Accuracy: %.3f' % accuracy_score(y_test, y_pred2))


Accuracy: 0.963


### Polynomial Kernel

In [13]:
svclassifierpoly = SVC(kernel='poly',degree =5)
svclassifierpoly.fit(X_train, y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=5, gamma='scale', kernel='poly',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [14]:
y_pred3=svclassifierpoly.predict(X_test)

In [15]:
print(confusion_matrix(y_test,y_pred3))
print(classification_report(y_test,y_pred3))

[[   0    9   13]
 [   0 1022    0]
 [   1  137   22]]
              precision    recall  f1-score   support

     expired       0.00      0.00      0.00        22
      funded       0.88      1.00      0.93      1022
 fundraising       0.63      0.14      0.23       160

    accuracy                           0.87      1204
   macro avg       0.50      0.38      0.39      1204
weighted avg       0.83      0.87      0.82      1204



In [16]:
print('Accuracy: %.3f' % accuracy_score(y_test, y_pred3))

Accuracy: 0.867


### Sigmoid Kernel

In [17]:
svclassifiersig = SVC(kernel='sigmoid')
svclassifiersig.fit(X_train, y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='sigmoid',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [18]:
y_pred4=svclassifiersig.predict(X_test)

In [19]:
print(confusion_matrix(y_test,y_pred4))
print(classification_report(y_test,y_pred4))

[[  2  20   0]
 [ 50 921  51]
 [ 10 131  19]]
              precision    recall  f1-score   support

     expired       0.03      0.09      0.05        22
      funded       0.86      0.90      0.88      1022
 fundraising       0.27      0.12      0.17       160

    accuracy                           0.78      1204
   macro avg       0.39      0.37      0.36      1204
weighted avg       0.77      0.78      0.77      1204



In [20]:
print('Accuracy: %.3f' % accuracy_score(y_test, y_pred4))

Accuracy: 0.782
