**Links**:
* https://www.kaggle.com/warther/svm-logreg
* https://www.datacamp.com/community/tutorials/decision-tree-classification-python

## Importing required libraries

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split # Import train_test_split function

In [3]:
bank = pd.read_csv("bank.csv", header=0)
submission = pd.read_csv("bank.csv", header=0)

In [4]:
bank.head()

Unnamed: 0,ID,Age,Experience,Income,ZIP Code,Family,CCAvg,Education,Mortgage,Personal Loan,Securities Account,CD Account,Online,CreditCard
0,1,25,1,49,91107,4,1.6,1,0,0,1,0,0,0
1,2,45,19,34,90089,3,1.5,1,0,0,1,0,0,0
2,3,39,15,11,94720,1,1.0,1,0,0,0,0,0,0
3,4,35,9,100,94112,1,2.7,2,0,0,0,0,0,0
4,5,35,8,45,91330,4,1.0,2,0,0,0,0,0,1


In [5]:
print (bank.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 14 columns):
ID                    5000 non-null int64
Age                   5000 non-null int64
Experience            5000 non-null int64
Income                5000 non-null int64
ZIP Code              5000 non-null int64
Family                5000 non-null int64
CCAvg                 5000 non-null float64
Education             5000 non-null int64
Mortgage              5000 non-null int64
Personal Loan         5000 non-null int64
Securities Account    5000 non-null int64
CD Account            5000 non-null int64
Online                5000 non-null int64
CreditCard            5000 non-null int64
dtypes: float64(1), int64(13)
memory usage: 547.0 KB
None


## Splitting Data
To understand model performance, dividing the dataset into a training set and a test set is a good strategy.

Let's split the dataset by using function train_test_split(). You need to pass 3 parameters features, target, and test_set size.

In [7]:
##### Split between X and y #####

X=bank.drop('CreditCard',axis=1)
y=bank['CreditCard']

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) # 70% training and 30% test

print('train size is %i'%y_train.shape[0])
print('test size is %i'%y_test.shape[0])

train size is 3500
test size is 1500


## Building SVM Model

SVM and Logistic Regression will be tested in a pipeline with preprocessing jobs: an imputer, a polynomial transformation and a data scale.

We will put the two algorithm into a loop.

The results will be anlysed with a ROC curve, and learning curve will be used to search how we can improve our score.

In [8]:
from sklearn.svm import SVC
from sklearn import metrics
from sklearn import model_selection
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_predict

In [9]:
classifier = SVC(random_state=0, kernel='rbf',decision_function_shape='ovo')

classifier.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = classifier.predict(X_test)



## Evaluating Model
Let's estimate, how accurately the classifier or model can predict the type of cultivars.

Accuracy can be computed by comparing actual test set values and predicted values.

In [11]:
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

report=metrics.classification_report(y_test,y_pred)

clf_name=['SVM Finance Loan','RegLog']

print('Reporting for %s:'%clf_name)

print(report)

Accuracy: 0.714
Reporting for ['SVM Finance Loan', 'RegLog']:
              precision    recall  f1-score   support

           0       0.71      1.00      0.83      1071
           1       0.00      0.00      0.00       429

    accuracy                           0.71      1500
   macro avg       0.36      0.50      0.42      1500
weighted avg       0.51      0.71      0.59      1500



  'precision', 'predicted', average, warn_for)


## Create a copy of the original file, so we can run the svm_label after on Decision Tree

In [12]:
original = bank.drop('CreditCard',axis=1)
original.head()

#Predict the response for all dataset to compare results
bank_pred = classifier.predict(original)
print(bank_pred)

[0 0 0 ... 0 0 1]


In [15]:
submission["svm_label"] = bank_pred

submission.head()

Unnamed: 0,ID,Age,Experience,Income,ZIP Code,Family,CCAvg,Education,Mortgage,Personal Loan,Securities Account,CD Account,Online,CreditCard,svm_label
0,1,25,1,49,91107,4,1.6,1,0,0,1,0,0,0,0
1,2,45,19,34,90089,3,1.5,1,0,0,1,0,0,0,0
2,3,39,15,11,94720,1,1.0,1,0,0,0,0,0,0,0
3,4,35,9,100,94112,1,2.7,2,0,0,0,0,0,0,0
4,5,35,8,45,91330,4,1.0,2,0,0,0,0,0,1,1


### Export a new csv for this execution

In [16]:
submission.to_csv("bank_svm.csv", index=False)