# Binary Classification using Logistic Regression and SVMs

In this assignment, you need to perform binary classification using Logistic Regression and SVMs. The dataset is provided to you. Please note the following:
1. Use the dataset provided with train/val/test set. (Dataset-Binary-Train/Validate/Test.csv)
2. You can use **LogisticRegression** and **SVC** from `sklearn` package:
    - https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
    - https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
3. You might need to look into data balancing, dealing with categorical values.
4. For both the validation and test sets:
    - Show confusion matrix
    - Accuracy, Precision, Recall, F-1 score
    - AUC
    
    The most important metrics in this problem are F1-Score and AUC.

In [2]:
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
import numpy as np


In [23]:
data = np.genfromtxt('Dataset-Binary-Test.csv', delimiter=',')
print(np.shape(data))
features = ['F1','F2','F3','F4','F5','F6','F7','F8']
print(features)
x_test = data[1:,0:8]
y_test = data[1:,8]
print(np.shape(x_test))

(266, 9)
['F1', 'F2', 'F3', 'F4', 'F5', 'F6', 'F7', 'F8']
(265, 8)


In [24]:
data = np.genfromtxt('Dataset-Binary-Train.csv', delimiter=',')
print(np.shape(data))
x_train = data[1:,0:8]
y_train = data[1:,8]
print(np.shape(x_train))

(1001, 9)
(1000, 8)


In [25]:
data = np.genfromtxt('Dataset-Binary-Validate.csv', delimiter=',')
print(np.shape(data))
x_val = data[1:,0:8]
y_val = data[1:,8]
print(np.shape(x_val))

(220, 9)
(219, 8)


# Logistic_Regression

In [48]:
regr = LogisticRegression()
regr.fit(x_train, y_train)
y_pred = regr.predict(x_val)
y_pred_test = regr.predict(x_test)
score = regr.score(x_val, y_val)
print("Validation set score: ",score)
print("Test set score: ", regr.score(x_test, y_test))

Validation set score:  0.8949771689497716
Test set score:  0.8264150943396227


# [Validation Set]

In [64]:
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_val, y_pred).ravel()
print(confusion_matrix(y_val, y_pred))
print(tn,fp,fn,tp)

[[192   0]
 [ 23   4]]
192 0 23 4


In [66]:
from sklearn.metrics import roc_auc_score

accuracy = (tp + tn) / (tn + fp + fn + tp)
print("Accuracy: %5.2f" % (accuracy*100), "%")
precision = tp / (tp + fp)
print("Precision: %5.2f" % (precision*100), "%")
recall = tp / (tp + fn)
print("Recall: %5.2f" % (recall*100), "%")
f1 = 2 * ((precision*recall)/(precision+recall))
print("F1-Score: %5.2f" % (f1))
auc = roc_auc_score(y_val, y_pred)
print("AUC: %5.2f" % (auc))

Accuracy: 89.50 %
Precision: 100.00 %
Recall: 14.81 %
F1-Score:  0.26
AUC:  0.57


# [Test Set]

In [67]:
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_test, y_pred_test).ravel()
print(confusion_matrix(y_test, y_pred_test))
print(tn,fp,fn,tp)

[[218   0]
 [ 46   1]]
218 0 46 1


In [68]:
accuracy = (tp + tn) / (tn + fp + fn + tp)
print("Accuracy: %5.2f" % (accuracy*100), "%")
precision = tp / (tp + fp)
print("Precision: %5.2f" % (precision*100), "%")
recall = tp / (tp + fn)
print("Recall: %5.2f" % (recall*100), "%")
f1 = 2 * ((precision*recall)/(precision+recall))
print("F1-Score: %5.2f" % (f1))
auc = roc_auc_score(y_test, y_pred_test)
print("AUC: %5.2f" % (auc))

Accuracy: 82.64 %
Precision: 100.00 %
Recall:  2.13 %
F1-Score:  0.04
AUC:  0.51


# SVM

In [69]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

regr = make_pipeline(StandardScaler(), SVC(gamma='auto'))
regr.fit(x_train, y_train)
y_pred = regr.predict(x_val)
y_pred_test = regr.predict(x_test)
score = regr.score(x_val, y_val)
print("Validation set score: ",score)
print("Test set score: ", regr.score(x_test, y_test))

Validation set score:  0.9315068493150684
Test set score:  0.9207547169811321


# Validation set

In [74]:
tn, fp, fn, tp = confusion_matrix(y_val, y_pred).ravel()
print(confusion_matrix(y_val, y_pred))
print(tn,fp,fn,tp)

[[186   6]
 [  9  18]]
186 6 9 18


In [75]:
accuracy = (tp + tn) / (tn + fp + fn + tp)
print("Accuracy: %5.2f" % (accuracy*100), "%")
precision = tp / (tp + fp)
print("Precision: %5.2f" % (precision*100), "%")
recall = tp / (tp + fn)
print("Recall: %5.2f" % (recall*100), "%")
f1 = 2 * ((precision*recall)/(precision+recall))
print("F1-Score: %5.2f" % (f1))
auc = roc_auc_score(y_val, y_pred)
print("AUC: %5.2f" % (auc))

Accuracy: 93.15 %
Precision: 75.00 %
Recall: 66.67 %
F1-Score:  0.71
AUC:  0.82


# Test Set

In [79]:
tn, fp, fn, tp = confusion_matrix(y_test, y_pred_test).ravel()
print(confusion_matrix(y_test, y_pred_test))
print(tn,fp,fn,tp)

[[216   2]
 [ 19  28]]
216 2 19 28


In [80]:
accuracy = (tp + tn) / (tn + fp + fn + tp)
print("Accuracy: %5.2f" % (accuracy*100), "%")
precision = tp / (tp + fp)
print("Precision: %5.2f" % (precision*100), "%")
recall = tp / (tp + fn)
print("Recall: %5.2f" % (recall*100), "%")
f1 = 2 * ((precision*recall)/(precision+recall))
print("F1-Score: %5.2f" % (f1))
auc = roc_auc_score(y_test, y_pred_test)
print("AUC: %5.2f" % (auc))

Accuracy: 92.08 %
Precision: 93.33 %
Recall: 59.57 %
F1-Score:  0.73
AUC:  0.79
