<h2>1. Importing the Dataset</h2>

In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv('../../../data/clean/Social_Network_Ads.csv')
display(df.head())
x = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0


---
<h2>2. Splitting the Dataset</h2>

In [2]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
print("train dataset size : {} observations\ntest dataset size : {} observations".format(x_train.shape[0], x_test.shape[0]))

train dataset size : 320 observations
test dataset size : 80 observations


---
<h2>3. Feature Scaling</h2>

In [3]:
from sklearn.preprocessing import StandardScaler

stand_x = StandardScaler().fit(x_train)
x_ss = stand_x.transform(x_train)

---
<h2>4. Training the SVM Model with Train Dataset</h2>

In [4]:
from sklearn.svm import SVC

svm = SVC(C=1.0, kernel='rbf', gamma='scale', random_state=42)
svm.fit(x_ss, y_train)

SVC(random_state=42)

---
<h2>5. Predicting the Test Dataset and Display Results</h2>

In [5]:
y_pred = svm.predict(stand_x.transform(x_test))

pd.DataFrame(data=np.stack((y_test, y_pred), axis=1),
             index=None, columns=['y actual', 'y prediction'],
             copy=False).head(10)

Unnamed: 0,y actual,y prediction
0,0,0
1,0,0
2,0,0
3,0,0
4,0,0
5,0,0
6,0,0
7,1,1
8,0,0
9,0,1


---
<h2>6. Making the Confusion Matrix</h2>

In [6]:
from sklearn.metrics import confusion_matrix

print(confusion_matrix(y_test, y_pred))
print("\nConfusion matrix result shows that:\n\t- 55 correct predictions of the class 0 (who didn\'t buy the product)\
        \n\t- 3 incorrect predictions of the class 1 (predicted as user who bought the product but in reality not to)\
        \n\t- 21 correct predictions of the class 1 (who bought the product)\
        \n\t- 1 incorrect predictions of the class 0 (predicted as user who didn\'t buy the product but in reality they bought the product)")

[[55  3]
 [ 1 21]]

Confusion matrix result shows that:
	- 55 correct predictions of the class 0 (who didn't buy the product)        
	- 3 incorrect predictions of the class 1 (predicted as user who bought the product but in reality not to)        
	- 21 correct predictions of the class 1 (who bought the product)        
	- 1 incorrect predictions of the class 0 (predicted as user who didn't buy the product but in reality they bought the product)


---
<h2>7. Applying k-Fold Cross Validation</h2>

In [7]:
from sklearn.model_selection import cross_val_score

acc = cross_val_score(estimator=svm, X=x_ss, y=y_train, cv=10, n_jobs=-1)
print("Accuracy: {:.2f}%".format(acc.mean()*100))
print("Standard Deviation: {:.2f}%".format(acc.std()*100))
print("\nWhile the 10 accuracy is resulting from the test, they fall around between {:.2f}% and {:.2f}%. So we have actually a pretty high Standard Deviation.".format((acc.mean()-acc.std())*100, (acc.mean()+acc.std())*100))

Accuracy: 90.00%
Standard Deviation: 4.80%

While the 10 accuracy is resulting from the test, they fall around between 85.20% and 94.80%. So we have actually a pretty high Standard Deviation.
