<a href="https://colab.research.google.com/github/aleksanderprofic/Machine-Learning/blob/master/Classification/ModelSelection/social_network_ads_model_selection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classification model selection

### Selecting the best model for particular problem out of all learned classification models:
* Logistic Regression, 
* KNN, 
* SVM,
* Naive Bayes
* Decision Trees,
* Random Forests


## Data preprocessing

In [164]:
import pandas as pd
import numpy as np

dataset = pd.read_csv('Social_Network_Ads.csv')
dataset.head()

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0


### Extracting dependent and independent variables

In [165]:
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

### Feature Scaling

In [166]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

### Splitting dataset into the Training Set and the Test Set 

In [167]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

## Training and predictions

### Logistic Regression

In [168]:
from sklearn.linear_model import LogisticRegression

log_regressor = LogisticRegression()
log_regressor.fit(X_train, y_train)
log_y_pred = log_regressor.predict(X_test)

### KNN

In [169]:
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=6)
knn.fit(X_train, y_train)
knn_y_pred = knn.predict(X_test)

### SVM

In [170]:
from sklearn.svm import SVC

svc = SVC(degree=4)
svc.fit(X_train, y_train)
svc_y_pred = svc.predict(X_test)

### Naive Bayes

In [171]:
from sklearn.naive_bayes import GaussianNB

nb = GaussianNB()
nb.fit(X_train, y_train)
nb_y_pred = nb.predict(X_test)

### Decision Tree

In [172]:
from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier()
tree.fit(X_train, y_train)
tree_y_pred = tree.predict(X_test)

### Random Forest

In [173]:
from sklearn.ensemble import RandomForestClassifier

forest = RandomForestClassifier(n_estimators=10)
forest.fit(X_train, y_train)
forest_y_pred = forest.predict(X_test)

## Evaluating Models performance with accuracy score

### Logistic Regression

In [174]:
from sklearn.metrics import accuracy_score, recall_score, confusion_matrix

cm = confusion_matrix(y_test, log_y_pred)

print(f'Accuracy score: {accuracy_score(log_y_pred, y_test)}')
print(f'Confusion matrix: \n{cm}')

Accuracy score: 0.925
Confusion matrix: 
[[57  1]
 [ 5 17]]


### KNN

In [175]:
from sklearn.metrics import accuracy_score, confusion_matrix

cm = confusion_matrix(y_test, knn_y_pred)
print(f'Accuracy score: {accuracy_score(knn_y_pred, y_test)}')
print(f'Confusion matrix: \n{cm}')

Accuracy score: 0.95
Confusion matrix: 
[[55  3]
 [ 1 21]]


### SVM

In [176]:
from sklearn.metrics import accuracy_score, recall_score, confusion_matrix

cm = confusion_matrix(y_test, svc_y_pred)

print(f'Accuracy score: {accuracy_score(svc_y_pred, y_test)}')
print(f'Confusion matrix: \n{cm}')

Accuracy score: 0.95
Confusion matrix: 
[[55  3]
 [ 1 21]]


### Naive Bayes

In [177]:
from sklearn.metrics import accuracy_score, recall_score, confusion_matrix

cm = confusion_matrix(y_test, nb_y_pred)

print(f'Accuracy score: {accuracy_score(nb_y_pred, y_test)}')
print(f'Confusion matrix: \n{cm}')

Accuracy score: 0.9125
Confusion matrix: 
[[55  3]
 [ 4 18]]


### Decision Tree

In [178]:
from sklearn.metrics import accuracy_score, recall_score, confusion_matrix

cm = confusion_matrix(y_test, tree_y_pred)

print(f'Accuracy score: {accuracy_score(tree_y_pred, y_test)}')
print(f'Confusion matrix: \n{cm}')

Accuracy score: 0.9125
Confusion matrix: 
[[54  4]
 [ 3 19]]


### Random Forest

In [179]:
from sklearn.metrics import accuracy_score, recall_score, confusion_matrix

cm = confusion_matrix(y_test, forest_y_pred)

print(f'Accuracy score: {accuracy_score(forest_y_pred, y_test)}')
print(f'Confusion matrix: \n{cm}')

Accuracy score: 0.9125
Confusion matrix: 
[[56  2]
 [ 5 17]]


The best classification models for this problem seem to be K-Nearest Neighbors or Support Vector Machines because their accuracy scores are closest to 1.