# Scikit Learn Classifiers comparison

Comparison of classifiers, a challenge posed by Siraj Raval in his video [Introduction - Learn Python for Data Science #1](https://youtu.be/T5pRlIbr6gg).

In [2]:
from sklearn.metrics import accuracy_score
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
import numpy as np

### Dataset
Initializing toy dataset for classify, input attributes are: height, weight and shoe size. In order to determinate the gender (male or female).

In [3]:
# [height, weight, shoe size]
X = [[181, 80, 44], [177, 70, 43], [160, 60, 38], [154, 54, 37], [166, 65, 40],
     [190, 90, 47], [175, 64, 39], [177, 70, 40], [159, 55, 37], [171, 75, 42], 
     [181, 85, 43], [184, 82, 43], [176, 75, 42], [170, 65, 39], [183, 84, 44]]

Y = ['male', 'male', 'female', 'female', 'male', 'male', 'female', 'female',
     'female', 'male', 'male', 'female', 'male', 'female', 'male']

### Classifiers
We storage the names of classifiers in names list. All classifiers are initialized and storaged in classifiers list.

In [4]:
names = ["Nearest Neighbors", "Linear SVM", "RBF SVM", "Gaussian Process",
         "Decision Tree", "Random Forest", "Neural Net", "AdaBoost",
         "Naive Bayes", "Quadratic Discriminant Analysis"]

classifiers = [
    KNeighborsClassifier(3),
    SVC(kernel="linear", C=0.0225),
    SVC(gamma=2, C=1),
    GaussianProcessClassifier(1.0 * RBF(1.0)),
    DecisionTreeClassifier(max_depth=5),
    RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
    MLPClassifier(alpha=1),
    AdaBoostClassifier(),
    GaussianNB(),
    QuadraticDiscriminantAnalysis()]

### Training and Testing
All Classifiers are trained with fit method. Then, we testing each classifier using the same data.

In [5]:
accuracies = []

for name, clf in zip(names, classifiers):
    # Training
    clf.fit(X, Y)
    # Testing
    pred = clf.predict(X)
    accuracy = accuracy_score(Y, pred) * 100
    accuracies.append(accuracy)
    # [example] Accuracy for [model]: [Accuracy]
    print('{} Accuracy for {}: {}'
          .format(clf.predict([[176, 74, 42]]), name, accuracy))

['male'] Accuracy for Nearest Neighbors: 80.0
['male'] Accuracy for Linear SVM: 86.66666666666667
['male'] Accuracy for RBF SVM: 100.0
['male'] Accuracy for Gaussian Process: 100.0
['male'] Accuracy for Decision Tree: 100.0
['male'] Accuracy for Random Forest: 100.0
['male'] Accuracy for Neural Net: 53.333333333333336
['male'] Accuracy for AdaBoost: 100.0
['male'] Accuracy for Naive Bayes: 86.66666666666667
['male'] Accuracy for Quadratic Discriminant Analysis: 100.0


### Best Classifier
The best classifier for the given data.

In [6]:
id = np.argmax(accuracies)
print('The best gender classifier is {}'.format(names[id]))

The best gender classifier is RBF SVM
