## Validation Metrics

### Multilabel Classification

In [None]:
from sklearn import datasets
iris = datasets.load_iris()
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(iris.data, iris.target, test_size = 0.5, random_state =4)
# Use a very bad multiclass classifier
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(max_depth =2)
classifier.fit(X_train, Y_train)
Y_pred = classifier.predict(X_test)
iris.target_names

Measures that are commonly used in multilabel classification:
Confusion matrix: tells about misclassification for each class. Ideally, in a perfect classification, all the cells that are not on the diagonal should be 0s.In the following example, you will instead see that class 0 (Setosa) is never misclassified, class 1 (Versicolor) is misclassified thrice as Virginica, and class 2 (Virginica) is misclassified twice as Versicolor.


In [None]:
from sklearn import metrics
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(Y_test, Y_pred)
cm

In [None]:
import matplotlib.pyplot as plt
img = plt.matshow(cm, cmap=plt.cm.autumn)
plt.colorbar(img, fraction=0.045)
for x in range(cm.shape[0]):
    for y in range(cm.shape[1]):
        plt.text(x, y, "%0.2f" % cm[x,y], 
            size=12, color='black', ha="center", va="center")

In [None]:
#Accuracy: Accuracy is the portion of the predicted labels that are exactly equal to the real ones. 
#In other words, it's the percentage of overall correctly classified labels
print ( metrics.accuracy_score(Y_test, Y_pred))

In [None]:
"""Precision: It is a measure that is taken from the information retrieval world. It counts the number
of relevant results in the result set. Equivalently, in a classification task, it counts the number of correct 
labels in each set of classified labels.Then, results are averaged on all of the labels.

Recall: This is another concept taken from information retrieval. 
It counts the number of relevant results in the result set, compared to all of the relevant labels in the dataset. 
In classification tasks, this is the amount of correctly classified labels in the set divided by the total count
of labels for that set. Finally, the results are averaged, just like in the following code

F1 Score: This is the harmonic average of precision and recall, which is mostly used when dealing with 
unbalanced datasets in order to reveal if the classifier is performing well with all the classes:"""
from sklearn.metrics import classification_report
print (classification_report(Y_test, Y_pred, 
                            target_names=iris.target_names))

### Regression

In [None]:
#to predict real numbers or regression, many error measures are derived from Euclidean algebra
#Mean absolute error or MAE: This is the mean L1 norm of the difference vector between the predicted and real values
from sklearn.metrics import mean_absolute_error 
mean_absolute_error([1.0, 0.0, 0.0], [0.0, 0.0, -1.0])

In [None]:
#Mean squared error or MSE: This is the mean L2 norm of the difference vector between the predicted and real values
from sklearn.metrics import mean_squared_error 
mean_squared_error([-10.0, 0.0, 0.0], [0.0, 0.0, 0.0])

In [None]:
"""R2 score: R2 is also known as the coefficient of determination. R2 determines how good a linear fit 
there is that exists between the predictors and the target variable.It takes values between 0 and 1 (inclusive); 
the higher R2 is, the better the model.  """

## Testing and validating

In [None]:
"""A machine learning algorithm, by observing a series of examples and pairing them with their outcome,
is able to extract a series of rules that can be successfully generalized to new examples by correctly guessing
their resulting outcome. Such is the supervised learning approach, where it applies a series of highly specialized
learning algorithms that we expect can correctly predict (and generalize) on any new data"""
from sklearn.datasets import load_digits
digits = load_digits()
print(digits.DESCR)
X = digits.data
y = digits.target

In [None]:
#64 numeric values from 0-16 of each 8*8 images
X[0]

In [None]:
#using three different support vector machines for classification
from sklearn import svm
h1 = svm.LinearSVC(C=1.0)
h2 = svm.SVC(kernel = "rbf", degree = 3, gamma = 0.001, C=1.0)
h3 = svm.SVC(kernel="poly", degree =3, C=1.0)

In [None]:
h1.fit(X,y)
print (h1.score(X,y))

In [None]:
chosen_random_state = 1
X_train, X_test, y_train, y_test =train_test_split(
                    X, y, 
                    test_size=0.30, random_state=chosen_random_state)
print ("(X train shape %s, X test shape %s,y train shape %s, y test shape %s" % (X_train.shape, X_test.shape, 
                            y_train.shape, y_test.shape))
h1.fit(X_train,y_train)
print (h1.score(X_test,y_test))

In [None]:
#using validation set to compare the performance
chosen_random_state = 1
X_train, X_validation_test, y_train, y_validation_test = train_test_split(X, y, test_size=.40, 
                                                                          random_state=chosen_random_state)            
X_validation, X_test, y_validation, y_test = train_test_split(X_validation_test, y_validation_test, 
                                                              test_size=.50, 
                                                              random_state=chosen_random_state)
print ("X train shape, %s, X validation shape %s, X test shape %s,/n y train shape %s, y validation shape %s, y test shape %s/n" 
       % (X_train.shape, X_validation.shape, X_test.shape,  
         y_train.shape, y_validation.shape, y_test.shape))
for hypothesis in [h1, h2, h3]:
        hypothesis.fit(X_train,y_train)
        print ("%s -> validation mean accuracy = %0.3f" % (hypothesis,  
        hypothesis.score(X_validation,y_validation))  )  
h2.fit(X_train,y_train)
print ("n%s -> test mean accuracy = %0.3f" % (h2,   
h2.score(X_test,y_test)))

## Cross Validation

In [None]:
#The idea is to divide the training data into a certain number of partitions (called folds) and train the model
#as many times as the number of partitions there are
# After every model training, it will test the result on the fold that is left out and store it away.
from sklearn.model_selection import cross_val_score
import numpy as np
choosen_random_state = 1
cv_folds = 10 # Try 3, 5 or 20
eval_scoring='accuracy' # Try also f1
workers = -1 # this will use all your CPU power
X_train, X_test, y_train, y_test = train_test_split(
                                    X, y, 
                                    test_size=0.30, 
                                    random_state=choosen_random_state)
for hypothesis in [h1, h2, h3]:
    scores = cross_val_score(hypothesis, 
                     X_train, y_train, 
                     cv=cv_folds, scoring= eval_scoring, n_jobs=workers)
    print ("%s -> cross validation accuracy: mean = %0.3f \
            std = %0.3f" % (hypothesis, np.mean(scores), 
                            np.std(scores))) 