Examples

Here, some examples demonstrating different use cases of MultiScorer are show.
For the exact documentation see Documentation Wiki page.

For all examples below let's assume that the following code is proceeding them:

import numpy as np
from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import *

digits = datasets.load_digits()
X = digits.data
y = digits.target

clf = BernoulliNB()

...

Getting results from all folds for multiple metrics:

In this example, we will demonstrating how to process the results from all folds and from all metrics.

First, we will start by defining a multiscorer instance:

...
from multiscorer import MultiScorer

scorer = MultiScorer({
	'accuracy' : (accuracy_score,  {}),
	'precision': (precision_score,{'average': 'macro'}),
	'recall'   : (recall_score,   {'average': 'macro'}),
	'AUC'      : (auc,             {'reorder': True}),
	'F-measure': (f1_score,        {'average': 'macro'})
})
...

What we pass as an argument in the __init__() method is actually a simple Python dict. The dict itself should have as keys human readable names for the metrics to be used. The names are only for convenience and do not affect the behavior of the scorer.

For each key, a tuple value is expected. The first item should be the metric function itself, and the second argument should be a dictionary containing any arguments the function is expecting (to be passed as **kwargs). Use an empty dict if the function requires no arguments.

Note: The tuples actually have the same form as a signature of sklearn's make_scorer() function, which is to be expected, as this module is something like a decorator of that function.

Now let us perform cross validation:

...
cross_val_score(clf, X, y, scoring=scorer, cv=3)
...

You may notice, that while cross_val_score typically returns a list as a result, that result is actually no longer valid when using MultiScorer and should be ignored. Continue this example to see how to get and parse the actual results.

In order to get all scores and parse them:

...
results = scorer.get_results()

for metric_name in results.keys():
	average_metric_score = np.average(results[metric_name])
	if metric_name == 'AUC':
		print('Average AUC score : %d' % (average_metric_score))
	else:
		print('Average %s score :  %.3f' % (metric_name, average_metric_score))

The output will be the following:

Average recall score :  0.826
Average AUC score : 40
Average F-measure score :  0.824
Average precision score :  0.829
Average accuracy score :  0.826

Note: As you can see, we parse slightly different the result of auc metric, because its result is an integer, rather than a float number. This was done in order to demonstrate that MultiScorer is not affected by the type of return value of the metrics used. Rather, it is client code's responsibility to parse the results accordingly.

Handling different fold results separately.

In this example, we aim to print results for all metrics fold after fold. In order to do this, we make use of the fold argument of get_results() method. The first steps (creating the scorer and performing cross validation) are exactly as before.

In order to get the results per fold:

...
for fold_number in range(1, 10+1):
	fold_results = scorer.get_results(fold=fold_number)

	print('Fold %d:' %(fold_number))
	for metric_name in fold_results.keys():
		print(' %s : %f' % (metric_name, fold_results[metric_name]) )

The output now will be:

Fold 1:
 recall : 0.835591
 AUC : 38.000000
 F-measure : 0.834746
 precision : 0.837973
 accuracy : 0.835548
Fold 2:
 recall : 0.805459
 AUC : 43.000000
 F-measure : 0.800389
 precision : 0.806128
 accuracy : 0.804674
Fold 3:
 recall : 0.837087
 AUC : 41.500000
 F-measure : 0.836500
 precision : 0.842100
 accuracy : 0.837248

Note: Fold indices start from 1 and not from 0, as they are human readable.

Handling specific metric results seperately:

The reason this module asks for names when defining the metrics to be used is to easily retrieve specific scores easily. This can be done by using the metric parameter of get_results() method:

...
print('Average accuracy score: %.3f' % (np.average(scorer.get_results(metric='accuracy'))))

Note: The name 'accuracy' has been given by us, when initializing our scorer.

To illustrate how this feature can be of use, in the next example our goal is to print the confusion matrix for the classifier. If this was to be done inside the previous forloop, it would result to messy code. Instead, we could use this approach:

scorer = MultiScorer({
	'accuracy'   : (accuracy_score,  {}),
	'precision'  : (precision_score,{'average': 'macro'}),
	'recall'     : (recall_score,   {'average': 'macro'}),
	'conf-matrix': (confusion_matrix, {})
})

...

def get_average_matrix(matrix_list):
	average_matrix = matrix_list[0]
	for i in range(1, len(matrix_list)): average_matrix = average_matrix + matrix_list[i]
	return average_matrix / len(matrix_list)

def pretty_print_confusion_matrix(matrix):
	print('Confusion Matrix:\n+' + '--+'*matrix.shape[1])
	for i in range(matrix.shape[0]):
		printable = '|'
		for j in range(matrix.shape[1]):
			printable += '%2d' %(matrix[i,j]) + '|'
		print(printable)
		print('+' + '--+'*matrix.shape[1])

results = scorer.get_results()

all_conf_matrices = scorer.get_results('conf-matrix', fold='all')
average_matrix = get_average_matrix(all_conf_matrices)
pretty_print_confusion_matrix(average_matrix)


for metric_name in results.keys():
	if metric_name == 'conf-matrix':
		continue
	
	average_metric_score = np.average(results[metric_name])
	print('%s : %.2f' %(metric_name, average_metric_score))

This will output the following:

Confusion Matrix:
+--+--+--+--+--+--+--+--+--+--+
|58| 0| 0| 0| 1| 0| 0| 0| 0| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0|30| 8| 0| 1| 0| 0| 0|13| 5|
+--+--+--+--+--+--+--+--+--+--+
| 0| 2|48| 4| 0| 0| 0| 0| 3| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 0| 1|51| 0| 0| 0| 0| 2| 3|
+--+--+--+--+--+--+--+--+--+--+
| 0| 1| 0| 0|56| 0| 0| 2| 0| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 1| 0| 0| 1|48| 1| 0| 1| 7|
+--+--+--+--+--+--+--+--+--+--+
| 0| 1| 0| 0| 0| 1|56| 0| 0| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 0| 0| 0| 2| 0| 0|56| 1| 0|
+--+--+--+--+--+--+--+--+--+--+
| 0| 6| 1| 0| 0| 2| 1| 0|42| 4|
+--+--+--+--+--+--+--+--+--+--+
| 0| 4| 0| 1| 0| 0| 0| 3| 2|47|
+--+--+--+--+--+--+--+--+--+--+
recall : 0.83
precision : 0.83
accuracy : 0.83

The key line in this code is all_conf_matrices = scorer.get_results('conf-matrix', fold='all') where we are querying for the confusion matrix results separately.
The functions defined in the code above are used to compute elementwise the average of multiple numpy arrays and to print an array in that format. They do not directly affect the API.
As you may see, the result of confusion_matrix metric is kind of miscellaneous, so it doesn't make sense to be parsed with the rest of the numeric results. That is why we made a specific call for that.
Note: Parameter fold='all' is redundant since 'all' is the default value.

Using custom metrics in MultiScorer

MultiScorer is built to support the use of sklearn's metric functions, however any function that has metric(y, yPred, **kwargs) as signature can be used, with any return value.
This way you can implement your own metrics and use them alongside built-in ones:

...
def dummy_metric(y, yPred, useless_param="I won't be used."):
	return 42

scorer = MultiScorer({
	'dummy' : (dummy_metric, {'useless_param': "doesn't matter anyway"})
})

cross_val_score(clf, X, y, scoring=scorer, cv=3)

all_42s = scorer.get_results('dummy')
print(all_42s)

This will of course print:

[42, 42, 42]

A (nice) way of using that functionality is to get the actual predictions (and/or the actual labels) of the testing fold. The way to do so is to simply have the metric function returning the argument yPred(and/or y):

...
def results_getter(y, yPred):
	return (y, yPred)


scorer = MultiScorer({
	'results' : (results_getter, {}),
	'accuracy': (accuracy_score, {})
})

cross_val_score(clf, X, y, scoring=scorer, cv=3)

print('Accuracy of fold 1: %.3f' % (scorer.get_results('accuracy', fold=1)))

actual_labels, predictions = scorer.get_results('results', fold=1)
print('Predicted:')
print(predictions)
print('Actual:')
print(actual_labels)

which will print:

Accuracy of fold 1: 0.836
Predicted:
[0 1 1 3 4 3 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 4 8 3 0 9 5 5 6 5 0
 6 8 9 8 4 1 4 7 3 8 1 0 0 8 1 7 8 1 0 1 1 6 3 3 7 3 3 4 6 6 6 4 7 1 5 0 9
 5 1 8 1 0 0 1 7 6 3 2 1 7 1 6 3 1 3 3 1 7 1 8 4 3 1 4 0 5 8 6 9 1 1 7 5 4
 4 7 2 8 1 8 5 7 9 8 4 1 2 4 3 0 8 9 1 0 9 2 3 4 5 6 7 8 9 0 1 8 3 4 5 6 7
 8 9 0 1 2 3 4 5 6 7 8 9 0 3 5 5 6 5 0 9 8 9 1 4 4 7 7 3 5 9 0 0 2 2 7 8 2
 0 1 2 6 3 3 7 3 3 4 6 6 6 4 9 1 5 0 9 5 2 8 1 0 0 4 7 6 3 2 1 7 3 9 3 9 1
 7 6 8 4 3 4 4 0 5 3 6 9 6 4 7 5 4 4 7 2 8 2 2 5 5 4 8 1 4 9 0 8 8 8 0 1 2
 3 4 9 6 7 8 7 0 1 2 3 4 5 6 7 8 7 0 4 2 3 4 5 6 7 8 9 0 9 5 5 6 5 0 9 8 9
 8 4 1 7 7 3 5 1 0 0 2 2 7 8 2 0 1 2 6 3 3 7 3 3 4 6 6 6 4 1 1 5 0 7 5 2 8
 2 0 0 1 7 6 3 2 1 7 4 6 3 1 3 7 1 7 6 8 4 2 1 4 0 9 3 6 7 6 5 7 5 4 4 7 2
 8 2 2 5 7 9 5 4 8 8 4 4 0 8 7 3 0 9 2 3 4 5 6 7 8 9 0 1 2 3 4 5 8 7 8 9 0
 9 2 0 4 5 6 7 8 9 0 8 5 5 5 8 0 9 8 9 8 4 1 7 7 3 5 1 0 0 2 2 7 9 2 0 2 2
 6 3 9 4 3 3 4 6 6 6 4 9 1 9 0 9 5 2 8 2 0 0 1 4 6 3 2 1 7 4 6 3 1 3 9 1 7
 6 8 4 3 1 4 0 5 3 6 8 8 1 7 5 4 4 7 2 9 2 8 5 7 9 5 4 8 8 4 9 0 8 9 8 0 2
 2 9 4 4 6 7 1 9 0 2 8 3 7 9 6 7 8 9 0 2 2 2 4 9 6 7 8 9 0 8 5 9 6 9 0 9 8
 8 1 4 2 7 7 3 9 2 0 0 2 2 7 8 2 0 2 2 5 9 3 7 8 9 4 6 6 6 4 9 6 5 9 0 8 2
 7 4 2 7 4 5 9 4 4 4]
Actual:
[0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 9 5 5 6 5 0
 9 8 9 8 4 1 7 7 3 5 1 0 0 2 2 7 8 2 0 1 2 6 3 3 7 3 3 4 6 6 6 4 9 1 5 0 9
 5 2 8 2 0 0 1 7 6 3 2 1 7 4 6 3 1 3 9 1 7 6 8 4 3 1 4 0 5 3 6 9 6 1 7 5 4
 4 7 2 8 2 2 5 7 9 5 4 8 8 4 9 0 8 9 8 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
 8 9 0 1 2 3 4 5 6 7 8 9 0 9 5 5 6 5 0 9 8 9 8 4 1 7 7 3 5 1 0 0 2 2 7 8 2
 0 1 2 6 3 3 7 3 3 4 6 6 6 4 9 1 5 0 9 5 2 8 2 0 0 1 7 6 3 2 1 7 3 1 3 9 1
 7 6 8 4 3 1 4 0 5 3 6 9 6 1 7 5 4 4 7 2 8 2 2 5 5 4 8 8 4 9 0 8 9 8 0 1 2
 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 9 5 5 6 5 0 9 8 9
 8 4 1 7 7 3 5 1 0 0 2 2 7 8 2 0 1 2 6 3 3 7 3 3 4 6 6 6 4 9 1 5 0 9 5 2 8
 2 0 0 1 7 6 3 2 1 7 4 6 3 1 3 9 1 7 6 8 4 3 1 4 0 5 3 6 9 6 1 7 5 4 4 7 2
 8 2 2 5 7 9 5 4 8 8 4 9 0 8 9 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
 1 2 3 4 5 6 7 8 9 0 9 5 5 6 5 0 9 8 9 8 4 1 7 7 3 5 1 0 0 2 2 7 8 2 0 1 2
 6 3 3 7 3 3 4 6 6 6 4 9 1 5 0 9 5 2 8 2 0 0 1 7 6 3 2 1 7 4 6 3 1 3 9 1 7
 6 8 4 3 1 4 0 5 3 6 9 6 1 7 5 4 4 7 2 8 2 2 5 7 9 5 4 8 8 4 9 0 8 9 8 0 1
 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 9 5 5 6 5 0 9 8
 9 8 4 1 7 7 3 5 1 0 0 2 2 7 8 2 0 1 2 6 3 3 7 3 3 4 6 6 6 4 9 1 5 9 5 8 1
 7 6 1 7 4 6 9 4 4 4]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examples

Examples

Getting results from all folds for multiple metrics:

Handling different fold results separately.

Handling specific metric results seperately:

Using custom metrics in MultiScorer

Clone this wiki locally