Skip to content

Friedman tests for comparing multiple methods across datasets in python

Notifications You must be signed in to change notification settings

benjaminpatrickevans/MethodComparisonsInPython

Repository files navigation

Statistical Comparisons of Classifiers in Python

Here is a summary on the currently suggested statistical tests for comparing classification algorithms. All work is implemented in python.

A summary of the current ltierature is given in the slides below

Slides here

Just tell me what to use (TLDR)

Flow chart

Statistical comparisons of 2 classifiers on a single dataset

The suggestion in Approximate statistical tests for comparing supervised classification learning algorithms (1998) is to perform a 5x2 t-test. This was further extended in Combined 5 × 2 cv F Test for Comparing Supervised Classification Learning Algorithms to the 5x2 f-test which is also recommended by the original authors above to be the new standard.

These tests are implemented in mlxtend

5x2 t-test (Note: prefer extension below)

5x2 F-test

Alternatively, if computational resources only allow a method to be run a single time (unlike the 10 times required above), the only acceptable statistical test appears to be the McNemars test, which is also implemented in mlxtend.

Statistical comparisons of 2 classifiers on multiple datasets

If we are comparing 2 classifiers across several datasets, the suggestion in Statistical Comparisons of Classifiers over Multiple Data Sets (2006) is to perform Wilcoxon Signed-Rank Tests between the datasets.

This is shown in:

Statistical comparisons of multiple classifiers over multiple data sets

If we are comparing multiple methods across multiple datasets, the suggestions in Statistical Comparisons of Classifiers over Multiple Data Sets (2006) are to perform a Friedman test, paired with either Nemenyi post hoc analysis (for comparing all methods), or FWER correction if comparing to a control classifier (i.e. comparing several methods to one dataset, not pairwise comparisons).

Both tests are shown in the following Jupyter notebook


All credits to the original authors of the above articles

About

Friedman tests for comparing multiple methods across datasets in python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published