Skip to content

Jason2Brownlee/SKLearnArena

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Scikit-Learn Arena

Elo Ratings for classification and regression algorithms in scikit-learn.

This is just a proof of concept based on ideas from the LLM Chatbot Arena and AutoML Arena.

Elo Ratings

Elo ratings, originally developed for chess rankings, are a method for calculating the relative skill levels of players based on their head-to-head match outcomes. Here's how they work:

Each player starts with a base rating (e.g. 1200).

After each match, ratings are updated based on:

  1. The match outcome (win/loss/draw)
  2. The rating difference between players
  3. A K-factor that determines how much ratings can change per match

The core principle is that winning against a stronger opponent should increase your rating more than winning against a weaker one. Similarly, losing to a stronger opponent should decrease your rating less than losing to a weaker one.

The basic update formula works like this: After each match, the winner gains points while the loser loses points. The number of points exchanged depends on the expected probability of winning (calculated from rating difference) compared to the actual outcome.

If a heavily favored player wins, they gain few points while their opponent loses few points. But if an underdog wins, they gain many points while the favorite loses many points.

Rate scikit-learn Algorithms

We can assign all scikit-learn algorithms an Elo rating, then rank algorithms by their rating.

  1. Select a collection of standard datasets (e.g., from scikit-learn's built-in datasets or UCI repository)

  2. For each dataset:

    • Evaluate each algorithm on the dataset (e.g. repeated k-fold cross-validation).
    • Record performance metric (e.g., accuracy for classification, MSE for regression)
    • For each possible pair of algorithms, treat the one with better performance as the "winner"
    • Update both algorithms' Elo ratings based on these pairwise "matches"

For example, if you have 3 algorithms (RandomForest, SVM, LogisticRegression) and 5 datasets:

  • Start each algorithm at 1200 Elo
  • On Dataset1, if RandomForest gets 0.85 accuracy and SVM gets 0.80:
    • RandomForest "wins" vs SVM
    • Update both Elo scores accordingly
  • Continue this for all algorithm pairs on all datasets

The final Elo ratings will reflect each algorithm's relative performance across all datasets, accounting for:

  • Consistency (performing well across many datasets)
  • Margin of victory (winning by large or small performance differences)
  • Quality of competition (beating strong algorithms vs weak ones)

Classification Arena

See arena_classification.py.

Results:

Current Rankings:
                         Algorithm   Elo Rating
0                              SVC  1483.476429
1             KNeighborsClassifier  1436.259629
2             ExtraTreesClassifier  1418.984442
3           RandomForestClassifier  1399.931241
4                    MLPClassifier  1394.189639
5   HistGradientBoostingClassifier  1360.896926
6        RadiusNeighborsClassifier  1339.349720
7             LogisticRegressionCV  1333.265391
8               LogisticRegression  1317.042033
9                            NuSVC  1297.331826
10      GradientBoostingClassifier  1289.215738
11                       LinearSVC  1264.363682
12     PassiveAggressiveClassifier  1239.713321
13                   SGDClassifier  1236.419448
14                      Perceptron  1182.985894
15                 RidgeClassifier  1177.249455
16               RidgeClassifierCV  1149.653122
17                 NearestCentroid  1090.004834
18               BaggingClassifier  1076.131953
19                   MultinomialNB  1063.118807
20          DecisionTreeClassifier  1044.169226
21                      GaussianNB  1035.428305
22                     BernoulliNB  1027.888777
23                    ComplementNB  1011.798905
24             ExtraTreeClassifier  1007.745416
25                   CategoricalNB   986.312061
26       GaussianProcessClassifier   983.557891
27              AdaBoostClassifier   953.515889

Regression Arena

See arena_regression.py.

Results:

Current Rankings:
                        Algorithm   Elo Rating
0   HistGradientBoostingRegressor  1472.224472
1             ExtraTreesRegressor  1467.166315
2           RandomForestRegressor  1451.170396
3       GradientBoostingRegressor  1421.210202
4                BaggingRegressor  1413.318732
5                    MLPRegressor  1395.907021
6                           NuSVR  1369.294900
7                             SVR  1354.186504
8             KNeighborsRegressor  1318.997026
9           DecisionTreeRegressor  1315.919028
10                           Lars  1292.044124
11     TransformedTargetRegressor  1282.915341
12                        RidgeCV  1246.612762
13               LinearRegression  1221.887239
14                        LassoCV  1204.077560
15                          Ridge  1182.862180
16                   ElasticNetCV  1180.618743
17             ExtraTreeRegressor  1155.701573
18                         LarsCV  1148.494488
19              AdaBoostRegressor  1129.530651
20      OrthogonalMatchingPursuit  1114.509803
21               TweedieRegressor  1086.206851
22                     ElasticNet  1082.474490
23                 DummyRegressor  1070.439746
24                          Lasso  1038.544663
25                 HuberRegressor  1027.773147
26                      LinearSVR  1003.927533
27                RANSACRegressor   979.974378
28     PassiveAggressiveRegressor   951.083042
29              TheilSenRegressor   926.625594
30                   SGDRegressor   894.301497

Improvements

  • More datasets (e.g. openml).
  • More algorithms (e.g. xgboost, catboost, lightgbm, etc.)
  • Capture no. matches, wins, losses, win rate, etc.
  • Confidence intervals.

About

Elo Ratings for classification and regression algorithms in scikit-learn.

Topics

Resources

Stars

Watchers

Forks

Languages