Skip to content

DataSystemsGroupUT/HyperParameterTunability

Repository files navigation

To tune or not to tune? A meta-leaning approach for recommending important hyperparameters

The following repository contains all metrails for repoducing the paper "To tune or not to tune? A meta-leaning approach for recommending important hyperparameters":

  • the scripts for collecting performance data of 6 machine learning algorithms on 200 classification tasks from OpenML environment.

  • the collected performance data of SVM, Decision Tree, Random Forest, AdaBoost, Gradient Boosting and Extra Trees Classifiers.

  • Several notebooks that each performs one experiment and conducts the results.

  • Based on PerformanceData, created new datasets that all are in output_csv folders.

  • tools for:

    • Importing and modifying the collected data
    • Searching correlation between the dataset metafeatures and classifier performances.
    • Conducting statistical tests to compare performance of the classifiers over the tasks.
    • Computing the best value for each important hyperparameter.
    • Computing Wilcoxon test for verifing the result.
  • script for extracting metafeatures of the datasets

  • script for performing fANOVA on the performance data

To start collecting data for a given classifier over all datasets

from DataCollection.functions import *

path_to_datasets = 'Datasets/'
classification_per_algorithm(path=path_to_datasets, algorithm='DecisionTree')

Conduct fANOVA on the data

from fANOVA.fanova_functions import *
do_fanova(dataset_name='PerformanceData/AB_results_total.csv', algorithm='AdaBoost')

Extract Metafeatures

from tools.metafeatures import *
extract_for_all(path_to_datasets)

Create the Database object to import the collected data in desired formats

from Tools.database import Database
db = Database()
per_dataset_acc = db.get_per_dataset_accuracies()
per_dataset_acc.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
dataset AB ET RF DT GB SVM
0 AP_Breast_Omentum.csv 0.981060 0.976235 0.976462 0.973912 0.983555 0.914538
1 AP_Breast_Prostate.csv 0.995238 0.995238 0.995238 0.995238 0.995238 0.961498
2 AP_Endometrium_Lung.csv 0.968363 0.958392 0.957018 0.929240 0.968363 0.894591
3 AP_Endometrium_Prostate.csv 0.992857 0.992857 0.992857 1.000000 1.000000 0.984615
4 AP_Endometrium_Uterus.csv 0.854854 0.837953 0.859561 0.827924 0.860409 0.758801
metafeatures = db.get_metafeatures()

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •