Open more ML model parameters to optimization #49

rhiever · 2015-12-07T20:45:06Z

From #39, we discussed parameters that may be important to open up to search for the various ML models in TPOT. The sklearn devs have a general sense of some of the important parameters, below, but this is not an exhaustive list.

I think it would be valuable at some point to explore what parameters are most important to optimize for the various models used in TPOT, as I discussed here.

_DEFAULT_PARAM_GRIDS = {'AdaBoostClassifier':
                        [{'learning_rate': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'AdaBoostRegressor':
                        [{'learning_rate': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'DecisionTreeClassifier':
                        [{'max_features': ["auto", None]}],
                        'DecisionTreeRegressor':
                        [{'max_features': ["auto", None]}],
                        'ElasticNet':
                        [{'alpha': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'GradientBoostingClassifier':
                        [{'max_depth': [1, 3, 5]}],
                        'GradientBoostingRegressor':
                        [{'max_depth': [1, 3, 5]}],
                        'KNeighborsClassifier':
                        [{'n_neighbors': [1, 5, 10, 100],
                          'weights': ['uniform', 'distance']}],
                        'KNeighborsRegressor':
                        [{'n_neighbors': [1, 5, 10, 100],
                          'weights': ['uniform', 'distance']}],
                        'Lasso':
                        [{'alpha': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'LinearRegression':
                        [{}],
                        'LinearSVC':
                        [{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'LogisticRegression':
                        [{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'SVC': [{'C': [0.01, 0.1, 1.0, 10.0, 100.0],
                                 'gamma': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'MultinomialNB':
                        [{'alpha': [0.1, 0.25, 0.5, 0.75, 1.0]}],
                        'RandomForestClassifier':
                        [{'max_depth': [1, 5, 10, None]}],
                        'RandomForestRegressor':
                        [{'max_depth': [1, 5, 10, None]}],
                        'Ridge':
                        [{'alpha': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'SGDClassifier':
                        [{'alpha': [0.000001, 0.00001, 0.0001, 0.001, 0.01],
                          'penalty': ['l1', 'l2', 'elasticnet']}],
                        'SGDRegressor':
                        [{'alpha': [0.000001, 0.00001, 0.0001, 0.001, 0.01],
                          'penalty': ['l1', 'l2', 'elasticnet']}],
                        'LinearSVR':
                        [{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'SVR':
                        [{'C': [0.01, 0.1, 1.0, 10.0, 100.0],
                          'gamma': [0.01, 0.1, 1.0, 10.0, 100.0]}]}

kadarakos · 2016-02-04T10:13:00Z

I was thinking about adding a couple more classifiers and I have a small suggestion for string valued hyperparameters. For each string valued hyperparameter we could simple pass an int value that we can use to index into the list of possible choices defined in the pipeline operator for the particular classifier e.g.: for RandomForest something like:

max_features = ["auto", "log2", None][index1]
class_weight   = ["balanced", "balanced_subsample"][index2]

The search space over the number of max_features is in 1 < x <= n_features , but using this simple indexing into possible options at least we can try out reasonable options. Same goes for class_weight.

rhiever · 2016-02-05T15:20:39Z

What about hyperparameters that have a mixture of string and integer values? I think that's why I used if statements to handle string hyperparameters -- many of them have a mixture.

I was thinking about adding a couple more classifiers

What classifiers are you thinking of? :-)

rhiever added the enhancement label Dec 7, 2015

rhiever mentioned this issue Dec 26, 2015

Smart seeding of TPOT populations? #59

Open

rhiever added need contributor easy labels Feb 16, 2016

rhiever closed this as completed Apr 18, 2016

AIAdventures mentioned this issue Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

saddy001 mentioned this issue Mar 20, 2018

Segfault on optimization process #676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open more ML model parameters to optimization #49

Open more ML model parameters to optimization #49

rhiever commented Dec 7, 2015

kadarakos commented Feb 4, 2016

rhiever commented Feb 5, 2016

Open more ML model parameters to optimization #49

Open more ML model parameters to optimization #49

Comments

rhiever commented Dec 7, 2015

kadarakos commented Feb 4, 2016

rhiever commented Feb 5, 2016