Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open more ML model parameters to optimization #49

Closed
rhiever opened this issue Dec 7, 2015 · 2 comments
Closed

Open more ML model parameters to optimization #49

rhiever opened this issue Dec 7, 2015 · 2 comments

Comments

@rhiever
Copy link
Contributor

rhiever commented Dec 7, 2015

From #39, we discussed parameters that may be important to open up to search for the various ML models in TPOT. The sklearn devs have a general sense of some of the important parameters, below, but this is not an exhaustive list.

I think it would be valuable at some point to explore what parameters are most important to optimize for the various models used in TPOT, as I discussed here.

_DEFAULT_PARAM_GRIDS = {'AdaBoostClassifier':
                        [{'learning_rate': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'AdaBoostRegressor':
                        [{'learning_rate': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'DecisionTreeClassifier':
                        [{'max_features': ["auto", None]}],
                        'DecisionTreeRegressor':
                        [{'max_features': ["auto", None]}],
                        'ElasticNet':
                        [{'alpha': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'GradientBoostingClassifier':
                        [{'max_depth': [1, 3, 5]}],
                        'GradientBoostingRegressor':
                        [{'max_depth': [1, 3, 5]}],
                        'KNeighborsClassifier':
                        [{'n_neighbors': [1, 5, 10, 100],
                          'weights': ['uniform', 'distance']}],
                        'KNeighborsRegressor':
                        [{'n_neighbors': [1, 5, 10, 100],
                          'weights': ['uniform', 'distance']}],
                        'Lasso':
                        [{'alpha': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'LinearRegression':
                        [{}],
                        'LinearSVC':
                        [{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'LogisticRegression':
                        [{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'SVC': [{'C': [0.01, 0.1, 1.0, 10.0, 100.0],
                                 'gamma': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'MultinomialNB':
                        [{'alpha': [0.1, 0.25, 0.5, 0.75, 1.0]}],
                        'RandomForestClassifier':
                        [{'max_depth': [1, 5, 10, None]}],
                        'RandomForestRegressor':
                        [{'max_depth': [1, 5, 10, None]}],
                        'Ridge':
                        [{'alpha': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'SGDClassifier':
                        [{'alpha': [0.000001, 0.00001, 0.0001, 0.001, 0.01],
                          'penalty': ['l1', 'l2', 'elasticnet']}],
                        'SGDRegressor':
                        [{'alpha': [0.000001, 0.00001, 0.0001, 0.001, 0.01],
                          'penalty': ['l1', 'l2', 'elasticnet']}],
                        'LinearSVR':
                        [{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'SVR':
                        [{'C': [0.01, 0.1, 1.0, 10.0, 100.0],
                          'gamma': [0.01, 0.1, 1.0, 10.0, 100.0]}]}
@kadarakos
Copy link
Contributor

I was thinking about adding a couple more classifiers and I have a small suggestion for string valued hyperparameters. For each string valued hyperparameter we could simple pass an int value that we can use to index into the list of possible choices defined in the pipeline operator for the particular classifier e.g.: for RandomForest something like:

max_features = ["auto", "log2", None][index1]
class_weight   = ["balanced", "balanced_subsample"][index2]

The search space over the number of max_features is in 1 < x <= n_features , but using this simple indexing into possible options at least we can try out reasonable options. Same goes for class_weight.

@rhiever
Copy link
Contributor Author

rhiever commented Feb 5, 2016

What about hyperparameters that have a mixture of string and integer values? I think that's why I used if statements to handle string hyperparameters -- many of them have a mixture.

I was thinking about adding a couple more classifiers

What classifiers are you thinking of? :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants