## hyperperamters
https://machinelearningmastery.com/hyperparameters-for-classification-machine-learning-algorithms/

### Logistic Regression
Logistic regression does not really have any critical hyperparameters to tune.

Sometimes, you can see useful differences in performance or convergence with different solvers (solver).

* solver in [‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’]

Regularization (penalty) can sometimes be helpful.

* penalty in [‘none’, ‘l1’, ‘l2’, ‘elasticnet’]
Note: not all solvers support all regularization terms.

The C parameter controls the penality strength, which can also be effective.

* C in [100, 10, 1.0, 0.1, 0.01]

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html


In [1]:
# example of grid searching key hyperparametres for logistic regression
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define models and parameters
model = LogisticRegression()
solvers = ['newton-cg', 'lbfgs', 'liblinear']
penalty = ['l2']
c_values = [100, 10, 1.0, 0.1, 0.01]
# define grid search
grid = dict(solver=solvers,penalty=penalty,C=c_values)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.960000 using {'C': 0.01, 'penalty': 'l2', 'solver': 'newton-cg'}
0.954333 (0.014761) with: {'C': 100, 'penalty': 'l2', 'solver': 'newton-cg'}
0.952000 (0.015578) with: {'C': 100, 'penalty': 'l2', 'solver': 'lbfgs'}
0.952333 (0.016265) with: {'C': 100, 'penalty': 'l2', 'solver': 'liblinear'}
0.954667 (0.015434) with: {'C': 10, 'penalty': 'l2', 'solver': 'newton-cg'}
0.953000 (0.015308) with: {'C': 10, 'penalty': 'l2', 'solver': 'lbfgs'}
0.952667 (0.015691) with: {'C': 10, 'penalty': 'l2', 'solver': 'liblinear'}
0.955333 (0.014996) with: {'C': 1.0, 'penalty': 'l2', 'solver': 'newton-cg'}
0.955000 (0.016073) with: {'C': 1.0, 'penalty': 'l2', 'solver': 'lbfgs'}
0.954333 (0.014302) with: {'C': 1.0, 'penalty': 'l2', 'solver': 'liblinear'}
0.956333 (0.016224) with: {'C': 0.1, 'penalty': 'l2', 'solver': 'newton-cg'}
0.956333 (0.016224) with: {'C': 0.1, 'penalty': 'l2', 'solver': 'lbfgs'}
0.955667 (0.013828) with: {'C': 0.1, 'penalty': 'l2', 'solver': 'liblinear'}
0.960000 (0.015492) wi

### Ridge Classifier
Ridge regression is a penalized linear regression model for predicting a numerical value.

Nevertheless, it can be very effective when applied to classification.

Perhaps the most important parameter to tune is the regularization strength (alpha). A good starting point might be values in the range [0.1 to 1.0]

* alpha in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifier.html

In [2]:
# example of grid searching key hyperparametres for ridge classifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import RidgeClassifier
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define models and parameters
model = RidgeClassifier()
alpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
# define grid search
grid = dict(alpha=alpha)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.978667 using {'alpha': 0.1}
0.978667 (0.016275) with: {'alpha': 0.1}
0.978667 (0.016275) with: {'alpha': 0.2}
0.978667 (0.016275) with: {'alpha': 0.3}
0.978667 (0.016275) with: {'alpha': 0.4}
0.978667 (0.016275) with: {'alpha': 0.5}
0.978667 (0.016275) with: {'alpha': 0.6}
0.978667 (0.016275) with: {'alpha': 0.7}
0.978667 (0.016275) with: {'alpha': 0.8}
0.978667 (0.016275) with: {'alpha': 0.9}
0.978667 (0.016275) with: {'alpha': 1.0}


### K-Nearest Neighbors (KNN)
The most important hyperparameter for KNN is the number of neighbors (n_neighbors).

Test values between at least 1 and 21, perhaps just the odd numbers.

* n_neighbors in [1 to 21]
It may also be interesting to test different distance metrics (metric) for choosing the composition of the neighborhood.

* metric in [‘euclidean’, ‘manhattan’, ‘minkowski’]
For a fuller list see:

https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html

It may also be interesting to test the contribution of members of the neighborhood via different weightings (weights).

https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html



In [3]:

# example of grid searching key hyperparametres for KNeighborsClassifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define models and parameters
model = KNeighborsClassifier()
n_neighbors = range(1, 21, 2)
weights = ['uniform', 'distance']
metric = ['euclidean', 'manhattan', 'minkowski']
# define grid search
grid = dict(n_neighbors=n_neighbors,weights=weights,metric=metric)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.966667 using {'metric': 'euclidean', 'n_neighbors': 17, 'weights': 'uniform'}
0.868000 (0.029143) with: {'metric': 'euclidean', 'n_neighbors': 1, 'weights': 'uniform'}
0.868000 (0.029143) with: {'metric': 'euclidean', 'n_neighbors': 1, 'weights': 'distance'}
0.914667 (0.025131) with: {'metric': 'euclidean', 'n_neighbors': 3, 'weights': 'uniform'}
0.914667 (0.025131) with: {'metric': 'euclidean', 'n_neighbors': 3, 'weights': 'distance'}
0.940333 (0.016630) with: {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.940333 (0.016630) with: {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.951667 (0.017528) with: {'metric': 'euclidean', 'n_neighbors': 7, 'weights': 'uniform'}
0.951667 (0.017528) with: {'metric': 'euclidean', 'n_neighbors': 7, 'weights': 'distance'}
0.950333 (0.017026) with: {'metric': 'euclidean', 'n_neighbors': 9, 'weights': 'uniform'}
0.950333 (0.017026) with: {'metric': 'euclidean', 'n_neighbors': 9, 'weights': 'distance'}
0.958333 

### Support Vector Machine (SVM)
The SVM algorithm, like gradient boosting, is very popular, very effective, and provides a large number of hyperparameters to tune.

Perhaps the first important parameter is the choice of kernel that will control the manner in which the input variables will be projected. There are many to choose from, but linear, polynomial, and RBF are the most common, perhaps just linear and RBF in practice.

* kernels in [‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’]
If the polynomial kernel works out, then it is a good idea to dive into the degree hyperparameter.

Another critical parameter is the penalty (C) that can take on a range of values and has a dramatic effect on the shape of the resulting regions for each class. A log scale might be a good starting point.

* C in [100, 10, 1.0, 0.1, 0.001]

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

In [4]:
# example of grid searching key hyperparametres for SVC
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define model and parameters
model = SVC()
kernel = ['poly', 'rbf', 'sigmoid']
C = [50, 10, 1.0, 0.1, 0.01]
gamma = ['scale']
# define grid search
grid = dict(kernel=kernel,C=C,gamma=gamma)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.977667 using {'C': 0.1, 'gamma': 'scale', 'kernel': 'sigmoid'}
0.972000 (0.015362) with: {'C': 50, 'gamma': 'scale', 'kernel': 'poly'}
0.975667 (0.015206) with: {'C': 50, 'gamma': 'scale', 'kernel': 'rbf'}
0.957000 (0.017916) with: {'C': 50, 'gamma': 'scale', 'kernel': 'sigmoid'}
0.972000 (0.015362) with: {'C': 10, 'gamma': 'scale', 'kernel': 'poly'}
0.975667 (0.015206) with: {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}
0.967333 (0.018607) with: {'C': 10, 'gamma': 'scale', 'kernel': 'sigmoid'}
0.971000 (0.016603) with: {'C': 1.0, 'gamma': 'scale', 'kernel': 'poly'}
0.977000 (0.015948) with: {'C': 1.0, 'gamma': 'scale', 'kernel': 'rbf'}
0.974333 (0.013085) with: {'C': 1.0, 'gamma': 'scale', 'kernel': 'sigmoid'}
0.731333 (0.065966) with: {'C': 0.1, 'gamma': 'scale', 'kernel': 'poly'}
0.973667 (0.016017) with: {'C': 0.1, 'gamma': 'scale', 'kernel': 'rbf'}
0.977667 (0.012297) with: {'C': 0.1, 'gamma': 'scale', 'kernel': 'sigmoid'}
0.731333 (0.065966) with: {'C': 0.01, 'gamma': 'sca

### Bagged Decision Trees (Bagging)
The most important parameter for bagged decision trees is the number of trees (n_estimators).

Ideally, this should be increased until no further improvement is seen in the model.

Good values might be a log scale from 10 to 1,000.

* n_estimators in [10, 100, 1000]

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html


In [5]:
# example of grid searching key hyperparameters for BaggingClassifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import BaggingClassifier
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define models and parameters
model = BaggingClassifier()
n_estimators = [10, 100, 1000]
# define grid search
grid = dict(n_estimators=n_estimators)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.930667 using {'n_estimators': 1000}
0.850667 (0.038981) with: {'n_estimators': 10}
0.915667 (0.030842) with: {'n_estimators': 100}
0.930667 (0.025682) with: {'n_estimators': 1000}


### Random Forest
The most important parameter is the number of random features to sample at each split point (max_features).

You could try a range of integer values, such as 1 to 20, or 1 to half the number of input features.

* max_features [1 to 20]
Alternately, you could try a suite of different default value calculators.

* max_features in [‘sqrt’, ‘log2’]
Another important parameter for random forest is the number of trees (n_estimators).

Ideally, this should be increased until no further improvement is seen in the model.

Good values might be a log scale from 10 to 1,000.

* n_estimators in [10, 100, 1000]

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html


In [None]:
# example of grid searching key hyperparameters for RandomForestClassifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define models and parameters
model = RandomForestClassifier()
n_estimators = [10, 100, 1000]
max_features = ['sqrt', 'log2']
# define grid search
grid = dict(n_estimators=n_estimators,max_features=max_features)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

### Stochastic Gradient Boosting
Also called Gradient Boosting Machine (GBM) or named for the specific implementation, such as XGBoost.

The gradient boosting algorithm has many parameters to tune.

There are some parameter pairings that are important to consider. The first is the learning rate, also called shrinkage or eta (learning_rate) and the number of trees in the model (n_estimators). Both could be considered on a log scale, although in different directions.

* learning_rate in [0.001, 0.01, 0.1]
* n_estimators [10, 100, 1000]

Another pairing is the number of rows or subset of the data to consider for each tree (subsample) and the depth of each tree (max_depth). These could be grid searched at a 0.1 and 1 interval respectively, although common values can be tested directly.

* subsample in [0.5, 0.7, 1.0]
* max_depth in [3, 7, 9]

https://machinelearningmastery.com/configure-gradient-boosting-algorithm/

In [None]:
# example of grid searching key hyperparameters for GradientBoostingClassifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingClassifier
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define models and parameters
model = GradientBoostingClassifier()
n_estimators = [10, 100, 1000]
learning_rate = [0.001, 0.01, 0.1]
subsample = [0.5, 0.7, 1.0]
max_depth = [3, 7, 9]
# define grid search
grid = dict(learning_rate=learning_rate, n_estimators=n_estimators, subsample=subsample, max_depth=max_depth)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))