# About
Hyperparameter optimization is required to get the most out of your machine learning models.

Hyperparameters are points of choice or configuration that allow a machine learning model to be customized for a specific task or dataset.

Parameters are different from hyperparameters. Parameters are learned automatically; hyperparameters are set manually to help guide the learning process.

Choosing a hyperparameter grid is probably the most difficult part of hyperparameter tuning: it's nearly impossible ahead of time to say which values of hyperparameters will work well and the optimal settings will depend on the dataset. Moreover, the hyperparameters have complex interactions with each other which means that just tuning one at a time doesn't work because when we start changing other hyperparameters that will affect the one we just tuned!

# Libraries

In [1]:
%run "/home/cesar/Python_NBs/HDL_Project/HDL_Project/global_fv.ipynb"

In [2]:
import os

# Save trained models
import joblib

# Data
from sklearn.model_selection import train_test_split
from sklearn.utils.multiclass import type_of_target

# Hypertuning tools
from sklearn.model_selection import KFold
from sklearn.model_selection import RandomizedSearchCV

# Metrics
from sklearn.metrics import SCORERS

# Nonlinear models
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn import svm
from sklearn.gaussian_process import GaussianProcessRegressor

# Ensemble models
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import BaggingRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import GradientBoostingRegressor
from xgboost import XGBRegressor

# Clone of time class
s = t

# Random seed
import random
random.seed(101)

os.getcwd()

'/home/cesar/Python_NBs/HDL_Project/HDL_Project/2_Models/Multivariate/ML'

# User-Defined Functions

In [3]:
def hyper_tuning(model, space, X, y):
    # The searching algorithm includes a “cv” argument that allows:
    # a) An integer number of folds to be specified, e.g. 5
    #cross_val = 5
    # b) A configured cross-validation object.
    kfold = KFold(n_splits=3, shuffle=False)

    # The scoring metric must be maximizing, meaning better models result in larger scores.
    scoring_metric = 'neg_mean_squared_error'

    # Search for best hyperparameters
    grid = RandomizedSearchCV(estimator=model, 
                              param_distributions=search_space, 
                              cv=kfold, 
                              n_iter=50,
                              scoring=scoring_metric)

    result = grid.fit(X_test, y_test)
    return result

# Data

## Sample preparation

In [4]:
sql_table = "sima_station_CE"
target = "pm25"

# Define columns of interest from sql table
#     Select all columns:
column = "*"
#     Select specific columns:
#column = "datetime, prs, rainf, rh, sr, tout, wdr, wsr, " + str(target)

# Filter data with WHERE command
sql_where = "where datetime > \'2020-03-01\'"

# Initialize class to create multivariate samples:
multi_ts = multivariate_samples(sql_table, column, sql_where)

# Datasets can't be trained with sample batches by default. So parameter is 1.
X, y = multi_ts.samples_creation(1, target)

X_train, X_test, y_train, y_test = train_test_split(X[:,0,:], y, test_size = 0.30, shuffle= False)

In [5]:
type_of_target(y_train)

'continuous'

# Hyperparameter tuning

## Objective function

In [6]:
sorted(SCORERS.keys())

['accuracy',
 'adjusted_mutual_info_score',
 'adjusted_rand_score',
 'average_precision',
 'balanced_accuracy',
 'completeness_score',
 'explained_variance',
 'f1',
 'f1_macro',
 'f1_micro',
 'f1_samples',
 'f1_weighted',
 'fowlkes_mallows_score',
 'homogeneity_score',
 'jaccard',
 'jaccard_macro',
 'jaccard_micro',
 'jaccard_samples',
 'jaccard_weighted',
 'max_error',
 'mutual_info_score',
 'neg_brier_score',
 'neg_log_loss',
 'neg_mean_absolute_error',
 'neg_mean_absolute_percentage_error',
 'neg_mean_gamma_deviance',
 'neg_mean_poisson_deviance',
 'neg_mean_squared_error',
 'neg_mean_squared_log_error',
 'neg_median_absolute_error',
 'neg_root_mean_squared_error',
 'normalized_mutual_info_score',
 'precision',
 'precision_macro',
 'precision_micro',
 'precision_samples',
 'precision_weighted',
 'r2',
 'rand_score',
 'recall',
 'recall_macro',
 'recall_micro',
 'recall_samples',
 'recall_weighted',
 'roc_auc',
 'roc_auc_ovo',
 'roc_auc_ovo_weighted',
 'roc_auc_ovr',
 'roc_auc_ovr_we

# Random Search
RandomizedSearchCV for random search evaluates models for a given hyperparameter vector using cross-validation, hence the “CV” suffix of each class name.

It requires two arguments. 
1. The first is the model that you are optimizing. This is an instance of the model with values of hyperparameters set that you want to optimize. 
2. The second is the search space. This is defined as a dictionary where the names are the hyperparameter arguments to the model and the values are discrete values or a distribution of values to sample in the case of a random search.

## K-Nearest Neighbors
KNeighborsRegressor()

In [7]:
# Select an algorithm
model = KNeighborsRegressor()
model.get_params()

{'algorithm': 'auto',
 'leaf_size': 30,
 'metric': 'minkowski',
 'metric_params': None,
 'n_jobs': None,
 'n_neighbors': 5,
 'p': 2,
 'weights': 'uniform'}

In [8]:
# define search space
ss_dictionary = {
    'n_neighbors': list(range(1,10)),
    'weights': list(['uniform', 'distance']),
    'algorithm': list(['auto', 'ball_tree', 'kd_tree', 'brute']),
    'leaf_size': list(range(15, 45)),
    'p': list([1,2]),
    'metric': list(['euclidean', 'manhattan','chebyshev', 'minkowski']),
    # The search can be made parallel using various if not all of your CPU cores 
    # We can set it to -1 to automatically use all of the cores in the system.
    'n_jobs': list([-1])
}
    
search_space = [ss_dictionary] 

In [9]:
t.tic()
result_KNN = hyper_tuning(model, search_space, X_train, y_train)
t.toc(restart=True)
# Get the results
print(result_KNN.best_score_)
print("")
print(result_KNN.best_estimator_)
print("")
print(result_KNN.best_params_)

Elapsed time is 8.132417 seconds.
-133.50513381495009

KNeighborsRegressor(leaf_size=17, metric='manhattan', n_jobs=-1, n_neighbors=9,
                    p=1, weights='distance')

{'weights': 'distance', 'p': 1, 'n_neighbors': 9, 'n_jobs': -1, 'metric': 'manhattan', 'leaf_size': 17, 'algorithm': 'auto'}


## Classification and Regression Tree
DecisionTreeRegressor()

In [10]:
# Select an algorithm
model = DecisionTreeRegressor()
model.get_params()

{'ccp_alpha': 0.0,
 'criterion': 'squared_error',
 'max_depth': None,
 'max_features': None,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'random_state': None,
 'splitter': 'best'}

In [11]:
# define search space
ss_dictionary = {
    'criterion': list(['squared_error', 'friedman_mse', 'absolute_error', 'poisson'])
    , 'splitter': list(['best', 'random'])
    , 'max_depth': list(range(1,10))
    , 'min_samples_split': list(range(2,10))
    , 'min_samples_leaf': list(range(1,10))
    , 'min_weight_fraction_leaf': list(np.linspace(0.0,0.5))
}

search_space = [ss_dictionary] 

In [12]:
t.tic()
result_DTR = hyper_tuning(model, search_space, X_train, y_train)
t.toc(restart=True)

# Get the results
print(result_DTR.best_score_)
print("")
print(result_DTR.best_estimator_)
print("")
print(result_DTR.best_params_)

Elapsed time is 5.114506 seconds.
-154.3718749133663

DecisionTreeRegressor(max_depth=3, min_samples_leaf=8, min_samples_split=8,
                      min_weight_fraction_leaf=0.01020408163265306)

{'splitter': 'best', 'min_weight_fraction_leaf': 0.01020408163265306, 'min_samples_split': 8, 'min_samples_leaf': 8, 'max_depth': 3, 'criterion': 'squared_error'}


## Support Vector Regression - Polynomial
svm.SVR(kernel='poly')

In [7]:
# Select an algorithm
model = svm.SVR()
model.get_params()

{'C': 1.0,
 'cache_size': 200,
 'coef0': 0.0,
 'degree': 3,
 'epsilon': 0.1,
 'gamma': 'scale',
 'kernel': 'rbf',
 'max_iter': -1,
 'shrinking': True,
 'tol': 0.001,
 'verbose': False}

In [8]:
# define search space
ss_dictionary = {
    'kernel': list(['poly'])
    # `degree` is a parameter used when kernel is set to ‘poly’.
    , 'degree': list([0, 2, 3, 4, 5, 6])
    # Gamma is a parameter for non linear hyperplanes. 
    # The higher the gamma value it tries to exactly fit the training data set
    , 'gamma' : list([0.1, 1, 10, 100])
    # C is the penalty parameter of the error term. 
    # It controls the trade off between smooth decision boundary and classifying the training points correctly.
    , 'C': list([0.1, 1, 10, 100, 1000])
}

search_space = [ss_dictionary] 

In [None]:
#if(True):
t.tic()
result_SVM_poly = hyper_tuning(model, search_space, X_train, y_train)
t.toc(restart=True)

# Get the results
print(result_SVM_poly.best_score_)
print("")
print(result_SVM_poly.best_estimator_)
print("")
print(result_SVM_poly.best_params_)

## Support Vector Regression - RBF
svm.SVR(kernel='rbf')

In [None]:
# Select an algorithm
model = svm.SVR()
model.get_params()

In [None]:
# define search space
ss_dictionary = {
    'kernel': list(['rbf'])
    # `degree` is a parameter used when kernel is set to ‘poly’.
    , 'degree': list([0, 2, 3, 4, 5, 6])
    # Gamma is a parameter for non linear hyperplanes. 
    # The higher the gamma value it tries to exactly fit the training data set
    , 'gamma' : list([0.1, 1, 10, 100])
    # C is the penalty parameter of the error term. 
    # It controls the trade off between smooth decision boundary and classifying the training points correctly.
    , 'C': list([0.1, 1, 10, 100, 1000])
}

search_space = [ss_dictionary] 

In [None]:
t.tic()
result_SVM_RBF = hyper_tuning(model, search_space, X_train, y_train)
t.toc(restart=True)

# Get the results
print(result_SVM_RBF.best_score_)
print("")
print(result_SVM_RBF.best_estimator_)
print("")
print(result_SVM_RBF.best_params_)

## Support Vector Regression - Linear
svm.SVR(kernel='linear')

In [None]:
# Select an algorithm
model = svm.SVR()
model.get_params()

In [None]:
# define search space
ss_dictionary = {
    'kernel': list(['linear'])
    # `degree` is a parameter used when kernel is set to ‘poly’.
    , 'degree': list([0, 2, 3, 4, 5, 6])
    # Gamma is a parameter for non linear hyperplanes. 
    # The higher the gamma value it tries to exactly fit the training data set
    , 'gamma' : list([0.1, 1, 10, 100])
    # C is the penalty parameter of the error term. 
    # It controls the trade off between smooth decision boundary and classifying the training points correctly.
    , 'C': list([0.1, 1, 10, 100, 1000])
}

search_space = [ss_dictionary] 

In [None]:
t.tic()
result_SVM_Linear = hyper_tuning(model, search_space, X_train, y_train)
t.toc(restart=True)

# Get the results
print(result_SVM_Linear.best_score_)
print("")
print(result_SVM_Linear.best_estimator_)
print("")
print(result_SVM_Linear.best_params_)

## Gaussian Naive Bayes
GaussianProcessRegressor()

In [None]:
# Select an algorithm
model = GaussianProcessRegressor()
model.get_params()

In [None]:
# define search space
ss_dictionary = {
    
}

search_space = [ss_dictionary] 

In [None]:
result_GNB = hyper_tuning(model, list(space), X_train, y_train)

# Get the results
print(result_GNB.best_score_)
print("")
print(result_GNB.best_estimator_)
print("")
print(result_GNB.best_params_)

## Bagging Regressor'
BaggingRegressor()

In [None]:
# Select an algorithm
model = BaggingRegressor()
model.get_params()

In [None]:
# define search space
ss_dictionary = {
    
}

search_space = [ss_dictionary] 

In [None]:
result_Bagging = hyper_tuning(model, list(space), X_train, y_train)

# Get the results
print(result_Bagging.best_score_)
print("")
print(result_Bagging.best_estimator_)
print("")
print(result_Bagging.best_params_)

## Random Forest
RandomForestRegressor()

In [None]:
# Select an algorithm
model = RandomForestRegressor()
model.get_params()

In [None]:
# define search space
ss_dictionary = {
    
}

search_space = [ss_dictionary] 

In [None]:
result_RF = hyper_tuning(model, list(space), X_train, y_train)

# Get the results
print(result_RF.best_score_)
print("")
print(result_RF.best_estimator_)
print("")
print(result_RF.best_params_)

## Extra-trees regressor
ExtraTreesRegressor()

In [None]:
# Select an algorithm
model = ExtraTreesRegressor()
model.get_params()

In [None]:
# define search space
ss_dictionary = {
    
}

search_space = [ss_dictionary] 

In [None]:
result_ETR = hyper_tuning(model, list(space), X_train, y_train)

# Get the results
print(result_ETR.best_score_)
print(result_ETR.best_estimator_)
print(result_ETR.best_params_)

## XG Boost 
XGBRegressor()

In [None]:
# Select an algorithm
model = XGBRegressor()
model.get_params()

In [None]:
# define search space
ss_dictionary = {
    
}

search_space = [ss_dictionary] 

In [None]:
result_XGB = hyper_tuning(model, list(space), X_train, y_train)

# Get the results
print(result_XGB.best_score_)
print("")
print(result_XGB.best_estimator_)
print("")
print(result_XGB.best_params_)

# Sources:
* sklearn.model_selection.RandomizedSearchCV <br>
    - https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html 
    - https://scikit-learn.org/stable/modules/grid_search.html?highlight=randomsearchcv
* sklearn.model_selection.KFold <br>
    - https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html
    - https://machinelearningmastery.com/k-fold-cross-validation/


## Models
* DecisionTreeRegressor()
    - https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html
* pmdarima
    - https://towardsdatascience.com/efficient-time-series-using-pythons-pmdarima-library-f6825407b7f0
* SVM
    - https://medium.com/all-things-ai/in-depth-parameter-tuning-for-svc-758215394769

## Metrics
* Metrics and scoring: quantifying the quality of predictions
    - https://scikit-learn.org/stable/modules/model_evaluation.html