### Hyperparameters Tuning
The are several main techniques thah can be used for hyperparameters tunning

### GridSearchCV
Performs an exhaustive search over the specified range of hyperparameters (grid). For this method you need to specify every single value for each parameter (which can be tricky, especially for the continuous parameters) that you want your model to try.

**The main disadvantages:**
- If searching space is large, it takes forever
- Discrete set of parameters (if optimum value is 150 but the range is `[100, 200]`, the optimum won't be found)

### Randomized Search CV
Doesn’t set up a grid of hyperparameter values. Instead, we have to specify a distribution for each hyperparameter we want to tune. Randomized Search CV then sample values from these distributions and selects their random combinations. 

But still the optimum set of hyperparameters can be missed due to the randomness of the algorithm.

### Bayesian Methods
More advanced approaches are using the history of past trials to select hyperparameters for each trial in an informed manner. This often results in the faster hyperparameter tuning process and more accurate resulting models. There are several modules that allow implementing this algorithm:
- Hyperopt
- Optuna 

### Hyperopt Implementation 

In [2]:
from sklearn.model_selection import cross_val_score
from hyperopt import tpe, hp, fmin, space_eval, Trials
from hyperopt.pyll.stochastic import sample as ho_sample

In [5]:
class ModelHyperparametersHyperopt:
    """
    Class for hyperparameters optimizations using Hyperopt library
    CV type: cross_val_score
    
    NOTE: Delete minus in _objective for return statement when using non regression metrics 
    
    """
    
    def __init__(self, model, X_train, y_train, params_space, n_trials, cv_metric, cv_type, fit_params=None, opt_algo=tpe.suggest, seed=23):
        """
        model: callable
        
        X_train/y_train: DataFrame
        
        params_space: dict
            Hyperparameters space defined according to Hyperopt documentation 
            
        n_trials: int
            Number of iterations to find optimal hyperparameters
            
        cv_metric: str
            Name for the metric to be used according (sklearn metrics)
            
        cv_type: callable
            Cross validation type
            
        fit_params: dict 
            Additional parameters for the model
            
        opt_algo: callable
            Type of an algorithm that searches in a hyperparameters space 
            
        """
        self.model = model
        self.X_train = X_train
        self.y_train = y_train
        self.params_space = params_space
        self.n_trials = n_trials
        self.cv_metric = cv_metric
        self.cv_type = cv_type
        self.fit_params = fit_params
        self.opt_algo = opt_algo
        self.seed = seed
        self.trials = Trials()
        
    def _objective(self):
        """
        Defines the objective function
            
        """
        
        self.model.set_params(**self.params_space)
        
        cv_score = cross_val_score(self.model,
                                   self.X_train,
                                   self.y_train,
                                   scoring=self.cv_metric,
                                   cv=self.cv_type,
                                   error_score='reaise',
                                   fit_params=self.fit_params,
                                   n_jobs=-1)
        
        return -cv_score.mean()
        
    def optimize(self):
        """
        Find optimal hyperparameters by minimizing the objective function
        
        """
        
        return fmin(fn=self._objective,
                    space=self.params_space,
                    algo=self.opt_algo,
                    max_evals=self.n_trials,
                    trials=self.trials,
                    rstate=np.random.RandomState(self.seed))
    
# model_hyperparameters = ModelHyperparameters(model=model_name, X_train=X_train, y_train=y_train,
#                                              params_space=params_space, n_trials=50, cv_metric='roc_auc',
#                                              cv_type=StratifiedKFold(shuffle=True, random_state=SEED))

# best_params = model_hyperparameters.optimize()

# For better found parameters retrieval use space_eval
# space_eval(params_space, best_params)

In [5]:
# Example for hyperparameters space - GradientBoostingClassifier
params_space = {
    'model__learning_rate': hp.loguniform('learning_rate', np.log(0.001), np.log(0.5)),
    'model__n_estimators': ho_scope.int(hp.quniform('n_estimators', 50, 500, 1)),
    'model__criterion': hp.choice('criterion', ['friedman_mse', 'mse']), # don't use mae
    'model__min_samples_split': hp.loguniform('min_samples_split', np.log(0.1), np.log(1)),
    'model__max_depth':  ho_scope.int(hp.quniform('max_depth', 1, 8, 1)),
    'model__max_features': ho_scope.int(hp.quniform('max_features', 1, x_train.shape[1], 1)),
    'model__random_state': SEED
} 

### Optuna 

In [33]:
import optuna 
from optuna import samplers

In [8]:
class ModelHyperparametersOptuna:
    """
    Class for hyperparameters optimizations using Hyperopt library
    CV type: cross_val_score
    
    NOTE: Delete minus in _objective for return statement when using non regression metrics 
    
    """

    def __init__(self, model, X_train, y_train, params_space, n_trials, cv_metric, cv_type, opt_algo='tpe', direction='maximize', seed=23):
        """
        model: callable
        
        X_train/y_train: DataFrame
        
        params_space: dict
            Hyperparameters Space defined according to the Optuna documentation
        
        n_trials: int
            Number of iterations to find optimal hyperparameters
            
        cv_metric: str
            Name for the metric to be used according (sklearn metrics)
            
        cv_type: callable
            Cross validation type
            
        opt_algo: str
            Type of an optimization algorithm
            
        direction: str
            Wether to maximize or minimize the 
            
        opt_algo: callable
            Type of an algorithm that searches in a hyperparameters space 
        """
        self.model = model
        self.X_train = X_train
        self.y_train = y_train
        self.params_space = params_space
        self.n_trials = n_trials
        self.cv_metric = cv_metric
        self.cv_type = cv_type
        self.direction = direction
        self.seed = seed
        self.study = None
        
        if opt_algo == 'tpe':
            self.opt_algo = samplers.TPESampler(self.seed)
        
        
    def _objective(self, trial):
        """
        Defines the objective function
        
        """
            
        self.model.set_params(**self.params_space) 
        
        cv_score = cross_val_score(self.model, self.X_train, self.y_train,
                                   scoring=self.cv_metric, cv=self.cv_type, n_jobs=-1)
        return cv_score.mean()
    
    def optimize(self):
        self.study = optuna.create_study(sampler=self.opt_algo, direction=self.direction)
        self.study.optimize(self._objective, n_trials=self.n_trials)
        
        
# model_hyperparameters = ModelHyperparameters(model=model_name, X_train=X_train, y_train=y_train, params_space=params_space,
#                                              n_trials=50, cv_metric='roc_auc',
#                                              cv_type=StratifiedKFold(shuffle=True, random_state=SEED))

# model_hyperparameters.optimize()
# model_hyperparameters.study.best_params

### GridSearchCV and RandomizedSearchCV

In [19]:
from sklearn.model_selection import GridSearchCV , RandomizedSearchCV

In [20]:
class ModelHyperparametersGridSearchCV:
    """
    Class for hyperparameters optimizations using GridSearchCV
    CV type: cross_val_score
    
    NOTE: Delete minus in _objective for return statement when using non regression metrics 
    
    """

    def __init__(self, grid_type, model, X_train, y_train, params_space, cv_metric, cv_type, seed=23):
        """
        grid_type: callable
            GridSearchCV or RandomizedSearchCV
        
        model: callable
        
        X_train/y_train: DataFrame
        
        params_space: dict
            Hyperparameters Space defined according to the Optuna documentation
        
        cv_metric: str or dict (google: "Running GridSearchCV using multiple evaluation metrics") 
            Name for the metric to be used according (sklearn metrics)
            
        cv_type: callable
            Cross validation type

        """
        self.grid_type = grid_type
        self.model = model
        self.X_train = X_train
        self.y_train = y_train
        self.params_space = params_space
        self.cv_metric = cv_metric
        self.cv_type = cv_type
        self.seed = seed
        
    def optimize(self, n_iter):
        class_name = self.grid_type.__name__
        
        if class_name.lower().startswith('rand'):
            model = self.grid_type(estimator=elf.model,
                                   param_distributions=self.params_space,
                                   scoring=self.cv_metric,
                                   cv=self.cv_type,
                                   n_iter=n_iter,
                                   n_jobs=-1,
                                   random_state=self.seed)
        else:
            model = self.grid_type(estimator=self.model,
                                   param_grid=self.params_space,
                                   scoring=self.cv_metric,
                                   cv=self.cv_type,
                                   n_jobs=-1)
        
        model.fit(self.X_train, self.y_train)
        return model.best_params_

Некторые приближения гиперпараметров для моделей 

### GradientBoostingClassifier

In [None]:
# optuna
from sklearn.ensemble import GradientBoostingClassifier

params = {
    'learning_rate': trial.suggest_uniform('learning_rate', 0.001, 0.5),
    'n_estimators': trial.suggest_int('n_estimators', 40, 1000),
    'criterion': trial.suggest_categorical('criterion', ['friedman_mse', 'mse']),
    'min_samples_split': trial.suggest_uniform('learning_rate', 0.1, 1),
    'min_samples_leaf': trial.suggest_uniform('learning_rate', 0.1, 1),
    'max_depth': trial.suggest_int('max_depth', 1, 10),
    'max_features': trial.suggest_int('max_features', 1, x_train.shape[1])
}    

### XGBoost

In [None]:
# optuna
from xgboost import XGBClassifier
xgboost_model = XGBClassifier(**params)

params = {
    'n_estimators': trial.suggest_int('n_estimators', 40, 600),
    'max_depth': trial.suggest_int('max_depth', 2, 20),
    'min_child_weight': trial.suggest_int('min_child_weight', 2, 20),
    'learning_rate': trial.suggest_uniform('learning_rate', 0.001, 0.5),
    'base_score': trial.suggest_uniform('base_score', 0.01, 1),
    'subsample': trial.suggest_uniform('subsample', 0.50, 1),
    'colsample_bytree': trial.suggest_uniform('colsample_bytree', 0.50, 1),
    'colsample_bynode': trial.suggest_uniform('colsample_bytree', 0.50, 1),
    'colsample_bylevel': trial.suggest_uniform('colsample_bytree', 0.50, 1),
    'gamma': trial.suggest_int('gamma', 0, 10),
    'tree_method': 'gpu_hist',  
    'objective': 'binary:logistic'
}