# in this notebook, we start doing some intelligent optimization:
namely: https://towardsdatascience.com/doing-xgboost-hyper-parameter-tuning-the-smart-way-part-1-of-2-f6d255a45dde

the article compares three different parameter tuning techniques: 
1. grid-search
2. coordinate descent  
3. genetic algorithms

"For practical reasons and to avoid the complexities involved in doing hybrid continuous-discrete optimization, most approaches to hyper-parameter tuning start off by discretizing the ranges of all hyper-parameters in question. For example, for our XGBoost experiments below we will fine-tune five hyperparameters. The ranges of possible values that we will consider for each are as follows:"

In [None]:
{"learning_rate"    : [0.05, 0.10, 0.15, 0.20, 0.25, 0.30 ] , #personally, starting to get a bit of a feel for ranges. small datasets are going to have much smaller learning rates be more effective, while large dataset will typically benefit from more trees combined with a larger learning rate
 "max_depth"        : [ 3, 4, 5, 6, 8, 10, 12, 15],
 "min_child_weight" : [ 1, 3, 5, 7 ],
 "gamma"            : [ 0.0, 0.1, 0.2 , 0.3, 0.4 ],
 "colsample_bytree" : [ 0.3, 0.4, 0.5 , 0.7 ] }

## GRID SEARCH

Exhaustive brute force search sweeping all parameter combinations. The article rightly states that lexicogrphic-ordered parameter search is not recommended because it will just get stuck in random areas of the search space, this is not very efficient and is actually like a buzzhead way of doing it. Instead, the author says that randomly searching the whole grid-space is preferable. "With this type of search, it is likely that one encounters close-to-optimal regions of the hyper-param space early on. We show some evidence of this in the example below."

## COORDINATE DESCENT 
Simpler than gradient descent :D
"The basic idea is that, at each iteration, only one of the coordinate directions of our search vector h is altered. To pick which one, we examine each coordinate direction turn and minimize the objective function by varying that coordinate and leaving all the other constant. Then we pick the direction that yields the most improvement. The algorithm stops when none of the directions yields any improvement."
Very easy to understand

## GENETIC ALGO
Genetic algorithms (GAs) are a whole class of optimization algorithms of rather general applicability and are particularly well adapted for high-dimensional discrete search spaces. A genetic algorithm tries to mimic nature by simulating a population of feasible solutions to a(n optimization) problem as they evolve through several generations and survival of the fittest is enforced. There are two basic mechanisms for generating a new generation from the previous one. One is cross-breeding in which two individuals (feasible solutions) are combined to produce two offspring. The other one is mutation. With a given mutation probability, any individual can change any of their params to another valid value. Survival of the fittest is enforced by letting fitter individuals cross-breed with higher probability than less fit individuals. The fitness of an individual is of course the negative of the loss function.

In [1]:
#https://en.wikipedia.org/wiki/Coordinate_descent
#https://github.com/infoyuxiglobal/data-analytics/blob/master/hpar_opt_experiments/coordescent.py

all code at https://github.com/infoyuxiglobal/data-analytics/tree/master/hpar_opt_experiments

In [1]:
import os 
os.chdir('D:/Projects/Re-education/Optimization/hpar_opt_experiments/')

from collections import OrderedDict
from itertools import product
import random
import xgboost as xgb
import os
from importlib import reload

import pandas as pd
#import xgboost as xgb
#%%
import pandas as pd
import numpy as np

from memoize import Memoizer
from inst_func_eval import InstFunEvaluator
from hashlib import md5

import genetic
os.chdir('D:/Projects/Prediction/Techniques Practice/Trees/XGBoost/Suicides/temp data/')

In [2]:
from sklearn.metrics import mean_squared_error as mse

In [3]:
X_train = pd.read_csv('D:/Projects/Prediction/Techniques Practice/Trees/XGBoost/Suicides/temp data/X_train.csv').drop('Unnamed: 0',axis=1)
y_train = pd.read_csv('D:/Projects/Prediction/Techniques Practice/Trees/XGBoost/Suicides/temp data/y_train.csv').drop('Unnamed: 0',axis=1).values
X_val = pd.read_csv('D:/Projects/Prediction/Techniques Practice/Trees/XGBoost/Suicides/temp data/X_val.csv').drop('Unnamed: 0',axis=1)
y_val = pd.read_csv('D:/Projects/Prediction/Techniques Practice/Trees/XGBoost/Suicides/temp data/y_val.csv').drop('Unnamed: 0',axis=1).values


In [4]:
#%%
def main() :
    
    param_grid_dic = OrderedDict([
            ("learning_rate", [0.05, 0.10, 0.15, 0.20, 0.25, 0.30 ] ),
            ("max_depth"    , [  3 , 4 , 5, 6,  8, 10, 12, 15 ] ),
            ("min_child_weight", [ 1, 3, 5, 7 ] ),
            ("gamma", [0.0, 0.1, 0.2, 0.3, 0.4 ]),
            ("colsample_bytree", [  0.3, 0.4, 0.5, 0.7 ] ),
            ])

    #%%
    train_fraction = 0.05
    test_fraction = 0.16
    #data = subsample( data0, train_fraction, test_fraction )
    data=None
    #%%
    seed=1359
    log_level=0
    #%%
    memoization_path = DATA_DIR + "/" + "xgboost_memo%g" % train_fraction
    print( "memoization_path= " + memoization_path)
    if not os.path.exists( memoization_path ) :
        os.mkdir( memoization_path )
    #%%
    fun = Memoizer( lambda param_dic : train_xgb( data, param_dic ),
                    memoization_path )
    #%%
    grid_search( param_grid_dic, fun )
    #%%


def run_trials( n_trials, param_grid_dic, fun, method, method_name ) :

    for i in range(n_trials) :
        print( "trial = %d" % i )
        fun_eval = method( param_grid_dic, fun, seed=i )

def grid_search( param_grid_dic, fun, seed, log_level=0 ) :
    #%%
    import inst_func_eval as ife
    from hashlib import md5

    param_combos_shuffled = make_param_combos( param_grid_dic, seed=seed)

    best_auc = 0
    #best_combo = None

    fun_eval = ife.InstFunEvaluator( fun, param_grid_dic )

    for i, param_dic  in enumerate( param_combos_shuffled ) :
        
        auc = fun_eval.eval_fun(fun, param_dic )
        if log_level > 0 :
            print( tuple( param_dic.items() )  )
            print( auc,  " best_auc: ", best_auc  )
        if auc > best_auc :
            best_auc = auc
            #best_combo = param_dic

    #%%
    return fun_eval


def coordinate_descent( param_grid_dic, fun, seed=1359 ) :
    #%%
    import coordescent as cd
    reload(cd)

    #param_grid = param_grid_dic.values()

    fun_min = lambda param_dic : -fun(param_dic)

    random.seed( seed  )

    best_val, best_idxs, fun_eval = cd.coordinate_descent( fun_min, param_grid_dic, x_idxs=None)
    #%%
    return fun_eval
    #%%
def genetic( param_grid_dic, fun, seed=1336 ) :
    #%%
    from importlib import reload
    import genetic as G
    reload( G )

    genes_grid = param_grid_dic

    best_val, best_idxs, fun_eval  = G.genetic_algorithm( fun,  genes_grid,
                                       init_pop = None, pop_size = 10, n_gen=30,
                                       mutation_prob=0.1,
                                       normalize = G.normalizer( 2.0, 0.01),
                                       seed=seed )

    # Params first batch
    #best_val, best_idxs, fun_eval  = G.genetic_algorithm( fun,  genes_grid,
    #                                   init_pop = None, pop_size = 30, n_gen=10,
    #                                   mutation_prob=0.1,
    #                                   normalize = G.normalizer( 2.0, 0.01),
    #                                   seed=seed )

    #print( best_val, fun_eval.eval_cnt() )
    #%%
    return fun_eval
    #%% 0.7407
    #_ = G.genetic_algorithm( fun, gene_names, genes_grid,
    #                                 init_pop = None, pop_size = 30, n_gen=10,
    #                                 mutation_prob=0.2,
    #                                 #normalize = g.normalizer( 1.0, 0.3),
    #                                 seed=1337 )
    #%%


def test( data, memoization_path ) :
    #%%
    param_dic = OrderedDict([('learning_rate', 0.2),
             ('max_depth', 5),
             ('min_child_weight', 1),
             ('gamma', 0.4),
             ('colsample_bytree', 0.7)])


    train_xgb( data, param_dic, memoization_path=None, model_type='xgb')

    #%%

def subsample( data0, train_fraction, test_fraction, seed=1337 ) :
    data = data0.copy()
    n_train = len(data0["x_train"])
    assert n_train== len(data0["y_train"])
    #%%
    np.random.seed( seed )
    r = np.random.rand(n_train) < train_fraction
    data["x_train"] = data0["x_train"].loc[r]
    data["y_train"] = data0["y_train"].loc[r]

    #%%
    np.random.seed( seed )
    n_test= len( data0["x_test"])
    assert n_test == len(data0["y_test"])

    r = np.random.rand(n_test) < test_fraction

    data["x_test"] = data0["x_test"].loc[r]
    data["y_test"] = data0["y_test"].loc[r]
    #%%
    return data


def make_param_combos( param_grid_dic, seed=1337 ) :
    #param_lens = [ len(v) for v  in param_grid_dic.values() ]
    param_names = list( param_grid_dic.keys() )

    param_idx_ranges = [ range(len(v)) for v in param_grid_dic.values() ]
    all_param_idx_combos = product( *param_idx_ranges  )
    all_param_combos = [  OrderedDict( ( name, param_grid_dic[name][idx])
                                for name, idx in  zip(param_names,idx_combo ) )
                          for idx_combo in all_param_idx_combos   ]
    #%%
    np.random.seed( seed )
    param_combos_shuffled = all_param_combos.copy()
    np.random.shuffle( param_combos_shuffled )
    #%%
    return param_combos_shuffled

#%%

def train_xgb(param_dic, model_type='xgb' ) :
    # fit model no training data

    if model_type == 'xgb' :
        model = xgb.XGBRegressor( learning_rate=param_dic["learning_rate"],
                                   max_depth=param_dic["max_depth"],
                                   min_child_weight=param_dic["min_child_weight"],
                                   colsample_bytree=param_dic["colsample_bytree"],
                                   gamma=param_dic["gamma"],
                                   tree_method = "gpu_hist",
                                   single_precision_histogram=True, 
                                   gpu_id=0 )
    else :
        model = RandomForestClassifier( n_estimators=100, min_samples_split=50, min_samples_leaf=10, max_depth=12)
    print( type(model) )
    model.fit(X_train, y_train )

    y_pred = model.predict_proba( X_val )[:,1]
    rmse = np.sqrt(mse( y_val, y_pred ))
    #
    return rmse
#%%


In [7]:
DATA_DIR = r"."
memoization_path = DATA_DIR + "/" + "xgboost_memo%g" 
#fun = Memoizer( lambda param_dic : train_xgb( param_dic ), memoization_path )
fun = Memoizer( train_xgb( param_grid_dic ), memoization_path )
param_grid_dic = OrderedDict([
            ("learning_rate", [0.05, 0.10, 0.15, 0.20, 0.25, 0.30 ] ),
            ("max_depth"    , [  3 , 4 , 5, 6,  8, 10, 12, 15 ] ),
            ("min_child_weight", [ 1, 3, 5, 7 ] ),
            ("gamma", [0.0, 0.1, 0.2, 0.3, 0.4 ]),
            ("colsample_bytree", [  0.3, 0.4, 0.5, 0.7 ] ),
            ])
#n_trials=10000
n_trials=100
#method=genetic
method=grid_search

dfs = []
for i in range(n_trials) :
        if i % 100 == 0 :
            print( "trial = %d" % i )
        fun_eval = method( param_grid_dic, fun, seed=i )
        df = pd.DataFrame( fun_eval.eval_log() ) 
        del df["pars"]
        df["method"] = method.__name__
        df["trial"] = i
        dfs.append( df )

<class 'xgboost.sklearn.XGBRegressor'>


XGBoostError: Invalid Parameter format for colsample_bytree expect float but value='[0.3, 0.4, 0.5, 0.7]'

In [12]:
memoization_path

'./xgboost_memo%g'

In [41]:
pd.concat( dfs ).to_hdf( "df_gen_10.hdf", "a/") 

ValueError: No objects to concatenate

In [None]:
pd.DataFrame( fun_eval.eval_log() ) 

In [54]:
hashlib.md5( 'hello'.encode("utf8") ).hexdigest()

'5d41402abc4b2a76b9719d911017c592'

In [None]:
fun = Memoizer( lambda param_dic : train_xgb( data, param_dic ),
                memoization_path )


In [None]:
DATA_DIR = r"."

param_grid_dic = OrderedDict([
        ("learning_rate", [0.05, 0.10, 0.15, 0.20, 0.25, 0.30 ] ),
        ("max_depth"    , [  3 , 4 , 5, 6,  8, 10, 12, 15 ] ),
        ("min_child_weight", [ 1, 3, 5, 7 ] ),
        ("gamma", [0.0, 0.1, 0.2, 0.3, 0.4 ]),
        ("colsample_bytree", [  0.3, 0.4, 0.5, 0.7 ] ),
        ])

#%%
train_fraction = 0.05
test_fraction  = 0.16
#data = subsample( data0, train_fraction, test_fraction )
data=None
#%%
seed=1359
log_level=0
#%%
memoization_path = DATA_DIR + "/" + "xgboost_memo%g" % train_fraction
print( "memoization_path= " + memoization_path)
if not os.path.exists( memoization_path ) :
    os.mkdir( memoization_path )
#%%
fun = Memoizer( lambda param_dic : train_xgb( data, param_dic ),
                memoization_path )
#%%
grid_search( param_grid_dic, fun )

In [21]:
grid_search( param_grid_dic, fun, 42 )

NameError: name 'md5' is not defined

NameError: name 'md5' is not defined