## Optuna Example Hyperparameter Optimization KNN, SVD, NNMF

In order for the code below to work for a different method/model, the model should have member functions like this:

    def get_hyper_params(self):
        hparams = {'num_factors': {'type': 'integer', 'values': [2, 10]},
                   'rho_1': {'type': 'loguniform', 'values': [1e-3, 100]},
                   'rho_2': {'type': 'loguniform', 'values': [1e-3, 100]}}
        return hparams

    def set_hyper_params(self, **kwargs):
        self.num_factors = kwargs['num_factors']
        self.rho1 = kwargs['rho_1']
        self.rho2 = kwargs['rho_2']

In [2]:
import os, sys
sys.path.append('../')
import hp_optimization as hopt
from optuna.visualization import plot_optimization_history, \
                                 plot_intermediate_values, \
                                 plot_contour

from design import ModelTraining
from methods.matrix_factorization.MF_STL import MF_STL
from methods.matrix_factorization.MF import SVD_MF, NonNegative_MF
from methods.knn.KNN import KNN_Normalized
from shutil import copyfile
from UTILS.utils import datasetParams2str
from datasets import SyntheticData as SD

outdir = '../outputs/experiment_004x' # make sure that it lines up with the experiment's filename
if not os.path.exists(outdir):
    os.makedirs(outdir)

In [3]:
dataset = SD.SyntheticDataCreator(num_tasks=3,cellsPerTask=400, drugsPerTask=10, function="cosine",
             normalize=False, noise=1, graph=False, test_split=0.3)
dataset.prepare_data()

In [6]:
methods  = [KNN_Normalized(k=10), SVD_MF(n_factors=100), NonNegative_MF(n_factors=100)]
i = 0 
for method in methods:
    if i == 0:
        study = hopt.optimize_hyper_params(method, dataset,n_trials=5)
        i += 1
    else:
        study = hopt.optimize_hyper_params(method, dataset,n_trials=50)
    plot_optimization_history(study)
    plot_intermediate_values(study)
    plot_contour(study)
    print("best params for "+ method.name + " : ",study.best_params)
    # copy the study, i.e. hyperparam trials
    dataset_str = datasetParams2str(dataset.__dict__)
    study_name = '{}_{}'.format(method.name,dataset_str)
    storage='hyperparam_experiments/{}.db'.format(study_name)
    copyfile(storage, os.path.join(outdir,study_name + '.db'))
    

[I 2020-07-08 00:19:31,265] Using an existing study with name 'KNN_Normalized_CCLE_GDSC_CTRP_NCI60_common_False_unseenCells_False_normalize_True_test_split_0.2_cat_point_10_drug_transform_type_pca_num_comp_10_cell_transform_type_pca_num_comp_10' instead of creating a new one.
[I 2020-07-08 00:23:17,687] Finished trial#46 with value: 0.7163296775771079 with parameters: {'k': 4}. Best is trial#40 with value: 0.7130136518528005.
[I 2020-07-08 00:27:04,747] Finished trial#47 with value: 0.7190425842743811 with parameters: {'k': 7}. Best is trial#40 with value: 0.7130136518528005.
[I 2020-07-08 00:30:52,346] Finished trial#48 with value: 0.714697045806538 with parameters: {'k': 9}. Best is trial#40 with value: 0.7130136518528005.
[I 2020-07-08 00:34:40,279] Finished trial#49 with value: 0.7183926645634353 with parameters: {'k': 6}. Best is trial#40 with value: 0.7130136518528005.
[I 2020-07-08 00:38:21,009] Finished trial#50 with value: 0.7156357278800352 with parameters: {'k': 5}. Best is 

FrozenTrial(number=40, value=0.7130136518528005, datetime_start=datetime.datetime(2020, 7, 7, 23, 57, 20, 863843), datetime_complete=datetime.datetime(2020, 7, 8, 0, 1, 21, 228128), params={'k': 6}, distributions={'k': IntUniformDistribution(high=13, low=2, step=1)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=41, state=TrialState.COMPLETE)


[W 2020-07-08 00:38:21,944] You need to set up the pruning feature to utilize `plot_intermediate_values()`


best params for KNN_Normalized :  {'k': 6}


[I 2020-07-08 00:38:22,653] Using an existing study with name 'SVD_MF_CCLE_GDSC_CTRP_NCI60_common_False_unseenCells_False_normalize_True_test_split_0.2_cat_point_10_drug_transform_type_pca_num_comp_10_cell_transform_type_pca_num_comp_10' instead of creating a new one.
[I 2020-07-08 00:43:54,707] Finished trial#40 with value: 0.7210341236900055 with parameters: {'n_factors': 119, 'n_epochs': 145}. Best is trial#34 with value: 0.7182385283967304.
[I 2020-07-08 00:49:27,288] Finished trial#41 with value: 0.717449173380104 with parameters: {'n_factors': 50, 'n_epochs': 77}. Best is trial#41 with value: 0.717449173380104.
[I 2020-07-08 00:54:59,584] Finished trial#42 with value: 0.7217463140413867 with parameters: {'n_factors': 49, 'n_epochs': 79}. Best is trial#41 with value: 0.717449173380104.
[I 2020-07-08 01:00:35,120] Finished trial#43 with value: 0.7206237196455089 with parameters: {'n_factors': 53, 'n_epochs': 48}. Best is trial#41 with value: 0.717449173380104.
[I 2020-07-08 01:06:3

FrozenTrial(number=60, value=0.717164392154884, datetime_start=datetime.datetime(2020, 7, 8, 2, 28, 21, 906913), datetime_complete=datetime.datetime(2020, 7, 8, 2, 33, 18, 682064), params={'n_epochs': 111, 'n_factors': 56}, distributions={'n_epochs': IntUniformDistribution(high=150, low=2, step=1), 'n_factors': IntUniformDistribution(high=150, low=2, step=1)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=61, state=TrialState.COMPLETE)


[W 2020-07-08 04:59:51,336] You need to set up the pruning feature to utilize `plot_intermediate_values()`


best params for SVD_MF :  {'n_epochs': 111, 'n_factors': 56}


[I 2020-07-08 04:59:52,149] A new study created with name: NonNegative_MF_CCLE_GDSC_CTRP_NCI60_common_False_unseenCells_False_normalize_True_test_split_0.2_cat_point_10_drug_transform_type_pca_num_comp_10_cell_transform_type_pca_num_comp_10
[I 2020-07-08 05:05:44,277] Finished trial#0 with value: 3.10387681152492 with parameters: {'n_factors': 20, 'n_epochs': 40}. Best is trial#0 with value: 3.10387681152492.
[I 2020-07-08 05:11:39,549] Finished trial#1 with value: 3.035816709366838 with parameters: {'n_factors': 130, 'n_epochs': 73}. Best is trial#1 with value: 3.035816709366838.
[I 2020-07-08 05:17:32,675] Finished trial#2 with value: 2.954475281596767 with parameters: {'n_factors': 2, 'n_epochs': 22}. Best is trial#2 with value: 2.954475281596767.
[I 2020-07-08 05:23:27,598] Finished trial#3 with value: 3.1041309838384596 with parameters: {'n_factors': 29, 'n_epochs': 138}. Best is trial#2 with value: 2.954475281596767.
[I 2020-07-08 05:29:20,392] Finished trial#4 with value: 3.1037

FrozenTrial(number=38, value=2.8832264680148425, datetime_start=datetime.datetime(2020, 7, 8, 8, 46, 31, 715196), datetime_complete=datetime.datetime(2020, 7, 8, 8, 52, 47, 907029), params={'n_epochs': 12, 'n_factors': 9}, distributions={'n_epochs': IntUniformDistribution(high=150, low=2, step=1), 'n_factors': IntUniformDistribution(high=150, low=2, step=1)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=39, state=TrialState.COMPLETE)


[W 2020-07-08 10:01:04,629] You need to set up the pruning feature to utilize `plot_intermediate_values()`


best params for NonNegative_MF :  {'n_epochs': 12, 'n_factors': 9}
