## dCMF + Bayesian Optimization(BO) based hyperparameter search
Example of running the "dcmf_bo" module with a mix of BO searched and user provided (hyper)parameters

In [1]:
import sys
sys.path.append("..")

In [2]:
import pprint
import numpy as np
import pickle as pkl
import time
import itertools
import os

In [3]:
from src.dcmf_bo import dcmf_bo

## Loading the sample dataset

This directory contains a sample synthetic dataset generated for the augmented setting of Fig 1(c) in the [paper](https://arxiv.org/abs/1811.11427).
You can download the sample data from [here](https://drive.google.com/open?id=1EFF_kuOIg2aYyOGZY_peX3NziqCSxxP1) and unzip it to the data directory.

In [4]:
data_dir = "../data/sample_data/"

In [5]:
#Loads the dataset into a dict
#Note: This dataset contains 5-folds for the matrix X_12 (matrix R below)
num_folds = 1
#
pp = pprint.PrettyPrinter()
print("Loading data from data_dir: ",data_dir)
U1 = pkl.load(open(data_dir+"X_13.pkl",'rb'))
U2 = pkl.load(open(data_dir+"X_14.pkl",'rb'))
V1 = pkl.load(open(data_dir+"X_26.pkl",'rb'))
W1 = pkl.load(open(data_dir+"X_53.pkl",'rb'))
R_temp_dict = {}
for fold_num in np.arange(1,num_folds+1):
    Rtrain = pkl.load(open(data_dir+'/X_12_train_fold_'+str(fold_num)+'.pkl','rb'))
    Rtrain = Rtrain
    Rtrain_idx = pkl.load(open(data_dir+'/X_12_train_idx_'+str(fold_num)+'.pkl','rb')) 
    Rtest = pkl.load(open(data_dir+'/X_12_test_fold_'+str(fold_num)+'.pkl','rb'))
    Rtest_idx = pkl.load(open(data_dir+'/X_12_test_idx_'+str(fold_num)+'.pkl','rb'))
    Rdoublets = pkl.load(open(data_dir+'/R_doublets_'+str(fold_num)+'.pkl','rb'))
    R_temp_dict[fold_num] = {"Rtrain":Rtrain,"Rtrain_idx":Rtrain_idx,"Rtest":Rtest,"Rtest_idx":Rtest_idx,"Rdoublets":Rdoublets}
#
data_dict = {"U1":U1,"U2":U2,"V1":V1,"W1":W1,"R":R_temp_dict}

Loading data from data_dir:  ../data/sample_data/


In [6]:
print("U1.shape: ",U1.shape)
print("U2.shape: ",U2.shape)
print("V1.shape: ",V1.shape)
print("W1.shape: ",W1.shape)
print("R.shape: ",data_dict['R'][1]['Rtrain'].shape)

U1.shape:  (1000, 20)
U2.shape:  (1000, 150)
V1.shape:  (2000, 250)
W1.shape:  (300, 20)
R.shape:  (1000, 2000)


## Building the required data structures

Here we construct the data structures required as input to the dcmf API

#### *entity matrix relationship graph *

- **G**: dict, keys are entity IDs and values are lists of associated matrix IDs

#### * training data*
- **X_data**: dict, keys are matrix IDs and values are (1) np.array, or (2) dict, (if this matrix is in validation set **X_val**) with validation set IDs as keys & values as np.array
- **X_meta**: dict, keys are matrix IDs and values are lists of the 2 associated entity IDs

#### *validation data*
- **X_val**: dict, keys are IDs of the matrices that are part of validation set and values are dict with validation set IDs as keys and values are (1) scipy.sparse matrix, or (2) list of triplets corresponding to the validation entries (if you would like to perform classification and measure AUC)  
**Note**: To perform K folds cross validation, use K validation sets for the corresponsing matrix/matrices. In the example below, we used a single validation set with ID "1" for the matrix with ID "X1"

In [7]:
G = {
    "e1":["X1","X2","X3"],\
    "e2":["X1","X4"],\
    "e3":["X2","X5"],\
    "e4":["X3"],\
    "e5":["X5"],\
    "e6":["X4"]}
    #"e6":["X4"]}

In [8]:
X_data = {
    "X1":{"1":data_dict['R'][1]["Rtrain"]},\
    "X2":U1,\
    "X3":U2,\
    "X4":V1,\
    "X5":W1}

In [9]:
X_meta = {
    "X1":["e1","e2"],\
    "X2":["e1","e3"],\
    "X3":["e1","e4"],\
    "X4":["e2","e6"],\
    "X5":["e5","e3"]}

In [10]:
Rtest_triplets = [[1,1,1],[3,3,0],[1,2,0],[0,1,0],[0,2,0],[0,3,0]]

In [11]:
X_val = {
    "X1":{"1":Rtest}
}

#### *dCMF network construction - hyperparameters*

- **kf**: float, in the range (0,1) 
- **k**: int, entity representation or encoding size. Refer Appendix A in the [paper](https://arxiv.org/abs/1811.11427) for info about how k and kf are used in the dCMF network construction. 
- **e_actf**: str, autoencoder's encoding activation function.
- **d_actf**: str, autoencoder's decoding activation function. Supported functions are "tanh","sigma","relu","lrelu"
- **is_linear_last_enc_layer**: bool, True to set linear activation for the bottleneck/encoding generation layer 
- **is_linear_last_dec_layer**: bool, True to set linear activation for the output/decoding generation layer 
- **num_chunks**: int, number of training batches to create.

In [12]:
kf = 0.5
k = 100
e_actf = "tanh"
d_actf = "tanh"
is_linear_last_enc_layer = False
is_linear_last_dec_layer = False
num_chunks = 2

#### *Optimization/training - hyperparamteres*

- **learning_rate**: float, Adam optimizer's learning rate
- **weight_decay**: float, Adam optimizers's weight decay (L2 penalty)
- **max_epochs**: int, maximum number of training epochs at which the training stops 
- **convg_thres**: float, convergence threshold 

In [13]:
learning_rate = 0.001
weight_decay = 0.05
max_epochs = 5
convg_thres = 0.1

#### *Hyperparamteres related to pre-training*

- **is_pretrain**: bool, True for pretraining 
- **pretrain_thres**: bool, pre-training convergence thresholsd
- **max_pretrain_epochs**: int, maximum number of pre-training epochs at which the training stops

In [14]:
is_pretrain=True
pretrain_thres= 0.1
max_pretrain_epochs = 2

#### *Parameters related to validation*

- **val_metric**: str, Validation performance metric. Supported metrics: ["rmse","r@k","p@k","auc"]. Where,  
     *rmse* - Root [mean square error](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html)  
     *r@k* - Recall@k. Refer section 5.2's sub-section "Evaluation metric" in the [paper](https://arxiv.org/abs/1811.11427)      
     *p@k* - Probability@k. Refer section 5.3's sub-section "Evaluation metric" in the [paper](https://arxiv.org/abs/1811.11427)      
     *auc* - [Area under the curve](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html)
    
- **is_val_transpose**: bool, True if the reconstructed matrix has to be transposed before computing the validation performance
- **at_k**: int, the value of k if the **val_metric** is either "r@k" or "p@k"

In [15]:
val_metric = "rmse"
is_val_transpose = False
at_k = 10

#### *GPU - parameters *

- **is_gpu**: bool, True if pytorch tensors storage and operations has to be done in GPU
- **gpu_ids**: str, Comma separated string of CUDA GPU ID

In [16]:
is_gpu = False
gpu_ids = "1"

#### *BO hyperparameter search and related parameters*

Following are the list of hyperparameters that can be searched through BO.
- "learning_rate"
- "convg_thres"
- "weight_decay"
- "kf"
- "k"
- "num_chunks"
- "pretrain_thres"

To enable BO based search for any of the above parameters set them as '*None*' while initializing dcmf_bo. If not None, then the used provided values will be used. 

The config file *"bo_config.py"* helps you control the domain in which each of the above hyperparameters have to be searched. Below is an excerpt from the file. The *type* can be discrete/continuous/categorical and *domain* has to be defined accordingly. Refer to GPyOpt documentation [here](http://nbviewer.jupyter.org/github/SheffieldML/GPyOpt/blob/master/manual/GPyOpt_mixed_domain.ipynb) for more details. **Note**: *the order of hyperparameters in the config file should not be changed. You may however modify the domain/type as required. Do not remove the specification of an hyperparameter even if you do not want to perform BO search for it*
  
*{"name": "kf", "type": "continuous", "domain": (0.1,0.5)}*  
  
Here are the parameters related to BO 
- **num_bo_steps**: Number of BO steps to run
- **initial_design_size**: Number of BO steps to run with random hyperparameter samples
- **best_criterion**: str, criterion for selecting best hyperparameter set from the **num_bo_steps** runs. Supported criteria: ["loss","val"], where,  
    "loss" - select the hyperparameter set which resulted in minimum loss.  
    "val" - select the hyperparamter set which resulted in maximum validation performance.  

In [17]:
best_criterion = "loss" 
num_bo_steps = 5
initial_design_size = 5

#### *Instantiating the dCMF model...*
- Initializes dCMF after validating the input data and the (hyper)parameters
- Here we perform BO based search for hyperparamters "learning_rate", "weight_decay" and "convg_thres" and the rest were initialized with the provided values

In [18]:
dcmf_bo_model = dcmf_bo(G, X_data, X_meta,\
            num_chunks=num_chunks,k=k, kf=kf, e_actf=e_actf, d_actf=d_actf,\
            learning_rate=None, weight_decay=None, convg_thres=None, max_epochs=max_epochs,\
            is_gpu=is_gpu,gpu_ids=gpu_ids,is_pretrain=is_pretrain, pretrain_thres=pretrain_thres,\
            max_pretrain_epochs=max_pretrain_epochs,X_val=X_val,val_metric=val_metric,\
            is_val_transpose=is_val_transpose, at_k=at_k, best_criterion=best_criterion,\
            is_linear_last_enc_layer=is_linear_last_enc_layer,is_linear_last_dec_layer=is_linear_last_dec_layer,num_val_sets=num_folds)

dcmf_base.__init__ - start
dcmf_base.__init__ - end
dcmf_bo.__init__ - start
dCMF + BO:
---
Input Hyperparameters:-
max_epochs:  5
isPretrain:  True
pretrain_thres:  0.1
max_pretrain_epochs:  2
num_chunks:  2
k:  100
kf:  0.5
d_actf:  tanh
d_actf:  tanh
---
Hyperparameters to be set using BO:-
learning_rate
convg_thres
weight_decay
---
val:-
num_val_sets:  1
X_val #matrices:  1
best_criterion:  loss
val_metric (used only if X_val #matrices > 0):  rmse
at_k (used only if X_val #matrices > 0 and val_metric is r@k or p@k):  10
is_val_transpose:  False
---
Others:-
is_gpu:  False
gpu_ids:  1
num entities:  6
num matrices:  5
is_linear_last_enc_layer:  False
is_linear_last_dec_layer:  False
---
dcmf_bo.__init__ - end


#### *Fitting... *
- For each step of BO
    - Performs the input transformation and network construction
    - (Pre-trains and) trains the model to obtain the entity representations
    - Reconstruct the input matrices using the entity representations obtained
- Refer Algo 1 and Algo 2 in the [paper](https://arxiv.org/abs/1811.11427) for more details.


In [19]:
dcmf_bo_model.fit()

dcmf_bo.fit - start
#BO - max_iter: 2
Models printout after each iteration is only available for GP and GP_MCMC models
#BO - max_iter: 2
#BO - max_time: inf
dcmf_base.__init__ - start
dcmf_base.__init__ - end
#
dCMF:
---
#
dCMF: 
#
learning_rate:  4.361453979173315e-05
weight_decay:  0.25995186097511425
convg_thres:  6.968342245572056e-05
max_epochs:  5
isPretrain:  True
pretrain_thres:  0.1
max_pretrain_epochs:  2
num_chunks:  2
k:  100
kf:  0.5
e_actf:  tanh
d_actf:  tanh
is_gpu:  False
gpu_ids:  1
num entities:  6
num matrices:  5
num_val_sets:  1
X_val #matrices:  1
val_metric (used only if X_val #matrices > 0):  rmse
at_k (used only if X_val #matrices > 0 and val_metric is r@k or p@k):  10
is_val_transpose:  False
is_linear_last_enc_layer:  False
is_linear_last_dec_layer:  False
#
## fold_num:  1  ##
dcmf_base.__init__ - start
dcmf_base.__init__ - end
#
dCMF: 
#
learning_rate:  4.361453979173315e-05
weight_decay:  0.25995186097511425
convg_thres:  6.968342245572056e-05
max_epochs:

dcmf_base.__init__ - start
dcmf_base.__init__ - end
#
dCMF:
---
#
dCMF: 
#
learning_rate:  4.7153669528578657e-05
weight_decay:  0.26347917695876444
convg_thres:  1.968883558354767e-05
max_epochs:  5
isPretrain:  True
pretrain_thres:  0.1
max_pretrain_epochs:  2
num_chunks:  2
k:  100
kf:  0.5
e_actf:  tanh
d_actf:  tanh
is_gpu:  False
gpu_ids:  1
num entities:  6
num matrices:  5
num_val_sets:  1
X_val #matrices:  1
val_metric (used only if X_val #matrices > 0):  rmse
at_k (used only if X_val #matrices > 0 and val_metric is r@k or p@k):  10
is_val_transpose:  False
is_linear_last_enc_layer:  False
is_linear_last_dec_layer:  False
#
## fold_num:  1  ##
dcmf_base.__init__ - start
dcmf_base.__init__ - end
#
dCMF: 
#
learning_rate:  4.7153669528578657e-05
weight_decay:  0.26347917695876444
convg_thres:  1.968883558354767e-05
max_epochs:  5
isPretrain:  True
pretrain_thres:  0.1
max_pretrain_epochs:  2
num_chunks:  2
k:  100
kf:  0.5
e_actf:  tanh
d_actf:  tanh
is_gpu:  False
gpu_ids:  1
n

#### *Result attributes:*

- **out_dict_p_hash_info**: dict, keys are loss/validation performance attributes and values are corresponding results for the best parameter set
- **out_list_D**: list of dicts, info dicts for all the BO steps

In [20]:
dcmf_bo_model.out_dict_p_hash_info

{'E': 6,
 'M': 5,
 'best_criterion': 'loss',
 'list_bo_hyperparams': ['learning_rate', 'convg_thres', 'weight_decay'],
 'loss_all_folds': {'1': [0.6861676573753357,
   0.7225450873374939,
   1.0877848267555237,
   0.9674782156944275,
   1.304173469543457,
   0.9413672089576721,
   0.05440543219447136,
   0.05987958237528801,
   0.04063842073082924,
   0.01538105309009552,
   0.008572279708459973]},
 'loss_all_folds_avg_sum': 5.888393233763054,
 'loss_all_folds_avg_tuple': [0.6861676573753357,
  0.7225450873374939,
  1.0877848267555237,
  0.9674782156944275,
  1.304173469543457,
  0.9413672089576721,
  0.05440543219447136,
  0.05987958237528801,
  0.04063842073082924,
  0.01538105309009552,
  0.008572279708459973],
 'num_val_sets': 1,
 'params': {'convg_thres': 6.968342245572056e-05,
  'd_actf': 'tanh',
  'e_actf': 'tanh',
  'is_linear_last_dec_layer': False,
  'is_linear_last_enc_layer': False,
  'is_pretrain': True,
  'k': 100,
  'kf': 0.5,
  'learning_rate': 4.361453979173315e-05,
  

In [21]:
dcmf_bo_model.out_list_D

[{'E': 6,
  'M': 5,
  'best_criterion': 'loss',
  'list_bo_hyperparams': ['learning_rate', 'convg_thres', 'weight_decay'],
  'loss_all_folds': {'1': [0.6861676573753357,
    0.7225450873374939,
    1.0877848267555237,
    0.9674782156944275,
    1.304173469543457,
    0.9413672089576721,
    0.05440543219447136,
    0.05987958237528801,
    0.04063842073082924,
    0.01538105309009552,
    0.008572279708459973]},
  'loss_all_folds_avg_sum': 5.888393233763054,
  'loss_all_folds_avg_tuple': [0.6861676573753357,
   0.7225450873374939,
   1.0877848267555237,
   0.9674782156944275,
   1.304173469543457,
   0.9413672089576721,
   0.05440543219447136,
   0.05987958237528801,
   0.04063842073082924,
   0.01538105309009552,
   0.008572279708459973],
  'num_val_sets': 1,
  'params': {'convg_thres': 6.968342245572056e-05,
   'd_actf': 'tanh',
   'e_actf': 'tanh',
   'is_linear_last_dec_layer': False,
   'is_linear_last_enc_layer': False,
   'is_pretrain': True,
   'k': 100,
   'kf': 0.5,
   'lear