## dCMF
Example of running the "dcmf" module with the best parameters found using the ax framework

In [28]:
import ax
from ax import RangeParameter, ChoiceParameter, FixedParameter
from ax import ParameterType, SearchSpace
from ax.service.managed_loop import optimize

In [2]:
import sys
sys.path.append("..")

In [33]:
import pprint
import numpy as np
import pickle as pkl
import time
import itertools
import os
import pprint

In [4]:
from src.dcmf import dcmf

## Loading the sample dataset

This directory contains a sample synthetic dataset generated for the augmented setting of Fig 1(c) in the [paper](https://arxiv.org/abs/1811.11427).
You can download the sample data from [here](https://drive.google.com/open?id=1EFF_kuOIg2aYyOGZY_peX3NziqCSxxP1) and unzip it to the data directory.

In [5]:
data_dir = "../data/sample_data/"

In [6]:
#Loads the dataset into a dict
#Note: This dataset contains 5-folds for the matrix X_12 (matrix R below)
num_folds = 1
#
pp = pprint.PrettyPrinter()
print("Loading data from data_dir: ",data_dir)
U1 = pkl.load(open(data_dir+"X_13.pkl",'rb'))
U2 = pkl.load(open(data_dir+"X_14.pkl",'rb'))
V1 = pkl.load(open(data_dir+"X_26.pkl",'rb'))
W1 = pkl.load(open(data_dir+"X_53.pkl",'rb'))
R_temp_dict = {}
for fold_num in np.arange(1,num_folds+1):
    Rtrain = pkl.load(open(data_dir+'/X_12_train_fold_'+str(fold_num)+'.pkl','rb'))
    Rtrain = Rtrain
    Rtrain_idx = pkl.load(open(data_dir+'/X_12_train_idx_'+str(fold_num)+'.pkl','rb')) 
    Rtest = pkl.load(open(data_dir+'/X_12_test_fold_'+str(fold_num)+'.pkl','rb'))
    Rtest_idx = pkl.load(open(data_dir+'/X_12_test_idx_'+str(fold_num)+'.pkl','rb'))
    Rdoublets = pkl.load(open(data_dir+'/R_doublets_'+str(fold_num)+'.pkl','rb'))
    R_temp_dict[fold_num] = {"Rtrain":Rtrain,"Rtrain_idx":Rtrain_idx,"Rtest":Rtest,"Rtest_idx":Rtest_idx,"Rdoublets":Rdoublets}
#
data_dict = {"U1":U1,"U2":U2,"V1":V1,"W1":W1,"R":R_temp_dict}

Loading data from data_dir:  ../data/sample_data/


In [7]:
print("U1.shape: ",U1.shape)
print("U2.shape: ",U2.shape)
print("V1.shape: ",V1.shape)
print("W1.shape: ",W1.shape)
print("R.shape: ",data_dict['R'][1]['Rtrain'].shape)

U1.shape:  (1000, 20)
U2.shape:  (1000, 150)
V1.shape:  (2000, 250)
W1.shape:  (300, 20)
R.shape:  (1000, 2000)


## Building the required data structures

Here we construct the data structures required as input to the dcmf API

#### *entity matrix relationship graph *

- **G**: dict, keys are entity IDs and values are lists of associated matrix IDs

#### * training data*
- **X_data**: dict, keys are matrix IDs and values are (1) np.array, or (2) dict, (if this matrix is in validation set **X_val**) with validation set IDs as keys & values as np.array
- **X_meta**: dict, keys are matrix IDs and values are lists of the 2 associated entity IDs

#### *validation data*
- **X_val**: dict, keys are IDs of the matrices that are part of validation set and values are dict with validation set IDs as keys and values are (1) scipy.sparse matrix, or (2) list of triplets corresponding to the validation entries (if you would like to perform classification and measure AUC)  
**Note**: To perform K folds cross validation, use K validation sets for the corresponsing matrix/matrices. In the example below, we used a single validation set with ID "1" for each of the matrices with IDs "X1" and "X2"

In [8]:
G = {
    "e1":["X1","X2","X3"],\
    "e2":["X1","X4"],\
    "e3":["X2","X5"],\
    "e4":["X3"],\
    "e5":["X5"],\
    "e6":["X4"]}
    #"e6":["X4"]}

In [9]:
X_data = {
    "X1":{"1":data_dict['R'][1]["Rtrain"]},\
    "X2":{"1":U1},\
    "X3":U2,\
    "X4":V1,\
    "X5":W1}

In [10]:
X_meta = {
    "X1":["e1","e2"],\
    "X2":["e1","e3"],\
    "X3":["e1","e4"],\
    "X4":["e2","e6"],\
    "X5":["e5","e3"]}
    #"X5":["e5","e3"]}

In [11]:
Rtest_triplets1 = [[1,1,1],[2,2,0]]
Rtest_triplets2 = [[1,1,1],[3,3,0],[1,2,0],[0,1,0],[0,2,0],[0,3,0]]

In [12]:
X_val = {
    "X1":{"1":Rtest_triplets1},
    "X2":{"1":Rtest_triplets2}
}

#### *dCMF network construction - hyperparameters*

- **kf**: float, in the range (0,1) 
- **k**: int, entity representation or encoding size. Refer Appendix A in the [paper](https://arxiv.org/abs/1811.11427) for info about how k and kf are used in the dCMF network construction. 
- **e_actf**: str, autoencoder's encoding activation function.
- **d_actf**: str, autoencoder's decoding activation function. Supported functions are "tanh","sigma","relu","lrelu"
- **is_linear_last_enc_layer**: bool, True to set linear activation for the bottleneck/encoding generation layer 
- **is_linear_last_dec_layer**: bool, True to set linear activation for the output/decoding generation layer 
- **num_chunks**: int, number of training batches to create.

In [13]:
kf = 0.5
k = 100
e_actf = "tanh"
d_actf = "tanh"
is_linear_last_enc_layer = False
is_linear_last_dec_layer = False
num_chunks = 2

#### *Optimization/training - hyperparamteres*

- **learning_rate**: float, Adam optimizer's learning rate
- **weight_decay**: float, Adam optimizers's weight decay (L2 penalty)
- **max_epochs**: int, maximum number of training epochs at which the training stops 
- **convg_thres**: float, convergence threshold 

In [14]:
learning_rate = 0.001
weight_decay = 0.05
max_epochs = 5
convg_thres = 0.1

#### *Hyperparamteres related to pre-training*

- **is_pretrain**: bool, True for pretraining 
- **pretrain_thres**: bool, pre-training convergence thresholsd
- **max_pretrain_epochs**: int, maximum number of pre-training epochs at which the training stops

In [15]:
is_pretrain=True
pretrain_thres= 0.1
max_pretrain_epochs = 2

#### *Parameters related to validation*

- **val_metric**: str, Validation performance metric. Supported metrics: ["rmse","r@k","p@k","auc"]. Where,  
     *rmse* - Root [mean square error](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html)  
     *r@k* - Recall@k. Refer section 5.2's sub-section "Evaluation metric" in the [paper](https://arxiv.org/abs/1811.11427)      
     *p@k* - Probability@k. Refer section 5.3's sub-section "Evaluation metric" in the [paper](https://arxiv.org/abs/1811.11427)      
     *auc* - [Area under the curve](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html)
    
- **is_val_transpose**: bool, True if the reconstructed matrix has to be transposed before computing the validation performance
- **at_k**: int, the value of k if the **val_metric** is either "r@k" or "p@k"

In [16]:
val_metric = "auc"
is_val_transpose = True
at_k = 10

#### *GPU - parameters *

- **is_gpu**: bool, True if pytorch tensors storage and operations has to be done in GPU
- **gpu_ids**: str, Comma separated string of CUDA GPU ID

In [17]:
is_gpu = False
gpu_ids = "1"

# Hyperparameter selection using the ax framework

- Installation instruction can be found at: [https://ax.dev/](https://ax.dev/)
- The example below is based on the following API:
[https://ax.dev/tutorials/gpei_hartmann_loop.html](https://ax.dev/tutorials/gpei_hartmann_loop.html)
- And here is a high level intro to the library: 
[https://www.youtube.com/watch?v=2c8YX0E8Qhw](https://www.youtube.com/watch?v=2c8YX0E8Qhw)

In [26]:
# Create a wrapper method for DCMF to use with the ax framework
# Here we perform the hyper parameter optimization based on the training loss
# i.e. finding the optimum hyperparams that results in minimum loss

def run_dcmf(parameterization):
    #hyper-parameters that are selected using ax
    learning_rate = parameterization["learning_rate"]
    weight_decay = parameterization["weight_decay"]
    #
    dcmf_model = dcmf(G, X_data, X_meta,\
                num_chunks=num_chunks,k=k, kf=kf, e_actf=e_actf, d_actf=d_actf,\
                learning_rate=learning_rate, weight_decay=weight_decay, convg_thres=convg_thres, max_epochs=max_epochs,\
                is_gpu=is_gpu,gpu_ids=gpu_ids,is_pretrain=is_pretrain, pretrain_thres=pretrain_thres,\
                max_pretrain_epochs=max_pretrain_epochs,X_val=X_val,val_metric=val_metric,\
                is_val_transpose=is_val_transpose, at_k=at_k,\
                is_linear_last_enc_layer=is_linear_last_enc_layer,is_linear_last_dec_layer=is_linear_last_dec_layer,num_val_sets=num_folds)
    #
    dcmf_model.fit()
    print("#")
    print("dcmf_model.out_dict_info: ")
    pp.pprint(dcmf_model.out_dict_info)
    print("#")
    #
    out_dict = {}
    out_dict["loss"] = (dcmf_model.out_dict_info["loss_all_folds_avg_sum"], 0.0)
    return out_dict

In [29]:
# The ax method that performs the hyperparams optimization
# Here we optimize only 2 hyperparameters: learning_rate and weight_decay. 
# You can add more hyperparams as in the commented section
# The optimization is performed with 2 DCMF executions. 
# You can change this by setting "total_trials" as desired
# Tip: Use atleast total_trials=50 for finding near optimum of the two hyperparameters

best_parameters, values, experiment, model = optimize(
    parameters=[
        {
            "name": "weight_decay",
            "type": "range",
            "bounds": [1e-6, 1e-2],
            "value_type": "float",  # Optional, defaults to inference from type of "bounds".
            "log_scale": False,  # Optional, defaults to False.
        },
        {
            "name": "learning_rate",
            "type": "range",
            "bounds": [1e-7, 1e-5], #mortality1y
            #"bounds": [1e-5, 1e-4], #diag
            "value_type": "float",  # Optional, defaults to inference from type of "bounds".
            "log_scale": False,  # Optional, defaults to False.
        }
        # {
        #     "name": "convg_thres",
        #     "type": "range",
        #     "bounds": [1e-5, 1e-3], #diag
        #     "value_type": "float",  # Optional, defaults to inference from type of "bounds".
        #     "log_scale": False,  # Optional, defaults to False.
        # },
        # {
        #     "name": "num_layers",
        #     "type": "choice",
        #     #"values": [0, 1, 2],
        #     "values": [2,2],
        #     "value_type": "int"
        # },
        # {
        #     "name": "k",
        #     "type": "choice",
        #     #"values": [50, 100, 150, 200],
        #     #"values": [50,100,200],
        #     "value_type": "int"
        # }
        # {
        #     "name": "actf",
        #     "type": "choice",
        #     "values": ["tanh", "sigma"],
        #     "value_type": "str"
        # }
        # {
        #     "name": "num_layers",
        #     "type": "choice",
        #     "values": [1,2],
        #     "value_type": "int"
        # }
    ],
    experiment_name="dcmf_bo",
    objective_name="loss",
    evaluation_function=run_dcmf,
    minimize=True,  # Optional, defaults to False.
    #parameter_constraints=["k%2 <= 0"],  # Optional.
    #outcome_constraints=["loss >= 0"],  # Optional.
    total_trials=2, # Optional.
)


[INFO 07-16 22:02:56] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+GPEI', steps=[Sobol for 5 trials, GPEI for subsequent trials]). Iterations after 5 will take longer to generate due to  model-fitting.
[INFO 07-16 22:02:56] ax.service.managed_loop: Started full optimization with 2 steps.
[INFO 07-16 22:02:56] ax.service.managed_loop: Running optimization trial 1...


dcmf_base.__init__ - start
dcmf_base.__init__ - end
#
dCMF:
---
#
dCMF: 
#
learning_rate:  4.171199586614966e-06
weight_decay:  0.009874873130339199
convg_thres:  0.1
max_epochs:  5
isPretrain:  True
pretrain_thres:  0.1
max_pretrain_epochs:  2
num_chunks:  2
k:  100
kf:  0.5
e_actf:  tanh
d_actf:  tanh
is_gpu:  False
gpu_ids:  1
num entities:  6
num matrices:  5
num_val_sets:  1
X_val #matrices:  2
val_metric (used only if X_val #matrices > 0):  auc
at_k (used only if X_val #matrices > 0 and val_metric is r@k or p@k):  10
is_val_transpose:  True
is_linear_last_enc_layer:  False
is_linear_last_dec_layer:  False
#
## fold_num:  1  ##
dcmf_base.__init__ - start
dcmf_base.__init__ - end
#
dCMF: 
#
learning_rate:  4.171199586614966e-06
weight_decay:  0.009874873130339199
convg_thres:  0.1
max_epochs:  5
isPretrain:  True
pretrain_thres:  0.1
max_pretrain_epochs:  2
num_chunks:  2
k:  100
kf:  0.5
e_actf:  tanh
d_actf:  tanh
is_gpu:  False
gpu_ids:  1
num entities:  6
num matrices:  5
num_v

[INFO 07-16 22:03:01] ax.service.managed_loop: Running optimization trial 2...


epoch:  5  total loss L:  8.336682319641113  Took  0.5  secs.
Computing AUC.
Rpred.shape:  (2000, 1000)
Rtest_triplets.shape:  (2, 3)
Computing AUC.
Rpred.shape:  (20, 1000)
Rtest_triplets.shape:  (6, 3)
#
dcmf.fit - end
#
dcmf_model.out_dict_info: 
{'E': 6,
 'M': 5,
 'loss_all_folds': {'1': [0.6903803944587708,
                          0.7196261584758759,
                          1.0797675251960754,
                          0.9683963656425476,
                          1.4825401306152344,
                          0.9440502226352692,
                          0.5293242186307907,
                          0.6657675802707672,
                          0.7025379836559296,
                          0.5187419801950455,
                          0.03554987534880638]},
 'loss_all_folds_avg_sum': 8.336682435125113,
 'loss_all_folds_avg_tuple': [0.6903803944587708,
                              0.7196261584758759,
                              1.0797675251960754,
                           

In [35]:
#Info about all the ax trails
print("experiment.trials: ")
pprint.pprint(experiment.trials)
print("#")

experiment.trials: 
{0: Trial(experiment_name='dcmf_bo', index=0, status=TrialStatus.COMPLETED, arm=Arm(name='0_0', parameters={'weight_decay': 0.009874873130339199, 'learning_rate': 4.171199586614966e-06})),
 1: Trial(experiment_name='dcmf_bo', index=1, status=TrialStatus.COMPLETED, arm=Arm(name='1_0', parameters={'weight_decay': 0.009241838757959194, 'learning_rate': 4.29610376060009e-07}))}
#


In [31]:
# The best hyper-parameters found using ax
print("best_parameters: ")
print(best_parameters)
print("#")

best_parameters: 
{'weight_decay': 0.009874873130339199, 'learning_rate': 4.171199586614966e-06}
#


In [32]:
#The loss corresponding to the best hyper-parameters
print("values[0]: ")
print(values[0])
print("#")

values[0]: 
{'loss': 8.336682435125113}
#


In [36]:
#The loss corresponding to all the hyper-parameters tried
for idx in experiment.trials.keys():
    trial =  experiment.trials[idx]
    print("obj: ",round(trial.objective_mean,4)," params: ",trial.arm.parameters)
print("#")

obj:  8.3367  params:  {'weight_decay': 0.009874873130339199, 'learning_rate': 4.171199586614966e-06}
obj:  10.2249  params:  {'weight_decay': 0.009241838757959194, 'learning_rate': 4.29610376060009e-07}
#


## Rerunning the DCMF with the best parameters found using the ax framework

### *Instantiating the dCMF model with the best hyper-parameters*

In [37]:
# load the best hyper-parameters
learning_rate = best_parameters["learning_rate"]
weight_decay = best_parameters["weight_decay"]

In [38]:
dcmf_model = dcmf(G, X_data, X_meta,\
            num_chunks=num_chunks,k=k, kf=kf, e_actf=e_actf, d_actf=d_actf,\
            learning_rate=learning_rate, weight_decay=weight_decay, convg_thres=convg_thres, max_epochs=max_epochs,\
            is_gpu=is_gpu,gpu_ids=gpu_ids,is_pretrain=is_pretrain, pretrain_thres=pretrain_thres,\
            max_pretrain_epochs=max_pretrain_epochs,X_val=X_val,val_metric=val_metric,\
            is_val_transpose=is_val_transpose, at_k=at_k,\
            is_linear_last_enc_layer=is_linear_last_enc_layer,is_linear_last_dec_layer=is_linear_last_dec_layer,num_val_sets=num_folds)

dcmf_base.__init__ - start
dcmf_base.__init__ - end
#
dCMF:
---
#
dCMF: 
#
learning_rate:  4.171199586614966e-06
weight_decay:  0.009874873130339199
convg_thres:  0.1
max_epochs:  5
isPretrain:  True
pretrain_thres:  0.1
max_pretrain_epochs:  2
num_chunks:  2
k:  100
kf:  0.5
e_actf:  tanh
d_actf:  tanh
is_gpu:  False
gpu_ids:  1
num entities:  6
num matrices:  5
num_val_sets:  1
X_val #matrices:  2
val_metric (used only if X_val #matrices > 0):  auc
at_k (used only if X_val #matrices > 0 and val_metric is r@k or p@k):  10
is_val_transpose:  True
is_linear_last_enc_layer:  False
is_linear_last_dec_layer:  False
#


#### *Fitting... *
- Performs the input transformation and network construction
- (Pre-trains and) trains the model to obtain the entity representations
- Reconstruct the input matrices using the entity representations obtained

In [40]:
dcmf_model.fit()

## fold_num:  1  ##
dcmf_base.__init__ - start
dcmf_base.__init__ - end
#
dCMF: 
#
learning_rate:  4.171199586614966e-06
weight_decay:  0.009874873130339199
convg_thres:  0.1
max_epochs:  5
isPretrain:  True
pretrain_thres:  0.1
max_pretrain_epochs:  2
num_chunks:  2
k:  100
kf:  0.5
e_actf:  tanh
d_actf:  tanh
is_gpu:  False
gpu_ids:  1
num entities:  6
num matrices:  5
num_val_sets:  1
X_val #matrices:  2
val_metric (used only if X_val #matrices > 0):  auc
at_k (used only if X_val #matrices > 0 and val_metric is r@k or p@k):  10
is_val_transpose:  True
is_linear_last_enc_layer:  False
is_linear_last_dec_layer:  False
#
dcmf - model construction - start
__input_transformation - start
#
concatenated-matrix construction...
e_id:  e1
X_id_list:  ['X1', 'X2', 'X3']
X_id:  X1
X[X_id].shape:  (1000, 2000)
X_id:  X2
X[X_id].shape:  (1000, 20)
X_id:  X3
X[X_id].shape:  (1000, 150)
C_dict[e].shape:  torch.Size([1000, 2170])
---
e_id:  e2
X_id_list:  ['X1', 'X4']
X_id:  X1
X[X_id].shape:  (1000

#### *Result attributes:*
- **out_dict_U**:  dict, keys are validation set IDs and values are dict with entity IDs as keys and np.array of entity representations/encodings as values
- **out_dict_X_prime**: dict, keys are matrix IDs and values are matrix reconstructions
- **out_dict_info**: dict, keys are loss/validation performance attributes and values are corresponding results.


In [41]:
dcmf_model.out_dict_U['1'].keys()

dict_keys(['e1', 'e2', 'e3', 'e4', 'e5', 'e6'])

In [42]:
dcmf_model.out_dict_X_prime['1'].keys()

dict_keys(['X1', 'X2', 'X3', 'X4', 'X5'])

In [43]:
dcmf_model.out_dict_info

{'params': {'learning_rate': 4.171199586614966e-06,
  'weight_decay': 0.009874873130339199,
  'convg_thres': 0.1,
  'max_epochs': 5,
  'is_pretrain': True,
  'pretrain_thres': 0.1,
  'max_pretrain_epochs': 2,
  'num_chunks': 2,
  'k': 100,
  'kf': 0.5,
  'e_actf': 'tanh',
  'd_actf': 'tanh',
  'is_linear_last_enc_layer': False,
  'is_linear_last_dec_layer': False},
 'num_val_sets': 1,
 'loss_all_folds': {'1': [0.6874975860118866,
   0.7197298407554626,
   1.0806135535240173,
   0.9703676998615265,
   1.3286762833595276,
   0.9447701871395111,
   0.4794623553752899,
   0.7553596496582031,
   0.7198514938354492,
   0.46184852719306946,
   0.20644105970859528]},
 'loss_all_folds_avg_tuple': [0.6874975860118866,
  0.7197298407554626,
  1.0806135535240173,
  0.9703676998615265,
  1.3286762833595276,
  0.9447701871395111,
  0.4794623553752899,
  0.7553596496582031,
  0.7198514938354492,
  0.46184852719306946,
  0.20644105970859528],
 'loss_all_folds_avg_sum': 8.354618236422539,
 'val_metric'