<center><h1>GDL - AutoSF</h1></center>

In our project, we try to achieve state-of-the-art results in the ogbl-biokg dataset (https://ogb.stanford.edu/docs/leader_linkprop/#ogbl-biokg).<br/>
To do so, we rely on AutoSF by Zhang et. al.<br/>
<br/>
OGB: https://ogb.stanford.edu/<br/>
AutoSF Paper: https://arxiv.org/pdf/1904.11682.pdf <br/>
AutoSF Code: https://github.com/AutoML-4Paradigm/AutoSF<br/>
<br/>
<b>Authors: </b> <br/>
Michael Mazourik (mazoum@usi.ch)<br/>
Brian Pulfer: (pulfeb@usi.ch)

# Installing OGB and cloning AutoSF repository

In [20]:
!pip install ogb
!git clone https://github.com/AutoML-4Paradigm/AutoSF.git
%cd /content/AutoSF/

Cloning into 'AutoSF'...
remote: Enumerating objects: 205, done.[K
remote: Counting objects: 100% (205/205), done.[K
remote: Compressing objects: 100% (155/155), done.[K
remote: Total 205 (delta 65), reused 172 (delta 44), pack-reused 0[K
Receiving objects: 100% (205/205), 43.81 MiB | 31.57 MiB/s, done.
Resolving deltas: 100% (65/65), done.
/content/AutoSF


# Imports

In [21]:
# CSV
import csv

# NumPy
import numpy as np

# Pandas
import pandas as pd

# AutoSF DataLoader
from read_data import DataLoader

# OGB
from ogb.linkproppred import Evaluator
from ogb.linkproppred.dataset import LinkPropPredDataset 

# Loading Dataset

We load the dataset as documented in the OGB website. We then use this dataset object to write the <i>entity2id.txt</i> and <i>relation2id.txt</i> files (which are standard in libraries like OpenKE). These files will then be read by the DataLoader implemented by Zhang et. al.

In [22]:
def type_and_id_2_new_id(typ, id):
  base = 0

  if 'dis' in typ:
    base += 0
  elif 'prot' in typ:
    base = 10_687
  elif 'drug' in typ:
    base = 10_687 + 17_499
  elif 'effe' in typ:
    base = 10_687 + 17_499 + 10_533
  elif 'fun' in typ:
    base = 10_687 + 17_499 + 10_533 + 9_969

  return base + id

def trainset_to_newset(train_set):
  tmp_set = []

  for ht, h, r, tt, t in zip(train_set['head_type'], train_set['head'], train_set['relation'], train_set['tail_type'], train_set['tail']):
    head_id = type_and_id_2_new_id(ht, h)
    tail_id = type_and_id_2_new_id(tt, t)
  
    tmp_set.append([head_id, r, tail_id])
  
  return pd.DataFrame(np.array(tmp_set), columns=['head', 'relation', 'tail'])

def valset_to_newset(val_set):
  tmp_set = []

  for ht, h, nhs, r, tt, t, nts in zip(val_set['head_type'], val_set['head'], val_set['head_neg'], val_set['relation'], val_set['tail_type'], val_set['tail'], val_set['tail_neg']):
    h_id = type_and_id_2_new_id(ht, h)
    t_id = type_and_id_2_new_id(tt, t)

    nh_ids, nt_ids = [], []
    for nh, nt in zip(nhs, nts):
      nh_id = type_and_id_2_new_id(ht, nh)
      nt_id = type_and_id_2_new_id(tt, nt)

      nh_ids.append(nh_id)
      nt_ids.append(nt_id)

    tmp_set.append([h_id, nh_ids, r, t_id, nt_ids])
  
  return pd.DataFrame(np.array(tmp_set), columns=['head', 'head_neg', 'relation', 'tail', 'tail_neg'])

In [23]:
# Loading OGB dataset
dataset = LinkPropPredDataset(name = 'ogbl-biokg') 

split_set = dataset.get_edge_split()
train_set, valid_set, test_set = split_set["train"], split_set["valid"], split_set["test"]

# Converting OGB sets to unique IDs sets
train_set = trainset_to_newset(train_set)
valid_set = valset_to_newset(valid_set)
test_set = valset_to_newset(test_set)



In [24]:
!mkdir dataset/ogbl_biokg/standard_naming

def write_dataset(name = "test2id.txt", dataset="test"):
  with open(f'/content/AutoSF/dataset/ogbl_biokg/standard_naming/{name}', 'w') as f:
      # create the csv writer
      # write a row to the csv file
      writer = csv.writer(f, delimiter='\t')
      writer.writerow([len(dataset['head'])])

      for i, _ in enumerate(dataset['head']):
        writer.writerow([dataset['head'][i], dataset['tail'][i], dataset['relation'][i]])

def create_dictionaries(dataset):
  with open(f'/content/AutoSF/dataset/ogbl_biokg/standard_naming/relation2id.txt', 'w') as f:
      writer = csv.writer(f, delimiter='\t')

      relation_set = set(dataset['relation'])
      writer.writerow([len(relation_set)])
      
      for i, _ in enumerate(relation_set):
        writer.writerow([i, i])

  with open(f'/content/AutoSF/dataset/ogbl_biokg/standard_naming/entity2id.txt', 'w') as f:
      writer = csv.writer(f, delimiter='\t')

      entity_set = set(np.append(dataset['head'], dataset['tail']))
      writer.writerow([len(entity_set)])
      
      for i, _ in enumerate(entity_set):
        writer.writerow([i, i])

create_dictionaries(train_set)
write_dataset("test2id.txt", test_set)
write_dataset("train2id.txt", train_set)
write_dataset("valid2id.txt", valid_set)

# Defining the Data Loader
loader = DataLoader('/content/AutoSF/dataset/ogbl_biokg/standard_naming/')

mkdir: cannot create directory ‘dataset/ogbl_biokg/standard_naming’: File exists


# Option 1: Running the authors command

The authors code does not work for our dataset.<br/>
Line 48 in train.py 
<code>
valid_head_filter, valid_tail_filter, test_head_filter, test_tail_filter = loader.get_filter()
</code>  fills the RAM.

In [25]:
# Doesn't work, kills the RAM
# !python train.py --task_dir /content/AutoSF/dataset/ogbl_biokg/standard_naming/ --n_dim 128 --lr 0.5 --n_epoch 100 --n_batch 2048 --filter 0

# Setting the option '--filter 0' would solve the problem, except the authors don't handle the option in their code

# Option 2: Using the authors source code

We modify the authors source code as to fit our dataset

In [26]:
# Operating System
import os

# Time
import time

# tqdm
from tqdm import tqdm

# PyTorch
import torch
import torch.nn as nn
from torch.optim import Adam, SGD, Adagrad
from torch.optim.lr_scheduler import ExponentialLR
import torch.multiprocessing as mp

# AutoSF
from select_gpu import select_gpu
from utils import batch_by_size, cal_ranks, cal_performance

# from base_model import BaseModel
from state import StateSpace
from predict import Predictor

In [27]:
class KGEModule(nn.Module):
    def __init__(self, n_ent, n_rel, args, struct):
        super(KGEModule, self).__init__()
        self.n_ent = n_ent
        self.n_rel = n_rel
        self.args = args
        self.struct = struct
        self.lamb = args.lamb
        self.loss = torch.nn.Softplus().cuda()
        self.ent_embed = nn.Embedding(n_ent, args.n_dim)
        self.rel_embed = nn.Embedding(n_rel, args.n_dim)
        self.init_weight()

    def init_weight(self):
        for param in self.parameters():
            nn.init.xavier_uniform_(param.data)

    def forward(self, head, tail, rela, dropout=True):
        head = head.view(-1)
        tail = tail.view(-1)
        rela = rela.view(-1)

        head_embed = self.ent_embed(head)
        tail_embed = self.ent_embed(tail)
        rela_embed = self.rel_embed(rela)

        # get f = h' R t

        pos_trip = self.test_trip(head_embed, rela_embed, tail_embed)

        neg_tail = self.test_tail(head_embed, rela_embed)
        neg_head = self.test_head(rela_embed, tail_embed)

        max_t = torch.max(neg_tail, 1, keepdim=True)[0]
        max_h = torch.max(neg_head, 1, keepdim=True)[0]

        # multi-class loss: negative loglikelihood
        loss = - 2 * pos_trip + max_t + torch.log(torch.sum(torch.exp(neg_tail - max_t), 1)) + \
               max_h + torch.log(torch.sum(torch.exp(neg_head - max_h), 1))
        self.regul = torch.sum(rela_embed ** 2)

        return torch.sum(loss)

    def forward_no_loss(self, head, tail, rela):
      head = head.view(-1)
      tail = tail.view(-1)
      rela = rela.view(-1)

      head_embed = self.ent_embed(head)
      tail_embed = self.ent_embed(tail)
      rela_embed = self.rel_embed(rela)

      return self.test_trip(head_embed, rela_embed, tail_embed)

    def test_trip(self, head, rela, tail):
        vec_hr = self.get_hr(head, rela)
        scores = torch.sum(vec_hr * tail, 1)
        return scores

    def test_tail(self, head, rela):
        vec_hr = self.get_hr(head, rela)
        tail_embed = self.ent_embed.weight
        scores = torch.mm(vec_hr, tail_embed.transpose(1, 0))
        return scores

    def test_head(self, rela, tail):
        vec_rt = self.get_rt(rela, tail)
        head_embed = self.ent_embed.weight
        scores = torch.mm(vec_rt, head_embed.transpose(1, 0))
        return scores

    def get_hr(self, head, rela):
        idx = tuple(self.struct)
        length = self.args.n_dim // 4
        h1 = head[:, :length]
        r1 = rela[:, :length]

        h2 = head[:, 1 * length:2 * length]
        r2 = rela[:, 1 * length:2 * length]

        h3 = head[:, 2 * length:3 * length]
        r3 = rela[:, 2 * length:3 * length]

        h4 = head[:, 3 * length:4 * length]
        r4 = rela[:, 3 * length:4 * length]

        hs = [h1, h2, h3, h4]
        rs = [r1, r2, r3, r4]

        vs = [0, 0, 0, 0]
        vs[idx[0]] = h1 * r1
        vs[idx[1]] = h2 * r2
        vs[idx[2]] = h3 * r3
        vs[idx[3]] = h4 * r4

        res_B = (len(idx) - 4) // 4
        for b_ in range(1, res_B + 1):
            base = 4 * b_
            vs[idx[base + 2]] += rs[idx[base + 0]] * hs[idx[base + 1]] * int(idx[base + 3])
        return torch.cat(vs, 1)

    def get_rt(self, rela, tail):
        idx = tuple(self.struct)
        length = self.args.n_dim // 4
        t1 = tail[:, :length]
        r1 = rela[:, :length]

        t2 = tail[:, 1 * length:2 * length]
        r2 = rela[:, 1 * length:2 * length]

        t3 = tail[:, 2 * length:3 * length]
        r3 = rela[:, 2 * length:3 * length]

        t4 = tail[:, 3 * length:4 * length]
        r4 = rela[:, 3 * length:4 * length]

        ts = [t1, t2, t3, t4]
        rs = [r1, r2, r3, r4]

        vs = [r1 * ts[idx[0]], r2 * ts[idx[1]], r3 * ts[idx[2]], r4 * ts[idx[3]]]

        res_B = (len(idx) - 4) // 4
        for b_ in range(1, res_B + 1):
            base = 4 * b_
            vs[idx[base + 1]] += rs[idx[base + 0]] * ts[idx[base + 2]] * int(idx[base + 3])
        return torch.cat(vs, 1)

In [28]:
class BaseModel(object):
    def __init__(self, n_ent, n_rel, args, struct):
        self.model = KGEModule(n_ent, n_rel, args, struct)
        self.model.cuda()

        self.n_ent = n_ent
        self.n_rel = n_rel
        self.time_tot = 0
        self.args = args

    def train(self, train_data, tester_val, tester_tst):
        head, tail, rela = train_data
        # useful information related to cache
        n_train = len(head)

        if self.args.optim=='adam' or self.args.optim=='Adam':
            self.optimizer = Adam(self.model.parameters(), lr=self.args.lr)
        elif self.args.optim=='adagrad' or self.args.optim=='Adagrad':
            self.optimizer = Adagrad(self.model.parameters(), lr=self.args.lr)
        else:
            self.optimizer = SGD(self.model.parameters(), lr=self.args.lr)

        scheduler = ExponentialLR(self.optimizer, self.args.decay_rate)

        n_epoch = self.args.n_epoch
        n_batch = self.args.n_batch
        best_mrr = 0

        # used for counting repeated triplets for margin based loss

        for epoch in range(n_epoch):
            start = time.time()

            self.epoch = epoch
            rand_idx = torch.randperm(n_train)
            head = head[rand_idx].cuda()
            tail = tail[rand_idx].cuda()
            rela = rela[rand_idx].cuda()

            epoch_loss = 0

            for h, t, r in batch_by_size(n_batch, head, tail, rela, n_sample=n_train):
                self.model.zero_grad()

                loss = self.model.forward(h, t, r)
                loss += self.args.lamb * self.model.regul
                loss.backward()
                self.optimizer.step()
                self.prox_operator()
                epoch_loss += loss.data.cpu().numpy()

            self.time_tot += time.time() - start
            scheduler.step()

            print(f"Loss at Epoch {epoch}:   {epoch_loss}")

            if (epoch+1) %  self.args.epoch_per_test == 0:
                # output performance 
                valid_mrr, valid_mr, valid_1, valid_10 = tester_val()
                test_mrr,  test_mr,  test_1,  test_10  = tester_tst()
                out_str = '$valid mrr:%.4f, H@1:%.4f, H@10:%.4f\t\t$test mrr:%.4f, H@1:%.4f, H@10:%.4f\n'%(valid_mrr, valid_1, valid_10, test_mrr, test_1, test_10)
                if not self.args.mode == 'search':
                    print(out_str)

                # output the best performance info
                if valid_mrr > best_mrr:
                    best_mrr = valid_mrr
                    best_str = out_str
                    torch.save(model.model.state_dict(), f'mrr_{best_mrr}_model.pth')
                    
                if best_mrr < self.args.thres:
                    print('\tearly stopped in Epoch:{}, best_mrr:{}'.format(epoch+1, best_mrr), self.model.struct)
                    return best_mrr, best_str
        return best_mrr, best_str

    def prox_operator(self,):
        for n, p in self.model.named_parameters():
            if 'ent' in n:
                X = p.data.clone()
                Z = torch.norm(X, p=2, dim=1, keepdim=True)
                Z[Z<1] = 1
                X = X/Z
                p.data.copy_(X.view(self.n_ent, -1))

    def test_link(self, test_data, head_filter, tail_filter):
        heads, tails, relas = test_data
        batch_size = self.args.test_batch_size
        num_batch = len(heads) // batch_size + int(len(heads)%batch_size>0)

        head_probs = []
        tail_probs = []
        for i in range(num_batch):
            start = i * batch_size
            end = min( (i+1)*batch_size, len(heads))
            batch_h = heads[start:end].cuda()
            batch_t = tails[start:end].cuda()
            batch_r = relas[start:end].cuda()

            h_embed = self.model.ent_embed(batch_h)
            r_embed = self.model.rel_embed(batch_r)
            t_embed = self.model.ent_embed(batch_t)

            head_scores = torch.sigmoid(self.model.test_head(r_embed, t_embed)).data
            tail_scores = torch.sigmoid(self.model.test_tail(h_embed, r_embed)).data

            head_probs.append(head_scores.data.cpu().numpy())
            tail_probs.append(tail_scores.data.cpu().numpy())

        head_probs = np.concatenate(head_probs) * head_filter
        tail_probs = np.concatenate(tail_probs) * tail_filter
        head_ranks = cal_ranks(head_probs, label=heads.data.numpy())
        tail_ranks = cal_ranks(tail_probs, label=tails.data.numpy())
        h_mrr, h_mr, h_h1, h_h10 = cal_performance(head_ranks)
        t_mrr, t_mr, t_h1, t_h10 = cal_performance(tail_ranks)
        mrr = (h_mrr + t_mrr) / 2
        mr = (h_mr + t_mr) / 2
        h1  = (h_h1  + t_h1 ) / 2
        h10 = (h_h10 + t_h10) / 2
        return mrr, mr, h1, h10

In [29]:
# Model arguments (dimension of embeddings)
class ProgramArguments:
  def __init__(self, optim='adagrad', lamb=0.2, decay_rate=1.0, n_dim=1000, parrel=1, lr=0.1, thres=0.0, n_epoch=100, n_batch=2048, epoch_per_test=10, test_batch_size=100, mode='search'):
    self.optim = optim
    self.lamb = lamb 
    self.decay_rate = decay_rate
    self.n_dim = n_dim
    self.parrel = parrel
    self.lr = lr
    self.thres = thres
    self.n_epoch = n_epoch
    self.n_batch = n_batch
    self.epoch_per_test = epoch_per_test
    self.test_batch_size = test_batch_size
    self.perf_file = None
    self.mode = mode


args = ProgramArguments()

# Function to get MRR
def evaluate(eval_set, model):
  """Returns the MRR on the given set plus 3 dummy variables"""
  # OGB Evaluator
  evaluator = Evaluator('ogbl-biokg')

  # Loading test / evaluation data as (h,t,r) and (h',t',r) triplets
  heads, tails, relations = eval_set['head'], eval_set['tail'], eval_set['relation']
  heads_n, tails_n = eval_set['head_neg'], eval_set['tail_neg']

  # Conversion to PyTorch tensors and in cuda memory
  # heads, tails, relations = torch.tensor(heads).cuda(), torch.tensor(tails).cuda(), torch.tensor(relations).cuda()
  # heads_n, tails_n = torch.tensor(heads_n).cuda(), torch.tensor(tails_n).cuda()

  # print(heads.shape, tails.shape, relations.shape, heads_n.shape, tails_n.shape)

  # Getting predictions on test / evaluation set
  # model(heads, tails, heads_n, tails_n, relations) # Is this allowed ???
  counter = 0
  y_pred_pos, y_pred_neg = [], []
  print("Model evaluation: Collecting scores")
  for h, t, r, hs_n, ts_n in tqdm(zip(heads, tails, relations, heads_n, tails_n), total=len(heads)):
    h = torch.tensor([h]).cuda()
    t = torch.tensor([t]).cuda()
    r = torch.tensor([r]).cuda()
    r_n = torch.tensor(500 * [r]).cuda()
    h_n = torch.tensor(hs_n).cuda().view(-1)
    t_n = torch.tensor(ts_n).cuda().view(-1)
    
    with torch.no_grad():
      y_pos = model.forward_no_loss(h, t, r)
      y_neg = model.forward_no_loss(h_n, t_n, r)
      y_pred_pos.append(y_pos.cpu().numpy()[0])
      y_pred_neg.append(y_neg.cpu().numpy())
      
  # Evaluating predictions
  print("Model evaluation: Getting MRR")
  metrics = evaluator.eval({
      'y_pred_pos': np.array(y_pred_pos), # Predictions on the actual facts
      'y_pred_neg': np.array(y_pred_neg),    # Predictions on the corrupted facts
  })

  mrr = metrics['mrr_list'].mean()
  print(f"Model evaluation: MRR is {mrr}")
  return mrr, 0, 0, 0


def get_eval_fn(eval_set, model):
  def f():
    return evaluate(eval_set, model)
  return f

In [30]:
# Finding current device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Getting infos from Data Loader
n_ent, n_rel = loader.graph_size()

train_data = loader.load_data('train') # [[heads], [tails], [relations]] (4'762'677 triplets)
valid_data = loader.load_data('valid') # [[heads], [tails], [relations]] (  162'886 triplets)
test_data = loader.load_data('test')   # [[heads], [tails], [relations]] (  162'870 triplets)

n_train = len(train_data[0])

# This command kills the RAM
# valid_head_filter, valid_tail_filter, test_head_filter, test_tail_filter = loader.get_filter()

train_data = [torch.LongTensor(vec).to(device) for vec in train_data]
valid_data = [torch.LongTensor(vec).to(device) for vec in valid_data]
test_data  = [torch.LongTensor(vec).to(device) for vec in test_data]

In [31]:
directory = 'results'
if not os.path.exists(directory):
    os.makedirs(directory)


def run_model(i, state):
    print('newID:', i, state, len(state))
    args.perf_file = os.path.join(directory, 'biokg_perf.txt')
    torch.cuda.empty_cache()
    # sleep to avoid multiple gpu occupy
    time.sleep(10 * (i % args.parrel) + 1)
    # torch.cuda.set_device(device)
    torch.cuda.set_device(0)

    model = BaseModel(n_ent, n_rel, args, state)

    # (delete these 2 lines)
    evaluate(valid_set, model.model)

    # This is the authors original code. Kills the RAM
    # tester_val = lambda: model.test_link(valid_data, valid_head_filter, valid_tail_filter)
    # tester_tst = lambda: model.test_link(test_data, test_head_filter, test_tail_filter)
    # best_mrr, best_str = model.train(train_data, tester_val, tester_tst)

    # Our adaptation
    validate_fn = get_eval_fn(valid_set, model.model)
    test_fn = get_eval_fn(test_set, model.model)
    best_mrr, best_str = model.train(train_data, validate_fn, test_fn)

    # Storing model
    

    with open(args.perf_file, 'a') as f:
        print('ID:', i, 'structure:%s' % (str(state)), '\tvalid mrr', best_mrr)
        for s in state:
            f.write(str(s) + ' ')
        f.write('\t\tbest_performance: ' + best_str)
    torch.cuda.empty_cache()

    print(f"Process {i} returned")
    return best_mrr

In [None]:
# Main Program
os.environ["OMP_NUM_THREADS"] = "5"
os.environ["MKL_NUM_THREADS"] = "5"

try:
    # mp.set_start_method('forkserver')
    pass
except RuntimeError:
    print("Multi-Processing context was already set")

state_obj = StateSpace()
T = 32  # train for 1000 iterations
N = 8  # number of states for train
NUM_STATES = 256  # number of states for predict
N_PREDS = 5

# config predictor
pred_obj = [Predictor() for i in range(N_PREDS)]

perf_file = os.path.join(directory, 'biokg_perf.txt')

for B in [4, 6, 8, 10, 12, 14, 16]:
    best_score = 0
    num_train = 0

    time_train = 0
    time_filt = 0
    time_pred = 0
    if B == 4:
        # only five candidates which worth evaluation in f^4
        TT = 1
    else:
        TT = T
    for t in range(TT):
        states_cand = []
        matrix_cand = []
        t_filt = time.time()
        counts = 0
        for i in range(NUM_STATES):
            state, matrix, count = state_obj.gen_new_state(B, matrix_cand)
            if state is not None:
                states_cand.append(state)
                matrix_cand.append(tuple(matrix))
            counts += count
        print('B=%d Iter %d\tsampled %d candidate state for evaluate' % (B, t + 1, len(states_cand)), counts)
        states_cand = np.array(states_cand)
        matrix_cand = np.array(matrix_cand)
        time_filt = time.time() - t_filt

        t_pred = time.time()
        if len(states_cand) < N:
            states_train = states_cand
            matrix_train = matrix_cand
        else:
            scores = []
            features = []
            for state in states_cand:
                features.append(state_obj.state2srf(state))
                # features.append(state_obj.state2onehot(state))
            features = torch.FloatTensor(np.array(features))
            for m in range(N_PREDS):
                scores.append(pred_obj[m].get_scores(features))
            scores = np.mean(np.array(scores), 0)
            top_k = scores.argsort()[-N:][::-1]
            states_train = np.array(states_cand[top_k])
            matrix_train = np.array(matrix_cand[top_k])
            print('top_k states selected', scores[top_k], time.time() - t_pred)
        time_pred += time.time() - t_pred
        # train the selected N models in parallel
        scores = []
        t_train = time.time()
        # pool = mp.Pool(processes=args.parrel)
        for i, state in enumerate(states_train):
            # score = pool.apply_async(run_model, (num_train, state,))
            score = run_model(num_train, state)
            num_train += 1
            scores.append(score)
        # pool.close()
        # pool.join()
        print('~~~~~~~~~~~~~~ parallelly train B=%d finished~~~~~~~~~~~~~~ ' % (B), t)
        time_train += time.time() - t_train

        for state, matrix, score in zip(states_train, matrix_train, scores):
            scor = score.get()
            if scor > best_score:
                best_score = scor
            state_obj.history_matrix[(B - 4) // 2].append(tuple(matrix))
            state_obj.state_and_score[(B - 4) // 2].append((tuple(state), scor))
    state_obj.update_good((B - 4) // 2)
    print('number of models trained:', num_train, 'best score:', best_score)

    t_pred = time.time()
    # train the predictor
    state_obj.update_train(perf_file)
    in_x = torch.from_numpy(np.array(state_obj.pred_x, dtype='float32'))
    in_y = torch.from_numpy(np.array(state_obj.pred_y, dtype='float32'))
    idx = np.random.choice(in_y.size(0), 16)
    batch_size = max(in_y.size(0) // 8, 1)
    print('------------ start training predictor ------------', in_x.size(), in_y.size())
    for m in range(N_PREDS):
        n = in_y.size(0)
        idx = np.random.choice(n, n * 4 // N_PREDS)
        pred_obj[m].train(in_x[idx], in_y[idx], batch_size, 0.3, n_iter=200 * (m + 1))
    print('\t............ train predictor finished ............')
    time_pred += time.time() - t_pred

    print('time used:', time_train, time_filt, time_pred, B, t)

B=4 Iter 1	sampled 5 candidate state for evaluate 12856
newID: 0 [0 1 2 3] 4


  0%|          | 1/162886 [00:00<5:22:08,  8.43it/s]

Model evaluation: Collecting scores


100%|██████████| 162886/162886 [17:44<00:00, 153.01it/s]


Model evaluation: Getting MRR
Model evaluation: MRR is 0.01333604846149683
Loss at Epoch 0:   183971255438.0
Loss at Epoch 1:   161477001748.0
Loss at Epoch 2:   153572131130.0


In [None]:
torch.save(model.model.state_dict(), f'mrr_{best_mrr}_model.pth')

In [None]:
import os

In [None]:
os.getcwd()

In [None]:
!pwd