# **Modeling Polypharmacy Side Effects with Graph Convolutional Networks**

Polypharmacy side effects emerge because of drug-drug interactions, in which activity of one drug may change, favorably or unfavorably, if taken with another drug. The knowledge of drug interactions is often limited because these complex relationships are rare, and are usually not observed in relatively small clinical testing. 

**Decagon** is an approach for modeling polypharmacy side effects. The approach
constructs a multimodal graph of protein-protein interactions, drug-protein target interactions, and the polypharmacy side effects, which are represented as drug-drug interactions, where each side effect is an edge of a different type. Decagon is developed specifically to handle such multimodal graphs with
a large number of edge types.

Furthermore, Decagon models particularly well polypharmacy side effects that have a strong molecular basis, while on predominantly non-molecular side effects, it achieves good performance because of effective sharing of model parameters across edge types.

Polypharmacy side effect is modeled as a multirelational link prediction problem on a multimodal graph encoding drug, protein, and side effect relationships. More precisely, these relationships are represented by a graph G = (V,R) with N nodes (e.g., proteins, drugs) and labeled edges (relations) (v_i, r, v_j ), where r is the edge type (relation type): 
(1) physical binding between two proteins, 
(2) a target relationship between a drug and a protein, or 
(3) a particular type of a side effect between two drugs. 

Original project can be found at: http://snap.stanford.edu/decagon/

# **Imports**

In [1]:
%tensorflow_version 1.x

TensorFlow 1.x selected.


In [2]:
import tensorflow as tf
import keras
import matplotlib.pyplot as plt
import random
import pandas as pd
import seaborn as sns
import numpy as np
import scipy.sparse as sp
import time
import os
import networkx as nx
import sys

from sklearn import metrics
from collections import defaultdict
from collections import Counter
from __future__ import print_function
from __future__ import division
from scipy.stats import ks_2samp
from operator import itemgetter
from itertools import combinations
%matplotlib inline

Using TensorFlow backend.


# **Functions**

**In this section most of the functions used in the project are implemented.**

In [3]:
#Returns sparse matrix.

def sparse_to_tuple(sparse_mx):
    if not sp.isspmatrix_coo(sparse_mx):
        sparse_mx = sparse_mx.tocoo()
    coords = np.vstack((sparse_mx.row, sparse_mx.col)).transpose()
    values = sparse_mx.data
    shape = sparse_mx.shape
    return coords, values, shape
    
#Computes the average precision at k. This function computes the average precision at k between two lists of items.

def apk(actual, predicted, k=10):
    if len(predicted)>k:
        predicted = predicted[:k]

    score = 0.0
    num_hits = 0.0

    for i, p in enumerate(predicted):
        if p in actual and p not in predicted[:i]:
            num_hits += 1.0
            score += num_hits / (i + 1.0)

    if not actual:
        return 0.0

    return score / min(len(actual), k)

#Computes the mean average precision at k. This function computes the mean average precision at k between two lists of lists of items.

def mapk(actual, predicted, k=10):
    return np.mean([apk(a,p,k) for a, p in zip(actual, predicted)])

#Creating a weight variable with Glorot & Bengio (AISTATS 2010) initialization. Used in Graph Convolution Calculation.

def weight_variable_glorot(input_dim, output_dim, name=""):
    init_range = np.sqrt(6.0 / (input_dim + output_dim))
    initial = tf.random_uniform([input_dim, output_dim], minval=-init_range,
                                maxval=init_range, dtype=tf.float32)
    return tf.Variable(initial, name=name)

#Returns a zero tensor with dimensions (in, out).

def zeros(input_dim, output_dim, name=None):
    initial = tf.zeros((input_dim, output_dim), dtype=tf.float32)
    return tf.Variable(initial, name=name)

#Returns a one tensor with dimension (in, out).

def ones(input_dim, output_dim, name=None):
    initial = tf.ones((input_dim, output_dim), dtype=tf.float32)
    return tf.Variable(initial, name=name)


In [4]:
#Flags are used during the training phase, in order to send the parameters to the Neural Network.

flags = tf.app.flags
FLAGS = flags.FLAGS

#Global unique layer ID dictionary for layer name assignment.

_LAYER_UIDS = {}

#Helper function, assigns unique layer IDs.

def get_layer_uid(layer_name=''):
    if layer_name not in _LAYER_UIDS:
        _LAYER_UIDS[layer_name] = 1
        return 1
    else:
        _LAYER_UIDS[layer_name] += 1
        return _LAYER_UIDS[layer_name]

# Wrapper for tf.matmul (sparse vs dense) tensor multiplication.

def dot(x, y, sparse=False):
    if sparse:
        res = tf.sparse_tensor_dense_matmul(x, y)
    else:
        res = tf.matmul(x, y)
    return res

#Dropout for sparse tensors. Currently fails for very large sparse tensors (>1M elements).

def dropout_sparse(x, keep_prob, num_nonzero_elems):
    noise_shape = [num_nonzero_elems]
    random_tensor = keep_prob
    random_tensor += tf.random_uniform(noise_shape)
    dropout_mask = tf.cast(tf.floor(random_tensor), dtype=tf.bool)
    pre_out = tf.sparse_retain(x, dropout_mask)
    return pre_out * (1./keep_prob)

# **Graph Layers**

The idea is that Decagon learns how to transform and propagate information, captured by node feature vectors, across the graph. Every node’s network neighborhood defines a different neural network information propagation architecture but these architectures then share functions/parameters that define how information is shared and propagated. For a given node Decagon performs transformation/aggregation operations on feature vectors of its neighbors. This way Decagon only takes into account the first-order neighborhood of a node and applies the same transformation across all locations in the graph. Successive application of these operations then effectively convolves information across the K-th order neighborhood (i.e., embedding of a
node depends on all the nodes that are at most K steps away), where K is
the number of successive operations of convolutional layers in the neural
network model.


In each layer, Decagon propagates latent node feature information
across edges of the graph, while taking into account the type (relation) of
an edge.

In [5]:
#Base layer class for Multi Graphs, it's expanded later. Defines basic API for all layer objects.

class MultiLayer(object):
    def __init__(self, edge_type=(), num_types=-1, **kwargs):
        self.edge_type = edge_type
        self.num_types = num_types
        allowed_kwargs = {'name', 'logging'}
        for kwarg in kwargs.keys():
            assert kwarg in allowed_kwargs, 'Invalid keyword argument: ' + kwarg
        name = kwargs.get('name')
        if not name:
            layer = self.__class__.__name__.lower()
            name = layer + '_' + str(get_layer_uid(layer))
        self.name = name
        self.vars = {}
        logging = kwargs.get('logging', False)
        self.logging = logging
        self.issparse = False

    def _call(self, inputs):
        return inputs

#Used when the output has to be computed with getting certain input.

    def __call__(self, inputs):
        with tf.name_scope(self.name):
            outputs = self._call(inputs)
            return outputs

#Graph convolution layer for sparse inputs.

class GraphConvolutionSparseMulti(MultiLayer):
    def __init__(self, input_dim, output_dim, adj_mats,
                 nonzero_feat, dropout=0., act=tf.nn.relu, support=None, **kwargs):
        super(GraphConvolutionSparseMulti, self).__init__(**kwargs)
        self.dropout = dropout
        self.adj_mats = adj_mats
        self.act = act
        self.issparse = True
        self.rank = 100

        if support is None:
            self.support = placeholders['support']
        else:
            self.support = support

        self.nonzero_feat = nonzero_feat
        with tf.variable_scope('%s_vars' % self.name):
            for k in range(self.num_types):
                self.vars['weights_%d' % k] = weight_variable_glorot(
                    input_dim[self.edge_type[1]], output_dim, name='weights_%d' % k)

#In this function, the SampledGraphConv in FastGCN is used, and changed the original GCN.

    def _call(self, inputs):
        outputs = []
        x = inputs
        for k in range(self.num_types):
          
          #Dropping certain values, where 1-self.dropout is the probability of how many will stay, and according to that, the mask is defined. 
            x = dropout_sparse(inputs, 1-self.dropout, self.nonzero_feat[self.edge_type[1]])

            #In this part, the convolution is done, feature vector for the node is multiplied by weights of the local neighbour of the node.
            x = tf.sparse_tensor_dense_matmul(x, self.vars['weights_%d' % k])

            #Support is defined (it gives directions where a priority is given comparing to other nodes) - this is done based on adj matrix and edge type.
            sup = self.support[self.edge_type][k]
            
            #Normalization of input
            norm_x = tf.nn.l2_normalize(x)
            #norm_sup = tf.nn.l2_normalize(tf.sparse.to_dense(sup))

            #In the original FastGCN they use norm_sup, but in my case, this did not work.

            #Here we multiply support with convoluted input, basically filtering out some of the neighbouring nodes.
            norm_mix = tf.sparse_tensor_dense_matmul(sup, x)
            #We are trying to get diagonal matrix, and we do this in order to get nodes from the neighbourhood of the support.
            #After this, the norm_mix matrix is a diagonal matrix, and after sampling for multinomial distribution, these nodeas are now contained on diagonal. 
            norm_mix = norm_mix*tf.transpose(tf.reduce_sum(norm_mix))
            #Getting indexes of those samples (nodes), because we know that on the diagonal matrix, all the nodes are the ones from support.
            sampledIndex = tf.multinomial(tf.log(norm_mix), self.rank)
            #Since GCN is full graph convolution, sampled graph convolution (by sampling the neighbourhood) works faster (FastGCN paper gives some interesting results for this),
            #as it works with tensors that are smaller in size.

            out = norm_mix 
            #Output is the exit from activation function (activation layer).
            outputs.append(self.act(out))

#Summing of all tensors from the list (for every of these relationships, we got the tensor that represents the output of activation function).

        outputs = tf.add_n(outputs)
        outputs = tf.nn.l2_normalize(outputs, dim=1)
        return outputs

#Basic graph convolution layer for undirected graph without edge labels.

class GraphConvolutionMulti(MultiLayer):

    def __init__(self, input_dim, output_dim, adj_mats, dropout=0., act=tf.nn.relu, **kwargs):
        super(GraphConvolutionMulti, self).__init__(**kwargs)
        self.adj_mats = adj_mats
        self.dropout = dropout
        self.act = act
        with tf.variable_scope('%s_vars' % self.name):
            for k in range(self.num_types):
                self.vars['weights_%d' % k] = weight_variable_glorot(
                    input_dim, output_dim, name='weights_%d' % k)

    def _call(self, inputs):
        outputs = []
        x = inputs
        for k in range(self.num_types):
            x = tf.nn.dropout(inputs, 1-self.dropout)
            x = tf.matmul(x, self.vars['weights_%d' % k])
            x = tf.sparse_tensor_dense_matmul(self.adj_mats[self.edge_type][k], x)
            outputs.append(self.act(x))

        outputs = tf.add_n(outputs)
        outputs = tf.nn.l2_normalize(outputs, dim=1)
        return outputs

# **Decoder**

The goal of decoder is to reconstruct labeled edges in G by relying on learned node embeddings and by treating each label (edge type) differently. In particular, decoder scores a (vi, r, vj )-triple through a function g whose goal is to assign a score g(vi, r, vj ) representing how likely it is that drugs vi and vj are interacting through a relation/side effect type r.


Two cases has to be distinguished:

(1) When v_i and v_j are drug nodes, the decoder g in assumes a global model of drug-drug interactions (i.e., R) whose variation and importance across polypharmacy side effects are described by side-effectspecific diagonal factors;

(2) When v_i and v_j are not both drug nodes, the decoder g in employs a bilinear form to decode edges from node embeddings.

In [6]:
#For different type of relationships, a different Decoder is used.

#DEDICOM Tensor Factorization Decoder model layer for link prediction. It is specifically created for Decagon. By encoding it gets matrices.
#Creates "relaxing" of tensors, meaning that it does not removes the values, but decreases their weight. In that way, it can edit priorities of relationships
#local surrounding, and if there is a drug that is connected to many proteins - relationships with that drug have higher weight.

class DEDICOMDecoder(MultiLayer):
    def __init__(self, input_dim, dropout=0., act=tf.nn.sigmoid, **kwargs):
        super(DEDICOMDecoder, self).__init__(**kwargs)
        self.dropout = dropout
        self.act = act
        with tf.variable_scope('%s_vars' % self.name):
            self.vars['global_interaction'] = weight_variable_glorot(
                input_dim, input_dim, name='global_interaction')
            for k in range(self.num_types):
                tmp = weight_variable_glorot(
                    input_dim, 1, name='local_variation_%d' % k)
                self.vars['local_variation_%d' % k] = tf.reshape(tmp, [-1])

    def _call(self, inputs):
        i, j = self.edge_type
        outputs = []
        for k in range(self.num_types):
            inputs_row = tf.nn.dropout(inputs[i], 1-self.dropout)
            inputs_col = tf.nn.dropout(inputs[j], 1-self.dropout)
            relation = tf.diag(self.vars['local_variation_%d' % k])
            product1 = tf.matmul(inputs_row, relation)
            product2 = tf.matmul(product1, self.vars['global_interaction'])
            product3 = tf.matmul(product2, relation)
            rec = tf.matmul(product3, tf.transpose(inputs_col))
            outputs.append(self.act(rec))
        return outputs

#DistMult Decoder model layer for link prediction. The same as previous one, but without decreasing of the weights.

class DistMultDecoder(MultiLayer):
    def __init__(self, input_dim, dropout=0., act=tf.nn.sigmoid, **kwargs):
        super(DistMultDecoder, self).__init__(**kwargs)
        self.dropout = dropout
        self.act = act
        with tf.variable_scope('%s_vars' % self.name):
            for k in range(self.num_types):
                tmp = weight_variable_glorot(
                    input_dim, 1, name='relation_%d' % k)
                self.vars['relation_%d' % k] = tf.reshape(tmp, [-1])

    def _call(self, inputs):
        i, j = self.edge_type
        outputs = []
        for k in range(self.num_types):
            inputs_row = tf.nn.dropout(inputs[i], 1-self.dropout)
            inputs_col = tf.nn.dropout(inputs[j], 1-self.dropout)
            relation = tf.diag(self.vars['relation_%d' % k])
            intermediate_product = tf.matmul(inputs_row, relation)
            rec = tf.matmul(intermediate_product, tf.transpose(inputs_col))
            outputs.append(self.act(rec))
        return outputs

#Bilinear Decoder model layer for link prediction.

class BilinearDecoder(MultiLayer):
    def __init__(self, input_dim, dropout=0., act=tf.nn.sigmoid, **kwargs):
        super(BilinearDecoder, self).__init__(**kwargs)
        self.dropout = dropout
        self.act = act
        with tf.variable_scope('%s_vars' % self.name):
            for k in range(self.num_types):
                self.vars['relation_%d' % k] = weight_variable_glorot(
                    input_dim, input_dim, name='relation_%d' % k)

    def _call(self, inputs):
        i, j = self.edge_type
        outputs = []
        for k in range(self.num_types):
            inputs_row = tf.nn.dropout(inputs[i], 1-self.dropout)
            inputs_col = tf.nn.dropout(inputs[j], 1-self.dropout)
            intermediate_product = tf.matmul(inputs_row, self.vars['relation_%d' % k])
            rec = tf.matmul(intermediate_product, tf.transpose(inputs_col))
            outputs.append(self.act(rec))
        return outputs

#Inner Product Decoder for link prediction.

class InnerProductDecoder(MultiLayer):
    def __init__(self, input_dim, dropout=0., act=tf.nn.sigmoid, **kwargs):
        super(InnerProductDecoder, self).__init__(**kwargs)
        self.dropout = dropout
        self.act = act

    def _call(self, inputs):
        i, j = self.edge_type
        outputs = []
        for k in range(self.num_types):
            inputs_row = tf.nn.dropout(inputs[i], 1-self.dropout)
            inputs_col = tf.nn.dropout(inputs[j], 1-self.dropout)
            rec = tf.matmul(inputs_row, tf.transpose(inputs_col))
            outputs.append(self.act(rec))
        return outputs

# **Edge Minibatch Iterator**

This minibatch iterator iterates over batches of sampled edges or random pairs of co-occuring edges.

In [7]:
np.random.seed(123)

#In order to do the training, Dataset has to be divided in Batches, where every of the batches is being sent to separate (tensor) units (going through edges of graph model).

class EdgeMinibatchIterator(object):
    def __init__(self, adj_mats, feat, edge_types, batch_size=100, val_test_size=0.01):
        self.adj_mats = adj_mats
        self.feat = feat
        self.edge_types = edge_types
        self.batch_size = batch_size
        self.val_test_size = val_test_size
        self.num_edge_types = sum(self.edge_types.values())

        self.iter = 0
        self.freebatch_edge_types= list(range(self.num_edge_types))
        self.batch_num = [0]*self.num_edge_types
        self.current_edge_type_idx = 0
        self.edge_type2idx = {}
        self.idx2edge_type = {}
        r = 0

#Going throught edges of one batch and creating of dynamically allocated arrays, where all members are 0. 
#Later, this values are filled with different values (metrics, accuracy,...)

        for i, j in self.edge_types:
            for k in range(self.edge_types[i,j]):
                self.edge_type2idx[i, j, k] = r
                self.idx2edge_type[r] = i, j, k
                r += 1

        self.train_edges = {edge_type: [None]*n for edge_type, n in self.edge_types.items()}
        self.val_edges = {edge_type: [None]*n for edge_type, n in self.edge_types.items()}
        self.test_edges = {edge_type: [None]*n for edge_type, n in self.edge_types.items()}
        self.test_edges_false = {edge_type: [None]*n for edge_type, n in self.edge_types.items()}
        self.val_edges_false = {edge_type: [None]*n for edge_type, n in self.edge_types.items()}

        #Function to build test and val sets with val_test_size positive links (is there a effect between two drugs).

        self.adj_train = {edge_type: [None]*n for edge_type, n in self.edge_types.items()}
        for i, j in self.edge_types:
            for k in range(self.edge_types[i,j]):
                print("Minibatch edge type:", "(%d, %d, %d)" % (i, j, k))
                self.mask_test_edges((i, j), k)

                print("Train edges=", "%04d" % len(self.train_edges[i,j][k]))
                print("Val edges=", "%04d" % len(self.val_edges[i,j][k]))
                print("Test edges=", "%04d" % len(self.test_edges[i,j][k]))

#Converting adj matrix to sparse matrix, and then to tuple. Tuple contains coordinates, and values at those coordinates (similar to sparse tensor). 

    def preprocess_graph(self, adj):
        adj = sp.coo_matrix(adj)
        if adj.shape[0] == adj.shape[1]:
            adj_ = adj + sp.eye(adj.shape[0])
            rowsum = np.array(adj_.sum(1))
            degree_mat_inv_sqrt = sp.diags(np.power(rowsum, -0.5).flatten())
            adj_normalized = adj_.dot(degree_mat_inv_sqrt).transpose().dot(degree_mat_inv_sqrt).tocoo()
        else:
            rowsum = np.array(adj.sum(1))
            colsum = np.array(adj.sum(0))
            rowdegree_mat_inv = sp.diags(np.nan_to_num(np.power(rowsum, -0.5)).flatten())
            coldegree_mat_inv = sp.diags(np.nan_to_num(np.power(colsum, -0.5)).flatten())
            adj_normalized = rowdegree_mat_inv.dot(adj).dot(coldegree_mat_inv).tocoo()
        return sparse_to_tuple(adj_normalized)

    def _ismember(self, a, b):
        a = np.array(a)
        b = np.array(b)
        rows_close = np.all(a - b == 0, axis=1)
        return np.any(rows_close)

#Mask of test edges, if it is a test edge, value in the tensor is 1, otherwise it is 0.

    def mask_test_edges(self, edge_type, type_idx):
        edges_all, _, _ = sparse_to_tuple(self.adj_mats[edge_type][type_idx])
        num_test = max(50, int(np.floor(edges_all.shape[0] * self.val_test_size)))
        num_val = max(50, int(np.floor(edges_all.shape[0] * self.val_test_size)))

        all_edge_idx = list(range(edges_all.shape[0]))
        np.random.shuffle(all_edge_idx)

        val_edge_idx = all_edge_idx[:num_val]
        val_edges = edges_all[val_edge_idx]

        test_edge_idx = all_edge_idx[num_val:(num_val + num_test)]
        test_edges = edges_all[test_edge_idx]

        train_edges = np.delete(edges_all, np.hstack([test_edge_idx, val_edge_idx]), axis=0)

        test_edges_false = []
        while len(test_edges_false) < len(test_edges):
            if len(test_edges_false) % 1000 == 0:
                print("Constructing test edges=", "%04d/%04d" % (len(test_edges_false), len(test_edges)))
            idx_i = np.random.randint(0, self.adj_mats[edge_type][type_idx].shape[0])
            idx_j = np.random.randint(0, self.adj_mats[edge_type][type_idx].shape[1])
            if self._ismember([idx_i, idx_j], edges_all):
                continue
            if test_edges_false:
                if self._ismember([idx_i, idx_j], test_edges_false):
                    continue
            test_edges_false.append([idx_i, idx_j])

        val_edges_false = []
        while len(val_edges_false) < len(val_edges):
            if len(val_edges_false) % 1000 == 0:
                print("Constructing val edges=", "%04d/%04d" % (len(val_edges_false), len(val_edges)))
            idx_i = np.random.randint(0, self.adj_mats[edge_type][type_idx].shape[0])
            idx_j = np.random.randint(0, self.adj_mats[edge_type][type_idx].shape[1])
            if self._ismember([idx_i, idx_j], edges_all):
                continue
            if val_edges_false:
                if self._ismember([idx_i, idx_j], val_edges_false):
                    continue
            val_edges_false.append([idx_i, idx_j])

        #Re-build adj matrices.

        data = np.ones(train_edges.shape[0])
        adj_train = sp.csr_matrix(
            (data, (train_edges[:, 0], train_edges[:, 1])),
            shape=self.adj_mats[edge_type][type_idx].shape)
        self.adj_train[edge_type][type_idx] = self.preprocess_graph(adj_train)

        self.train_edges[edge_type][type_idx] = train_edges
        self.val_edges[edge_type][type_idx] = val_edges
        self.val_edges_false[edge_type][type_idx] = np.array(val_edges_false)
        self.test_edges[edge_type][type_idx] = test_edges
        self.test_edges_false[edge_type][type_idx] = np.array(test_edges_false)

    def end(self):
        finished = len(self.freebatch_edge_types) == 0
        return finished

#Update of placeholders.

    def update_feed_dict(self, feed_dict, dropout, placeholders):

        # construct feed dictionary

        feed_dict.update({
            placeholders['adj_mats_%d,%d,%d' % (i,j,k)]: self.adj_train[i,j][k]
            for i, j in self.edge_types for k in range(self.edge_types[i,j])})
        feed_dict.update({placeholders['feat_%d' % i]: self.feat[i] for i, _ in self.edge_types})
        feed_dict.update({placeholders['dropout']: dropout})

        return feed_dict

#Update of values for a certain batch.

    def batch_feed_dict(self, batch_edges, batch_edge_type, placeholders):
        feed_dict = dict()
        feed_dict.update({placeholders['batch']: batch_edges})
        feed_dict.update({placeholders['batch_edge_type_idx']: batch_edge_type})
        feed_dict.update({placeholders['batch_row_edge_type']: self.idx2edge_type[batch_edge_type][0]})
        feed_dict.update({placeholders['batch_col_edge_type']: self.idx2edge_type[batch_edge_type][1]})

        return feed_dict

#Select a random edge type and a batch of edges of the same type (finding the next batch).

    def next_minibatch_feed_dict(self, placeholders):
        while True:
            if self.iter % 4 == 0:
                # gene-gene relation
                self.current_edge_type_idx = self.edge_type2idx[0, 0, 0]
            elif self.iter % 4 == 1:
                # gene-drug relation
                self.current_edge_type_idx = self.edge_type2idx[0, 1, 0]
            elif self.iter % 4 == 2:
                # drug-gene relation
                self.current_edge_type_idx = self.edge_type2idx[1, 0, 0]
            else:
                # random side effect relation
                if len(self.freebatch_edge_types) > 0:
                    self.current_edge_type_idx = np.random.choice(self.freebatch_edge_types)
                else:
                    self.current_edge_type_idx = self.edge_type2idx[0, 0, 0]
                    self.iter = 0

            i, j, k = self.idx2edge_type[self.current_edge_type_idx]
            if self.batch_num[self.current_edge_type_idx] * self.batch_size \
                   <= len(self.train_edges[i,j][k]) - self.batch_size + 1:
                break
            else:
                if self.iter % 4 in [0, 1, 2]:
                    self.batch_num[self.current_edge_type_idx] = 0
                else:
                    self.freebatch_edge_types.remove(self.current_edge_type_idx)

        self.iter += 1
        start = self.batch_num[self.current_edge_type_idx] * self.batch_size
        self.batch_num[self.current_edge_type_idx] += 1
        batch_edges = self.train_edges[i,j][k][start: start + self.batch_size]
        return self.batch_feed_dict(batch_edges, self.current_edge_type_idx, placeholders)

#Number of batches.

    def num_training_batches(self, edge_type, type_idx):
        return len(self.train_edges[edge_type][type_idx]) // self.batch_size + 1

#Update of validation part of the Batch.

    def val_feed_dict(self, edge_type, type_idx, placeholders, size=None):
        edge_list = self.val_edges[edge_type][type_idx]
        if size is None:
            return self.batch_feed_dict(edge_list, edge_type, placeholders)
        else:
            ind = np.random.permutation(len(edge_list))
            val_edges = [edge_list[i] for i in ind[:min(size, len(ind))]]
            return self.batch_feed_dict(val_edges, edge_type, placeholders)

#Used for permutation of edges, in order to not have overfitting.

    def shuffle(self):
        for edge_type in self.edge_types:
            for k in range(self.edge_types[edge_type]):
                self.train_edges[edge_type][k] = np.random.permutation(self.train_edges[edge_type][k])
                self.batch_num[self.edge_type2idx[edge_type[0], edge_type[1], k]] = 0
        self.current_edge_type_idx = 0
        self.freebatch_edge_types = list(range(self.num_edge_types))
        self.freebatch_edge_types.remove(self.edge_type2idx[0, 0, 0])
        self.freebatch_edge_types.remove(self.edge_type2idx[0, 1, 0])
        self.freebatch_edge_types.remove(self.edge_type2idx[1, 0, 0])
        self.iter = 0

# **Model**

In [8]:
#Model is an abstract class, that contains basic constructor, and functions (build, fit, predict).

class Model(object):
    def __init__(self, **kwargs):
        allowed_kwargs = {'name', 'logging'}
        for kwarg in kwargs.keys():
            assert kwarg in allowed_kwargs, 'Invalid keyword argument: ' + kwarg

        for kwarg in kwargs.keys():
            assert kwarg in allowed_kwargs, 'Invalid keyword argument: ' + kwarg
        name = kwargs.get('name')
        if not name:
            name = self.__class__.__name__.lower()
        self.name = name

        logging = kwargs.get('logging', False)
        self.logging = logging

        self.vars = {}

    def _build(self):
        raise NotImplementedError

    def build(self):
        """ Wrapper for _build() """
        with tf.variable_scope(self.name):
            self._build()
        variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name)
        self.vars = {var.name: var for var in variables}

    def fit(self):
        pass

    def predict(self):
        pass

In [9]:
class DecagonModel(Model):
    def __init__(self, placeholders, num_feat, nonzero_feat, edge_types, decoders, **kwargs):
        super(DecagonModel, self).__init__(**kwargs)
        self.edge_types = edge_types
        self.num_edge_types = sum(self.edge_types.values())
        self.num_obj_types = max([i for i, _ in self.edge_types]) + 1
        self.decoders = decoders
        self.inputs = {i: placeholders['feat_%d' % i] for i, _ in self.edge_types}
        self.input_dim = num_feat
        self.nonzero_feat = nonzero_feat
        self.placeholders = placeholders
        self.dropout = placeholders['dropout']
        self.support = placeholders['support']
        self.adj_mats = {et: [
            placeholders['adj_mats_%d,%d,%d' % (et[0], et[1], k)] for k in range(n)]
            for et, n in self.edge_types.items()}
        self.build()

#hidden1 is defined as dictionary (defined as layers for all possible relationships). This is the first, encoding part, where sparse layer is used, and therefore,
#the GraphConvolutionSparseMulti (function for sparse convolution) is used. The linear function f(x)= x is used as the activation fuction.

    def _build(self):
        self.hidden1 = defaultdict(list)
        for i, j in self.edge_types:
            self.hidden1[i].append(GraphConvolutionSparseMulti(
                input_dim=self.input_dim, output_dim=FLAGS.hidden1,
                edge_type=(i,j), num_types=self.edge_types[i,j],
                adj_mats=self.adj_mats, nonzero_feat=self.nonzero_feat,
                act=lambda x: x, dropout=self.dropout, support=self.support,
                logging=self.logging)(self.inputs[j]))
        for i, hid1 in self.hidden1.items():
            self.hidden1[i] = tf.nn.relu(tf.add_n(hid1))

#Dense layer is used for embedding, the first part of the decoder part.

        self.embeddings_reltyp = defaultdict(list)
        for i, j in self.edge_types:
            self.embeddings_reltyp[i].append(GraphConvolutionMulti(
                input_dim=FLAGS.hidden1, output_dim=FLAGS.hidden2,
                edge_type=(i,j), num_types=self.edge_types[i,j],
                adj_mats=self.adj_mats, act=lambda x: x,
                dropout=self.dropout, logging=self.logging)(self.hidden1[j]))
        self.embeddings = [None] * self.num_obj_types
        for i, embeds in self.embeddings_reltyp.items():
            # self.embeddings[i] = tf.nn.relu(tf.add_n(embeds))
            self.embeddings[i] = tf.add_n(embeds)

#Depending on the edge type, a specific decoder is used.

        self.edge_type2decoder = {}
        for i, j in self.edge_types:
            decoder = self.decoders[i, j]
            if decoder == 'innerproduct':
                self.edge_type2decoder[i, j] = InnerProductDecoder(
                    input_dim=FLAGS.hidden2, logging=self.logging,
                    edge_type=(i, j), num_types=self.edge_types[i, j],
                    act=lambda x: x, dropout=self.dropout)
            elif decoder == 'distmult':
                self.edge_type2decoder[i, j] = DistMultDecoder(
                    input_dim=FLAGS.hidden2, logging=self.logging,
                    edge_type=(i, j), num_types=self.edge_types[i, j],
                    act=lambda x: x, dropout=self.dropout)
            elif decoder == 'bilinear':
                self.edge_type2decoder[i, j] = BilinearDecoder(
                    input_dim=FLAGS.hidden2, logging=self.logging,
                    edge_type=(i, j), num_types=self.edge_types[i, j],
                    act=lambda x: x, dropout=self.dropout)
            elif decoder == 'dedicom':
                self.edge_type2decoder[i, j] = DEDICOMDecoder(
                    input_dim=FLAGS.hidden2, logging=self.logging,
                    edge_type=(i, j), num_types=self.edge_types[i, j],
                    act=lambda x: x, dropout=self.dropout)
            else:
                raise ValueError('Unknown decoder type')

#The goal of decoder is to reconstruct labeled edges in G by relying on learned node embeddings and by treating each label (edge type) differently.

        self.latent_inters = []
        self.latent_varies = []
        for edge_type in self.edge_types:
            decoder = self.decoders[edge_type]
            for k in range(self.edge_types[edge_type]):
                if decoder == 'innerproduct':
                    glb = tf.eye(FLAGS.hidden2, FLAGS.hidden2)
                    loc = tf.eye(FLAGS.hidden2, FLAGS.hidden2)
                elif decoder == 'distmult':
                    glb = tf.diag(self.edge_type2decoder[edge_type].vars['relation_%d' % k])
                    loc = tf.eye(FLAGS.hidden2, FLAGS.hidden2)
                elif decoder == 'bilinear':
                    glb = self.edge_type2decoder[edge_type].vars['relation_%d' % k]
                    loc = tf.eye(FLAGS.hidden2, FLAGS.hidden2)
                elif decoder == 'dedicom':
                    glb = self.edge_type2decoder[edge_type].vars['global_interaction']
                    loc = tf.diag(self.edge_type2decoder[edge_type].vars['local_variation_%d' % k])
                else:
                    raise ValueError('Unknown decoder type')

                self.latent_inters.append(glb)
                self.latent_varies.append(loc)

# **Decagon Optimizer**

In [10]:
class DecagonOptimizer(object):
    def __init__(self, embeddings, latent_inters, latent_varies,
                 degrees, edge_types, edge_type2dim, placeholders,
                 margin=0.1, neg_sample_weights=1., batch_size=100):
        self.embeddings= embeddings
        #Latent_inters and latent_varies represent "hiddent" edges between the nodes, for example if two drugs are connected through a protein.
        self.latent_inters = latent_inters
        self.latent_varies = latent_varies
        self.edge_types = edge_types
        self.degrees = degrees
        self.edge_type2dim = edge_type2dim
        self.obj_type2n = {i: self.edge_type2dim[i,j][0][0] for i, j in self.edge_types}
        self.margin = margin
        self.neg_sample_weights = neg_sample_weights
        self.batch_size = batch_size

        self.inputs = placeholders['batch']
        self.batch_edge_type_idx = placeholders['batch_edge_type_idx']
        self.batch_row_edge_type = placeholders['batch_row_edge_type']
        self.batch_col_edge_type = placeholders['batch_col_edge_type']

        self.row_inputs = tf.squeeze(gather_cols(self.inputs, [0]))
        self.col_inputs = tf.squeeze(gather_cols(self.inputs, [1]))

        obj_type_n = [self.obj_type2n[i] for i in range(len(self.embeddings))]
        self.obj_type_lookup_start = tf.cumsum([0] + obj_type_n[:-1])
        self.obj_type_lookup_end = tf.cumsum(obj_type_n)

        labels = tf.reshape(tf.cast(self.row_inputs, dtype=tf.int64), [self.batch_size, 1])
        neg_samples_list = []
        for i, j in self.edge_types:
          #Going through all edges and creating candidates whose links are true/false. 'True' means that there is a valid connection (bio-chemically) between two drugs,
          #False means that there is no valid connection, but they can be connected through the same protein.
            for k in range(self.edge_types[i,j]):
                neg_samples, _, _ = tf.nn.fixed_unigram_candidate_sampler(
                    true_classes=labels,
                    num_true=1,
                    num_sampled=self.batch_size,
                    unique=False,
                    range_max=len(self.degrees[i][k]),
                    distortion=0.75,
                    unigrams=self.degrees[i][k].tolist())
                neg_samples_list.append(neg_samples)
        self.neg_samples = tf.gather(neg_samples_list, self.batch_edge_type_idx)

        self.preds = self.batch_predict(self.row_inputs, self.col_inputs)
        self.outputs = tf.diag_part(self.preds)
        self.outputs = tf.reshape(self.outputs, [-1])

        self.neg_preds = self.batch_predict(self.neg_samples, self.col_inputs)
        self.neg_outputs = tf.diag_part(self.neg_preds)
        self.neg_outputs = tf.reshape(self.neg_outputs, [-1])

        self.predict()

        self._build()
        
        #Prediction for a specific batch (decoding of embedding)

    def batch_predict(self, row_inputs, col_inputs):
        concatenated = tf.concat(self.embeddings, 0)

        ind_start = tf.gather(self.obj_type_lookup_start, self.batch_row_edge_type)
        ind_end = tf.gather(self.obj_type_lookup_end, self.batch_row_edge_type)
        indices = tf.range(ind_start, ind_end)
        row_embeds = tf.gather(concatenated, indices)
        row_embeds = tf.gather(row_embeds, row_inputs)

        ind_start = tf.gather(self.obj_type_lookup_start, self.batch_col_edge_type)
        ind_end = tf.gather(self.obj_type_lookup_end, self.batch_col_edge_type)
        indices = tf.range(ind_start, ind_end)
        col_embeds = tf.gather(concatenated, indices)
        col_embeds = tf.gather(col_embeds, col_inputs)

        latent_inter = tf.gather(self.latent_inters, self.batch_edge_type_idx)
        latent_var = tf.gather(self.latent_varies, self.batch_edge_type_idx)

        product1 = tf.matmul(row_embeds, latent_var)
        product2 = tf.matmul(product1, latent_inter)
        product3 = tf.matmul(product2, latent_var)
        preds = tf.matmul(product3, tf.transpose(col_embeds))
        return preds

        #Prediction of outputs for all batches

    def predict(self):
        concatenated = tf.concat(self.embeddings, 0)

        ind_start = tf.gather(self.obj_type_lookup_start, self.batch_row_edge_type)
        ind_end = tf.gather(self.obj_type_lookup_end, self.batch_row_edge_type)
        indices = tf.range(ind_start, ind_end)
        row_embeds = tf.gather(concatenated, indices)

        ind_start = tf.gather(self.obj_type_lookup_start, self.batch_col_edge_type)
        ind_end = tf.gather(self.obj_type_lookup_end, self.batch_col_edge_type)
        indices = tf.range(ind_start, ind_end)
        col_embeds = tf.gather(concatenated, indices)

        latent_inter = tf.gather(self.latent_inters, self.batch_edge_type_idx)
        latent_var = tf.gather(self.latent_varies, self.batch_edge_type_idx)

        product1 = tf.matmul(row_embeds, latent_var)
        product2 = tf.matmul(product1, latent_inter)
        product3 = tf.matmul(product2, latent_var)
        self.predictions = tf.matmul(product3, tf.transpose(col_embeds))

  #Optimizar and loss function changes.

    def _build(self):
        self.cost = self._hinge_loss(self.outputs, self.neg_outputs)
        # self.cost = self._xent_loss(self.outputs, self.neg_outputs)
        #self.optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)
        self.optimizer = tf.train.GradientDescentOptimizer(learning_rate=FLAGS.learning_rate)
        self.opt_op = self.optimizer.minimize(self.cost)
        self.grads_vars = self.optimizer.compute_gradients(self.cost)


    def _hinge_loss(self, aff, neg_aff):
        #Maximum-margin optimization using the hinge loss.
        diff = tf.nn.relu(tf.subtract(neg_aff, tf.expand_dims(aff, 0) - self.margin), name='diff')
        loss = tf.reduce_sum(diff)
        return loss

    def _mse_loss(self, aff, neg_aff):
        #Mean squared error
        diff = tf.nn.relu((tf.subtract(neg_aff, tf.expand_dims(aff, 0))), name='diff')
        loss = tf.reduce_sum(tf.math.square(diff))
        return loss       

    def _xent_loss(self, aff, neg_aff):
        #Cross-entropy optimization.
        true_xent = tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.ones_like(aff), logits=aff)
        negative_xent = tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.zeros_like(neg_aff), logits=neg_aff)
        loss = tf.reduce_sum(true_xent) + self.neg_sample_weights * tf.reduce_sum(negative_xent)
        return loss

#Gather columns of a 2D tensor, used for training in order to get the subset that's responsible for current batch.
def gather_cols(params, indices, name=None):
    with tf.op_scope([params, indices], name, "gather_cols") as scope:
        # Check input
        params = tf.convert_to_tensor(params, name="params")
        indices = tf.convert_to_tensor(indices, name="indices")
        try:
            params.get_shape().assert_has_rank(2)
        except ValueError:
            raise ValueError('\'params\' must be 2D.')
        try:
            indices.get_shape().assert_has_rank(1)
        except ValueError:
            raise ValueError('\'params\' must be 1D.')

        # Define op
        p_shape = tf.shape(params)
        p_flat = tf.reshape(params, [-1])
        i_flat = tf.reshape(tf.reshape(tf.range(0, p_shape[0]) * p_shape[1],
                                       [-1, 1]) + indices, [-1])
        return tf.reshape(
            tf.gather(p_flat, i_flat), [p_shape[0], -1])

# **Training**

In [11]:
# Train on CPU (hide GPU) due to memory constraints.

os.environ['CUDA_VISIBLE_DEVICES'] = ""

np.random.seed(0)

#Creating placeholder for support.

def prepare_data(placeholders, edge_types):
    adj_mats = {et: [
            placeholders['adj_mats_%d,%d,%d' % (et[0], et[1], k)] for k in range(n)]
            for et, n in edge_types.items()}

    return adj_mats
    
#Creating Accuracy scores from Decagon Optimizer.

def get_accuracy_scores(edges_pos, edges_neg, edge_type):
    feed_dict.update({placeholders['dropout']: 0})
    feed_dict.update({placeholders['batch_edge_type_idx']: minibatch.edge_type2idx[edge_type]})
    feed_dict.update({placeholders['batch_row_edge_type']: edge_type[0]})
    feed_dict.update({placeholders['batch_col_edge_type']: edge_type[1]})
    rec = sess.run(opt.predictions, feed_dict=feed_dict)

    def sigmoid(x):
        return 1. / (1 + np.exp(-x))

    #Predict on test set of edges, since the relationships between drugs can be positive and negative, the procedure has to be separated in two different parts. 
    preds = []
    actual = []
    predicted = []
    edge_ind = 0
    for u, v in edges_pos[edge_type[:2]][edge_type[2]]:
        score = sigmoid(rec[u, v])
        preds.append(score)
        assert adj_mats_orig[edge_type[:2]][edge_type[2]][u,v] == 1, 'Problem 1'

        actual.append(edge_ind)
        predicted.append((score, edge_ind))
        edge_ind += 1

    preds_neg = []
    for u, v in edges_neg[edge_type[:2]][edge_type[2]]:
        score = sigmoid(rec[u, v])
        preds_neg.append(score)
        assert adj_mats_orig[edge_type[:2]][edge_type[2]][u,v] == 0, 'Problem 0'

        predicted.append((score, edge_ind))
        edge_ind += 1
    preds_all = np.hstack([preds, preds_neg])
   
    preds_all = np.nan_to_num(preds_all)
    labels_all = np.hstack([np.ones(len(preds)), np.zeros(len(preds_neg))])
    predicted = list(zip(*sorted(predicted, reverse=True, key=itemgetter(0))))[1]
    roc_sc = metrics.roc_auc_score(labels_all, preds_all)
    aupr_sc = metrics.average_precision_score(labels_all, preds_all)
    apk_sc = apk(actual, predicted, k=50)

    return roc_sc, aupr_sc, apk_sc

#Contruction and update of all placeholders.

def construct_placeholders(edge_types):
    placeholders = {
        'support': tf.sparse_placeholder(tf.float32),
        'batch': tf.placeholder(tf.int32, name='batch'),
        'batch_edge_type_idx': tf.placeholder(tf.int32, shape=(), name='batch_edge_type_idx'),
        'batch_row_edge_type': tf.placeholder(tf.int32, shape=(), name='batch_row_edge_type'),
        'batch_col_edge_type': tf.placeholder(tf.int32, shape=(), name='batch_col_edge_type'),
        'degrees': tf.placeholder(tf.int32),
        'dropout': tf.placeholder_with_default(0., shape=()),
    }
    
    placeholders.update({
        'adj_mats_%d,%d,%d' % (i, j, k): tf.sparse_placeholder(tf.float32)
        for i, j in edge_types for k in range(edge_types[i,j])})
    placeholders.update({
        'feat_%d' % i: tf.sparse_placeholder(tf.float32)
        for i, _ in edge_types})
    placeholders.update({'support': prepare_data(placeholders, edge_types)})
    return placeholders

# Creating the dataset from the paper (they refer it to a dummy dataset, but they say it's 
# a realistic representation).

val_test_size = 0.05
n_genes = 500
n_drugs = 400
n_drugdrug_rel_types = 3
gene_net = nx.planted_partition_graph(50, 10, 0.2, 0.05, seed=42)

gene_adj = nx.adjacency_matrix(gene_net)
gene_degrees = np.array(gene_adj.sum(axis=0)).squeeze()

gene_drug_adj = sp.csr_matrix((10 * np.random.randn(n_genes, n_drugs) > 15).astype(int))
drug_gene_adj = gene_drug_adj.transpose(copy=True)

drug_drug_adj_list = []
tmp = np.dot(drug_gene_adj, gene_drug_adj)
for i in range(n_drugdrug_rel_types):
    mat = np.zeros((n_drugs, n_drugs))
    for d1, d2 in combinations(list(range(n_drugs)), 2):
        if tmp[d1, d2] == i + 4:
            mat[d1, d2] = mat[d2, d1] = 1.
    drug_drug_adj_list.append(sp.csr_matrix(mat))
drug_degrees_list = [np.array(drug_adj.sum(axis=0)).squeeze() for drug_adj in drug_drug_adj_list]


#Data representation
adj_mats_orig = {
    (0, 0): [gene_adj, gene_adj.transpose(copy=True)],
    (0, 1): [gene_drug_adj],
    (1, 0): [drug_gene_adj],
    (1, 1): drug_drug_adj_list + [x.transpose(copy=True) for x in drug_drug_adj_list],
}
degrees = {
    0: [gene_degrees, gene_degrees],
    1: drug_degrees_list + drug_degrees_list,
}

#Featureless (genes)
gene_feat = sp.identity(n_genes)
gene_nonzero_feat, gene_num_feat = gene_feat.shape
gene_feat = sparse_to_tuple(gene_feat.tocoo())

#Features (drugs)
drug_feat = sp.identity(n_drugs)
drug_nonzero_feat, drug_num_feat = drug_feat.shape
drug_feat = sparse_to_tuple(drug_feat.tocoo())

#Data representation
num_feat = {
    0: gene_num_feat,
    1: drug_num_feat,
}
nonzero_feat = {
    0: gene_nonzero_feat,
    1: drug_nonzero_feat,
}
feat = {
    0: gene_feat,
    1: drug_feat,
}

edge_type2dim = {k: [adj.shape for adj in adjs] for k, adjs in adj_mats_orig.items()}
edge_type2decoder = {
    (0, 0): 'bilinear',
    (0, 1): 'bilinear',
    (1, 0): 'bilinear',
    (1, 1): 'dedicom',
}

edge_types = {k: len(v) for k, v in adj_mats_orig.items()}
num_edge_types = sum(edge_types.values())
print("Edge types:", "%d" % num_edge_types)

def del_all_flags(FLAGS):
    flags_dict = FLAGS._flags()    
    keys_list = [keys for keys in flags_dict]    
    for keys in keys_list:
        FLAGS.__delattr__(keys)

del_all_flags(tf.flags.FLAGS)

tf.app.flags.DEFINE_string('f', '', 'kernel')
flags.DEFINE_integer('neg_sample_size', 1, 'Negative sample size.')
flags.DEFINE_float('learning_rate', 0.001, 'Initial learning rate.')
flags.DEFINE_integer('epochs', 2, 'Number of epochs to train.')
flags.DEFINE_integer('hidden1', 64, 'Number of units in hidden layer 1.')
flags.DEFINE_integer('hidden2', 32, 'Number of units in hidden layer 2.')
flags.DEFINE_float('weight_decay', 0, 'Weight for L2 loss on embedding matrix.')
flags.DEFINE_float('dropout', 0.1, 'Dropout rate (1 - keep probability).')
flags.DEFINE_float('max_margin', 0.1, 'Max margin parameter in hinge loss')
flags.DEFINE_integer('batch_size', 100, 'minibatch size.')  # Changed from 512
flags.DEFINE_boolean('bias', True, 'Bias term.')

# Important -- Do not evaluate/print validation performance every iteration as it can take substantial amount of time!
PRINT_PROGRESS_EVERY = 150

print("Defining placeholders")
placeholders = construct_placeholders(edge_types)

print("Create minibatch iterator")
minibatch = EdgeMinibatchIterator(
    adj_mats=adj_mats_orig,
    feat=feat,
    edge_types=edge_types,
    batch_size=FLAGS.batch_size,
    val_test_size=val_test_size
)

print("Create model")
model = DecagonModel(
    placeholders=placeholders,
    num_feat=num_feat,
    nonzero_feat=nonzero_feat,
    edge_types=edge_types,
    decoders=edge_type2decoder
)

print("Create optimizer")
with tf.name_scope('optimizer'):
    opt = DecagonOptimizer(
        embeddings=model.embeddings,
        latent_inters=model.latent_inters,
        latent_varies=model.latent_varies,
        degrees=degrees,
        edge_types=edge_types,
        edge_type2dim=edge_type2dim,
        placeholders=placeholders,
        batch_size=FLAGS.batch_size,
        margin=FLAGS.max_margin
    )

print("Initialize session")
sess = tf.Session()
sess.run(tf.global_variables_initializer())
feed_dict = {}

print("Train model")
for epoch in range(FLAGS.epochs):

    minibatch.shuffle()
    itr = 0
    while not minibatch.end():
        # Construct feed dictionary
        feed_dict = minibatch.next_minibatch_feed_dict(placeholders=placeholders)
        feed_dict = minibatch.update_feed_dict(
            feed_dict=feed_dict,
            dropout=FLAGS.dropout,
            placeholders=placeholders)

        t = time.time()

        # Training step: run single weight update
        outs = sess.run([opt.opt_op, opt.cost, opt.batch_edge_type_idx], feed_dict=feed_dict)
        train_cost = outs[1]
        batch_edge_type = outs[2]

        if itr % PRINT_PROGRESS_EVERY == 0:
            val_auc, val_auprc, val_apk = get_accuracy_scores(
                minibatch.val_edges, minibatch.val_edges_false,
                minibatch.idx2edge_type[minibatch.current_edge_type_idx])

            print("Epoch:", "%04d" % (epoch + 1), "Iter:", "%04d" % (itr + 1), "Edge:", "%04d" % batch_edge_type,
                  "train_loss=", "{:.5f}".format(train_cost),
                  "val_roc=", "{:.5f}".format(val_auc), "val_auprc=", "{:.5f}".format(val_auprc),
                  "val_apk=", "{:.5f}".format(val_apk), "time=", "{:.5f}".format(time.time() - t))

        itr += 1

print("Optimization finished!")

for et in range(num_edge_types):
    roc_score, auprc_score, apk_score = get_accuracy_scores(
        minibatch.test_edges, minibatch.test_edges_false, minibatch.idx2edge_type[et])
    print("Edge type=", "[%02d, %02d, %02d]" % minibatch.idx2edge_type[et])
    print("Edge type:", "%04d" % et, "Test AUROC score", "{:.5f}".format(roc_score))
    print("Edge type:", "%04d" % et, "Test AUPRC score", "{:.5f}".format(auprc_score))
    print("Edge type:", "%04d" % et, "Test AP@k score", "{:.5f}".format(apk_score))
    print()

Edge types: 10
Defining placeholders
Create minibatch iterator
Minibatch edge type: (0, 0, 0)
Constructing test edges= 0000/0663
Constructing test edges= 0000/0663
Constructing val edges= 0000/0663
Train edges= 11952
Val edges= 0663
Test edges= 0663
Minibatch edge type: (0, 0, 1)
Constructing test edges= 0000/0663
Constructing val edges= 0000/0663
Train edges= 11952
Val edges= 0663
Test edges= 0663
Minibatch edge type: (0, 1, 0)
Constructing test edges= 0000/0664
Constructing val edges= 0000/0664
Train edges= 11958
Val edges= 0664
Test edges= 0664
Minibatch edge type: (1, 0, 0)
Constructing test edges= 0000/0664
Constructing val edges= 0000/0664
Train edges= 11958
Val edges= 0664
Test edges= 0664
Minibatch edge type: (1, 1, 0)
Constructing test edges= 0000/0868
Constructing val edges= 0000/0868
Train edges= 15642
Val edges= 0868
Test edges= 0868
Minibatch edge type: (1, 1, 1)
Constructing test edges= 0000/0378
Constructing val edges= 0000/0378
Train edges= 6810
Val edges= 0378
Test edg

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Initialize session
Train model
Epoch: 0001 Iter: 0001 Edge: 0000 train_loss= 10.40034 val_roc= 0.49862 val_auprc= 0.49663 val_apk= 0.20263 time= 2.49267
Epoch: 0001 Iter: 0151 Edge: 0003 train_loss= 10.49539 val_roc= 0.47153 val_auprc= 0.47818 val_apk= 0.24230 time= 0.08874
Epoch: 0001 Iter: 0301 Edge: 0000 train_loss= 10.18957 val_roc= 0.49953 val_auprc= 0.48788 val_apk= 0.18642 time= 0.09074
Epoch: 0001 Iter: 0451 Edge: 0003 train_loss= 9.60027 val_roc= 0.47024 val_auprc= 0.47361 val_apk= 0.17834 time= 0.09631
Epoch: 0001 Iter: 0601 Edge: 0000 train_loss= 10.16814 val_roc= 0.49440 val_auprc= 0.48911 val_apk= 0.20636 time= 0.08721
Epoch: 0001 Iter: 0751 Edge: 0003 train_loss= 10.15314 val_roc= 0.46816 val_auprc= 0.47099 val_apk= 0.19757 time= 0.08959
Epoch: 0001 Iter: 0901 Edge: 0000 train_loss= 10.21339 val_roc= 0.49399 val_auprc= 0.48645 val_apk= 0.18608 time= 0.09092
Epoch: 0001 Iter: 1051 Edge: 0003 train_loss= 9.73838 val_roc= 0.46743 val_auprc= 0.46847 val_apk= 0.14683 time= 0.0