#node2vec
1. node2vec: 
*   https://github.com/aditya-grover/node2vec
*   https://github.com/eliorc/node2vec

2. node2vec uses gensim word2vec: https://radimrehurek.com/gensim/models/word2vec.html 


In [None]:
pip install node2vec

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting node2vec
  Downloading node2vec-0.4.3.tar.gz (4.6 kB)
Building wheels for collected packages: node2vec
  Building wheel for node2vec (setup.py) ... [?25l[?25hdone
  Created wheel for node2vec: filename=node2vec-0.4.3-py3-none-any.whl size=5980 sha256=048ab72029de252605a2afad267df90b47f556dda8cf02dfb6d3f80d91324f95
  Stored in directory: /root/.cache/pip/wheels/07/62/78/5202cb8c03cbf1593b48a8a442fca8ceec2a8c80e22318bae9
Successfully built node2vec
Installing collected packages: node2vec
Successfully installed node2vec-0.4.3


#Package Section

In [None]:
import sys
import numpy as np
import copy
from numpy import linalg as LA
from tensorflow import keras
from tensorflow.keras.utils import to_categorical
from sklearn.metrics.cluster import adjusted_rand_score
from sklearn.cluster import KMeans
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn import metrics
import time
# node2vec
# from node2vec import Node2Vec
import networkx as nx
# for sparse matrix
from scipy import sparse
#early stop
from keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import ModelCheckpoint

#Classes and functions

In [None]:
# Supress/hide the warning
# invalide devide resutls will be nan
np.seterr(divide='ignore', invalid='ignore')

############------------Auto_select_method_start-----------------###############
def Run(case, learn_opt, **kwargs):
  """
    input X can be a list of one of these format below:
    1. python list of n*n adjacency matrices.
    2. python list of s*2 edge lists. 
    3. python list of s*3 edge lists. 
    input Y can be these choices below:
    1. no Y input. The default will be [2,3,4,5] -- K range for clusters.
    2. n*1 class -- label vector. Positive labels are knwon labels and -1 indicate unknown labels.
    3. A range of potential number of clusters -- K (K clusters in total), i.e., [3, 4, 5].
    
    if input X is n*n adjacency =>  s*3 edg list
    if input X is s*2 => s*3 edg list
    
    Vertex size should be >10.

    Clustering / Classification
    The program automaticlly decide to run clustering or classification.
    1. If Y is a given cluster range, do clustering (case 1,3 for Y).
    2. If Y is a label vector (case 2 for Y), do classification.
    For classification: semi-supervised learning, supervised learning methods. 
                        see the "Learner" defined below. 

    
    Supervised learning "Learner":
      **Note the input trining set (X) need has fully known labels in Y.
      Learner = 1 run LDA, test on test set
      Learner = 0 run NN, test on test set

    Semi-supervised learning "Learner": 
       **Note the input trining set (X) need some unknown label(s) in Y.     
      Learner=0 means embedding via known label, do not learn the unknown labels. 
           Since only some nodes in the training set has known label, 
           the test set is the unknwon labeled set, which is compared with 
           the original labels of the unknown set
      Learner=1 means embedding via partial label, then learn unknown label via LDA.
        this runs semi-supervised learning with NN, 
        the test will be on the result labels with the original labels

      Learner=2 means embedding via partial label, then learn unknown label via two-layer NN.
        this runs semi-supervised learning with NN, 
        the test will be on the result labels with the original labels
    

  """
  defaultKwargs = {'Y':[2,3,4,5], 'DiagA': True,'Correlation': True,'Laplacian': False,
                  'Learner': 1, 'LearnerIter': 0, 'MaxIter': 50, 'MaxIterK': 5,
                  'Replicates': 3, 'Attributes': False, 'neuron': 20, 'activation': 'relu',
                   'emb_opt': 'AEE', 'sparse_opt': 'None', 'Batch_input': False} 
  kwargs = { **defaultKwargs, **kwargs }
  train_time = "no seperate training time for semisuperviised learning yet"
  total_begin = time.time()
  eval = Evaluation()
  kwargs_for_DataPreprocess =  {k: kwargs[k] for k in ['DiagA', 'Laplacian', 'Correlation', 'Attributes', 'emb_opt']}
  Dataset = DataPreprocess(case, **kwargs_for_DataPreprocess)
  
  Y = case.Y
  n = case.n

  # auto check block
  # if the option is not clustering, but the Y does not contain labels (known/unknwon) for n nodes. 
  if (learn_opt != "c") and (len(Y) != n):
    learn_opt = "c" # do clustering
    print("The given Y do not have the same size as the node.Y is assumed as cluster number range.",
    "Clustering will be performed.",
    "If you want to do classification, stop the current run, reimport the Y with the right format then run again.",
    sep = "\n")
 
  # clustering
  if learn_opt == 'c':  
    cluster = Clustering(Dataset)
    Z, Y, W, meanSS = cluster.cluster_main()
    ari = eval.clustering_test(Y, case.Y_ori)
    print("ARI: ", ari)

  # supervised learning
  if learn_opt == "su":
    Dataset = Dataset.supervise_preprocess()
    kwargs_for_learner = {k: kwargs[k] for k in ['Learner', 'LearnerIter', 'Batch_input']}
    train_strat = time.time()
    if kwargs['Learner'] == 1:
      lda = LDA(Dataset, **kwargs_for_learner)
      lda_res = lda.LDA_Learner(lda.DataSets)
      acc = eval.LDA_supervise_test(lda_res, Dataset.z_test, Dataset.Y_test)
    if kwargs['Learner'] == 0:
      gnn = GNN(Dataset, **kwargs_for_learner)
      gnn_res = gnn.GNN_complete()
      acc = eval.GNN_supervise_test(gnn_res, Dataset.z_test, Dataset.Y_test)
    train_end = time.time()
    train_time = train_end - train_strat 
    print("acc: ", acc)
  
  # semisupervised learning
  if learn_opt == "se":
    Dataset = Dataset.semi_supervise_preprocess()
    kwargs_for_learner = {k: kwargs[k] for k in ['Learner', 'LearnerIter', 'Batch_input']}
    if kwargs['Learner'] == 2:
      gnn = GNN(Dataset, **kwargs_for_learner)
      gnn_res = gnn.GNN_complete()
      acc = eval.GNN_semi_supervised_learn_test(gnn_res.Y, case.Y_ori)      
    if kwargs['Learner'] == 1:
      lda = LDA(Dataset, **kwargs_for_learner)
      lda_res = lda.LDA_Iter()
      acc = eval.GNN_semi_supervised_learn_test(lda_res.Y, case.Y_ori)
    if kwargs['Learner'] == 0:
      gnn = GNN(Dataset, **kwargs_for_learner)
      gnn_res = gnn.GNN_complete()      
      acc = eval.GNN_semi_supervised_not_learn_test(gnn_res, Dataset, case)
    print("acc: ", acc)
  
  total_end = time.time()
  emb_time = Dataset.embed_time
  total_time = total_end - total_begin
  print("--- embed %s seconds ---" % emb_time)
  print("--- train %s seconds ---" % train_time)
  print("--- total %s seconds ---" % total_time)

  Z_ori = Dataset.Z
  W_ori = Dataset.W

  sparse_opt = kwargs['sparse_opt']
  Z = To_single_sparse_matrix(Dataset.Z, sparse_opt)
  W = To_multi_sparse_matrix(Dataset.W, sparse_opt)

  return acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori
 
############------------node2vec_embed_start--------------------################
def node2vec_embed(X):
  G = nx.from_numpy_matrix(X)
  # use default setting from https://github.com/eliorc/node2vec
  node2vec = Node2Vec(G, dimensions=64, walk_length=30, num_walks=200, workers=4)
  # Embed nodes, use default setting from https://github.com/eliorc/node2vec
  model = node2vec.fit(window=2, min_count=1, batch_words=4)
  # get embedding matrix
  Z = model.wv.vectors
  return Z

############------------node2vec_embed_end----------------------################
############------------graph_encoder_embed_start----------------###############
def graph_encoder_embed(X,Y,n,**kwargs):
  """
    input X is s*3 edg list: nodei, nodej, connection weight(i,j)
    graph embedding function
  """
  defaultKwargs = {'Correlation': True}
  kwargs = { **defaultKwargs, **kwargs}

  #If Y has more than one dimention , Y is the range of cluster size for a vertex. e.g. [2,10], [2,5,6]
  # check if Y is the possibility version. e.g.Y: n*k each row list the possibility for each class[0.9, 0.1, 0, ......]
  possibility_detected = False
  if Y.shape[1] > 1:
    k = Y.shape[1]
    possibility_detected = True
  else:
    # assign k to the max along the first column
    # Note for python, label Y starts from 0. Python index starts from 0. thus size k should be max + 1
    k = Y[:,0].max() + 1

  #nk: 1*n array, contains the number of observations in each class
  #W: encoder marix. W[i,k] = {1/nk if Yi==k, otherwise 0}
  nk = np.zeros((1,k))
  W = np.zeros((n,k))

  if possibility_detected:
    # sum Y (each row of Y is a vector of posibility for each class), then do element divid nk.
    nk=np.sum(Y, axis=0)
    W=Y/nk
  else:
    for i in range(k):
      nk[0,i] = np.count_nonzero(Y[:,0]==i)

    for i in range(Y.shape[0]):
      k_i = Y[i,0]
      if k_i >=0:
        W[i,k_i] = 1/nk[0,k_i]
  

  # Edge List Version in O(s)
  Z = np.zeros((n,k))
  i = 0
  for row in X:
    [v_i, v_j, edg_i_j] = row
    v_i = int(v_i)
    v_j = int(v_j)
    if possibility_detected:
      for label_j in range(k):
        Z[v_i, label_j] = Z[v_i, label_j] + W[v_j, label_j]*edg_i_j
        if v_i != v_j:
          Z[v_j, label_j] = Z[v_j, label_j] + W[v_i, label_j]*edg_i_j
    else:
      label_i = Y[v_i][0] 
      label_j = Y[v_j][0]

      if label_j >= 0:
        Z[v_i, label_j] = Z[v_i, label_j] + W[v_j, label_j]*edg_i_j
      if (label_i >= 0) and (v_i != v_j):
        Z[v_j, label_i] = Z[v_j, label_i] + W[v_i, label_i]*edg_i_j
  
  # Calculate each row's 2-norm (Euclidean distance). 
  # e.g.row_x: [ele_i,ele_j,ele_k]. norm2 = sqr(sum(ele_i^2+ele_i^2+ele_i^2))
  # then divide each element by their row norm
  # e.g. [ele_i/norm2,ele_j/norm2,ele_k/norm2]
  if kwargs['Correlation']:
    row_norm = LA.norm(Z, axis = 1)
    reshape_row_norm = np.reshape(row_norm, (n,1))
    Z = np.nan_to_num(Z/reshape_row_norm)
  
  return Z, W


def multi_graph_encoder_embed(DataSets, Y, **kwargs):
  """
    input X contains a list of s3 edge list
    get Z and W by using graph emcode embedding
    Z is the concatenated embedding matrix from multiple graphs
    if there are attirbutes provided, add attributes to Z
    W is a list of weight matrix Wi
  """
  kwargs_single = {**kwargs}

  X = DataSets.X
  n = DataSets.n
  U = DataSets.U
  Graph_count = DataSets.Graph_count
  attributes = DataSets.attributes
  kwargs = DataSets.kwargs

  W = []

  for i in range(Graph_count):
    if i == 0:
      [Z, Wi] = graph_encoder_embed(X[i],Y,n,**kwargs_single)
    else:
      [Z_new, Wi] = graph_encoder_embed(X[i],Y,n,**kwargs)
      Z = np.concatenate((Z, Z_new), axis=1)
    W.append(Wi)

  # if there is attributes matrix U provided, add U
  if attributes:
    # add U to Z side by side
    Z = np.concatenate((Z, U), axis=1)

  return Z, W

############------------graph_encoder_embed_end------------------###############

############------------DataPreprocess_start---------------------###############
class DataPreprocess:
  def __init__(self, Dataset_input, **kwargs):
    self.kwargs = self.kwargs_construct(**kwargs)
    # Note, since every element in multi-graph list X has the same size and 
    # node index, there will be only one column in Y for the node labels
    self.Y = Dataset_input.Y  
    self.n = Dataset_input.n
    (self.X, self.Graph_count, self.embed_time) = self.input_prep(Dataset_input.X)    
    (self.attributes, self.U) = self.check_attributes()
    self.Dataset_input = Dataset_input


  def kwargs_construct(self, **kwargs):
    defaultKwargs = {'DiagA': True,'Laplacian': False,  #input_prep
                     'Correlation': True,      # graph_encoder_embed
                     'Attributes': False,      # GNN_preprocess
                     }
    kwargs = { **defaultKwargs, **kwargs}  # update the args using input_args
    return kwargs      


  def check_attributes(self):
    """
      return attributes detected flag and attributes U
    """
    kwargs = self.kwargs
    
    Attributes_detected = False
    U = None

    if kwargs["Attributes"]:
      U = kwargs["Attributes"]
      if U.shape[0] == n:
        Attributes_detected = True
      else:
        print("Attributes need to have the same size as the nodes.\
        If n nodes, need n rows")    
    return Attributes_detected, U

  
  def test_edg_list_to_adj(self, n_test, n, edg_list):
    adj = np.zeros((n_test,n))

    for row in edg_list:
      [node_i, node_j, edge_i_j] = row
      adj[node_i, node_j] = edge_i_j
    
    return adj


  def input_prep(self, X):
    kwargs = self.kwargs
    # if X is a single numpy object, put this numpy object in a list
    if type(X) == np.ndarray:
      X = [X]
    
    ## Now X is a list of numpy objects
    # each element can be a numpy object for adjacency matrix or edge list
    Graph_count = len(X)  

    # AEE needs X to be a list of edge list
    if kwargs["emb_opt"] == "AEE":
      X, embed_time = self.input_prep_AEE(X, Graph_count)
    # Node2Vec only needs a list of adjacency matrix 
    if kwargs["emb_opt"] == "Node2Vec":
      embed_time = 0
      pass

    return X, Graph_count, embed_time

  
  def input_prep_AEE(self, X, Graph_count):
    """
      X may be a single numpy object or a list of numpy objects
      The multi-graph input X is assumed has the same node numbers 
      for each element in X, and the node are indexed the same way 
      amonge the elements. e.g. node_0 in X[1] is the same node_0 in X[2]. 
      return X as a list of s*3 edge lists
      return n, which is the total number of nodes
    """

    # need total labeled number n 
    # if try to get from the edg list, it may miss the node that has no connection with others but has label 
    n = self.n

    embed_time = 0 
    for i in range(Graph_count):
      X_tmp = X[i]
      X_tmp = self.to_s3_list(X_tmp) 
     
      # count the time for laplacian and diagnal only
      embed_begin = time.time() 
      X_tmp = self.single_X_prep(X_tmp, n)
      embed_end = time.time()
      embed_time += (embed_end - embed_begin)
      
      X[i] = X_tmp
    
    return X, embed_time


  def to_s3_list(self,X):
    """
      the input X is a signle graph, can be adjacency matrix or edgelist
      this function will return a s3 edge list
    """
    (s,t) = X.shape

    if s == t:
      # convert adjacency matrix to edgelist
      X = self.adj_to_edg(X);
    else:
      # for either s*2 or s*3 case, calculate n -- vertex number
      if t == 2: 
        # enlarge the edgelist to s*3 by adding 1 to the thrid position as adj(i,j)
        X = np.insert(X, 1, np.ones(s,1))

    return X


  def single_X_prep(self, X, n):
    """
      input X is a single S3 edge list
      this adds Diagnal augement and Laplacian normalization to the edge list
    """
    kwargs = self.kwargs

    X = X.astype(np.float32)

    # Diagnal augment
    if kwargs['DiagA']:
      # add self-loop to edg list -- add 1 connection for each (i,i)
      self_loops = np.column_stack((np.arange(n), np.arange(n), np.ones(n)))
      # faster than vstack --  adding the second to the bottom
      X = np.concatenate((X,self_loops), axis = 0)
    
    # Laplacian 
    s = X.shape[0] # get the row number of the edg list
    if kwargs["Laplacian"]:
      D = np.zeros((n,1))
      for row in X:
        [v_i, v_j, edg_i_j] = row
        v_i = int(v_i)
        v_j = int(v_j)
        D[v_i] = D[v_i] + edg_i_j
        if v_i != v_j:
          D[v_j] = D[v_j] + edg_i_j

      D = np.power(D, -0.5)
      
      for i in range(s):
        X[i,2] = X[i,2] * D[int(X[i,0])] * D[int(X[i,1])]
    
    return X

 
  def adj_to_edg(self,A):
    """
      input is the symmetric adjacency matrix: A
      other variables in this function:
      s: number of edges
      return edg_list -- matrix format with shape(edg_sum,3):
      example row in edg_list(matrix): [vertex1, vertex2, connection weight from Adj matrix]
    """
    # check the len of the second dimenson of A
    if A.shape[1] <= 3:
      edg = A
    else:
      n = A.shape[0]
      # construct the initial edgg_list matrix with the size of (edg_sum, 3)
      edg_list = []
      for i in range(n):
        for j in range(i, n):
          if A[i,j] > 0:
            row = [i, j, A[i,j]]
            edg_list.append(row)
      edg = np.array(edg_list)
    return edg

 
  def semi_supervise_preprocess(self):
    """
      get Z, W using multi_graph_encoder_embed()
      get training sets and testing sets for Z and Y by using split_data()
      
    """
    DataSets =  copy.deepcopy(self)
    Y = DataSets.Y
    kwargs = DataSets.kwargs
    Encoder_kwargs = {k: kwargs[k] for k in ['Correlation']}
    # semisupervise do embedding during the learning process
    # this timer is only for the first embedding for the normalized input X
    embed_time_main_begin = time.time()
    if kwargs["emb_opt"] == "AEE":
      (DataSets.Z, DataSets.W) = multi_graph_encoder_embed(DataSets, Y, **Encoder_kwargs)
    if kwargs["emb_opt"] == "Node2Vec":
      DataSets.Z = node2vec_embed(DataSets.X)  
    
    embed_time_main_end = time.time()
    embed_time_main = embed_time_main_end - embed_time_main_begin
    
    DataSets.k = DataSets.get_k() 
    DataSets = DataSets.split_data()
    DataSets.embed_time = DataSets.embed_time + embed_time_main
  
    return DataSets


  def get_k(self):
    Y = self.Y
    n = self.n
    # get class number k or the largest cluster size 
    # max of all flattened element + 1  
    if len(Y) == n:
      k = np.amax(Y) + 1   
    return k


  def split_data(self):
    split_Sets =  copy.deepcopy(self)
   
    Y = split_Sets.Y
    Z = split_Sets.Z
    
    ind_train = np.argwhere (Y >= 0)[:,0]
    ind_unlabel = np.argwhere (Y < 0)[:,0]   

    Y_train = Y[ind_train, 0]
    z_train = Z[ind_train]

    Y_unlabel = None
    z_unlabel = None

    if len(ind_unlabel) > 0:
      Y_unlabel = Y[ind_unlabel, 0]
      z_unlabel = Z[ind_unlabel] 

    # Convert targets into one-hot encoded format      
    Y_train_one_hot = to_categorical(Y_train) 

    split_Sets.ind_unlabel = ind_unlabel
    split_Sets.ind_train = ind_train    
    split_Sets.Y_train = Y_train
    split_Sets.Y_unlabel = Y_unlabel
    split_Sets.z_train = z_train
    split_Sets.z_unlabel = z_unlabel
    split_Sets.Y_train_one_hot = Y_train_one_hot

    return split_Sets


  def DataSets_reset(self, option):
    """
      based on the information of the given new Y:
      1. reassign Z and W to the given DataSets, 
      2. update z_train, z_unlabel 
      Input Option:
      1. if the option is "y_temp", do graph encoder using y_temp
    """
    NewSets =  copy.deepcopy(self)
    kwargs = NewSets.kwargs
    ind_unlabel = NewSets.ind_unlabel
    ind_train = NewSets.ind_train   
    y_temp =  NewSets.y_temp
    Encoder_kwargs = {k: kwargs[k] for k in ['Correlation']}

    # different versions
    if option == "y_temp":
      [Z,W] = multi_graph_encoder_embed(NewSets, y_temp, **Encoder_kwargs)
    if option == "y_temp_one_hot":
      y_temp_one_hot = NewSets.y_temp_one_hot
      [Z,W] = multi_graph_encoder_embed(NewSets, y_temp_one_hot, **Encoder_kwargs)  
    if NewSets.attributes:
      # add U to Z side by side
      Z = np.concatenate((Z, NewSets.U), axis=1)  
    
    NewSets.Z = Z
    NewSets.W = W
    NewSets.z_train = Z[ind_train]
    NewSets.z_unlabel = Z[ind_unlabel]
    
    return NewSets

  
  def supervise_preprocess(self):
    """
      adding test sets for supervised learning    
      this function assumes only one test set
      if there is a list of test set, needs to modify this function
    """

    DataSets = self.semi_supervise_preprocess()
    Dataset_input = DataSets.Dataset_input

    DataSets.z_test = DataSets.Z[Dataset_input.test_idx]
    DataSets.Y_test = Dataset_input.Y_test.ravel()
    DataSets.z_unlabel = None
    DataSets.Y_unlabel = None

    return DataSets
############------------DataPreprocess_end-----------------------###############

############-----------------GNN_start---------------------------###############
def batch_generator(X, y, k, batch_size, shuffle):
    number_of_batches = int(X.shape[0]/batch_size)
    counter = 0
    sample_index = np.arange(X.shape[0])
    if shuffle:
        np.random.shuffle(sample_index)
    while True:
        batch_index = sample_index[batch_size*counter:batch_size*(counter+1)]
        X_batch = X[batch_index,:]
        y_batch = y[batch_index,:]
        counter += 1
        yield X_batch, y_batch
        if (counter == number_of_batches):
            if shuffle:
                np.random.shuffle(sample_index)
            counter = 0

class Hyperperameters:
  """
    define perameters for GNN.
    default values are for GNN learning -- "Leaner" ==2:
      embedding via partial label, then learn unknown label via two-layer NN

  """
  def __init__(self):
    # there is no scaled conjugate gradiant in keras optimiser, use defualt instead
    # use whatever default
    self.learning_rate = 0.01  # Initial learning rate.
    self.epochs = 100 #Number of epochs to train.
    self.hidden = 20 #Number of units in hidden layer 
    self.val_split = 0.1 #Split 10% of training data for validation
    self.loss = 'categorical_crossentropy' # loss function


class GNN:
  def __init__(self, DataSets, **kwargs):
    GNN.kwargs = self.kwargs_construct(**kwargs)
    GNN.DataSets = DataSets
    GNN.hyperM = Hyperperameters()
    GNN.model = self.GNN_model()  #model summary: GNN.model.summary()
    GNN.meanSS = 0  # initialize the self-defined critirion meanSS
    
  def kwargs_construct(self, **kwargs):
    defaultKwargs = {'Learner': 1,                    # GNN_Leaner
                     'LearnerIter': 0,                # GNN_complete, GNN_Iter
                     "Replicates": 3,                 # GNN_Iter  
                     "Batch_input": False              # if run in batches       
                     }
    kwargs = { **defaultKwargs, **kwargs}  # update the args using input_args
    return kwargs      
 
  
  def GNN_model(self):
    """
      build GNN model
    """
    hyperM = self.hyperM
    DataSets = self.DataSets

    z_train = DataSets.z_train
    k = DataSets.k

    feature_num = z_train.shape[1]
    
    model = keras.Sequential([
    keras.layers.Flatten(input_shape = (feature_num,)),  # input layer 
    keras.layers.Dense(hyperM.hidden, activation='relu'),  # hidden layer -- no tansig activation function in Keras, use relu instead
    keras.layers.Dense(k, activation='softmax') # output layer, matlab used softmax for patternnet default ??? max(opts.neuron,K)? opts 
    ])

    optimizer = keras.optimizers.Adam(learning_rate = hyperM.learning_rate)

    model.compile(optimizer='adam',
                  loss=hyperM.loss,
                  metrics=['accuracy'])

    return model
  

  def GNN_run(self, k, z_train, y_train_one_hot, z_unlabel):
    """
      Train and test directly.
      Do not learn from the unknown labels.
    """
    gnn = copy.deepcopy(self)
    hyperM = gnn.hyperM
    model = gnn.model    
    batch_flag = self.kwargs["Batch_input"]
    
    if batch_flag:
      early_stopping_callback = EarlyStopping(monitor='loss', patience=5, verbose=0)
      checkpoint_callback = ModelCheckpoint('GNN.h5', monitor='loss', save_best_only=True, mode='min', verbose=0)
      
      history = model.fit(batch_generator(z_train, y_train_one_hot, k, 32, True),
                      epochs=hyperM.epochs,
                      steps_per_epoch=z_train.shape[0],
                      callbacks=[early_stopping_callback, checkpoint_callback],
                      verbose=0)
    else:
      # validation_split=hyperM.val_split
      history = model.fit(z_train, y_train_one_hot, 
            validation_split=hyperM.val_split,
            epochs=hyperM.epochs, 
            shuffle=True,
            verbose=0)
    
    train_acc = history.history['accuracy'][-1]
    
    predict_probs = None
    pred_class = None
    pred_class_prob = None
    if type(z_unlabel) == np.ndarray:
      # predict_probas include probabilities for all classes for each node
      predict_probs = model.predict(z_unlabel)
      # assign the classes with the highest probability
      pred_class = np.argmax(predict_probs, axis=1)
      # the corresponding probabilities of the predicted classes
      pred_class_prob = predict_probs[range(len(predict_probs)),pred_class]

    gnn.model = model
    gnn.train_acc = train_acc
    gnn.pred_probs = predict_probs  
    gnn.pred_class = pred_class
    gnn.pred_class_prob = pred_class_prob


    return gnn

  def GNN_Direct(self, DataSets, y_train_one_hot):
    """
      This function can run:
      1. by itself, when interation is set to False (<1)
      2. inside GNN_Iter, when interation is set to True (>=1)

      Learner == 0: GNN, but not learn from the known label
      Learner == 2: GNN, and learn unknown labels 
    """
    Learner = self.kwargs["Learner"]  

    k = DataSets.k 
    z_train = DataSets.z_train 
    Y = DataSets.Y
    z_unlabel = DataSets.z_unlabel
    ind_unlabel = DataSets.ind_unlabel

    gnn = self.GNN_run(k, z_train, y_train_one_hot, z_unlabel)

    if Learner == 0:
      # do not learn unknown label.
      pass
    
    if Learner == 2:
      # learn unknown label based on the known label
      # replace the unknown label in Y with predicted labels
      pred_class = gnn.pred_class
      Y[ind_unlabel, 0] = pred_class

    gnn.Y = Y

    return gnn


  def GNN_Iter(self, DataSets):
    """
      Run this function when interation is set, which is >=1.
      
      1. randomly assign labels to the unknown labels, get Y_temp
      2. get Y_one_hot for the Y_temp 
      3. get Z from graph_encod function with X and Y_temp 
      within each loop: 
        use meanSS as its criterion to decide if the update is needed        
	      update Y_one_hot for the unknown labels with predict probabilities of each classes
	      update Y with the highest possible predicted labels
	      update z_train and z_unlabel from graph encoder embedding using the updated Y
	      train the model with the updated z_train and Y_one_hot     
    """

    kwargs = self.kwargs
    meanSS = self.meanSS

    k = DataSets.k 
    Y = DataSets.Y
    ind_unlabel = DataSets.ind_unlabel


    y_temp = np.copy(Y)
    DataSets.y_temp = y_temp


    for i in range(kwargs["Replicates"]):
      # assign random integers in [1,K] to unassigned labels
      r = [i for i in range(k)]
      
      ran_int = np.random.choice(r, size=(len(ind_unlabel),1))

      y_temp[ind_unlabel] = ran_int

      for j in range(kwargs["LearnerIter"]):
        if j ==0:
          # first iteration need to split the y_temp for training etc.
          # use reset to add z_train, z_unlabel, y_temp_one_hot, to the dataset
          DataSets = DataSets.DataSets_reset("y_temp")  
          # Convert targets into one-hot encoded format      
          y_temp_one_hot = to_categorical(y_temp) 
          # initialize y_temp_one_hot in the first loop
          DataSets.y_temp_one_hot = y_temp_one_hot     
        if j > 0:
          # update z_train, z_unlabel, and y_temp_train_one_hot to the dataset
          DataSets = DataSets.DataSets_reset("y_temp")
        # all the gnn train on y_train_one_hot
        gnn = self.GNN_Direct(DataSets, DataSets.Y_train_one_hot)
        predict_probs = gnn.pred_probs
        pred_class = gnn.pred_class
        pred_class_prob = gnn.pred_class_prob

        # z_unknown is initialized with none, so the pred_class may be none
        # This will not happen for the semi version,
        # since the unknown size should not be none for the semi version
        if type(pred_class) == np.ndarray:
          # if there are unkown labels and predicted labels are available
          # check if predicted_class are the same as the random integers
          # if so, stop the iteration in "LearnerIter" loop
          # shape (n,) is required for adjusted_rand_score()
          if adjusted_rand_score(ran_int.reshape((-1,)), pred_class) == 1:
            break
          # assign the probabilites for each class to the temp y_one_hot
          DataSets.y_temp_one_hot[ind_unlabel] = predict_probs
          # assgin the predicted classes to the temp Y unknown labels 
          DataSets.y_temp[ind_unlabel, 0] = pred_class 
          # # assign the highest possibility of the class to Y_temp
          # Y_temp[ind_unlabel, 0] = pred_class_prob
      minP = np.mean(pred_class_prob) - 3*np.std(pred_class_prob)
      if minP > meanSS:
        meanSS = minP
        Y = DataSets.y_temp   

      gnn.Y = Y
      gnn.meanSS = meanSS
      return gnn  
  
        
  def GNN_complete(self):
    """
      if LearnerIter set to False(<1):
        run GNN_Direct() with no iteration
      if LearnerIter set to True(>=1):
        run GNN_Iter(), which starts with radomly assigned k to unknown labels
      
    """
    kwargs = self.kwargs

    DataSets = self.DataSets
    y_train = DataSets.Y_train


    if kwargs["LearnerIter"] < 1:
      # Convert targets into one-hot encoded format
      y_train_one_hot = to_categorical(y_train)
      gnn = self.GNN_Direct(DataSets, y_train_one_hot)
    else:
      gnn = self.GNN_Iter(DataSets)
    
    return gnn
############-----------------GNN_end-----------------------------###############

############-----------------LDA_start---------------------------###############
class LDA:
  def __init__(self, DataSets, **kwargs):
    LDA.kwargs = self.kwargs_construct(**kwargs)
    LDA.DataSets = DataSets
    LDA.model = LinearDiscriminantAnalysis()  # asssume spseudolinear is its default setting
    LDA.meanSS = 0  # initialize the self-defined critirion meanSS
    
  def kwargs_construct(self, **kwargs):
    defaultKwargs = {'Learner': 1,                         # LDA_Leaner
                     'LearnerIter': 0, "Replicates": 3     # LDA_Iter                           
                     }
    kwargs = { **defaultKwargs, **kwargs}  # update the args using input_args
    return kwargs   

  def LDA_Learner(self, DataSets):
    """
      run this function when Learner set to 1.
      embedding via partial label, then learn unknown label via LDA.
    """  
    lda = copy.deepcopy(self)
    z_train = DataSets.z_train
    y_train = DataSets.Y_train
    ind_unlabel = DataSets.ind_unlabel
    z_unlabel = DataSets.z_unlabel
    Y = DataSets.Y

    model = self.model
    model.fit(z_train,y_train)
    # train_acc = model.score(z_train,y_train)

    # for semi-supervised learning 
    if type(z_unlabel) == np.ndarray:
      # predict_probas include probabilities for all classes for each node
      pred_probs = model.predict_proba(z_unlabel)
      # assign the classes with the highest probability
      pred_class = model.predict(z_unlabel)
      # the corresponding probabilities of the predicted classes
      pred_class_prob = pred_probs[range(len(pred_probs)),pred_class] 
      # assign the predicted class to Y
      Y[ind_unlabel, 0] = pred_class
      lda.Y = Y
      lda.pred_class = pred_class
      lda.pred_class_prob = pred_class_prob

    lda.model = model
    return lda
    
  def LDA_Iter(self):
    """
      run this function when Learner set to 1, and LeanerIter is True(>=1)
      ramdonly assign labels to the unknownlabel.
      embedding via partial label, then learn unknown label via LDA. 
    """

    kwargs = self.kwargs
    meanSS = self.meanSS
    DataSets = self.DataSets

    k = DataSets.k 
    Y = DataSets.Y
    ind_unlabel = DataSets.ind_unlabel

    y_temp = np.copy(Y)

    for i in range(kwargs["Replicates"]):
      # assign random integers in [1,K] to unassigned labels
      r = [i for i in range(k)]
      
      ran_int = np.random.choice(r, size=(len(ind_unlabel),1))

      y_temp[ind_unlabel] = ran_int
      
      DataSets.y_temp = y_temp
 
      for j in range(kwargs["LearnerIter"]):
        # use reset to add z_train, z_unlabel, to the dataset
        DataSets = DataSets.DataSets_reset("y_temp")         
        # all train on y_train
        lda = self.LDA_Learner(DataSets)
        pred_class = lda.pred_class
        pred_class_prob = lda.pred_class_prob

        # z_unknown is initialized with none, so the pred_class may be none
        # This will not happen for the semi version,
        # since the unknown size should not be none for the semi version
        if type(pred_class) == np.ndarray:
          # if there are unkown labels and predicted labels are available
          # check if predicted_class are the same as the random integers
          # if so, stop the iteration in "LearnerIter" loop
          # shape (n,) is required for adjusted_rand_score()
          if adjusted_rand_score(ran_int.reshape((-1,)), pred_class) == 1:
            break
          # assgin the predicted classes to the temp Y unknown labels 
          DataSets.y_temp[ind_unlabel, 0] = pred_class 
          # # assign the highest possibility of the class to Y_temp
          # Y_temp[ind_unlabel, 0] = pred_class_prob
      minP = np.mean(pred_class_prob) - 3*np.std(pred_class_prob)
      if minP > meanSS:
        meanSS = minP
        Y = DataSets.y_temp   

      lda.Y = Y
      lda.meanSS = meanSS
      return lda  
  

############-----------------LDA_end-----------------------------###############

############------------Clustering_start-------------------------###############
class Clustering:
  """
    The input DataSets.X is the s*3 edg list
    The innput DataSets.Y can be:
    1. A given cluster size, e.g. [3], meaning in total 3 clusters
    2. A range of cluster sizes. e.g. [3-5], meaning there are possibly 3 to 5 clusters 

  """
  def __init__(self, DataSets, **kwargs):
    self.kwargs = self.kwargs_construct(**kwargs)
    self.DataSets = DataSets
    self.cluster_size_range = self.cluster_size_check() 
    self.K = DataSets.Y[0]
  

  def kwargs_construct(self, **kwargs):
    defaultKwargs = {'Correlation': True,'MaxIter': 50, 'MaxIterK': 5,'Replicates': 3}
    kwargs = { **defaultKwargs, **kwargs}
    return kwargs

  def cluster_size_check(self):
    DataSets = self.DataSets
    Y = DataSets.Y

    cluster_size_range = None # in case that Y is an empty array.

    if len(Y) == 1:
      cluster_size_range = False  # meaning the cluster size is known. e.g. [3]
    if len(Y) > 1:
      cluster_size_range = True   # meaning only know the range of cluster size. e.g. [2, 3, 4, 5] 
    
    return cluster_size_range
    
  def graph_encoder_cluster(self, K):
    """
      clustering function
    """
    DataSets = self.DataSets
    X = DataSets.X
    n = DataSets.n
    kwargs = self.kwargs
    Encoder_kwargs = {k: kwargs[k] for k in ['Correlation']}


    minSS=-1
    Z = None
    W = None

    for i in range(kwargs['Replicates']):
      Y_temp = np.random.randint(K,size=(n,1))
      for r in range(kwargs['MaxIter']):
        [Zt,Wt] = multi_graph_encoder_embed(DataSets, Y_temp, **Encoder_kwargs)  
        
        if DataSets.attributes:
          # add U to Z side by side
          Zt = np.concatenate((Zt, DataSets.U), axis=1)
        kmeans = KMeans(n_clusters=K, max_iter = kwargs['MaxIter']).fit(Zt)
        labels = kmeans.labels_ # shape(n,)
        # sum_in_cluster = kmeans.inertia_ # sum of distance within cluster (k,1)
        dis_to_centors = kmeans.transform(Zt)
        # adjusted_rand_score() needs the shape (n,)
        if adjusted_rand_score(Y_temp.reshape(-1,), labels) == 1:
          break
        else:
          # we need labels to be the same shape as for Y(n,1) when assign
          Y_temp = labels.reshape(-1,1) 
      
      # calculate score and compare with meanSS
      tmp = self.temp_score(dis_to_centors, K, labels, n)
      if (minSS == -1) or tmp < minSS:
        Z = Zt
        W = Wt
        minSS = tmp
        Y = labels
    return  Z, Y, W, minSS


  def temp_score(self, dis_to_centors, K, labels, n):
    """
      calculate:
      1. sum_in_cluster(1*k): the sum of the distance from the nodes to the centroid 
      of its belonged cluster
      2. sum_in_cluster_norm(1*k): normalize the sum_in_cluster by the 
      corresponding label count (how many nodes in each cluster)
      3. sum_not_in_cluster(1*k): the sum of the distance of the cluster 
      centroid to the nodes that do not belong to the cluster
      4. sum_not_in_cluster_norm(1*k): normalize the sum_other_centroids by the 
      counts of the nodes that do not belong to the cluster.
      5. temp score(1*k): 
      (normalized sum in cluster / normalized sum not in cluster ) *
      (label count in cluster / total node number)
      6. get mean + 2 standard deviation of temp score, then return 
    """
    label_count = np.bincount(labels)
    sum_in_cluster_squre = np.zeros((K,))

    dis_to_centors_squre = dis_to_centors**2

    for i in range(n):
      label = labels[i]
      sum_in_cluster_squre[label] += dis_to_centors_squre[i][label]
    
    # how to find out if the distance is squared, the current method doesn't do square root.
    sum_not_in_cluster = (np.sum(dis_to_centors_squre, axis=0) - sum_in_cluster_squre)**0.5

    sum_not_in_cluster_norm = sum_not_in_cluster/(n - label_count)
    sum_in_cluster_norm = sum_in_cluster_squre**0.5/label_count

    tmp = sum_in_cluster_norm / sum_not_in_cluster_norm * label_count / n
    tmp = np.mean(tmp) + 2*np.std(tmp)

    return tmp


  def cluster_main(self):
    K = self.K
    DataSets = self.DataSets    
    X = DataSets.X
    n = DataSets.n

    kmax = np.amax(K)
    if n/kmax < 30:
      print('Too many clusters at maximum. Result may bias towards large K. Please make sure n/Kmax >30.')
    # when the cluster size is specified
    if not self.cluster_size_range:
      [Z,Y,W,meanSS]= self.graph_encoder_cluster(K[0])
    # when the range of cluster size is provided 
    # columns are less than n/2 and kmax is less than max(n/2, 10)
    if self.cluster_size_range:
      k_range = len(K)
      if k_range < n/2 and kmax < max(n/2, 10):
          minSS = -1
          Z = 0
          W = 0
          meanSS = np.zeros((k_range,1))
          for i in range(k_range):
            [Zt,Yt,Wt,tmp]= self.graph_encoder_cluster(K[i])
            meanSS[i,0] = i
            if (minSS == -1) or tmp < minSS:
              minSS = tmp
              Y = Yt
              Z = Zt
              W = Wt
    return Z, Y, W, meanSS

############------------Clustering_end---------------------------###############
############------------Evaluation_start---------------------------#############
class Evaluation:
  def GNN_supervise_test(self, gnn, z_test, y_test):
    """
      test the accuracy for GNN_direct
    """
    y_test_one_hot = to_categorical(y_test) 
    # set verbose to 0 to silent the output
    test_loss, test_acc = gnn.model.evaluate(z_test,  y_test_one_hot, verbose=0) 

    return test_acc

  def LDA_supervise_test(self, lda, z_test, y_test):
    """
      test the accuracy for LDA_learner
    """
    test_acc = lda.model.score(z_test, y_test)

    return test_acc

  def GNN_semi_supervised_learn_test(self,Y_result, Y_original):
    """
      test accuracy for semi-supervised learning
    """
    test_acc = metrics.accuracy_score(Y_result, Y_original)   

    return test_acc

  def GNN_semi_supervised_not_learn_test(self, gnn, Dataset, case):
    """
      test accuracy for semi-supervised learning
    """

    ind_unlabel = Dataset.ind_unlabel
    z_unlabel =  Dataset.z_unlabel 
    y_unlabel_ori = case.Y_ori[ind_unlabel, 0]
    y_unlabel_ori_one_hot = to_categorical(y_unlabel_ori) 
    test_loss, test_acc = gnn.model.evaluate(z_unlabel, y_unlabel_ori_one_hot, verbose=0)

    return test_acc


  def clustering_test(self, Y_result, Y_original):
    """
      test accuracy for semi-supervised learning
    """
    ari = adjusted_rand_score(Y_result, Y_original.reshape(-1,))

    return ari

############-----------------Matrix_conversion-------------------###############
def To_multi_sparse_matrix(M_list,option):
  M_list_new = []
  for M in M_list:
    M_new = To_single_sparse_matrix(M,option)
    M_list_new.append(M_new)

  return M_list_new

def To_single_sparse_matrix(M,option):
  """
    coo_matrix is efficient and fast to construct, 
      However, arithmetic operations are not efficient on this matrix. 
      One can easily convert coo_matrix to csc_matrix/csr_matrix 
    csc_matrix/csr_matrix are efficient in column_slicing/row_slicing,
      One can have efficient multiplication or inversion.
  """
  if option == "coo":
    M = sparse.coo_matrix(M)
  if option == "csr":
    M = sparse.csr_matrix(M)
  if option == "csc":
    M = sparse.csc_matrix(M)
  
  return M



#Packages for Drive Files

In [None]:
# import packages
## for mount drive purpose
import os
from google.colab import drive

#Mount Drive

In [None]:
# mount drive
drive.mount('/content/drive/', force_remount=True)
os.chdir('/content/drive/My Drive/Colab_Notebooks/Graph_ML/semi_dr.shen')

Mounted at /content/drive/


# import ipynb packages

In [None]:
!pip install import-ipynb

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting import-ipynb
  Downloading import_ipynb-0.1.4-py3-none-any.whl (4.1 kB)
Collecting jedi>=0.10
  Downloading jedi-0.18.1-py2.py3-none-any.whl (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 6.9 MB/s 
Installing collected packages: jedi, import-ipynb
Successfully installed import-ipynb-0.1.4 jedi-0.18.1


In [None]:
import import_ipynb
from test_cases import Model, Case

importing Jupyter notebook from test_cases.ipynb
Mounted at /content/drive/


# Test Cases 

## Test Form

In [None]:
# importing the module
import pandas as pd
list_of_data = [
        [ False, False, False, 0, 0.95],
        ['False', 'False', 'True', 0, 0.95],
        ['False', 'True', 'True', 0, 0.95],
        ['False', 'True', 'False', 0, 0.95],        
        ['True', 'False', 'False', 0, 0.95],
        ['True', 'False', 'True', 0, 0.95],
        ['True', 'True', 'False', 0, 0.95],
        ['True', 'True', 'True', 0, 0.95]
        ]
df = pd.DataFrame(list_of_data,
index=['set_01','set_02','set_03','set_04','set_05','set_06','set_07','set_08'],
columns=['Laplacian','DiagA', 'Correlation', 'Accuracy', 'Time(s)'])

df = df.style.format({
  'Time(s)': '{:0.2f}',
})

display(df)

Unnamed: 0,Laplacian,DiagA,Correlation,Accuracy,Time(s)
set_01,False,False,False,0,0.95
set_02,False,False,True,0,0.95
set_03,False,True,True,0,0.95
set_04,False,True,False,0,0.95
set_05,True,False,False,0,0.95
set_06,True,False,True,0,0.95
set_07,True,True,False,0,0.95
set_08,True,True,True,0,0.95


## Real datasets and cases from input files

In [None]:
import math
import copy
import networkx as nx

class RealDataSet:
  def __init__(self, edg_file, node_file):
    self.X = None  # edg_list
    self.n = None
    self.Y = None
    self.edg_file = edg_file
    self.node_file = node_file
  
  def get_initial_values(self):
    realSet = copy.deepcopy(self)

    label_dict, map_new_old_keys = self.read_node_file(self.node_file)
    n = self.get_n(label_dict)
    
    if map_new_old_keys:
      X = self.read_edge_file_with_remap(self.edg_file, n, map_new_old_keys)
    else:
      X = self.read_edge_file(self.edg_file, n)
    
    realSet.X = X
    label_dict = self.check_class_values(label_dict)
    Y = self.get_labels(label_dict, n)
    realSet.Y = Y
    realSet.n = n
    realSet.k = self.get_k(label_dict)

    return realSet
  
  def check_class_values(self, label_dict):
    """
      check if class values start with 0, if not, correct it
    """
    
    if min(set(label_dict.values())) != 0:
      for k, v in label_dict.items():
         label_dict[k] = str(int(v) - 1)
    
    return label_dict

  def read_node_file(self, filename):
    """
      the node in the node file start with node 1 not node 0
    """
    re_map_node = False
    label_dict = {}
    labels = open(filename, "r") 
    line_count = 0
    map_new_old_keys = {}
    for l in labels:
      line_count += 1
      (node_i, label_i) = l.strip().split(",")
      if (line_count) == 1 and (int(node_i) != 1):
        re_map_node = True 
      label_dict[int(node_i)-1] = label_i 
    # if node not start with 0, there is an id for the node, for example PMID for pubmed data
    # need to map the pubmed id back to a serie of node IDs starting from 0 
    if re_map_node:
      keys = sorted(list(label_dict.keys()))
      new_node_idx = [i for i in range(len(keys))]
      new_label_dict = {}
      for i in range(len(keys)):
        map_new_old_keys[keys[i]] = new_node_idx[i]
        new_label_dict[new_node_idx[i]] = label_dict[keys[i]]
      label_dict = new_label_dict
        
    return label_dict, map_new_old_keys
  
  def get_n(self, label_dict):
    """
      get the number of nodes: n
      the keys start with 0, so n is max + 1.
    """
    n = max(sorted(list(label_dict.keys())))+1
    return n
  
  def get_k(self, label_dict):
    """
      get the number of classes: k
    """
    k = len(set(label_dict.values()))
    return k

  def read_edge_file(self, filename, n):
    """
      NOTE: the node in the node file start with node 1 not node 0
    """
    edg_list = []
    edges = open(filename, "r") 
    for l in edges:
      elements = l.strip().split(",")
      if len(elements) > 2:
        (node_i, node_j, w) = elements
        edg_list.append([int(node_i)-1, int(node_j)-1, float(w)])
      else: 
        (node_i, node_j) = elements
        edg_list.append([int(node_i)-1, int(node_j)-1, 1]) 
    edg = np.array(edg_list)
    return edg  

  def read_edge_file_with_remap(self, filename, n, map_new_old_keys):
    """
      for the ids that are remaped from the node file, 
      need to remap id for edge list as well
    """
    edg_list = []
    edges = open(filename, "r") 
    for l in edges:
      elements = l.strip().split(",")
      if len(elements) > 2:
        (node_i, node_j, w) = elements
        new_idx_i = map_new_old_keys[int(node_i)-1]
        new_idx_j = map_new_old_keys[int(node_j)-1]
        edg_list.append([new_idx_i, new_idx_j, float(w)])
      else: 
        (node_i, node_j) = elements
        new_idx_i = map_new_old_keys[int(node_i)-1]
        new_idx_j = map_new_old_keys[int(node_j)-1]        
        edg_list.append([new_idx_i, new_idx_j, 1]) 
    edg = np.array(edg_list)
    return edg  

  def check_label(self, label_dict, n):
    """
      the input label_dict start with key 0
    """
    check = True
    keys = sorted(list(label_dict.keys()))
    unlabeld_node_idx = []
    for node_idx in range(n):
      if node_idx not in keys:
        unlabeld_node_idx.append(node_idx)
    if len(unlabeld_node_idx) > 0:
      print("There are node(s) not labeled")
      check = False
    return check, unlabeld_node_idx

  def get_labels(self, label_dict, n):
    check, unlabeld_node_idx = self.check_label(label_dict, n)
    keys = sorted(list(label_dict.keys()))
    Y = np.zeros((n,1), dtype=int)
    for node_idx in keys:    
      Y[node_idx][0] = int(label_dict[node_idx])
    if not check:
      for idx in unlabeld_node_idx:
        Y[idx][0] = -1

    return Y

  def split_sets(self, test_ratio):

    DataSet = copy.deepcopy(self)
    Y_ori = DataSet.Y
    Y = np.copy(Y_ori)

    t = test_ratio
    Y_1st_dim = Y.shape[0]

    np.random.seed(0)
    indices = np.random.permutation(Y_1st_dim)  #randomly permute the 1st indices

    # Generate indices for splits
    test_ind_split_point = math.floor(Y_1st_dim*t)
    test_idx, train_idx = indices[:test_ind_split_point], indices[test_ind_split_point:]

    
    # get the Y_test label
    Y_test = Y[test_idx]
    Y_train = Y[train_idx]
    # mark the test position as unknown: -1
    Y[test_idx, 0] = -1    


    DataSet.Y_ori = Y_ori
    DataSet.Y = Y
    DataSet.Y_train = Y_train.ravel()
    DataSet.Y_test = Y_test.ravel() 
    DataSet.test_idx = test_idx
    DataSet.train_idx = train_idx    
    return DataSet 

def edge_list_to_adjacency_matrix(edg_list, n):
  A = np.zeros((n,n))
  for [i, j, w] in edg_list:
    i = int(i)
    j = int(j)
    if A[i,j] != w:
      A[i,j] = w
    if A[j,i] != w:
      A[j,i] = w
  return A

In [None]:
import math
import copy
import networkx as nx

class RealDataSet:
  def __init__(self, edg_file, node_file):
    self.X = None  # edg_list
    self.n = None
    self.Y = None
    self.edg_file = edg_file
    self.node_file = node_file
  
  def get_initial_values(self):
    realSet = copy.deepcopy(self)

    label_dict, map_new_old_keys = self.read_node_file(self.node_file)
    n = self.get_n(label_dict)
    
    if map_new_old_keys:
      X = self.read_edge_file_with_remap(self.edg_file, n, map_new_old_keys)
    else:
      X = self.read_edge_file(self.edg_file, n)
    
    realSet.X = X
    label_dict = self.check_class_values(label_dict)
    Y = self.get_labels(label_dict, n)
    realSet.Y = Y
    realSet.n = n
    realSet.k = self.get_k(label_dict)

    return realSet
  
  def check_class_values(self, label_dict):
    """
      check if class values start with 0, if not, correct it
    """
    
    if min(set(label_dict.values())) != 0:
      for k, v in label_dict.items():
         label_dict[k] = str(int(v) - 1)
    
    return label_dict

  def find_split_point(self, firstline):
    # find split point
    split_point_pos = [",", "\t"]
    split_point = ""
    for sp in split_point_pos:
      if sp in firstline:
        split_point = sp
        break
    return split_point

  def read_node_file(self, filename):
    """
      the node in the node file start with node 1 not node 0
    """
    re_map_node = False
    label_dict = {}
    labels = open(filename, "r") 
    line_count = 0
    map_new_old_keys = {}
    
    for l in labels:
      line_count += 1
      if (line_count) == 1:
        split_point = self.find_split_point(l)
      (node_i, label_i) = l.strip().split(split_point)
      if (line_count) == 1 and (int(node_i) != 1):
        re_map_node = True 
      label_dict[int(node_i)-1] = label_i 
    # if there is an id for the node, for example PMID for pubmed data
    # need to map the pubmed id back to a serie of node IDs starting from 0 
    if re_map_node:
      keys = sorted(list(label_dict.keys()))
      new_node_idx = [i for i in range(len(keys))]
      new_label_dict = {}
      for i in range(len(keys)):
        map_new_old_keys[keys[i]] = new_node_idx[i]
        new_label_dict[new_node_idx[i]] = label_dict[keys[i]]
      label_dict = new_label_dict
        
    return label_dict, map_new_old_keys
  
  def get_n(self, label_dict):
    """
      get the number of nodes: n
      the keys start with 0, so n is max + 1.
    """
    n = max(sorted(list(label_dict.keys())))+1
    return n
  
  def get_k(self, label_dict):
    """
      get the number of classes: k
    """
    k = len(set(label_dict.values()))
    return k

  def read_edge_file(self, filename, n):
    """
      NOTE: the node in the node file start with node 1 not node 0
    """
    edg_list = []
    edges = open(filename, "r") 

    line_count = 0
    for l in edges:
      
      line_count += 1
      if (line_count) == 1:
        split_point = self.find_split_point(l)

      elements = l.strip().split(split_point)
      if len(elements) > 2:
        (node_i, node_j, w) = elements
        edg_list.append([int(node_i)-1, int(node_j)-1, float(w)])
      else: 
        (node_i, node_j) = elements
        edg_list.append([int(node_i)-1, int(node_j)-1, 1]) 
    edg = np.array(edg_list)
    return edg  

  def read_edge_file_with_remap(self, filename, n, map_new_old_keys):
    """
      for the ids that are remaped from the node file, 
      need to remap id for edge list as well
    """
    edg_list = []
    edges = open(filename, "r") 

    line_count = 0
    for l in edges:
      line_count += 1
      if (line_count) == 1:
        split_point = self.find_split_point(l)
      elements = l.strip().split(split_point)
      if len(elements) > 2:
        (node_i, node_j, w) = elements
        new_idx_i = map_new_old_keys[int(node_i)-1]
        new_idx_j = map_new_old_keys[int(node_j)-1]
        edg_list.append([new_idx_i, new_idx_j, float(w)])
      else: 
        (node_i, node_j) = elements
        new_idx_i = map_new_old_keys[int(node_i)-1]
        new_idx_j = map_new_old_keys[int(node_j)-1]        
        edg_list.append([new_idx_i, new_idx_j, 1]) 
    edg = np.array(edg_list)
    return edg  

  def check_label(self, label_dict, n):
    """
      the input label_dict start with key 0
    """
    check = True
    keys = sorted(list(label_dict.keys()))
    unlabeld_node_idx = []
    for node_idx in range(n):
      if node_idx not in keys:
        unlabeld_node_idx.append(node_idx)
    if len(unlabeld_node_idx) > 0:
      print("There are node(s) not labeled")
      check = False
    return check, unlabeld_node_idx

  def get_labels(self, label_dict, n):
    check, unlabeld_node_idx = self.check_label(label_dict, n)
    keys = sorted(list(label_dict.keys()))
    Y = np.zeros((n,1), dtype=int)
    for node_idx in keys:    
      Y[node_idx][0] = int(label_dict[node_idx])
    if not check:
      for idx in unlabeld_node_idx:
        Y[idx][0] = -1

    return Y

  def split_sets(self, test_ratio):

    DataSet = copy.deepcopy(self)
    Y_ori = DataSet.Y
    Y = np.copy(Y_ori)

    t = test_ratio
    Y_1st_dim = Y.shape[0]

    np.random.seed(0)
    indices = np.random.permutation(Y_1st_dim)  #randomly permute the 1st indices

    # Generate indices for splits
    test_ind_split_point = math.floor(Y_1st_dim*t)
    test_idx, train_idx = indices[:test_ind_split_point], indices[test_ind_split_point:]

    
    # get the Y_test label
    Y_test = Y[test_idx]
    Y_train = Y[train_idx]
    # mark the test position as unknown: -1
    Y[test_idx, 0] = -1    


    DataSet.Y_ori = Y_ori
    DataSet.Y = Y
    DataSet.Y_train = Y_train.ravel()
    DataSet.Y_test = Y_test.ravel() 
    DataSet.test_idx = test_idx
    DataSet.train_idx = train_idx    
    return DataSet 

def edge_list_to_adjacency_matrix(edg_list, n):
  A = np.zeros((n,n))
  for [i, j, w] in edg_list:
    i = int(i)
    j = int(j)
    if A[i,j] != w:
      A[i,j] = w
    if A[j,i] != w:
      A[j,i] = w
  return A


### case10

In [None]:
edg_file = "case10.edges"
node_file = "case10.node_labels"

In [None]:
RlDataSet = RealDataSet(edg_file, node_file)
case10 = RlDataSet.get_initial_values()
test_case = case10.split_sets(0.2)

In [None]:
print(test_case.X)

[[0.000e+00 4.000e+00 1.000e+00]
 [0.000e+00 2.500e+01 1.000e+00]
 [0.000e+00 4.000e+01 1.000e+00]
 ...
 [2.992e+03 2.994e+03 1.000e+00]
 [2.993e+03 2.998e+03 1.000e+00]
 [2.998e+03 2.999e+03 1.000e+00]]


In [None]:
print(test_case.X.shape)

(500249, 3)


In [None]:
print(test_case.Y)

[[ 0]
 [-1]
 [ 1]
 ...
 [ 1]
 [ 1]
 [ 1]]


In [None]:
print(len(test_case.Y))

3000


In [None]:
test_case.n

3000

####GNN

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.8416666388511658
--- embed 1.7030916213989258 seconds ---
--- train 137.2607822418213 seconds ---
--- total 138.9784345626831 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.8650000095367432
--- embed 1.6895387172698975 seconds ---
--- train 17.011085510253906 seconds ---
--- total 18.72270154953003 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False, Batch_input = True)

acc:  0.8483333587646484
--- embed 1.6930992603302002 seconds ---
--- train 142.4762580394745 seconds ---
--- total 144.19393873214722 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.8633333444595337
--- embed 1.7046051025390625 seconds ---
--- train 10.864747047424316 seconds ---
--- total 12.591570377349854 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.8866666555404663
--- embed 9.62221646308899 seconds ---
--- train 322.52445101737976 seconds ---
--- total 332.1621344089508 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.6916666626930237
--- embed 7.123134613037109 seconds ---
--- train 10.095487594604492 seconds ---
--- total 17.236419916152954 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True, Batch_input = True)

acc:  0.8666666746139526
--- embed 1.7506904602050781 seconds ---
--- train 75.43104577064514 seconds ---
--- total 77.1966187953949 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.8766666650772095
--- embed 2.1593384742736816 seconds ---
--- train 11.031860828399658 seconds ---
--- total 13.204347372055054 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True, Batch_input = True)

acc:  0.875
--- embed 1.719573974609375 seconds ---
--- train 64.00433158874512 seconds ---
--- total 65.74850010871887 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.8816666603088379
--- embed 1.6475141048431396 seconds ---
--- train 9.924271583557129 seconds ---
--- total 11.587576389312744 seconds ---


####LDA

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.775
--- embed 1.7442491054534912 seconds ---
--- train 0.0273897647857666 seconds ---
--- total 1.7843189239501953 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.775
--- embed 8.265424251556396 seconds ---
--- train 0.004114866256713867 seconds ---
--- total 8.284082651138306 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.775
--- embed 3.185542583465576 seconds ---
--- train 0.002935171127319336 seconds ---
--- total 3.2131803035736084 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.77
--- embed 1.7510716915130615 seconds ---
--- train 0.002561330795288086 seconds ---
--- total 1.7670342922210693 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.7716666666666666
--- embed 1.7108738422393799 seconds ---
--- train 0.003490924835205078 seconds ---
--- total 1.728276252746582 seconds ---


### case11

In [None]:
edg_file = "case11.edges"
node_file = "case11.node_labels"

In [None]:
RlDataSet = RealDataSet(edg_file, node_file)
case11 = RlDataSet.get_initial_values()
test_case = case11.split_sets(0.2)

In [None]:
print(test_case.X)

[[0.000e+00 4.000e+00 1.000e+00]
 [0.000e+00 1.300e+01 1.000e+00]
 [0.000e+00 2.500e+01 1.000e+00]
 ...
 [2.992e+03 2.994e+03 1.000e+00]
 [2.993e+03 2.998e+03 1.000e+00]
 [2.998e+03 2.999e+03 1.000e+00]]


In [None]:
print(test_case.X.shape)

(539863, 3)


In [None]:
print(test_case.Y)

[[ 1]
 [-1]
 [ 1]
 ...
 [ 2]
 [ 2]
 [ 1]]


In [None]:
print(len(test_case.Y))

3000


In [None]:
test_case.n

3000

####GNN

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.8583333492279053
--- embed 1.897650957107544 seconds ---
--- train 323.10284447669983 seconds ---
--- total 325.0223937034607 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.8316666483879089
--- embed 1.932950735092163 seconds ---
--- train 11.266501665115356 seconds ---
--- total 13.231640338897705 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False, Batch_input = True)

acc:  0.8766666650772095
--- embed 2.132197618484497 seconds ---
--- train 335.88620615005493 seconds ---
--- total 338.0512397289276 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.8366666436195374
--- embed 2.339596748352051 seconds ---
--- train 10.929938554763794 seconds ---
--- total 13.303427696228027 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.8500000238418579
--- embed 8.627696514129639 seconds ---
--- train 382.879017829895 seconds ---
--- total 391.5228343009949 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.19499999284744263
--- embed 19.210325002670288 seconds ---
--- train 21.22362780570984 seconds ---
--- total 40.45474815368652 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True, Batch_input = True)

acc:  0.8299999833106995
--- embed 1.9737493991851807 seconds ---
--- train 322.5313820838928 seconds ---
--- total 324.5168182849884 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.8366666436195374
--- embed 1.994297981262207 seconds ---
--- train 10.238099575042725 seconds ---
--- total 12.252954959869385 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True, Batch_input = True)

acc:  0.8316666483879089
--- embed 1.8419044017791748 seconds ---
--- train 271.4477961063385 seconds ---
--- total 273.3167734146118 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.8399999737739563
--- embed 1.9245240688323975 seconds ---
--- train 10.860437631607056 seconds ---
--- total 12.811626672744751 seconds ---


####LDA

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.7883333333333333
--- embed 1.924760103225708 seconds ---
--- train 0.029677867889404297 seconds ---
--- total 1.9658112525939941 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.7883333333333333
--- embed 8.090712547302246 seconds ---
--- train 0.003103971481323242 seconds ---
--- total 8.10787034034729 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.7883333333333333
--- embed 1.9155547618865967 seconds ---
--- train 0.0030975341796875 seconds ---
--- total 1.9387617111206055 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.7883333333333333
--- embed 1.9196226596832275 seconds ---
--- train 0.0032722949981689453 seconds ---
--- total 1.9366436004638672 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.7883333333333333
--- embed 1.864607334136963 seconds ---
--- train 0.00330352783203125 seconds ---
--- total 1.8862638473510742 seconds ---


### case20

In [None]:
edg_file = "case20.edges"
node_file = "case20.node_labels"

In [None]:
RlDataSet = RealDataSet(edg_file, node_file)
case20 = RlDataSet.get_initial_values()
test_case = case20.split_sets(0.2)

In [None]:
print(test_case.X)

[[0.000e+00 3.000e+00 1.000e+00]
 [0.000e+00 1.680e+02 1.000e+00]
 [0.000e+00 4.420e+02 1.000e+00]
 ...
 [2.952e+03 2.975e+03 1.000e+00]
 [2.952e+03 2.993e+03 1.000e+00]
 [2.975e+03 2.980e+03 1.000e+00]]


In [None]:
print(test_case.X.shape)

(539863, 3)


In [None]:
print(test_case.Y)

[[ 0]
 [-1]
 [ 1]
 ...
 [ 1]
 [ 1]
 [ 1]]


In [None]:
print(len(test_case.Y))

3000


In [None]:
test_case.n

####GNN

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.8816666603088379
--- embed 0.1246945858001709 seconds ---
--- train 68.94957947731018 seconds ---
--- total 69.078449010849 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.8983333110809326
--- embed 0.12955904006958008 seconds ---
--- train 21.617371320724487 seconds ---
--- total 21.75085997581482 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False, Batch_input = True)

acc:  0.8816666603088379
--- embed 0.12337827682495117 seconds ---
--- train 142.46699905395508 seconds ---
--- total 142.59352087974548 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.9016666412353516
--- embed 0.12606191635131836 seconds ---
--- train 9.873629808425903 seconds ---
--- total 10.004696607589722 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.8949999809265137
--- embed 0.5198707580566406 seconds ---
--- train 143.59347558021545 seconds ---
--- total 144.11669301986694 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.6916666626930237
--- embed 0.5243124961853027 seconds ---
--- train 10.020005226135254 seconds ---
--- total 10.547384977340698 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True, Batch_input = True)

acc:  0.8816666603088379
--- embed 0.13627123832702637 seconds ---
--- train 82.49933934211731 seconds ---
--- total 82.63745927810669 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.8816666603088379
--- embed 0.11913657188415527 seconds ---
--- train 9.989803314208984 seconds ---
--- total 10.11062741279602 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True, Batch_input = True)

acc:  0.8383333086967468
--- embed 0.13864660263061523 seconds ---
--- train 23.59549832344055 seconds ---
--- total 23.737098693847656 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.8816666603088379
--- embed 0.12118363380432129 seconds ---
--- train 10.011458158493042 seconds ---
--- total 10.137312173843384 seconds ---


####LDA

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.6733333333333333
--- embed 0.34978747367858887 seconds ---
--- train 0.008037567138671875 seconds ---
--- total 0.3704230785369873 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.7116666666666667
--- embed 1.3298521041870117 seconds ---
--- train 0.0035698413848876953 seconds ---
--- total 1.3370723724365234 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.6766666666666666
--- embed 0.3089005947113037 seconds ---
--- train 0.0038299560546875 seconds ---
--- total 0.31716275215148926 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.71
--- embed 0.31122732162475586 seconds ---
--- train 0.010443925857543945 seconds ---
--- total 0.3301999568939209 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.7033333333333334
--- embed 0.11613583564758301 seconds ---
--- train 0.0024139881134033203 seconds ---
--- total 0.1206817626953125 seconds ---


### case21

In [None]:
edg_file = "case21.edges"
node_file = "case21.node_labels"

In [None]:
RlDataSet = RealDataSet(edg_file, node_file)
case21 = RlDataSet.get_initial_values()
test_case = case21.split_sets(0.2)

In [None]:
print(test_case.X)

[[0.000e+00 3.000e+00 1.000e+00]
 [0.000e+00 1.680e+02 1.000e+00]
 [0.000e+00 5.510e+02 1.000e+00]
 ...
 [2.952e+03 2.993e+03 1.000e+00]
 [2.975e+03 2.980e+03 1.000e+00]
 [2.983e+03 2.987e+03 1.000e+00]]


In [None]:
print(test_case.X.shape)

(30487, 3)


In [None]:
print(test_case.Y)

[[ 3]
 [-1]
 [ 4]
 ...
 [ 5]
 [ 6]
 [ 4]]


In [None]:
print(len(test_case.Y))

3000


In [None]:
test_case.n

3000

####GNN

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.7733333110809326
--- embed 0.11215424537658691 seconds ---
--- train 202.51082801818848 seconds ---
--- total 202.6280415058136 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.7683333158493042
--- embed 0.12771868705749512 seconds ---
--- train 21.00049114227295 seconds ---
--- total 21.130443572998047 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False, Batch_input = True)

acc:  0.7633333206176758
--- embed 0.1423168182373047 seconds ---
--- train 382.49053978919983 seconds ---
--- total 382.6389887332916 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.7649999856948853
--- embed 0.15233659744262695 seconds ---
--- train 11.929754734039307 seconds ---
--- total 12.086615324020386 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.7316666841506958
--- embed 0.4661402702331543 seconds ---
--- train 264.93456053733826 seconds ---
--- total 265.40412735939026 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.41333332657814026
--- embed 0.47107362747192383 seconds ---
--- train 11.967174291610718 seconds ---
--- total 12.441295146942139 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True, Batch_input = True)

acc:  0.7599999904632568
--- embed 0.12299966812133789 seconds ---
--- train 382.50110054016113 seconds ---
--- total 382.62635135650635 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.7699999809265137
--- embed 0.1675739288330078 seconds ---
--- train 20.941564083099365 seconds ---
--- total 21.12047553062439 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True, Batch_input = True)

acc:  0.7766666412353516
--- embed 0.1356501579284668 seconds ---
--- train 338.26007175445557 seconds ---
--- total 338.3992009162903 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.778333306312561
--- embed 0.16256403923034668 seconds ---
--- train 20.945129871368408 seconds ---
--- total 21.112346649169922 seconds ---


####LDA

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.755
--- embed 0.21677660942077637 seconds ---
--- train 0.007802724838256836 seconds ---
--- total 0.22705721855163574 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.7483333333333333
--- embed 0.5205044746398926 seconds ---
--- train 0.005011081695556641 seconds ---
--- total 0.5277643203735352 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.75
--- embed 0.17207908630371094 seconds ---
--- train 0.005054473876953125 seconds ---
--- total 0.1821455955505371 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.7433333333333333
--- embed 0.1678323745727539 seconds ---
--- train 0.0052165985107421875 seconds ---
--- total 0.1768050193786621 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.745
--- embed 0.28714823722839355 seconds ---
--- train 0.012469768524169922 seconds ---
--- total 0.3026864528656006 seconds ---


### Citessser

In [None]:
edg_file = "citeseer/citeseer.edges"
node_file = "citeseer/citeseer.node_labels"

In [None]:
RlDataSet = RealDataSet(edg_file, node_file)
citeseer = RlDataSet.get_initial_values()
test_case = citeseer.split_sets(0.2)

In [None]:
print(test_case.X)

[[0.000e+00 8.690e+02 1.000e+00]
 [1.000e+00 5.970e+02 1.000e+00]
 [1.000e+00 2.206e+03 1.000e+00]
 ...
 [3.196e+03 3.197e+03 1.000e+00]
 [3.227e+03 3.228e+03 1.000e+00]
 [3.242e+03 3.243e+03 1.000e+00]]


In [None]:
print(test_case.X.shape)

(4536, 3)


In [None]:
print(test_case.Y)

[[ 1]
 [-1]
 [ 4]
 ...
 [ 2]
 [ 3]
 [ 3]]


In [None]:
print(len(test_case.Y))

3264


In [None]:
test_case.n

3264

####GNN

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.6947852969169617
--- embed 0.13004660606384277 seconds ---
--- train 383.3185830116272 seconds ---
--- total 383.46177434921265 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.6855828166007996
--- embed 0.07430148124694824 seconds ---
--- train 21.6912841796875 seconds ---
--- total 21.775913953781128 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False, Batch_input = True)

acc:  0.6855828166007996
--- embed 0.06679677963256836 seconds ---
--- train 442.5569746494293 seconds ---
--- total 442.62749457359314 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.6947852969169617
--- embed 0.03598427772521973 seconds ---
--- train 14.735898971557617 seconds ---
--- total 14.774861335754395 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.6702454090118408
--- embed 0.11111664772033691 seconds ---
--- train 397.0311920642853 seconds ---
--- total 397.14545702934265 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.3711656332015991
--- embed 0.07914352416992188 seconds ---
--- train 21.0663104057312 seconds ---
--- total 21.148303747177124 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True, Batch_input = True)

acc:  0.6978527903556824
--- embed 0.04891252517700195 seconds ---
--- train 202.53120923042297 seconds ---
--- total 202.5845911502838 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.696319043636322
--- embed 0.0296475887298584 seconds ---
--- train 14.110218048095703 seconds ---
--- total 14.14150357246399 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True, Batch_input = True)

acc:  0.6794478297233582
--- embed 0.0518341064453125 seconds ---
--- train 442.56538915634155 seconds ---
--- total 442.61939215660095 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.6779140830039978
--- embed 0.03942275047302246 seconds ---
--- train 21.05480670928955 seconds ---
--- total 21.097041845321655 seconds ---


####LDA

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.661042944785276
--- embed 0.02965545654296875 seconds ---
--- train 0.005881786346435547 seconds ---
--- total 0.03696250915527344 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.6779141104294478
--- embed 0.1141049861907959 seconds ---
--- train 0.0047147274017333984 seconds ---
--- total 0.12119841575622559 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.6211656441717791
--- embed 0.07757186889648438 seconds ---
--- train 0.006510019302368164 seconds ---
--- total 0.08569645881652832 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.6825153374233128
--- embed 0.03724026679992676 seconds ---
--- train 0.010840654373168945 seconds ---
--- total 0.055170297622680664 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.6656441717791411
--- embed 0.0825507640838623 seconds ---
--- train 0.0065419673919677734 seconds ---
--- total 0.09149980545043945 seconds ---


### Cora

In [None]:
edg_file = "cora/cora.edges"
node_file = "cora/cora.node_labels"

In [None]:
RlDataSet = RealDataSet(edg_file, node_file)
cora = RlDataSet.get_initial_values()
test_case = cora.split_sets(0.2)

In [None]:
print(test_case.X)

[[0.000e+00 8.000e+00 1.000e+00]
 [0.000e+00 4.350e+02 1.000e+00]
 [0.000e+00 5.440e+02 1.000e+00]
 ...
 [2.707e+03 7.740e+02 1.000e+00]
 [2.707e+03 1.389e+03 1.000e+00]
 [2.707e+03 2.344e+03 1.000e+00]]


In [None]:
print(test_case.X.shape)

(5429, 3)


In [None]:
print(test_case.Y)

[[2]
 [5]
 [4]
 ...
 [1]
 [0]
 [2]]


In [None]:
test_case.n

2708

####GNN

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.8243992328643799
--- embed 0.04742574691772461 seconds ---
--- train 382.8231840133667 seconds ---
--- total 382.87349224090576 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.7227357029914856
--- embed 5.095839262008667 seconds ---
--- train 21.014777660369873 seconds ---
--- total 26.138925075531006 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False, Batch_input = True)

acc:  0.8280960917472839
--- embed 0.0472867488861084 seconds ---
--- train 353.3332750797272 seconds ---
--- total 353.3821804523468 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.6931608319282532
--- embed 5.053898572921753 seconds ---
--- train 14.894344329833984 seconds ---
--- total 19.971819639205933 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.8373382687568665
--- embed 0.09032297134399414 seconds ---
--- train 354.1913814544678 seconds ---
--- total 354.2829074859619 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.28096118569374084
--- embed 5.073204517364502 seconds ---
--- train 21.038694858551025 seconds ---
--- total 26.138411045074463 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True, Batch_input = True)

acc:  0.8262476921081543
--- embed 0.02802729606628418 seconds ---
--- train 321.1728835105896 seconds ---
--- total 321.2032253742218 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.8262476921081543
--- embed 5.07008957862854 seconds ---
--- train 21.029639959335327 seconds ---
--- total 26.1295166015625 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True, Batch_input = True)

acc:  0.7985212802886963
--- embed 0.03669428825378418 seconds ---
--- train 354.42542481422424 seconds ---
--- total 354.4661931991577 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.8262476921081543
--- embed 4.97986364364624 seconds ---
--- train 21.203248023986816 seconds ---
--- total 26.21301293373108 seconds ---


####LDA

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.6968576709796673
--- embed 8.34671688079834 seconds ---
--- train 0.009981155395507812 seconds ---
--- total 8.38416838645935 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.8133086876155268
--- embed 9.472187519073486 seconds ---
--- train 0.005350351333618164 seconds ---
--- total 9.50841760635376 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.711645101663586
--- embed 5.006281852722168 seconds ---
--- train 0.00544285774230957 seconds ---
--- total 5.042108774185181 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.8114602587800369
--- embed 4.980281829833984 seconds ---
--- train 0.006389617919921875 seconds ---
--- total 5.016510009765625 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.7929759704251387
--- embed 7.69186806678772 seconds ---
--- train 0.005058765411376953 seconds ---
--- total 7.725239038467407 seconds ---


### PubMed

In [None]:
edg_file = "PubMed/PubMed.edges"
node_file = "PubMed/PubMed.node_labels"

In [None]:
RlDataSet = RealDataSet(edg_file, node_file)
PbMed = RlDataSet.get_initial_values()
test_case = PbMed.split_sets(0.2)

In [None]:
print(test_case.X)

[[8964 2235    1]
 [8964 5975    1]
 [8964 1603    1]
 ...
 [8953  749    1]
 [8953 2175    1]
 [8953 5033    1]]


In [None]:
print(test_case.X.shape)

(44338, 3)


In [None]:
print(test_case.Y)

[[ 0]
 [ 0]
 [ 0]
 ...
 [ 1]
 [-1]
 [ 2]]


In [None]:
test_case.n

19717

####GNN

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.7862033843994141
--- embed 0.1909477710723877 seconds ---
--- train 2046.0987310409546 seconds ---
--- total 2046.299381017685 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.7316763997077942
--- embed 0.21355962753295898 seconds ---
--- train 65.60261607170105 seconds ---
--- total 65.83710551261902 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False, Batch_input = True)

acc:  0.7874714732170105
--- embed 0.24120068550109863 seconds ---
--- train 2302.5927753448486 seconds ---
--- total 2302.8419699668884 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.7841745018959045
--- embed 0.2671630382537842 seconds ---
--- train 66.28198838233948 seconds ---
--- total 66.55564451217651 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.7415673136711121
--- embed 0.7175610065460205 seconds ---
--- train 1402.647029876709 seconds ---
--- total 1403.3726933002472 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.43038296699523926
--- embed 0.7407631874084473 seconds ---
--- train 82.57444667816162 seconds ---
--- total 83.3216962814331 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True, Batch_input = True)

acc:  0.7877250909805298
--- embed 0.3924834728240967 seconds ---
--- train 442.83543515205383 seconds ---
--- total 443.2387979030609 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.7877250909805298
--- embed 0.2092146873474121 seconds ---
--- train 82.69088006019592 seconds ---
--- total 82.90701723098755 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True, Batch_input = True)

acc:  0.7877250909805298
--- embed 0.24066948890686035 seconds ---
--- train 1455.1146399974823 seconds ---
--- total 1455.3672857284546 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.7869642376899719
--- embed 0.48926329612731934 seconds ---
--- train 82.9206895828247 seconds ---
--- total 83.41761589050293 seconds ---


####LDA

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.7359878265280243
--- embed 0.7082791328430176 seconds ---
--- train 0.0830533504486084 seconds ---
--- total 0.824845552444458 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.7727618564544763
--- embed 2.7324724197387695 seconds ---
--- train 0.044559478759765625 seconds ---
--- total 2.7971391677856445 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.7362414405275172
--- embed 0.8585913181304932 seconds ---
--- train 0.019095659255981445 seconds ---
--- total 0.889228343963623 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.7823991884352016
--- embed 0.8611061573028564 seconds ---
--- train 0.05807352066040039 seconds ---
--- total 0.9404096603393555 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.7674359624651281
--- embed 0.8924369812011719 seconds ---
--- train 0.051174163818359375 seconds ---
--- total 0.9651534557342529 seconds ---


### proteins-all

In [None]:
edg_file = "proteins-all.edges"
node_file = "proteins-all.node_labels"

In [None]:
RlDataSet = RealDataSet(edg_file, node_file)
proteins_all = RlDataSet.get_initial_values()
test_case = proteins_all.split_sets(0.2)

In [None]:
print(test_case.X)

[[   11     0     1]
 [   22     0     1]
 [   32     0     1]
 ...
 [43438 43470     1]
 [43468 43470     1]
 [43469 43470     1]]


In [None]:
print(test_case.X.shape)

(162088, 3)


In [None]:
print(test_case.Y)

[[0]
 [0]
 [0]
 ...
 [2]
 [2]
 [2]]


In [None]:
test_case.n

43471

####GNN

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.7071543335914612
--- embed 0.682565450668335 seconds ---
--- train 3206.8173391819 seconds ---
--- total 3207.5229518413544 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.7031285762786865
--- embed 0.6823170185089111 seconds ---
--- train 143.06379795074463 seconds ---
--- total 143.76848530769348 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False, Batch_input = True)

acc:  0.7048539519309998
--- embed 0.898064374923706 seconds ---
--- train 3249.4049878120422 seconds ---
--- total 3250.327650308609 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.704278826713562
--- embed 0.8133687973022461 seconds ---
--- train 134.41222095489502 seconds ---
--- total 135.2360656261444 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False, Batch_input = True)

acc:  0.6979526281356812
--- embed 2.513030767440796 seconds ---
--- train 3682.604898929596 seconds ---
--- total 3685.1357028484344 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)



acc:  0.6170922517776489
--- embed 2.544031858444214 seconds ---
--- train 134.22378778457642 seconds ---
--- total 136.7819242477417 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True, Batch_input = True)

acc:  0.7083045840263367
--- embed 1.8698246479034424 seconds ---
--- train 3188.9193069934845 seconds ---
--- total 3190.827320098877 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.7068092823028564
--- embed 0.6754560470581055 seconds ---
--- train 143.07830095291138 seconds ---
--- total 143.76667022705078 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True, Batch_input = True)

acc:  0.6624108552932739
--- embed 0.7641010284423828 seconds ---
--- train 5123.021935939789 seconds ---
--- total 5123.814966201782 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.6995629072189331
--- embed 0.8250021934509277 seconds ---
--- train 142.78672289848328 seconds ---
--- total 143.63335585594177 seconds ---


####LDA

#####Laplacian = False, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.7039337474120083
--- embed 0.8234868049621582 seconds ---
--- train 0.05249977111816406 seconds ---
--- total 0.9090898036956787 seconds ---


#####Laplacian = True, DiagA = False, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.6946169772256728
--- embed 2.8665685653686523 seconds ---
--- train 0.02514195442199707 seconds ---
--- total 2.9131128787994385 seconds ---


#####Laplacian = False, DiagA = True, Correlation = False

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.7041637911203129
--- embed 1.7608740329742432 seconds ---
--- train 0.047395944595336914 seconds ---
--- total 1.8384361267089844 seconds ---


#####Laplacian = False, DiagA = False, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.692661605705084
--- embed 1.6155939102172852 seconds ---
--- train 0.04204893112182617 seconds ---
--- total 1.7071013450622559 seconds ---


#####Laplacian = False, DiagA = True, Correlation = True

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(test_case, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.7011732229123534
--- embed 1.8151791095733643 seconds ---
--- train 0.04037809371948242 seconds ---
--- total 1.878368616104126 seconds ---


## Graph Encoder test case


In [None]:
class Encoder_case:
  def __init__(self, A,Y,n):
    Encoder_case.X = A
    Encoder_case.Y = Y
    Encoder_case.n = n

###Case 1

A = 

\begin{bmatrix}
0 & 1 & 1 & 1 & 0\\
1 & 0 & 1 & 1 & 1\\
1 & 1 & 0 & 1 & 1\\
1 & 1 & 1 & 0 & 1\\
0 & 1 & 1 & 1 & 0
\end{bmatrix}

Labels = [0,0,0,1,1] 


In [None]:
A = np.ones((5,5))
A[0,4] = 0
A[4,0] = 0
np.fill_diagonal(A, 0)

Y = np.array([[0,0,0,1,1]]).reshape((5,1))

print(A)
print(Y)

Encoder_case = Encoder_case(A,Y,5)

[[0. 1. 1. 1. 0.]
 [1. 0. 1. 1. 1.]
 [1. 1. 0. 1. 1.]
 [1. 1. 1. 0. 1.]
 [0. 1. 1. 1. 0.]]
[[0]
 [0]
 [0]
 [1]
 [1]]


#### Laplacian = False, correclation = False, DiagA = False

In [None]:
Dataset = DataPreprocess(Encoder_case, Laplacian = False, DiagA = False)
print(Dataset.X)
print(Dataset.Y)
print(Dataset.n)

[array([[0., 1., 1.],
       [0., 2., 1.],
       [0., 3., 1.],
       [1., 0., 1.],
       [1., 2., 1.],
       [1., 3., 1.],
       [1., 4., 1.],
       [2., 0., 1.],
       [2., 1., 1.],
       [2., 3., 1.],
       [2., 4., 1.],
       [3., 0., 1.],
       [3., 1., 1.],
       [3., 2., 1.],
       [3., 4., 1.],
       [4., 1., 1.],
       [4., 2., 1.],
       [4., 3., 1.]])]
[[0]
 [0]
 [0]
 [1]
 [1]]
5


In [None]:
Z, W = graph_encoder_embed(Dataset.X[0], Dataset.Y, Dataset.n, Correlation = False)
print(Z)
print(W)

[[1.33333333 1.        ]
 [1.33333333 2.        ]
 [1.33333333 2.        ]
 [2.         1.        ]
 [1.33333333 1.        ]]
[[0.33333333 0.        ]
 [0.33333333 0.        ]
 [0.33333333 0.        ]
 [0.         0.5       ]
 [0.         0.5       ]]


#### Laplacian = False, correclation = True, DiagA = False

In [None]:
Dataset = DataPreprocess(Encoder_case, Laplacian = False, DiagA = False)
print(Dataset.X)
print(Dataset.Y)
print(Dataset.n)

[array([[0., 1., 1.],
       [0., 2., 1.],
       [0., 3., 1.],
       [1., 0., 1.],
       [1., 2., 1.],
       [1., 3., 1.],
       [1., 4., 1.],
       [2., 0., 1.],
       [2., 1., 1.],
       [2., 3., 1.],
       [2., 4., 1.],
       [3., 0., 1.],
       [3., 1., 1.],
       [3., 2., 1.],
       [3., 4., 1.],
       [4., 1., 1.],
       [4., 2., 1.],
       [4., 3., 1.]])]
[[0]
 [0]
 [0]
 [1]
 [1]]
5


In [None]:
Z, W = graph_encoder_embed(Dataset.X[0], Dataset.Y, Dataset.n, Correlation = True)
print(Z)
print(W)

[[0.8        0.6       ]
 [0.5547002  0.83205029]
 [0.5547002  0.83205029]
 [0.89442719 0.4472136 ]
 [0.8        0.6       ]]
[[0.33333333 0.        ]
 [0.33333333 0.        ]
 [0.33333333 0.        ]
 [0.         0.5       ]
 [0.         0.5       ]]


#### Laplacian = True, correclation = False, DiagA = False

In [None]:
Dataset = DataPreprocess(Encoder_case, Laplacian = True, DiagA = False)
print(Dataset.X)
print(Dataset.Y)
print(Dataset.n)

[array([[0.        , 1.        , 0.14433757],
       [0.        , 2.        , 0.14433757],
       [0.        , 3.        , 0.14433757],
       [1.        , 0.        , 0.14433757],
       [1.        , 2.        , 0.125     ],
       [1.        , 3.        , 0.125     ],
       [1.        , 4.        , 0.14433757],
       [2.        , 0.        , 0.14433757],
       [2.        , 1.        , 0.125     ],
       [2.        , 3.        , 0.125     ],
       [2.        , 4.        , 0.14433757],
       [3.        , 0.        , 0.14433757],
       [3.        , 1.        , 0.125     ],
       [3.        , 2.        , 0.125     ],
       [3.        , 4.        , 0.14433757],
       [4.        , 1.        , 0.14433757],
       [4.        , 2.        , 0.14433757],
       [4.        , 3.        , 0.14433757]])]
[[0]
 [0]
 [0]
 [1]
 [1]]
5


In [None]:
Z, W = graph_encoder_embed(Dataset.X[0], Dataset.Y, Dataset.n, Correlation = False)
print(Z)
print(W)

[[0.19245009 0.14433757]
 [0.17955838 0.26933757]
 [0.17955838 0.26933757]
 [0.26289171 0.14433757]
 [0.19245009 0.14433757]]
[[0.33333333 0.        ]
 [0.33333333 0.        ]
 [0.33333333 0.        ]
 [0.         0.5       ]
 [0.         0.5       ]]


#### Laplacian = True, correclation = True, DiagA = False

In [None]:
Dataset = DataPreprocess(Encoder_case, Laplacian = True, DiagA = False)
print(Dataset.X)
print(Dataset.Y)
print(Dataset.n)

[array([[0.        , 1.        , 0.14433757],
       [0.        , 2.        , 0.14433757],
       [0.        , 3.        , 0.14433757],
       [1.        , 0.        , 0.14433757],
       [1.        , 2.        , 0.125     ],
       [1.        , 3.        , 0.125     ],
       [1.        , 4.        , 0.14433757],
       [2.        , 0.        , 0.14433757],
       [2.        , 1.        , 0.125     ],
       [2.        , 3.        , 0.125     ],
       [2.        , 4.        , 0.14433757],
       [3.        , 0.        , 0.14433757],
       [3.        , 1.        , 0.125     ],
       [3.        , 2.        , 0.125     ],
       [3.        , 4.        , 0.14433757],
       [4.        , 1.        , 0.14433757],
       [4.        , 2.        , 0.14433757],
       [4.        , 3.        , 0.14433757]])]
[[0]
 [0]
 [0]
 [1]
 [1]]
5


In [None]:
Z, W = graph_encoder_embed(Dataset.X[0], Dataset.Y, Dataset.n, Correlation = False)
print(Z)
print(W)

[[0.19245009 0.14433757]
 [0.17955838 0.26933757]
 [0.17955838 0.26933757]
 [0.26289171 0.14433757]
 [0.19245009 0.14433757]]
[[0.33333333 0.        ]
 [0.33333333 0.        ]
 [0.33333333 0.        ]
 [0.         0.5       ]
 [0.         0.5       ]]


### test encoder_1

In [None]:
A = np.array([
 [0, 0, 1, 0, 0, 0, 0, 0],
 [0, 0, 0, 1, 0, 0, 0, 0],
 [1, 0, 0, 1, 0, 0, 0, 0],
 [0, 1, 1, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0]])
print(A)

[[0 0 1 0 0 0 0 0]
 [0 0 0 1 0 0 0 0]
 [1 0 0 1 0 0 0 0]
 [0 1 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0]]


In [None]:
Y = np.array([[1,1,0,1,2,1,1,1]]).reshape((8,1))
print(Y)

[[1]
 [1]
 [0]
 [1]
 [2]
 [1]
 [1]
 [1]]


In [None]:
Encoder_case = Encoder_case(A,Y,8)

####DiagA=false; Correlation=false; Laplacian=false

In [None]:
Dataset = DataPreprocess(Encoder_case, Laplacian = False, DiagA = False)
Z, W = graph_encoder_embed(Dataset.X[0], Dataset.Y, Dataset.n, Correlation = False)
print(Z)
print(W)

[[2.         0.         0.        ]
 [0.         0.33333333 0.        ]
 [0.         0.66666667 0.        ]
 [2.         0.33333333 0.        ]
 [0.         0.         0.        ]
 [0.         0.         0.        ]
 [0.         0.         0.        ]
 [0.         0.         0.        ]]
[[0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]
 [1.         0.         0.        ]
 [0.         0.16666667 0.        ]
 [0.         0.         1.        ]
 [0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]]


####DiagA=true; Correlation=false; Laplacian=false;

In [None]:
Dataset = DataPreprocess(Encoder_case, Laplacian = False, DiagA = True)
Z, W = graph_encoder_embed(Dataset.X[0], Dataset.Y, Dataset.n, Correlation = False)
print(Z)
print(W)

[[2.         0.16666667 0.        ]
 [0.         0.5        0.        ]
 [1.         0.66666667 0.        ]
 [2.         0.5        0.        ]
 [0.         0.         1.        ]
 [0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]]
[[0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]
 [1.         0.         0.        ]
 [0.         0.16666667 0.        ]
 [0.         0.         1.        ]
 [0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]]


####DiagA= true; Correlation= true; Laplacian=false

In [None]:
Dataset = DataPreprocess(Encoder_case, Laplacian = False, DiagA = True)
Z, W = graph_encoder_embed(Dataset.X[0], Dataset.Y, Dataset.n, Correlation = True)
print(Z)
print(W)

[[0.99654576 0.08304548 0.        ]
 [0.         1.         0.        ]
 [0.83205029 0.5547002  0.        ]
 [0.9701425  0.24253563 0.        ]
 [0.         0.         1.        ]
 [0.         1.         0.        ]
 [0.         1.         0.        ]
 [0.         1.         0.        ]]
[[0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]
 [1.         0.         0.        ]
 [0.         0.16666667 0.        ]
 [0.         0.         1.        ]
 [0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]]


####DiagA= true; Correlation= true; Laplacian= true;

In [None]:
Dataset = DataPreprocess(Encoder_case, Laplacian = True, DiagA = True)
Z, W = graph_encoder_embed(Dataset.X[0], Dataset.Y, Dataset.n, Correlation = True)
print(Z)
print(W)

[[0.99426272 0.10696564 0.        ]
 [0.         1.         0.        ]
 [0.79475691 0.60692789 0.        ]
 [0.95822122 0.28602815 0.        ]
 [0.         0.         1.        ]
 [0.         1.         0.        ]
 [0.         1.         0.        ]
 [0.         1.         0.        ]]
[[0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]
 [1.         0.         0.        ]
 [0.         0.16666667 0.        ]
 [0.         0.         1.        ]
 [0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]
 [0.         0.16666667 0.        ]]


### check Laplacian option

In [None]:
from scipy import sparse
import pandas as pd

In [None]:
# invalide devide resutls will be nan
np.seterr(divide='ignore', invalid='ignore')

############------------graph_encoder_embed_start----------------###############
class GraphEncoderEmbed:
  def run(self, X, Y, n, **kwargs):
    defaultKwargs = {'EdgeList': False, 'DiagA': True, 'Laplacian': False, 'Correlation': True}
    kwargs = { **defaultKwargs, **kwargs}

    if kwargs['EdgeList']:
      size_flag = self.edge_list_size
      X = self.Edge_to_Sparse(X, n, size_flag)
    
    if kwargs['DiagA']:
      X = self.Diagonal(X, n)

    if kwargs['Laplacian']:
      X = self.Laplacian(X, n)
    
    Z, W = self.Basic(X, Y, n)

    if kwargs['Correlation']:
      Z = self.Correlation(Z)
    
    return Z, W

  def Basic(self, X, Y, n):
    """
      graph embedding basic function
      input X is sparse csr matrix of adjacency matrix
      -- if there is a connection between node i and node j:
      ---- X(i,j) = 1, no edge weight
      ---- X(i,j) = edge weight.
      -- if there is no connection between node i and node j:
      ---- X(i,j) = 0, 
      ---- note there is no storage for this in sparse matrix. 
      ---- No storage means 0 in sparse matrix.
      input Y is numpy array with size (n,1):
      -- value -1 indicate no lable
      -- value >=0 indicate real label
      input train_idx: a list of indices of input X for training set 
    """
    # assign k to the max along the first column
    # Note for python, label Y starts from 0. Python index starts from 0. thus size k should be max + 1
    k = Y[:,0].max() + 1
    
    #nk: 1*n array, contains the number of observations in each class
    nk = np.zeros((1,k))
    for i in range(k):
      nk[0,i] = np.count_nonzero(Y[:,0]==i)
    
    #W: sparse matrix for encoder matrix. W[i,k] = {1/nk if Yi==k, otherwise 0}
    W = sparse.dok_matrix((n, k), dtype=np.float32)

    for i in range(n):
      k_i = Y[i,0]
      if k_i >=0:
        W[i,k_i] = 1/nk[0,k_i]
    
    W = sparse.csr_matrix(W)
    Z = X.dot(W)
    
    return Z, W

  def Diagonal(self, X, n):
    """
      input X is sparse csr matrix of adjacency matrix
      return a sparse csr matrix of X matrix with 1s on the diagonal
    """
    I = sparse.identity(n)
    X = X + I
    return X


  def Laplacian(self, X, n):
    """
      input X is sparse csr matrix of adjacency matrix
      return a sparse csr matrix of Laplacian normalization of X matrix
    """
    X_sparse = sparse.csr_matrix(X)
    # get an array of degrees
    dig = X_sparse.sum(axis=0).A1
    # diagonal sparse matrix of D
    D = sparse.diags(dig,0)
    _D = D.power(-0.5)
    # D^-0.5 x A x D^-0.5
    L = _D.dot(X_sparse.dot(_D)) 

    # _L = _D.dot(X_sparse.dot(_D))    
    # # L = I - D^-0.5 x A x D^-0.5
    # I = sparse.identity(n)
    # L = I - _L   

    return L
  
  def Correlation(self, Z):
    """
      input Z is sparse csr matrix of embedding matrix from the basic function
      return normalized Z sparse matrix
      Calculation:
      Calculate each row's 2-norm (Euclidean distance). 
      e.g.row_x: [ele_i,ele_j,ele_k]. norm2 = sqr(sum(ele_i^2+ele_i^2+ele_i^2))
      then divide each element by their row norm
      e.g. [ele_i/norm2,ele_j/norm2,ele_k/norm2] 
    """
    # 2-norm
    row_norm = sparse.linalg.norm(Z, axis = 1)

    # row division to get the normalized Z
    diag = np.nan_to_num(1/row_norm)
    N = sparse.diags(diag,0)
    Z = N.dot(Z)

    return Z

  def edge_list_size(self, X):
    """
      set default edge list size as S3.
      If find X only has 2 columns, 
      return a flag "S2" indicating this is S2 edge list
    """
    if X.shape[1] == 2:
      return "S2"
    else:
      return "S3"
    
  def Edge_to_Sparse(self, X, n, size_flag):
    """
      input X is an edge list.
      For S2 edge list (e.g. node_i, node_j per row), add one to all connections
      return a sparse csr matrix of S3 edge list
    """   
    #Build an empty sparse matrix. 
    X_new = sparse.dok_matrix((n, n), dtype=np.float32)

    for row in X:
      if size_flag == "S2":
        [node_i, node_j] = row
        X_new[node_i, node_j] = 1
      else:
        [node_i, node_j, weight] = row
        X_new[node_i, node_j] = weight
    
    X_new = sparse.csr_matrix(X_new)

    return X_new


############------------graph_encoder_embed_end------------------###############

In [None]:
n = 3000
case = Case(n)
case_10 = case.case_10_fully_known()
case_10.summary()

name:

    SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(3000, 1)
[[1]
 [0]
 [2]
 ...
 [2]
 [2]
 [2]]


In [None]:
X_sparse = sparse.csr_matrix(case_10.X)
GEE = GraphEncoderEmbed()
Z_sparse, W = GEE.run(X_sparse, case_10.Y, case_10.n, Laplacian = True, DiagA = False, Correlation = False)

In [None]:
Dataset = DataPreprocess(case_10, Laplacian = True, DiagA = False, emb_opt = "AEE")
Z, W = graph_encoder_embed(Dataset.X[0], Dataset.Y, Dataset.n, Correlation = False)

In [None]:
print(Dataset.X[0])
print(Dataset.Y)

[[1.         2.         0.28867513]
 [1.         4.         0.28867513]
 [1.         6.         0.20412415]
 [2.         1.         0.28867513]
 [4.         1.         0.28867513]
 [6.         1.         0.20412415]
 [6.         9.         0.35355338]
 [9.         6.         0.35355338]]
[[ 1]
 [ 0]
 [-1]
 [ 1]
 [ 1]
 [ 1]
 [ 1]
 [ 2]
 [-1]
 [ 1]]


In [None]:
print(Z_sparse)

  (1, 1)	0.16426643150443582
  (2, 0)	0.5773502691896257
  (4, 0)	0.5773502691896257
  (6, 1)	0.11785113370999531
  (6, 0)	0.408248290463863
  (9, 1)	0.11785113370999531


In [None]:
print(sparse.csr_matrix(Z))

  (1, 1)	0.16426642735799152
  (2, 0)	0.5773502588272095
  (4, 0)	0.5773502588272095
  (6, 0)	0.40824830532073975
  (6, 1)	0.11785112818082173
  (9, 1)	0.11785112818082173


In [None]:
for i in range(10):
  for j in range(3):
    if Z_sparse[i,j] != Z[i,j]:
      print(i, j, Z[i,j], Z_sparse[i,j])

1 1 0.16426642735799152 0.16426643150443582
2 0 0.5773502588272095 0.5773502691896257
4 0 0.5773502588272095 0.5773502691896257
6 0 0.40824830532073975 0.408248290463863
6 1 0.11785112818082173 0.11785113370999531
9 1 0.11785112818082173 0.11785113370999531


In [None]:
print(Z)

[[0.         0.         0.        ]
 [0.         0.16426643 0.        ]
 [0.57735026 0.         0.        ]
 [0.         0.         0.        ]
 [0.57735026 0.         0.        ]
 [0.         0.         0.        ]
 [0.40824831 0.11785113 0.        ]
 [0.         0.         0.        ]
 [0.         0.         0.        ]
 [0.         0.11785113 0.        ]]


In [None]:
Dataset = DataPreprocess(case_10, Laplacian = True, DiagA = False, Correlation = False, emb_opt = "AEE")

In [None]:
Dataset = Dataset.supervise_preprocess()

In [None]:
print(Dataset.Z)

[[0.00030292 0.00040615 0.00028555]
 [0.0003926  0.00035897 0.00031261]
 [0.00030511 0.00030078 0.00037801]
 ...
 [0.00026627 0.00030059 0.00041161]
 [0.00033237 0.00027925 0.00032617]
 [0.00032234 0.00036537 0.00036498]]


In [None]:
Z_multi, W = multi_graph_encoder_embed(Dataset,case_10.Y,Correlation = False)

In [None]:
Z_multi

array([[0.        , 0.        , 0.        ],
       [0.        , 0.16426643, 0.        ],
       [0.57735026, 0.        , 0.        ],
       [0.        , 0.        , 0.        ],
       [0.57735026, 0.        , 0.        ],
       [0.        , 0.        , 0.        ],
       [0.40824831, 0.11785113, 0.        ],
       [0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        ],
       [0.        , 0.11785113, 0.        ]])

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.47999998927116394
--- embed 3.782085657119751 seconds ---
--- train 21.060994625091553 seconds ---
--- total 41.50991177558899 seconds ---


In [None]:
z_test = Z[case_10.test_idx]
z_train = Z[case_10.train_idx]
y_train = case_10.Y[case_10.train_idx]
y_test = case_10.Y_test

In [None]:
gnn = GNN(Dataset, Learner = 0, LearnerIter = 0)
k = Dataset.k 
z_unlabel = Dataset.z_unlabel
y_train_one_hot = to_categorical(y_train)
gnn_res = gnn.GNN_run(k, z_train, y_train_one_hot, z_unlabel)
eval = Evaluation()
acc = eval.GNN_supervise_test(gnn_res, z_test, y_test)

In [None]:
print(acc)

0.47999998927116394


In [None]:
print(acc)

0.9483333230018616


In [None]:
gnn = GNN(Dataset, Learner = 0, LearnerIter = 0)
k = Dataset.k 
z_unlabel = Dataset.z_unlabel
y_train_one_hot = to_categorical(y_train)
gnn_res = gnn.GNN_run(k, z_train, y_train_one_hot, z_unlabel)
eval = Evaluation()
acc = eval.GNN_supervise_test(gnn_res, z_test, y_test)

In [None]:
print(Z)

[[0.         0.         0.        ]
 [0.         0.16426643 0.        ]
 [0.57735026 0.         0.        ]
 [0.         0.         0.        ]
 [0.57735026 0.         0.        ]
 [0.         0.         0.        ]
 [0.40824831 0.11785113 0.        ]
 [0.         0.         0.        ]
 [0.         0.         0.        ]
 [0.         0.11785113 0.        ]]


In [None]:
def batch_generator(X, y, k, batch_size, shuffle):
    number_of_batches = X.shape[0]/batch_size
    counter = 0
    sample_index = np.arange(X.shape[0])
    if shuffle:
        np.random.shuffle(sample_index)
    while True:
        batch_index = sample_index[batch_size*counter:batch_size*(counter+1)]
        X_batch = X[batch_index,:]
        y_batch = to_categorical(y[batch_index], num_classes=k)
        counter += 1
        yield X_batch, y_batch
        if (counter == number_of_batches):
            if shuffle:
                np.random.shuffle(sample_index)
            counter = 0

In [None]:
############-----------------GNN_start---------------------------###############
class Hyperperameters:
  """
    define perameters for GNN.
    default values are for GNN learning -- "Leaner" ==2:
      embedding via partial label, then learn unknown label via two-layer NN

  """
  def __init__(self):
    # there is no scaled conjugate gradiant in keras optimiser, use defualt instead
    # use whatever default
    self.learning_rate = 0.01  # Initial learning rate.
    self.epochs = 100 #Number of epochs to train.
    self.hidden = 20 #Number of units in hidden layer 
    self.val_split = 0.1 #Split 10% of training data for validation
    self.loss = 'categorical_crossentropy' # loss function


class GNN:
  def __init__(self, DataSets, **kwargs):
    GNN.kwargs = self.kwargs_construct(**kwargs)
    GNN.DataSets = DataSets
    GNN.hyperM = Hyperperameters()
    GNN.model = self.GNN_model()  #model summary: GNN.model.summary()
    GNN.meanSS = 0  # initialize the self-defined critirion meanSS
    
  def kwargs_construct(self, **kwargs):
    defaultKwargs = {'Learner': 1,                    # GNN_Leaner
                     'LearnerIter': 0,                # GNN_complete, GNN_Iter
                     "Replicates": 3                  # GNN_Iter                           
                     }
    kwargs = { **defaultKwargs, **kwargs}  # update the args using input_args
    return kwargs      
 
  
  def GNN_model(self):
    """
      build GNN model
    """
    hyperM = self.hyperM
    DataSets = self.DataSets

    z_train = DataSets.z_train
    k = DataSets.k

    feature_num = z_train.shape[1]
    
    model = keras.Sequential([
    keras.layers.Flatten(input_shape = (feature_num,)),  # input layer 
    keras.layers.Dense(hyperM.hidden, activation='relu'),  # hidden layer -- no tansig activation function in Keras, use relu instead
    keras.layers.Dense(k, activation='softmax') # output layer, matlab used softmax for patternnet default ??? max(opts.neuron,K)? opts 
    ])

    optimizer = keras.optimizers.Adam(learning_rate = hyperM.learning_rate)

    model.compile(optimizer='adam',
                  loss=hyperM.loss,
                  metrics=['accuracy'])

    return model
  

  def GNN_run(self, k, z_train, y_train_one_hot, z_unlabel):
    """
      Train and test directly.
      Do not learn from the unknown labels.
    """
    gnn = copy.deepcopy(self)
    hyperM = gnn.hyperM
    model = gnn.model    

    # validation_split=hyperM.val_split
    history = model.fit(batch_generator(z_train, y_train, k, 32, True), 
          epochs=hyperM.epochs, 
          steps_per_epoch=z_train.shape[0],  
          verbose=0)
    
    train_acc = history.history['accuracy'][-1]
    
    predict_probs = None
    pred_class = None
    pred_class_prob = None
    if type(z_unlabel) == np.ndarray:
      # predict_probas include probabilities for all classes for each node
      predict_probs = model.predict(z_unlabel)
      # assign the classes with the highest probability
      pred_class = np.argmax(predict_probs, axis=1)
      # the corresponding probabilities of the predicted classes
      pred_class_prob = predict_probs[range(len(predict_probs)),pred_class]

    gnn.model = model
    gnn.train_acc = train_acc
    gnn.pred_probs = predict_probs  
    gnn.pred_class = pred_class
    gnn.pred_class_prob = pred_class_prob


    return gnn

  def GNN_Direct(self, DataSets, y_train_one_hot):
    """
      This function can run:
      1. by itself, when interation is set to False (<1)
      2. inside GNN_Iter, when interation is set to True (>=1)

      Learner == 0: GNN, but not learn from the known label
      Learner == 2: GNN, and learn unknown labels 
    """
    Learner = self.kwargs["Learner"]  

    k = DataSets.k 
    z_train = DataSets.z_train 
    Y = DataSets.Y
    z_unlabel = DataSets.z_unlabel
    ind_unlabel = DataSets.ind_unlabel

    gnn = self.GNN_run(k, z_train, y_train_one_hot, z_unlabel)

    if Learner == 0:
      # do not learn unknown label.
      pass
    
    if Learner == 2:
      # learn unknown label based on the known label
      # replace the unknown label in Y with predicted labels
      pred_class = gnn.pred_class
      Y[ind_unlabel, 0] = pred_class

    gnn.Y = Y

    return gnn


  def GNN_Iter(self, DataSets):
    """
      Run this function when interation is set, which is >=1.
      
      1. randomly assign labels to the unknown labels, get Y_temp
      2. get Y_one_hot for the Y_temp 
      3. get Z from graph_encod function with X and Y_temp 
      within each loop: 
        use meanSS as its criterion to decide if the update is needed        
	      update Y_one_hot for the unknown labels with predict probabilities of each classes
	      update Y with the highest possible predicted labels
	      update z_train and z_unlabel from graph encoder embedding using the updated Y
	      train the model with the updated z_train and Y_one_hot     
    """

    kwargs = self.kwargs
    meanSS = self.meanSS

    k = DataSets.k 
    Y = DataSets.Y
    ind_unlabel = DataSets.ind_unlabel


    y_temp = np.copy(Y)
    DataSets.y_temp = y_temp


    for i in range(kwargs["Replicates"]):
      # assign random integers in [1,K] to unassigned labels
      r = [i for i in range(k)]
      
      ran_int = np.random.choice(r, size=(len(ind_unlabel),1))

      y_temp[ind_unlabel] = ran_int

      for j in range(kwargs["LearnerIter"]):
        if j ==0:
          # first iteration need to split the y_temp for training etc.
          # use reset to add z_train, z_unlabel, y_temp_one_hot, to the dataset
          DataSets = DataSets.DataSets_reset("y_temp")  
          # Convert targets into one-hot encoded format      
          y_temp_one_hot = to_categorical(y_temp) 
          # initialize y_temp_one_hot in the first loop
          DataSets.y_temp_one_hot = y_temp_one_hot     
        if j > 0:
          # update z_train, z_unlabel, and y_temp_train_one_hot to the dataset
          DataSets = DataSets.DataSets_reset("y_temp")
        # all the gnn train on y_train_one_hot
        gnn = self.GNN_Direct(DataSets, DataSets.Y_train_one_hot)
        predict_probs = gnn.pred_probs
        pred_class = gnn.pred_class
        pred_class_prob = gnn.pred_class_prob

        # z_unknown is initialized with none, so the pred_class may be none
        # This will not happen for the semi version,
        # since the unknown size should not be none for the semi version
        if type(pred_class) == np.ndarray:
          # if there are unkown labels and predicted labels are available
          # check if predicted_class are the same as the random integers
          # if so, stop the iteration in "LearnerIter" loop
          # shape (n,) is required for adjusted_rand_score()
          if adjusted_rand_score(ran_int.reshape((-1,)), pred_class) == 1:
            break
          # assign the probabilites for each class to the temp y_one_hot
          DataSets.y_temp_one_hot[ind_unlabel] = predict_probs
          # assgin the predicted classes to the temp Y unknown labels 
          DataSets.y_temp[ind_unlabel, 0] = pred_class 
          # # assign the highest possibility of the class to Y_temp
          # Y_temp[ind_unlabel, 0] = pred_class_prob
      minP = np.mean(pred_class_prob) - 3*np.std(pred_class_prob)
      if minP > meanSS:
        meanSS = minP
        Y = DataSets.y_temp   

      gnn.Y = Y
      gnn.meanSS = meanSS
      return gnn  
  
        
  def GNN_complete(self):
    """
      if LearnerIter set to False(<1):
        run GNN_Direct() with no iteration
      if LearnerIter set to True(>=1):
        run GNN_Iter(), which starts with radomly assigned k to unknown labels
      
    """
    kwargs = self.kwargs

    DataSets = self.DataSets
    y_train = DataSets.Y_train


    if kwargs["LearnerIter"] < 1:
      # Convert targets into one-hot encoded format
      y_train_one_hot = to_categorical(y_train)
      gnn = self.GNN_Direct(DataSets, y_train_one_hot)
    else:
      gnn = self.GNN_Iter(DataSets)
    
    return gnn

In [None]:
gnn = GNN(Dataset, Learner = 0, LearnerIter = 0)
k = Dataset.k 
z_unlabel = Dataset.z_unlabel
y_train_one_hot = to_categorical(y_train)
gnn_res = gnn.GNN_run(k, z_train, y_train_one_hot, z_unlabel)
eval = Evaluation()
acc = eval.GNN_supervise_test(gnn_res, z_test, y_test)

## Supervised Learning, Clustering, Semi-supervised learning 

In [None]:
n = 3000
case = Case(n)

In [None]:
# get all combinations of different emb settings 

sets_no = 8
L_set = [True, False]
Diag_set = [True, False]
Corre_set = [True, False]
comb = [L_set, Diag_set, Corre_set]
comb_set = []

ele_list = [None, None, None]
for ele1 in comb[0]:
  ele_list[0] = ele1
  for ele2 in comb[1]:
    ele_list[1] = ele2
    for ele3 in comb[2]:
      ele_list[2] = ele3
      comb_set.append(ele_list.copy())

print(comb_set)
print(len(comb_set))

[[True, True, True], [True, True, False], [True, False, True], [True, False, False], [False, True, True], [False, True, False], [False, False, True], [False, False, False]]
8


In [None]:
def average_restuls(case_num, comb_set, learner_no):
  results = []
  for comb in comb_set:
    acc_final, train_time_final, emb_time_final, total_time_final = 0,0,0,0
    for i in range(10):
      test_case = copy.deepcopy(case_num)
      acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori= Run(test_case, "su", Learner = learner_no, Laplacian = comb[0], DiagA = comb[1], Correlation = comb[2])
      acc_final += acc
      train_time_final += train_time
      emb_time_final += emb_time 
      total_time_final += total_time
    
    acc_final /= 10
    train_time_final /= 10
    emb_time_final /= 10
    total_time_final /= 10

    result = comb + [acc_final, train_time_final, emb_time_final, total_time_final]
    results.append(result)
  
  return results

def plot(results):
  df = pd.DataFrame(results,
  index=['set_01','set_02','set_03','set_04','set_05','set_06','set_07','set_08'],
  columns=['Laplacian','DiagA', 'Correlation', 'Accuracy', 'Train_Time(s)', 'Emb_Time(s)', 'Total_Time(s)'])

  df = df.style.format({
    'Emb_Time(s)': '{:0.2f}',
    'Train_Time(s)': '{:0.5f}',
    'Total_Time(s)': '{:0.2f}'
  })

  display(df)

### Supervised

#### GNN

##### case 10

In [None]:
case_10 = case.case_10_fully_known()
case_10.summary()

Info:

    SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(3000, 1)
[[1]
 [0]
 [2]
 ...
 [2]
 [2]
 [2]]


In [None]:
case_10 = case_10.to_edge_list()
case_10.summary()

name:

    SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(500249, 3)
[[   0    4    1]
 [   0   25    1]
 [   0   40    1]
 ...
 [2992 2994    1]
 [2993 2998    1]
 [2998 2999    1]]
Y:
(3000, 1)
[[1]
 [0]
 [2]
 ...
 [2]
 [2]
 [2]]


In [None]:
case_10.X.shape

(500249, 3)

In [None]:
print(case_10.bd)

0.13


######collect

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.9549999833106995
--- embed 2.020695924758911 seconds ---
--- train 21.399245023727417 seconds ---
--- total 23.436811208724976 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.9549999833106995
--- embed 1.8560810089111328 seconds ---
--- train 15.273542165756226 seconds ---
--- total 20.354602098464966 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.47999998927116394
--- embed 7.200726747512817 seconds ---
--- train 11.547739505767822 seconds ---
--- total 21.39046335220337 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.95333331823349
--- embed 2.035653829574585 seconds ---
--- train 14.474254608154297 seconds ---
--- total 19.48075819015503 seconds ---


######others

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 0, Laplacian = True, DiagA = True, Correlation = False)

acc:  0.47999998927116394
--- embed 6.551916599273682 seconds ---
--- train 12.9748694896698 seconds ---
--- total 21.97466778755188 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 1, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.9533333333333334
--- embed 7.2010579109191895 seconds ---
--- train 0.02668452262878418 seconds ---
--- total 9.962881326675415 seconds ---


In [None]:
results = average_restuls(case_10, comb_set, 0)

acc:  0.9516666531562805
--- embed 3.6638195514678955 seconds ---
--- train 20.943858861923218 seconds ---
--- total 39.67546010017395 seconds ---
acc:  0.9566666483879089
--- embed 5.7643585205078125 seconds ---
--- train 21.02521848678589 seconds ---
--- total 41.58613395690918 seconds ---
acc:  0.95333331823349
--- embed 3.614915370941162 seconds ---
--- train 20.95549726486206 seconds ---
--- total 39.63433289527893 seconds ---
acc:  0.9516666531562805
--- embed 3.704399824142456 seconds ---
--- train 11.752483129501343 seconds ---
--- total 30.275118350982666 seconds ---
acc:  0.95333331823349
--- embed 3.7135863304138184 seconds ---
--- train 11.800329446792603 seconds ---
--- total 30.627530336380005 seconds ---
acc:  0.9516666531562805
--- embed 3.642408609390259 seconds ---
--- train 11.687976598739624 seconds ---
--- total 30.079220056533813 seconds ---
acc:  0.9516666531562805
--- embed 3.7080516815185547 seconds ---
--- train 20.964645624160767 seconds ---
--- total 39.8272

In [None]:
acc, train_time, emb_time, total_time = Run(case_10, "su", Learner = 0, Laplacian = True, DiagA = True, Correlation = False)

acc:  0.47999998927116394
--- embed 3.3850369453430176 seconds ---
--- train 19.963889122009277 seconds ---
--- total 37.73133563995361 seconds ---
[[0.00030246 0.00040755 0.00028513]
 [0.00039492 0.00035844 0.00031217]
 [0.00030466 0.00030033 0.00037867]
 ...
 [0.00026587 0.00030015 0.00041224]
 [0.00033185 0.00027881 0.00032703]
 [0.00032185 0.00036483 0.00036568]]
[[0.00033925 0.00037717 0.00030801]
 [0.00034452 0.00039691 0.00030621]
 [0.00028007 0.00031384 0.00035417]
 ...
 [0.00035683 0.00024665 0.00028031]
 [0.00031754 0.00026905 0.00038784]
 [0.00034029 0.0003661  0.00028912]]
[1 1 2 1 1 0 2 0 2 2 2 1 0 2 2 1 2 2 2 2 1 1 2 2 1 2 1 1 0 1 1 2 1 1 2 0 1
 2 0 2 1 0 2 2 2 0 1 1 1 0 0 2 1 0 0 2 2 1 2 2 2 0 2 2 2 2 2 2 0 1 1 1 0 0
 2 0 0 1 2 1 2 2 2 0 2 2 0 1 1 1 2 0 2 2 1 1 2 2 2 2 1 2 0 2 0 1 2 2 2 0 2
 2 2 1 0 0 1 1 1 2 1 0 0 2 1 2 0 2 0 0 2 2 0 0 0 0 0 1 2 1 2 2 2 1 2 0 0 0
 2 2 2 1 2 1 1 1 2 0 2 2 0 2 1 2 1 2 2 1 2 2 2 0 0 2 2 0 2 2 2 2 2 0 2 1 2
 0 0 0 2 1 2 1 0 1 2 0 2 1 1 2 1 

In [None]:
acc, train_time, emb_time, total_time = Run(case_10, "su", Learner = 1, Laplacian = True, DiagA = True, Correlation = False)

acc:  0.9533333333333334
--- embed 3.450080394744873 seconds ---
--- train 0.035543203353881836 seconds ---
--- total 18.690415382385254 seconds ---
[[0.00030246 0.00040755 0.00028513]
 [0.00039492 0.00035844 0.00031217]
 [0.00030466 0.00030033 0.00037867]
 ...
 [0.00026587 0.00030015 0.00041224]
 [0.00033185 0.00027881 0.00032703]
 [0.00032185 0.00036483 0.00036568]]
[[0.00033925 0.00037717 0.00030801]
 [0.00034452 0.00039691 0.00030621]
 [0.00028007 0.00031384 0.00035417]
 ...
 [0.00035683 0.00024665 0.00028031]
 [0.00031754 0.00026905 0.00038784]
 [0.00034029 0.0003661  0.00028912]]
[1 1 2 1 1 0 2 0 2 2 2 1 0 2 2 1 2 2 2 2 1 1 2 2 1 2 1 1 0 1 1 2 1 1 2 0 1
 2 0 2 1 0 2 2 2 0 1 1 1 0 0 2 1 0 0 2 2 1 2 2 2 0 2 2 2 2 2 2 0 1 1 1 0 0
 2 0 0 1 2 1 2 2 2 0 2 2 0 1 1 1 2 0 2 2 1 1 2 2 2 2 1 2 0 2 0 1 2 2 2 0 2
 2 2 1 0 0 1 1 1 2 1 0 0 2 1 2 0 2 0 0 2 2 0 0 0 0 0 1 2 1 2 2 2 1 2 0 0 0
 2 2 2 1 2 1 1 1 2 0 2 2 0 2 1 2 1 2 2 1 2 2 2 0 0 2 2 0 2 2 2 2 2 0 2 1 2
 0 0 0 2 1 2 1 0 1 2 0 2 1 1 2 1

In [None]:
acc, train_time, emb_time, total_time = Run(case_10, "su", Learner = 0, Laplacian = True, DiagA = True, Correlation = True)

acc:  0.9516666531562805
--- embed 3.454232692718506 seconds ---
--- train 12.05919098854065 seconds ---
--- total 30.275911569595337 seconds ---
[[0.51957154 0.7001005  0.48980068]
 [0.63905609 0.58003098 0.50514491]
 [0.53324723 0.52568135 0.66280201]
 ...
 [0.46232283 0.52192124 0.71683738]
 [0.61117436 0.51350303 0.60231266]
 [0.52882802 0.59944564 0.60083763]]
[[0.57163251 0.63552442 0.51898457]
 [0.56639504 0.65251114 0.50341422]
 [0.50932978 0.57073679 0.64408283]
 ...
 [0.69090353 0.47757942 0.54274323]
 [0.55816818 0.47293868 0.68174577]
 [0.58932617 0.63402448 0.50070713]]
[1 1 2 1 1 0 2 0 2 2 2 1 0 2 2 1 2 2 2 2 1 1 2 2 1 2 1 1 0 1 1 2 1 1 2 0 1
 2 0 2 1 0 2 2 2 0 1 1 1 0 0 2 1 0 0 2 2 1 2 2 2 0 2 2 2 2 2 2 0 1 1 1 0 0
 2 0 0 1 2 1 2 2 2 0 2 2 0 1 1 1 2 0 2 2 1 1 2 2 2 2 1 2 0 2 0 1 2 2 2 0 2
 2 2 1 0 0 1 1 1 2 1 0 0 2 1 2 0 2 0 0 2 2 0 0 0 0 0 1 2 1 2 2 2 1 2 0 0 0
 2 2 2 1 2 1 1 1 2 0 2 2 0 2 1 2 1 2 2 1 2 2 2 0 0 2 2 0 2 2 2 2 2 0 2 1 2
 0 0 0 2 1 2 1 0 1 2 0 2 1 1 2 1 2 

##### case 11

In [None]:
case_11 = case.case_11_fully_known()
case_11.summary()

Info:

    SBM with 5 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
3000
d:
<class 'int'>
5
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(3000, 1)
[[2]
 [0]
 [2]
 ...
 [3]
 [3]
 [2]]


In [None]:
print(case_11.bd)

0.2


######collect

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_11, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  1.0
--- embed 2.032904624938965 seconds ---
--- train 12.993172645568848 seconds ---
--- total 17.815649032592773 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori= Run(case_11, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  1.0
--- embed 1.9146091938018799 seconds ---
--- train 20.919159650802612 seconds ---
--- total 29.384531021118164 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_11, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.19499999284744263
--- embed 7.936537027359009 seconds ---
--- train 12.703314065933228 seconds ---
--- total 23.251346349716187 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_11, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True)

acc:  1.0
--- embed 2.014296531677246 seconds ---
--- train 12.738264560699463 seconds ---
--- total 17.434539794921875 seconds ---


######others

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori= Run(case_11, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True)

acc:  1.0
--- embed 2.028475284576416 seconds ---
--- train 14.862491607666016 seconds ---
--- total 19.912732124328613 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_11, "su", Learner = 0, Laplacian = True, DiagA = True, Correlation = True)

acc:  1.0
--- embed 11.287137508392334 seconds ---
--- train 20.30963706970215 seconds ---
--- total 34.50051546096802 seconds ---


In [None]:
results = average_restuls(case_11, comb_set, 0)

acc:  1.0
--- embed 3.597121477127075 seconds ---
--- train 21.016427040100098 seconds ---
--- total 39.95435047149658 seconds ---
acc:  1.0
--- embed 3.617098331451416 seconds ---
--- train 20.986443519592285 seconds ---
--- total 39.91535687446594 seconds ---
acc:  1.0
--- embed 3.751112699508667 seconds ---
--- train 21.053439140319824 seconds ---
--- total 40.036688804626465 seconds ---
acc:  1.0
--- embed 3.536910057067871 seconds ---
--- train 21.01297926902771 seconds ---
--- total 40.42873191833496 seconds ---
acc:  1.0
--- embed 3.733595848083496 seconds ---
--- train 13.186842441558838 seconds ---
--- total 32.31816530227661 seconds ---
acc:  0.9983333349227905
--- embed 3.6090376377105713 seconds ---
--- train 13.274563550949097 seconds ---
--- total 31.815383434295654 seconds ---
acc:  1.0
--- embed 3.5584394931793213 seconds ---
--- train 21.012768030166626 seconds ---
--- total 40.322922468185425 seconds ---
acc:  1.0
--- embed 3.559985876083374 seconds ---
--- train 20.9

In [None]:
plot(results)

Unnamed: 0,Laplacian,DiagA,Correlation,Accuracy,Train_Time(s),Emb_Time(s),Total_Time(s)
set_01,True,True,True,0.999833,17.89955,3.61,36.88
set_02,True,True,False,0.195,15.63108,3.56,34.67
set_03,True,False,True,1.0,17.15214,3.71,37.69
set_04,True,False,False,0.195,17.17114,3.7,37.76
set_05,False,True,True,1.0,17.87204,3.55,26.43
set_06,False,True,False,1.0,18.67732,3.6,27.32
set_07,False,False,True,1.0,17.14811,3.74,25.89
set_08,False,False,False,1.0,18.59717,3.71,27.33


##### case 20

In [None]:
case_20 = case.case_20_fully_known()
case_20.summary()

Info:

    DC-SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Y:
(3000, 1)
[[1]
 [0]
 [2]
 ...
 [2]
 [2]
 [2]]


In [None]:
print(case_20.bd)

[0.9, 0.5, 0.2]


######collect

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_20, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.8633333444595337
--- embed 0.14469695091247559 seconds ---
--- train 21.005553007125854 seconds ---
--- total 22.735140800476074 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_20, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.8766666650772095
--- embed 0.14284586906433105 seconds ---
--- train 14.338932037353516 seconds ---
--- total 16.02400803565979 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_20, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.47999998927116394
--- embed 0.5435707569122314 seconds ---
--- train 21.139686107635498 seconds ---
--- total 23.43826651573181 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_20, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.903333306312561
--- embed 0.1229407787322998 seconds ---
--- train 21.041470527648926 seconds ---
--- total 22.686341285705566 seconds ---


######others

In [None]:
Run(case_20, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.8799999952316284


In [None]:
Run(case_20, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.8949999809265137


In [None]:
Run(case_20, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.8899999856948853


In [None]:
Run(case_20, "su", Learner = 0, Laplacian = True, DiagA = True, Correlation = True)

acc:  0.8966666460037231


In [None]:
results = average_restuls(case_20, comb_set, 0)

acc:  0.8983333110809326
--- embed 0.2274940013885498 seconds ---
--- train 12.17214846611023 seconds ---
--- total 15.866555452346802 seconds ---
acc:  0.8999999761581421
--- embed 0.22590970993041992 seconds ---
--- train 12.092920541763306 seconds ---
--- total 15.996092319488525 seconds ---
acc:  0.8983333110809326
--- embed 0.23233628273010254 seconds ---
--- train 21.00863552093506 seconds ---
--- total 24.91910743713379 seconds ---
acc:  0.8983333110809326
--- embed 0.22621440887451172 seconds ---
--- train 11.722924947738647 seconds ---
--- total 15.418576955795288 seconds ---
acc:  0.8983333110809326
--- embed 0.22709226608276367 seconds ---
--- train 11.918452501296997 seconds ---
--- total 15.83203911781311 seconds ---
acc:  0.8866666555404663
--- embed 0.2376565933227539 seconds ---
--- train 12.06058382987976 seconds ---
--- total 15.942700147628784 seconds ---
acc:  0.8983333110809326
--- embed 0.23672151565551758 seconds ---
--- train 12.194794178009033 seconds ---
--- t



acc:  0.8949999809265137
--- embed 0.22490143775939941 seconds ---
--- train 11.953272581100464 seconds ---
--- total 15.710629940032959 seconds ---
acc:  0.8966666460037231
--- embed 0.23372340202331543 seconds ---
--- train 20.982940912246704 seconds ---
--- total 24.972780466079712 seconds ---
acc:  0.8949999809265137
--- embed 0.22850799560546875 seconds ---
--- train 21.039648294448853 seconds ---
--- total 25.013603448867798 seconds ---
acc:  0.8916666507720947
--- embed 0.22601842880249023 seconds ---
--- train 21.00651478767395 seconds ---
--- total 24.79054570198059 seconds ---
acc:  0.8949999809265137
--- embed 0.2262735366821289 seconds ---
--- train 12.087382793426514 seconds ---
--- total 16.05409049987793 seconds ---
acc:  0.8949999809265137
--- embed 0.2419745922088623 seconds ---
--- train 12.137262105941772 seconds ---
--- total 16.094321489334106 seconds ---
acc:  0.8983333110809326
--- embed 0.22763991355895996 seconds ---
--- train 20.979283332824707 seconds ---
---

In [None]:
plot(results)

Unnamed: 0,Laplacian,DiagA,Correlation,Accuracy,Train_Time(s),Emb_Time(s),Total_Time(s)
set_01,True,True,True,0.896333,12.91241,0.23,16.71
set_02,True,True,False,0.48,17.50515,0.23,21.3
set_03,True,False,True,0.894667,15.72584,0.23,19.62
set_04,True,False,False,0.48,15.69895,0.23,19.6
set_05,False,True,True,0.902667,15.67053,0.23,18.83
set_06,False,True,False,0.888167,18.38201,0.23,21.57
set_07,False,False,True,0.904667,16.67513,0.23,19.9
set_08,False,False,False,0.8835,17.52914,0.23,20.71


##### case 21

In [None]:
case_21 = case.case_21_fully_known()
case_21.summary()

Info:

    DC-SBM with 10 classes and defined probabilities with fully known labels.
    Edge list version. 
    
n:
<class 'int'>
3000
d:
<class 'int'>
10
X:
(30487, 3)
[[   0    3    1]
 [   0  168    1]
 [   0  551    1]
 ...
 [2952 2993    1]
 [2975 2980    1]
 [2983 2987    1]]
Y:
(3000, 1)
[[4]
 [0]
 [5]
 ...
 [6]
 [7]
 [5]]


In [None]:
print(case_21.bd)

0.9


######collect

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.9516666531562805
--- embed 1.8923213481903076 seconds ---
--- train 20.999051094055176 seconds ---
--- total 25.819838285446167 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.95333331823349
--- embed 1.7629752159118652 seconds ---
--- train 21.01236128807068 seconds ---
--- total 25.408438444137573 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.47999998927116394
--- embed 7.47571063041687 seconds ---
--- train 20.970436096191406 seconds ---
--- total 30.865493297576904 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.95333331823349
--- embed 1.9337806701660156 seconds ---
--- train 14.504675149917603 seconds ---
--- total 19.41373324394226 seconds ---


######others

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_21, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.82833331823349
--- embed 0.12119865417480469 seconds ---
--- train 12.262593507766724 seconds ---
--- total 12.387485980987549 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_21, "su", Learner = 0, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.82833331823349
--- embed 0.12146544456481934 seconds ---
--- train 21.002887964248657 seconds ---
--- total 21.129703998565674 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_21, "su", Learner = 0, Laplacian = True, DiagA = False, Correlation = False)

acc:  0.4116666615009308
--- embed 0.4442942142486572 seconds ---
--- train 20.957082986831665 seconds ---
--- total 21.404434204101562 seconds ---


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_21, "su", Learner = 0, Laplacian = False, DiagA = False, Correlation = True)

acc:  0.8333333134651184
--- embed 0.11559200286865234 seconds ---
--- train 20.991804361343384 seconds ---
--- total 21.110264539718628 seconds ---


In [None]:
Run(case_21, "su", Learner = 0, Laplacian = True, DiagA = True, Correlation = True)

acc:  0.8050000071525574


In [None]:
results = average_restuls(case_21, comb_set, 0)

acc:  0.8116666674613953
--- embed 0.23595499992370605 seconds ---
--- train 13.567328214645386 seconds ---
--- total 14.51444935798645 seconds ---
acc:  0.8083333373069763
--- embed 0.23884010314941406 seconds ---
--- train 21.0231511592865 seconds ---
--- total 21.92582631111145 seconds ---
acc:  0.8100000023841858
--- embed 0.24170565605163574 seconds ---
--- train 12.741525888442993 seconds ---
--- total 13.641494274139404 seconds ---
acc:  0.8100000023841858
--- embed 0.24010276794433594 seconds ---
--- train 13.241767883300781 seconds ---
--- total 14.156140089035034 seconds ---
acc:  0.8083333373069763
--- embed 0.23739051818847656 seconds ---
--- train 21.071983575820923 seconds ---
--- total 21.96921944618225 seconds ---
acc:  0.8083333373069763
--- embed 0.2401423454284668 seconds ---
--- train 21.014440774917603 seconds ---
--- total 21.928312301635742 seconds ---
acc:  0.8083333373069763
--- embed 0.22916007041931152 seconds ---
--- train 21.044551372528076 seconds ---
--- 



acc:  0.8149999976158142
--- embed 0.25891661643981934 seconds ---
--- train 21.061058282852173 seconds ---
--- total 22.04176950454712 seconds ---
acc:  0.8183333277702332
--- embed 0.2427513599395752 seconds ---
--- train 21.48850393295288 seconds ---
--- total 22.424398183822632 seconds ---
acc:  0.8149999976158142
--- embed 0.24790668487548828 seconds ---
--- train 12.67394733428955 seconds ---
--- total 13.605210781097412 seconds ---
acc:  0.8233333230018616
--- embed 0.23498082160949707 seconds ---
--- train 21.07823348045349 seconds ---
--- total 22.027421951293945 seconds ---
acc:  0.8149999976158142
--- embed 0.23657989501953125 seconds ---
--- train 12.787090063095093 seconds ---
--- total 13.766481161117554 seconds ---
acc:  0.8183333277702332
--- embed 0.23082852363586426 seconds ---
--- train 21.037189960479736 seconds ---
--- total 21.972176551818848 seconds ---
acc:  0.8149999976158142
--- embed 0.2336723804473877 seconds ---
--- train 12.981094360351562 seconds ---
--- 

In [None]:
plot(results)

Unnamed: 0,Laplacian,DiagA,Correlation,Accuracy,Train_Time(s),Emb_Time(s),Total_Time(s)
set_01,True,True,True,0.809167,17.93527,0.24,18.85
set_02,True,True,False,0.618,16.06782,0.24,16.97
set_03,True,False,True,0.8185,16.16558,0.24,17.12
set_04,True,False,False,0.476167,16.13896,0.24,17.1
set_05,False,True,True,0.833667,17.0183,0.25,17.27
set_06,False,True,False,0.828833,15.21822,0.25,15.47
set_07,False,False,True,0.835333,16.95359,0.25,17.2
set_08,False,False,False,0.8315,17.76558,0.25,18.02


#### LDA

##### case 10

In [None]:
case_10 = case.case_10_fully_known()
case_10.summary()

name:

    SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(3000, 1)
[[1]
 [0]
 [2]
 ...
 [2]
 [2]
 [2]]


In [None]:
Run(case_10, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.9583333333333334


In [None]:
Run(case_10, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.9566666666666667


In [None]:
Run(case_10, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.9533333333333334


In [None]:
Run(case_10, "su", Learner = 1, Laplacian = True, DiagA = True, Correlation = True)

acc:  0.9533333333333334
--- embed 5.193510055541992 seconds ---
--- train 0.0058062076568603516 seconds ---
--- total 25.464478015899658 seconds ---


(0.9533333333333334,
 0.0058062076568603516,
 5.193510055541992,
 25.464478015899658)

In [None]:
results = average_restuls(case_10, comb_set, 1)

acc:  0.9533333333333334
--- embed 3.5747530460357666 seconds ---
--- train 0.005490779876708984 seconds ---
--- total 18.323492288589478 seconds ---
acc:  0.9533333333333334
--- embed 3.564149856567383 seconds ---
--- train 0.0036499500274658203 seconds ---
--- total 17.76686668395996 seconds ---
acc:  0.9533333333333334
--- embed 3.568498134613037 seconds ---
--- train 0.0036590099334716797 seconds ---
--- total 17.678396224975586 seconds ---
acc:  0.9533333333333334
--- embed 3.5280163288116455 seconds ---
--- train 0.0034914016723632812 seconds ---
--- total 18.08701181411743 seconds ---
acc:  0.9533333333333334
--- embed 3.5961172580718994 seconds ---
--- train 0.0034427642822265625 seconds ---
--- total 17.7871732711792 seconds ---
acc:  0.9533333333333334
--- embed 3.5459461212158203 seconds ---
--- train 0.005916595458984375 seconds ---
--- total 17.622737646102905 seconds ---
acc:  0.9533333333333334
--- embed 3.6501457691192627 seconds ---
--- train 0.0032329559326171875 seco

In [None]:
plot(results)

Unnamed: 0,Laplacian,DiagA,Correlation,Accuracy,Train_Time(s),Emb_Time(s),Total_Time(s)
set_01,True,True,True,0.953333,0.00392,3.57,17.9
set_02,True,True,False,0.953333,0.00356,3.58,17.87
set_03,True,False,True,0.953333,0.00359,3.66,19.28
set_04,True,False,False,0.953333,0.00368,3.66,19.23
set_05,False,True,True,0.953333,0.00349,3.49,8.09
set_06,False,True,False,0.956667,0.00351,3.47,8.07
set_07,False,False,True,0.953333,0.00369,3.55,8.08
set_08,False,False,False,0.958333,0.00346,3.57,8.12


In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 1, Laplacian = True, DiagA = True, Correlation = True, sparse_opt = 'csr')

acc:  0.9533333333333334
--- embed 3.5838661193847656 seconds ---
--- train 0.0035517215728759766 seconds ---
--- total 22.104680061340332 seconds ---
[[0.51957154 0.7001005  0.48980068]
 [0.63905609 0.58003098 0.50514491]
 [0.53324723 0.52568135 0.66280201]
 ...
 [0.46232283 0.52192124 0.71683738]
 [0.61117436 0.51350303 0.60231266]
 [0.52882802 0.59944564 0.60083763]]
[[0.57163251 0.63552442 0.51898457]
 [0.56639504 0.65251114 0.50341422]
 [0.50932978 0.57073679 0.64408283]
 ...
 [0.69090353 0.47757942 0.54274323]
 [0.55816818 0.47293868 0.68174577]
 [0.58932617 0.63402448 0.50070713]]
[1 1 2 1 1 0 2 0 2 2 2 1 0 2 2 1 2 2 2 2 1 1 2 2 1 2 1 1 0 1 1 2 1 1 2 0 1
 2 0 2 1 0 2 2 2 0 1 1 1 0 0 2 1 0 0 2 2 1 2 2 2 0 2 2 2 2 2 2 0 1 1 1 0 0
 2 0 0 1 2 1 2 2 2 0 2 2 0 1 1 1 2 0 2 2 1 1 2 2 2 2 1 2 0 2 0 1 2 2 2 0 2
 2 2 1 0 0 1 1 1 2 1 0 0 2 1 2 0 2 0 0 2 2 0 0 0 0 0 1 2 1 2 2 2 1 2 0 0 0
 2 2 2 1 2 1 1 1 2 0 2 2 0 2 1 2 1 2 2 1 2 2 2 0 0 2 2 0 2 2 2 2 2 0 2 1 2
 0 0 0 2 1 2 1 0 1 2 0 2 1 1 2

In [None]:
print(Z.shape)
print(Z_ori)
print(Z)

(3000, 3)
[[0.51957154 0.7001005  0.48980068]
 [0.63905609 0.58003098 0.50514491]
 [0.53324723 0.52568135 0.66280201]
 ...
 [0.46232283 0.52192124 0.71683738]
 [0.61117436 0.51350303 0.60231266]
 [0.52882802 0.59944564 0.60083763]]
  (0, 0)	0.5195715389378506
  (1, 0)	0.6390560884259104
  (2, 0)	0.5332472292758246
  (3, 0)	0.49725584103460374
  (4, 0)	0.4360105076065667
  (5, 0)	0.517217796861308
  (6, 0)	0.6374676270270658
  (7, 0)	0.45386221727566506
  (8, 0)	0.5916039088151546
  (9, 0)	0.5007366256669146
  (10, 0)	0.4351393827205337
  (11, 0)	0.5508805053320266
  (12, 0)	0.6655977323371083
  (13, 0)	0.4885214728176826
  (14, 0)	0.6748981332651581
  (15, 0)	0.5012390185849971
  (16, 0)	0.5175779379036337
  (17, 0)	0.5431564234339152
  (18, 0)	0.6601132250293184
  (19, 0)	0.7028375919709977
  (20, 0)	0.5196157471330455
  (21, 0)	0.5576492834988768
  (22, 0)	0.6133977145325848
  (23, 0)	0.6723062553102196
  (24, 0)	0.6193111528990982
  :	:
  (2975, 2)	0.49261966697922904
  (2976, 2)	0.

In [None]:
print(W[0].shape)
print(W_ori)
print(W[0])

(3000, 3)
[array([[0.        , 0.00134409, 0.        ],
       [0.00206186, 0.        , 0.        ],
       [0.        , 0.        , 0.00085397],
       ...,
       [0.        , 0.        , 0.00085397],
       [0.        , 0.        , 0.00085397],
       [0.        , 0.        , 0.00085397]])]
  (1, 0)	0.002061855670103093
  (12, 0)	0.002061855670103093
  (19, 0)	0.002061855670103093
  (21, 0)	0.002061855670103093
  (24, 0)	0.002061855670103093
  (27, 0)	0.002061855670103093
  (38, 0)	0.002061855670103093
  (51, 0)	0.002061855670103093
  (55, 0)	0.002061855670103093
  (57, 0)	0.002061855670103093
  (63, 0)	0.002061855670103093
  (67, 0)	0.002061855670103093
  (77, 0)	0.002061855670103093
  (81, 0)	0.002061855670103093
  (89, 0)	0.002061855670103093
  (91, 0)	0.002061855670103093
  (101, 0)	0.002061855670103093
  (123, 0)	0.002061855670103093
  (126, 0)	0.002061855670103093
  (129, 0)	0.002061855670103093
  (130, 0)	0.002061855670103093
  (147, 0)	0.002061855670103093
  (150, 0)	0.00206

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 1, Laplacian = True, DiagA = True, Correlation = True, sparse_opt = 'csc')

acc:  0.9533333333333334
--- embed 4.638524532318115 seconds ---
--- train 0.006206989288330078 seconds ---
--- total 23.42071294784546 seconds ---
[[0.51957154 0.7001005  0.48980068]
 [0.63905609 0.58003098 0.50514491]
 [0.53324723 0.52568135 0.66280201]
 ...
 [0.46232283 0.52192124 0.71683738]
 [0.61117436 0.51350303 0.60231266]
 [0.52882802 0.59944564 0.60083763]]
[[0.57163251 0.63552442 0.51898457]
 [0.56639504 0.65251114 0.50341422]
 [0.50932978 0.57073679 0.64408283]
 ...
 [0.69090353 0.47757942 0.54274323]
 [0.55816818 0.47293868 0.68174577]
 [0.58932617 0.63402448 0.50070713]]
[1 1 2 1 1 0 2 0 2 2 2 1 0 2 2 1 2 2 2 2 1 1 2 2 1 2 1 1 0 1 1 2 1 1 2 0 1
 2 0 2 1 0 2 2 2 0 1 1 1 0 0 2 1 0 0 2 2 1 2 2 2 0 2 2 2 2 2 2 0 1 1 1 0 0
 2 0 0 1 2 1 2 2 2 0 2 2 0 1 1 1 2 0 2 2 1 1 2 2 2 2 1 2 0 2 0 1 2 2 2 0 2
 2 2 1 0 0 1 1 1 2 1 0 0 2 1 2 0 2 0 0 2 2 0 0 0 0 0 1 2 1 2 2 2 1 2 0 0 0
 2 2 2 1 2 1 1 1 2 0 2 2 0 2 1 2 1 2 2 1 2 2 2 0 0 2 2 0 2 2 2 2 2 0 2 1 2
 0 0 0 2 1 2 1 0 1 2 0 2 1 1 2 1 

In [None]:
print(Z.shape)
print(Z_ori)
print(Z)

(3000, 3)
[[0.51957154 0.7001005  0.48980068]
 [0.63905609 0.58003098 0.50514491]
 [0.53324723 0.52568135 0.66280201]
 ...
 [0.46232283 0.52192124 0.71683738]
 [0.61117436 0.51350303 0.60231266]
 [0.52882802 0.59944564 0.60083763]]
  (0, 0)	0.5195715389378506
  (1, 0)	0.6390560884259104
  (2, 0)	0.5332472292758246
  (3, 0)	0.49725584103460374
  (4, 0)	0.4360105076065667
  (5, 0)	0.517217796861308
  (6, 0)	0.6374676270270658
  (7, 0)	0.45386221727566506
  (8, 0)	0.5916039088151546
  (9, 0)	0.5007366256669146
  (10, 0)	0.4351393827205337
  (11, 0)	0.5508805053320266
  (12, 0)	0.6655977323371083
  (13, 0)	0.4885214728176826
  (14, 0)	0.6748981332651581
  (15, 0)	0.5012390185849971
  (16, 0)	0.5175779379036337
  (17, 0)	0.5431564234339152
  (18, 0)	0.6601132250293184
  (19, 0)	0.7028375919709977
  (20, 0)	0.5196157471330455
  (21, 0)	0.5576492834988768
  (22, 0)	0.6133977145325848
  (23, 0)	0.6723062553102196
  (24, 0)	0.6193111528990982
  :	:
  (2975, 2)	0.49261966697922904
  (2976, 2)	0.

In [None]:
print(W[0].shape)
print(W_ori)
print(W[0])

(3000, 3)
[array([[0.        , 0.00134409, 0.        ],
       [0.00206186, 0.        , 0.        ],
       [0.        , 0.        , 0.00085397],
       ...,
       [0.        , 0.        , 0.00085397],
       [0.        , 0.        , 0.00085397],
       [0.        , 0.        , 0.00085397]])]
  (1, 0)	0.002061855670103093
  (12, 0)	0.002061855670103093
  (19, 0)	0.002061855670103093
  (21, 0)	0.002061855670103093
  (24, 0)	0.002061855670103093
  (27, 0)	0.002061855670103093
  (38, 0)	0.002061855670103093
  (51, 0)	0.002061855670103093
  (55, 0)	0.002061855670103093
  (57, 0)	0.002061855670103093
  (63, 0)	0.002061855670103093
  (67, 0)	0.002061855670103093
  (77, 0)	0.002061855670103093
  (81, 0)	0.002061855670103093
  (89, 0)	0.002061855670103093
  (91, 0)	0.002061855670103093
  (101, 0)	0.002061855670103093
  (123, 0)	0.002061855670103093
  (126, 0)	0.002061855670103093
  (129, 0)	0.002061855670103093
  (130, 0)	0.002061855670103093
  (147, 0)	0.002061855670103093
  (150, 0)	0.00206

In [None]:
acc, train_time, emb_time, total_time, Z, W, Z_ori, W_ori = Run(case_10, "su", Learner = 1, Laplacian = True, DiagA = True, Correlation = True, sparse_opt = 'coo')

acc:  0.9533333333333334
--- embed 3.5411553382873535 seconds ---
--- train 0.003939628601074219 seconds ---
--- total 18.111839532852173 seconds ---
[[0.51957154 0.7001005  0.48980068]
 [0.63905609 0.58003098 0.50514491]
 [0.53324723 0.52568135 0.66280201]
 ...
 [0.46232283 0.52192124 0.71683738]
 [0.61117436 0.51350303 0.60231266]
 [0.52882802 0.59944564 0.60083763]]
[[0.57163251 0.63552442 0.51898457]
 [0.56639504 0.65251114 0.50341422]
 [0.50932978 0.57073679 0.64408283]
 ...
 [0.69090353 0.47757942 0.54274323]
 [0.55816818 0.47293868 0.68174577]
 [0.58932617 0.63402448 0.50070713]]
[1 1 2 1 1 0 2 0 2 2 2 1 0 2 2 1 2 2 2 2 1 1 2 2 1 2 1 1 0 1 1 2 1 1 2 0 1
 2 0 2 1 0 2 2 2 0 1 1 1 0 0 2 1 0 0 2 2 1 2 2 2 0 2 2 2 2 2 2 0 1 1 1 0 0
 2 0 0 1 2 1 2 2 2 0 2 2 0 1 1 1 2 0 2 2 1 1 2 2 2 2 1 2 0 2 0 1 2 2 2 0 2
 2 2 1 0 0 1 1 1 2 1 0 0 2 1 2 0 2 0 0 2 2 0 0 0 0 0 1 2 1 2 2 2 1 2 0 0 0
 2 2 2 1 2 1 1 1 2 0 2 2 0 2 1 2 1 2 2 1 2 2 2 0 0 2 2 0 2 2 2 2 2 0 2 1 2
 0 0 0 2 1 2 1 0 1 2 0 2 1 1 2 

In [None]:
print(Z.shape)
print(Z_ori)
print(Z)

(3000, 3)
[[0.51957154 0.7001005  0.48980068]
 [0.63905609 0.58003098 0.50514491]
 [0.53324723 0.52568135 0.66280201]
 ...
 [0.46232283 0.52192124 0.71683738]
 [0.61117436 0.51350303 0.60231266]
 [0.52882802 0.59944564 0.60083763]]
  (0, 0)	0.5195715389378506
  (0, 1)	0.7001005028061394
  (0, 2)	0.4898006756797547
  (1, 0)	0.6390560884259104
  (1, 1)	0.5800309763510874
  (1, 2)	0.5051449121974598
  (2, 0)	0.5332472292758246
  (2, 1)	0.5256813506009793
  (2, 2)	0.6628020142546237
  (3, 0)	0.49725584103460374
  (3, 1)	0.7523076584193281
  (3, 2)	0.43216873514935844
  (4, 0)	0.4360105076065667
  (4, 1)	0.6760287880875464
  (4, 2)	0.5940369642821458
  (5, 0)	0.517217796861308
  (5, 1)	0.7005314852404515
  (5, 2)	0.491672033775302
  (6, 0)	0.6374676270270658
  (6, 1)	0.6155825875146564
  (6, 2)	0.46334987044483145
  (7, 0)	0.45386221727566506
  (7, 1)	0.5183226174209902
  (7, 2)	0.7248108387706896
  (8, 0)	0.5916039088151546
  :	:
  (2991, 2)	0.6019055237428302
  (2992, 0)	0.508884110800889

In [None]:
print(W[0].shape)
print(W_ori)
print(W[0])

(3000, 3)
[array([[0.        , 0.00134409, 0.        ],
       [0.00206186, 0.        , 0.        ],
       [0.        , 0.        , 0.00085397],
       ...,
       [0.        , 0.        , 0.00085397],
       [0.        , 0.        , 0.00085397],
       [0.        , 0.        , 0.00085397]])]
  (0, 1)	0.0013440860215053765
  (1, 0)	0.002061855670103093
  (2, 2)	0.0008539709649871904
  (3, 1)	0.0013440860215053765
  (4, 1)	0.0013440860215053765
  (5, 1)	0.0013440860215053765
  (6, 1)	0.0013440860215053765
  (7, 2)	0.0008539709649871904
  (8, 1)	0.0013440860215053765
  (11, 2)	0.0008539709649871904
  (12, 0)	0.002061855670103093
  (13, 2)	0.0008539709649871904
  (16, 2)	0.0008539709649871904
  (17, 1)	0.0013440860215053765
  (18, 2)	0.0008539709649871904
  (19, 0)	0.002061855670103093
  (20, 2)	0.0008539709649871904
  (21, 0)	0.002061855670103093
  (24, 0)	0.002061855670103093
  (25, 2)	0.0008539709649871904
  (26, 1)	0.0013440860215053765
  (27, 0)	0.002061855670103093
  (28, 1)	0.0013

##### case 11

In [None]:
case_11 = case.case_11_fully_known()
case_11.summary()

name:

    SBM with 5 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
3000
d:
<class 'int'>
5
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(3000, 1)
[[2]
 [0]
 [2]
 ...
 [3]
 [3]
 [2]]


In [None]:
print(case_11.bd)

0.2


In [None]:
Run(case_11, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = False)

acc:  1.0


In [None]:
Run(case_11, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = False)

acc:  1.0


In [None]:
Run(case_11, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = True)

acc:  1.0


In [None]:
Run(case_11, "su", Learner = 1, Laplacian = True, DiagA = True, Correlation = True)

acc:  1.0


In [None]:
results = average_restuls(case_11, comb_set, 1)

acc:  1.0
--- embed 3.8129494190216064 seconds ---
--- train 0.006302356719970703 seconds ---
--- total 20.552429914474487 seconds ---
acc:  1.0
--- embed 3.821761131286621 seconds ---
--- train 0.004448890686035156 seconds ---
--- total 19.162704467773438 seconds ---
acc:  1.0
--- embed 4.31722617149353 seconds ---
--- train 0.0046977996826171875 seconds ---
--- total 19.075125694274902 seconds ---
acc:  1.0
--- embed 3.827867269515991 seconds ---
--- train 0.0067119598388671875 seconds ---
--- total 19.20094585418701 seconds ---
acc:  1.0
--- embed 3.84987735748291 seconds ---
--- train 0.004353761672973633 seconds ---
--- total 19.247668743133545 seconds ---
acc:  1.0
--- embed 3.8219213485717773 seconds ---
--- train 0.00437164306640625 seconds ---
--- total 19.1132493019104 seconds ---
acc:  1.0
--- embed 3.8054163455963135 seconds ---
--- train 0.004303455352783203 seconds ---
--- total 19.076492071151733 seconds ---
acc:  1.0
--- embed 3.694176435470581 seconds ---
--- train 0.0

In [None]:
plot(results)

Unnamed: 0,Laplacian,DiagA,Correlation,Accuracy,Train_Time(s),Emb_Time(s),Total_Time(s)
set_01,True,True,True,1.0,0.00476,3.86,19.25
set_02,True,True,False,1.0,0.00433,3.81,19.14
set_03,True,False,True,1.0,0.00422,3.93,20.66
set_04,True,False,False,1.0,0.0044,3.91,20.62
set_05,False,True,True,1.0,0.00424,3.81,8.76
set_06,False,True,False,1.0,0.00409,3.78,8.75
set_07,False,False,True,1.0,0.00551,4.16,9.14
set_08,False,False,False,1.0,0.00436,3.94,8.97


##### case 20

In [None]:
case_20 = case.case_20_fully_known()
case_20.summary()

name:

    DC-SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Y:
(3000, 1)
[[1]
 [0]
 [2]
 ...
 [2]
 [2]
 [2]]


In [None]:
print(case_20.bd)

[0.9, 0.5, 0.2]


In [None]:
Run(case_20, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.8033333333333333


In [None]:
Run(case_20, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.81


In [None]:
Run(case_20, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.8683333333333333


In [None]:
Run(case_20, "su", Learner = 1, Laplacian = True, DiagA = True, Correlation = True)

acc:  0.8616666666666667


In [None]:
results = average_restuls(case_20, comb_set, 1)

acc:  0.8616666666666667
--- embed 0.25098705291748047 seconds ---
--- train 0.003683328628540039 seconds ---
--- total 3.8276798725128174 seconds ---
acc:  0.8616666666666667
--- embed 0.23109841346740723 seconds ---
--- train 0.00341033935546875 seconds ---
--- total 3.7405240535736084 seconds ---
acc:  0.8616666666666667
--- embed 0.2346956729888916 seconds ---
--- train 0.003314495086669922 seconds ---
--- total 3.7338244915008545 seconds ---
acc:  0.8616666666666667
--- embed 0.2519969940185547 seconds ---
--- train 0.003350973129272461 seconds ---
--- total 3.932577610015869 seconds ---
acc:  0.8616666666666667
--- embed 0.23643922805786133 seconds ---
--- train 0.0036618709564208984 seconds ---
--- total 3.7266390323638916 seconds ---
acc:  0.8616666666666667
--- embed 0.23111629486083984 seconds ---
--- train 0.003446340560913086 seconds ---
--- total 3.907248020172119 seconds ---
acc:  0.8616666666666667
--- embed 0.24237775802612305 seconds ---
--- train 0.003309011459350586 



acc:  0.8783333333333333
--- embed 0.2513918876647949 seconds ---
--- train 0.0037262439727783203 seconds ---
--- total 3.841672897338867 seconds ---




acc:  0.8783333333333333
--- embed 0.24879097938537598 seconds ---
--- train 0.003383159637451172 seconds ---
--- total 3.9952142238616943 seconds ---




acc:  0.8783333333333333
--- embed 0.23460149765014648 seconds ---
--- train 0.0030879974365234375 seconds ---
--- total 3.8033370971679688 seconds ---




acc:  0.8783333333333333
--- embed 0.24135541915893555 seconds ---
--- train 0.003627300262451172 seconds ---
--- total 3.981414318084717 seconds ---




acc:  0.8783333333333333
--- embed 0.23695731163024902 seconds ---
--- train 0.003475666046142578 seconds ---
--- total 3.9886748790740967 seconds ---




acc:  0.8783333333333333
--- embed 0.24387550354003906 seconds ---
--- train 0.0033698081970214844 seconds ---
--- total 3.897451400756836 seconds ---




acc:  0.8783333333333333
--- embed 0.2349700927734375 seconds ---
--- train 0.0031473636627197266 seconds ---
--- total 3.960235118865967 seconds ---




acc:  0.8783333333333333
--- embed 0.2505607604980469 seconds ---
--- train 0.004873991012573242 seconds ---
--- total 3.8042216300964355 seconds ---




acc:  0.8783333333333333
--- embed 0.2467048168182373 seconds ---
--- train 0.003174304962158203 seconds ---
--- total 3.9922757148742676 seconds ---




acc:  0.8783333333333333
--- embed 0.23595142364501953 seconds ---
--- train 0.005025625228881836 seconds ---
--- total 4.026124715805054 seconds ---




acc:  0.87
--- embed 0.23995113372802734 seconds ---
--- train 0.0031998157501220703 seconds ---
--- total 3.768972396850586 seconds ---




acc:  0.87
--- embed 0.23319005966186523 seconds ---
--- train 0.004218578338623047 seconds ---
--- total 4.0052971839904785 seconds ---




acc:  0.87
--- embed 0.23841285705566406 seconds ---
--- train 0.0031232833862304688 seconds ---
--- total 3.8501181602478027 seconds ---




acc:  0.87
--- embed 0.2340550422668457 seconds ---
--- train 0.00396728515625 seconds ---
--- total 3.9421684741973877 seconds ---




acc:  0.87
--- embed 0.2538590431213379 seconds ---
--- train 0.003163576126098633 seconds ---
--- total 3.852815628051758 seconds ---




acc:  0.87
--- embed 0.23768281936645508 seconds ---
--- train 0.0034101009368896484 seconds ---
--- total 3.925396680831909 seconds ---




acc:  0.87
--- embed 0.23560214042663574 seconds ---
--- train 0.0031104087829589844 seconds ---
--- total 3.955116033554077 seconds ---




acc:  0.87
--- embed 0.2387380599975586 seconds ---
--- train 0.0034732818603515625 seconds ---
--- total 3.785287857055664 seconds ---




acc:  0.87
--- embed 0.24123930931091309 seconds ---
--- train 0.0031731128692626953 seconds ---
--- total 4.038876056671143 seconds ---




acc:  0.87
--- embed 0.2584724426269531 seconds ---
--- train 0.0033452510833740234 seconds ---
--- total 3.7856481075286865 seconds ---
acc:  0.8683333333333333
--- embed 0.23628616333007812 seconds ---
--- train 0.0032999515533447266 seconds ---
--- total 3.225003480911255 seconds ---
acc:  0.8683333333333333
--- embed 0.23377132415771484 seconds ---
--- train 0.00323486328125 seconds ---
--- total 3.0738112926483154 seconds ---
acc:  0.8683333333333333
--- embed 0.23107028007507324 seconds ---
--- train 0.0032494068145751953 seconds ---
--- total 3.225822925567627 seconds ---
acc:  0.8683333333333333
--- embed 0.23411989212036133 seconds ---
--- train 0.0032422542572021484 seconds ---
--- total 3.2224957942962646 seconds ---
acc:  0.8683333333333333
--- embed 0.24110102653503418 seconds ---
--- train 0.0032701492309570312 seconds ---
--- total 3.0780293941497803 seconds ---
acc:  0.8683333333333333
--- embed 0.23151779174804688 seconds ---
--- train 0.003389120101928711 seconds ---


In [None]:
plot(results)

Unnamed: 0,Laplacian,DiagA,Correlation,Accuracy,Train_Time(s),Emb_Time(s),Total_Time(s)
set_01,True,True,True,0.861667,0.0034,0.24,3.83
set_02,True,True,False,0.875,0.00365,0.24,3.85
set_03,True,False,True,0.878333,0.00369,0.24,3.93
set_04,True,False,False,0.87,0.00342,0.24,3.89
set_05,False,True,True,0.868333,0.0035,0.24,3.16
set_06,False,True,False,0.81,0.00331,0.24,3.18
set_07,False,False,True,0.88,0.00327,0.24,3.17
set_08,False,False,False,0.803333,0.0033,0.24,3.18


##### case 21

In [None]:
case_21 = case.case_21_fully_known()
case_21.summary()

name:

    DC-SBM with 10 classes and defined probabilities with fully known labels.
    Edge list version. 
    
n:
<class 'int'>
3000
d:
<class 'int'>
10
X:
(60974, 3)
[[   0    3    1]
 [   0  168    1]
 [   0  551    1]
 ...
 [2999 2577    1]
 [2999 2877    1]
 [2999 2951    1]]
Y:
(3000, 1)
[[4]
 [0]
 [5]
 ...
 [6]
 [7]
 [5]]


In [None]:
print(case_21.bd)

0.9


In [None]:
Run(case_21, "su", Learner = 1, Laplacian = False, DiagA = False, Correlation = False)

acc:  0.8383333333333334


In [None]:
Run(case_21, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = False)

acc:  0.8316666666666667


In [None]:
Run(case_21, "su", Learner = 1, Laplacian = False, DiagA = True, Correlation = True)

acc:  0.83


In [None]:
Run(case_21, "su", Learner = 1, Laplacian = True, DiagA = True, Correlation = True)

acc:  0.8216666666666667


In [None]:
results = average_restuls(case_21, comb_set, 1)

acc:  0.8216666666666667
--- embed 0.32279348373413086 seconds ---
--- train 0.012647628784179688 seconds ---
--- total 1.0624041557312012 seconds ---
acc:  0.8216666666666667
--- embed 0.47625303268432617 seconds ---
--- train 0.011063098907470703 seconds ---
--- total 1.2822730541229248 seconds ---
acc:  0.8216666666666667
--- embed 0.46976280212402344 seconds ---
--- train 0.013624906539916992 seconds ---
--- total 1.9463958740234375 seconds ---
acc:  0.8216666666666667
--- embed 0.5477137565612793 seconds ---
--- train 0.013518333435058594 seconds ---
--- total 2.0885672569274902 seconds ---
acc:  0.8216666666666667
--- embed 0.6068413257598877 seconds ---
--- train 0.014215469360351562 seconds ---
--- total 1.9182448387145996 seconds ---
acc:  0.8216666666666667
--- embed 0.5083954334259033 seconds ---
--- train 0.014401435852050781 seconds ---
--- total 1.8037171363830566 seconds ---
acc:  0.8216666666666667
--- embed 0.5935494899749756 seconds ---
--- train 0.013621807098388672 



acc:  0.8116666666666666
--- embed 0.2333080768585205 seconds ---
--- train 0.008192300796508789 seconds ---
--- total 0.9488792419433594 seconds ---




acc:  0.8116666666666666
--- embed 0.2377326488494873 seconds ---
--- train 0.0056002140045166016 seconds ---
--- total 0.9522075653076172 seconds ---




acc:  0.8116666666666666
--- embed 0.23929643630981445 seconds ---
--- train 0.008352279663085938 seconds ---
--- total 0.9656522274017334 seconds ---




acc:  0.8116666666666666
--- embed 0.23727035522460938 seconds ---
--- train 0.007306098937988281 seconds ---
--- total 0.9923994541168213 seconds ---




acc:  0.8116666666666666
--- embed 0.23441267013549805 seconds ---
--- train 0.005865812301635742 seconds ---
--- total 0.9540493488311768 seconds ---




acc:  0.8116666666666666
--- embed 0.23378658294677734 seconds ---
--- train 0.00877833366394043 seconds ---
--- total 0.9477188587188721 seconds ---




acc:  0.8116666666666666
--- embed 0.23687243461608887 seconds ---
--- train 0.005882978439331055 seconds ---
--- total 0.9971613883972168 seconds ---




acc:  0.8116666666666666
--- embed 0.2317650318145752 seconds ---
--- train 0.007254362106323242 seconds ---
--- total 0.9609098434448242 seconds ---




acc:  0.8116666666666666
--- embed 0.23242735862731934 seconds ---
--- train 0.006545305252075195 seconds ---
--- total 0.9564080238342285 seconds ---




acc:  0.8116666666666666
--- embed 0.2358226776123047 seconds ---
--- train 0.006914377212524414 seconds ---
--- total 0.9805798530578613 seconds ---




acc:  0.8183333333333334
--- embed 0.23067569732666016 seconds ---
--- train 0.005755186080932617 seconds ---
--- total 0.9641084671020508 seconds ---




acc:  0.8183333333333334
--- embed 0.21796107292175293 seconds ---
--- train 0.007372856140136719 seconds ---
--- total 0.9567809104919434 seconds ---




acc:  0.8183333333333334
--- embed 0.2169969081878662 seconds ---
--- train 0.008222818374633789 seconds ---
--- total 0.9605503082275391 seconds ---




acc:  0.8183333333333334
--- embed 0.2193007469177246 seconds ---
--- train 0.006380558013916016 seconds ---
--- total 0.9543807506561279 seconds ---




acc:  0.8183333333333334
--- embed 0.2274487018585205 seconds ---
--- train 0.008347511291503906 seconds ---
--- total 0.9534950256347656 seconds ---




acc:  0.8183333333333334
--- embed 0.22861576080322266 seconds ---
--- train 0.0076177120208740234 seconds ---
--- total 0.9679384231567383 seconds ---




acc:  0.8183333333333334
--- embed 0.22882413864135742 seconds ---
--- train 0.008661985397338867 seconds ---
--- total 0.9839673042297363 seconds ---




acc:  0.8183333333333334
--- embed 0.22600221633911133 seconds ---
--- train 0.008264780044555664 seconds ---
--- total 0.9906904697418213 seconds ---




acc:  0.8183333333333334
--- embed 0.23665904998779297 seconds ---
--- train 0.0062525272369384766 seconds ---
--- total 0.990703821182251 seconds ---




acc:  0.8183333333333334
--- embed 0.23306655883789062 seconds ---
--- train 0.008369207382202148 seconds ---
--- total 0.9781818389892578 seconds ---
acc:  0.83
--- embed 0.2812047004699707 seconds ---
--- train 0.005878448486328125 seconds ---
--- total 0.29306650161743164 seconds ---
acc:  0.83
--- embed 0.2775564193725586 seconds ---
--- train 0.006426811218261719 seconds ---
--- total 0.2889981269836426 seconds ---
acc:  0.83
--- embed 0.29416823387145996 seconds ---
--- train 0.008039236068725586 seconds ---
--- total 0.3073267936706543 seconds ---
acc:  0.83
--- embed 0.2799255847930908 seconds ---
--- train 0.008398056030273438 seconds ---
--- total 0.29305076599121094 seconds ---
acc:  0.83
--- embed 0.2787182331085205 seconds ---
--- train 0.005712270736694336 seconds ---
--- total 0.289111852645874 seconds ---
acc:  0.83
--- embed 0.2833421230316162 seconds ---
--- train 0.005767345428466797 seconds ---
--- total 0.29396510124206543 seconds ---
acc:  0.83
--- embed 0.2837936

In [None]:
plot(results)

Unnamed: 0,Laplacian,DiagA,Correlation,Accuracy,Train_Time(s),Emb_Time(s),Total_Time(s)
set_01,True,True,True,0.821667,0.01524,0.54,1.9
set_02,True,True,False,0.816667,0.014,0.44,1.73
set_03,True,False,True,0.811667,0.00707,0.24,0.97
set_04,True,False,False,0.818333,0.00752,0.23,0.97
set_05,False,True,True,0.83,0.0067,0.28,0.3
set_06,False,True,False,0.831667,0.00759,0.29,0.3
set_07,False,False,True,0.823333,0.00676,0.29,0.3
set_08,False,False,False,0.838333,0.00627,0.28,0.29


###Clustering

#### Case 10

In [None]:
case_10_cluster = case.case_10_cluster()
case_10_cluster.summary()

name:

    SBM with 3 classes for clustering
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(1, 1)
[[3]]


In [None]:
print(case_10_cluster.bd)

0.13


In [None]:
Run(case_10_cluster, "c")

ARI:  0.8036759717543803


#### Case 11

In [None]:
case_11_cluster = case.case_11_cluster()
case_11_cluster.summary()

name:

    SBM with 5 classes for clustering
    
n:
<class 'int'>
3000
d:
<class 'int'>
5
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(1, 1)
[[5]]


In [None]:
print(case_11_cluster.bd)

0.2


In [None]:
Run(case_11_cluster, "c")

ARI:  1.0


#### case 20

In [None]:
case_20_cluster = case.case_20_cluster()
case_20_cluster.summary()

name:

    DC-SBM with 3 classes for clustering
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Y:
(1, 1)
[[3]]


In [None]:
print(case_20_cluster.bd)

[0.9, 0.5, 0.2]


In [None]:
Run(case_20_cluster, "c")

ARI:  0.6710219123774864


#### Case 21

In [None]:
case_21_cluster = case.case_21_cluster()
case_21_cluster.summary()

name:

    DC-SBM with 10 classes for clustering.
    Edge list version. 
    
n:
<class 'int'>
3000
d:
<class 'int'>
10
X:
(30487, 3)
[[   0    3    1]
 [   0  168    1]
 [   0  551    1]
 ...
 [2952 2993    1]
 [2975 2980    1]
 [2983 2987    1]]
Y:
(1, 1)
[[10]]


In [None]:
print(case_21_cluster.bd)

0.9


In [None]:
Run(case_21_cluster, "c")

ARI:  0.43355806469613173


### Semi-GNN-learner 0

#### case 10

In [None]:
case_10 = case.case_10()
case_10.summary()

name:

    SBM with 3 classes and defined probabilities with 95% unknown labels.
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [-1]]


In [None]:
print(case_10.bd)

0.13


In [None]:
Run(case_10, "se", Learner = 0, LearnerIter = 0)

acc:  0.5623025894165039


#### case 11

In [None]:
case_11 = case.case_11()
case_11.summary()

name:

    SBM with 5 classes and defined probabilities with 95% unknown labels.  
    
n:
<class 'int'>
3000
d:
<class 'int'>
5
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [-1]]


In [None]:
print(case_11.bd)

0.2


In [None]:
Run(case_11, "se", Learner = 0, LearnerIter = 0)

acc:  0.6410256624221802


#### case 20

In [None]:
case_20 = case.case_20()
case_20.summary()

name:

    DC-SBM with 3 classes and defined probabilities with 95% unknown labels.
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [-1]]


In [None]:
print(case_20.bd)

[0.9, 0.5, 0.2]


In [None]:
Run(case_20, "se", Learner = 0, LearnerIter = 0)

acc:  0.6356616616249084


#### case 21

In [None]:
case_21 = case.case_21()
case_21.summary()

name:

    DC-SBM with 10 classes and defined probabilities with 95% unknown labels.
    Edge list version.     
    
n:
<class 'int'>
3000
d:
<class 'int'>
10
X:
(30487, 3)
[[   0    3    1]
 [   0  168    1]
 [   0  551    1]
 ...
 [2952 2993    1]
 [2975 2980    1]
 [2983 2987    1]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [ 5]]


In [None]:
print(case_21.bd)

0.9


In [None]:
Run(case_21, "se", Learner = 0, LearnerIter = 0)

acc:  0.33778560161590576


### Semi-LDA-learner 1

#### case 10

In [None]:
case_10 = case.case_10()
case_10.summary()

name:

    SBM with 3 classes and defined probabilities with 95% unknown labels.
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [-1]]


In [None]:
print(case_10.bd)

0.13


In [None]:
Run(case_10, "se", Learner = 1, LearnerIter = 10)

acc:  0.763


#### case 11

In [None]:
case_11 = case.case_11()
case_11.summary()

name:

    SBM with 5 classes and defined probabilities with 95% unknown labels.  
    
n:
<class 'int'>
3000
d:
<class 'int'>
5
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [-1]]


In [None]:
print(case_11.bd)

0.2


In [None]:
Run(case_11, "se", Learner = 1, LearnerIter = 10)

acc:  1.0


#### case 20

In [None]:
case_20 = case.case_20()
case_20.summary()

name:

    DC-SBM with 3 classes and defined probabilities with 95% unknown labels.
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [-1]]


In [None]:
print(case_20.bd)

[0.9, 0.5, 0.2]


In [None]:
Run(case_20, "se", Learner = 1, LearnerIter = 10)

acc:  0.9073333333333333


#### case 21

In [None]:
case_21 = case.case_21()
case_21.summary()

name:

    DC-SBM with 10 classes and defined probabilities with 95% unknown labels.
    Edge list version.     
    
n:
<class 'int'>
3000
d:
<class 'int'>
10
X:
(30487, 3)
[[   0    3    1]
 [   0  168    1]
 [   0  551    1]
 ...
 [2952 2993    1]
 [2975 2980    1]
 [2983 2987    1]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [ 5]]


In [None]:
print(case_21.bd)

0.9


In [None]:
Run(case_21, "se", Learner = 1, LearnerIter = 10)

acc:  0.8416666666666667


### Semi-GNN-learner 2 - update using y_temp

#### case 10

In [None]:
case_10 = case.case_10()
case_10.summary()

name:

    SBM with 3 classes and defined probabilities with 95% unknown labels.
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [-1]]


In [None]:
print(case_10.bd)

0.13


In [None]:
Run(case_10, "se", Learner = 2, LearnerIter = 10)

acc:  0.7223333333333334


#### case 11

In [None]:
case_11 = case.case_11()
case_11.summary()

name:

    SBM with 5 classes and defined probabilities with 95% unknown labels.  
    
n:
<class 'int'>
3000
d:
<class 'int'>
5
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [-1]]


In [None]:
print(case_11.bd)

0.2


In [None]:
Run(case_11, "se", Learner = 2, LearnerIter = 10)

acc:  1.0


#### Case 20

In [None]:
case_20 = case.case_20()
case_20.summary()

name:

    DC-SBM with 3 classes and defined probabilities with 95% unknown labels.
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [-1]]


In [None]:
print(case_20.bd)

[0.9, 0.5, 0.2]


In [None]:
Run(case_20, "se", Learner = 2, LearnerIter = 10)

acc:  0.9053333333333333


#### Case 21

In [None]:
case_21 = case.case_21()
case_21.summary()

name:

    DC-SBM with 10 classes and defined probabilities with 95% unknown labels.
    Edge list version.     
    
n:
<class 'int'>
3000
d:
<class 'int'>
10
X:
(30487, 3)
[[   0    3    1]
 [   0  168    1]
 [   0  551    1]
 ...
 [2952 2993    1]
 [2975 2980    1]
 [2983 2987    1]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [ 5]]


In [None]:
print(case_21.bd)

0.9


In [None]:
Run(case_21, "se", Learner = 2, LearnerIter = 10)

acc:  0.83


### Semi-GNN-learner 2 - update using y_temp_one_hot

#### case 10

In [None]:
case_10 = case.case_10()
case_10.summary()

name:

    SBM with 3 classes and defined probabilities with 95% unknown labels.
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [-1]]


In [None]:
print(case_10.bd)

0.13


In [None]:
Run(case_10, "se", Learner = 2, LearnerIter = 10)

acc:  0.5123333333333333


#### case 11

In [None]:
case_11 = case.case_11()
case_11.summary()

name:

    SBM with 5 classes and defined probabilities with 95% unknown labels.  
    
n:
<class 'int'>
3000
d:
<class 'int'>
5
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [-1]]


In [None]:
print(case_11.bd)

0.2


In [None]:
Run(case_11, "se", Learner = 2, LearnerIter = 10)

acc:  0.9656666666666667


#### Case 20

In [None]:
case_20 = case.case_20()
case_20.summary()

name:

    DC-SBM with 3 classes and defined probabilities with 95% unknown labels.
    
n:
<class 'int'>
3000
d:
<class 'int'>
3
X:
(3000, 3000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [-1]]


In [None]:
print(case_20.bd)

[0.9, 0.5, 0.2]


In [None]:
Run(case_20, "se", Learner = 2, LearnerIter = 10)

acc:  0.907


#### Case 21

In [None]:
case_21 = case.case_21()
case_21.summary()

name:

    DC-SBM with 10 classes and defined probabilities with 95% unknown labels.
    Edge list version.     
    
n:
<class 'int'>
3000
d:
<class 'int'>
10
X:
(30487, 3)
[[   0    3    1]
 [   0  168    1]
 [   0  551    1]
 ...
 [2952 2993    1]
 [2975 2980    1]
 [2983 2987    1]]
Y:
(3000, 1)
[[-1]
 [-1]
 [-1]
 ...
 [-1]
 [-1]
 [ 5]]


In [None]:
print(case_21.bd)

0.9


In [None]:
Run(case_21, "se", Learner = 2, LearnerIter = 10)

acc:  0.854


## Node2Vec vs AEE

In [None]:
n = 2000
case = Case(n)

### Node2Vec - Supervised

#### GNN

##### case 10

In [None]:
case_10 = case.case_10_fully_known()
case_10.summary()

name:

    SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
2000
d:
<class 'int'>
3
X:
(2000, 2000)
[[0 0 0 ... 0 0 0]
 [0 0 1 ... 0 0 0]
 [0 1 0 ... 0 0 1]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 1 ... 0 0 0]]
Y:
(2000, 1)
[[ 1]
 [-1]
 [ 2]
 ...
 [ 2]
 [ 2]
 [-1]]


In [None]:
print(case_10.bd)

0.13


In [None]:
Run(case_10, "su", Learner = 0, emb_opt = "Node2Vec")

Computing transition probabilities:   0%|          | 0/2000 [00:00<?, ?it/s]

acc:  0.39500001072883606
--- 1779.9130690097809 seconds ---


##### Case 20

In [None]:
case_20 = case.case_20_fully_known()
case_20.summary()

name:

    DC-SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
2000
d:
<class 'int'>
3
X:
(2000, 2000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Y:
(2000, 1)
[[ 1]
 [-1]
 [ 2]
 ...
 [ 2]
 [ 2]
 [-1]]


In [None]:
print(case_20.bd)

[0.9, 0.5, 0.2]


In [None]:
Run(case_20, "su", Learner = 0, emb_opt = "Node2Vec")

Computing transition probabilities:   0%|          | 0/2000 [00:00<?, ?it/s]

acc:  0.4124999940395355
--- 1042.857544183731 seconds ---


#### LDA

##### case 10

In [None]:
case_10 = case.case_10_fully_known()
case_10.summary()

name:

    SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
2000
d:
<class 'int'>
3
X:
(2000, 2000)
[[0 0 0 ... 0 0 0]
 [0 0 1 ... 0 0 0]
 [0 1 0 ... 0 0 1]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 1 ... 0 0 0]]
Y:
(2000, 1)
[[ 1]
 [-1]
 [ 2]
 ...
 [ 2]
 [ 2]
 [-1]]


In [None]:
print(case_10.bd)

0.13


In [None]:
Run(case_10, "su", Learner = 1, emb_opt = "Node2Vec")

Computing transition probabilities:   0%|          | 0/2000 [00:00<?, ?it/s]

acc:  0.43
--- 1791.2492997646332 seconds ---


##### Case 20

In [None]:
case_20 = case.case_20_fully_known()
case_20.summary()

name:

    DC-SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
2000
d:
<class 'int'>
3
X:
(2000, 2000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Y:
(2000, 1)
[[ 1]
 [-1]
 [ 2]
 ...
 [ 2]
 [ 2]
 [-1]]


In [None]:
print(case_20.bd)

[0.9, 0.5, 0.2]


In [None]:
Run(case_20, "su", Learner = 1, emb_opt = "Node2Vec")

Computing transition probabilities:   0%|          | 0/2000 [00:00<?, ?it/s]

acc:  0.4475
--- 1176.5556297302246 seconds ---


### AEE - Supervised

#### GNN

##### case 10

In [None]:
case_10 = case.case_10_fully_known()
case_10.summary()

name:

    SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
2000
d:
<class 'int'>
3
X:
(2000, 2000)
[[0 0 0 ... 0 0 0]
 [0 0 1 ... 0 0 0]
 [0 1 0 ... 0 0 1]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 1 ... 0 0 0]]
Y:
(2000, 1)
[[ 1]
 [-1]
 [ 2]
 ...
 [ 2]
 [ 2]
 [-1]]


In [None]:
print(case_10.bd)

0.13


In [None]:
Run(case_10, "su", Learner = 0, emb_opt = "AEE")

acc:  0.8899999856948853
--- 26.105212450027466 seconds ---


##### Case 20

In [None]:
case_20 = case.case_20_fully_known()
case_20.summary()

name:

    DC-SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
2000
d:
<class 'int'>
3
X:
(2000, 2000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Y:
(2000, 1)
[[ 1]
 [-1]
 [ 2]
 ...
 [ 2]
 [ 2]
 [-1]]


In [None]:
print(case_20.bd)

[0.9, 0.5, 0.2]


In [None]:
Run(case_20, "su", Learner = 0, emb_opt = "AEE")

acc:  0.8274999856948853
--- 9.403448343276978 seconds ---


#### LDA

##### case 10

In [None]:
case_10 = case.case_10_fully_known()
case_10.summary()

name:

    SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
2000
d:
<class 'int'>
3
X:
(2000, 2000)
[[0 0 0 ... 0 0 0]
 [0 0 1 ... 0 0 0]
 [0 1 0 ... 0 0 1]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 1 ... 0 0 0]]
Y:
(2000, 1)
[[ 1]
 [-1]
 [ 2]
 ...
 [ 2]
 [ 2]
 [-1]]


In [None]:
print(case_10.bd)

0.13


In [None]:
Run(case_10, "su", Learner = 1, emb_opt = "AEE")

acc:  0.8825
--- 3.7893357276916504 seconds ---


##### Case 20

In [None]:
case_20 = case.case_20_fully_known()
case_20.summary()

name:

    DC-SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
2000
d:
<class 'int'>
3
X:
(2000, 2000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Y:
(2000, 1)
[[ 1]
 [-1]
 [ 2]
 ...
 [ 2]
 [ 2]
 [-1]]


In [None]:
print(case_20.bd)

[0.9, 0.5, 0.2]


In [None]:
Run(case_20, "su", Learner = 1, emb_opt = "AEE")

acc:  0.8275
--- 1.3883047103881836 seconds ---


### Encoder old method AEE + Node2Vec

In [None]:
class EncoderEmbedding:
  def AEE(self,dataset):
    aee = copy.deepcopy(self)

    X = dataset.X
    Y = dataset.Y
    test_idx = dataset.test_idx
    train_idx = dataset.train_idx

    # Partition the data
    X_test, X_train = X[test_idx,:][:,train_idx], X[train_idx,:][:,train_idx]
    Y_train = Y[train_idx]

    Y_test = dataset.Y_test 
    k = dataset.d

    #nk,w,Z
    nk = np.zeros((1,k))
    for i in range(0,len(Y_train)):
        nk[0,int(Y_train[i])]=nk[0,int(Y_train[i])]+1
    w = np.zeros((int(np.size(Y_train)),k))
    for i in range(0,int(np.size(Y_train))):
        k_i=int(Y_train[i])
        # w[i][k_i]=1/nk[0,k_i]*2
        w[i][k_i]=1/nk[0,k_i]
    
    aee.z_train= np.matmul(X_train,w)
    aee.z_test = np.matmul(X_test,w)
    aee.y_train = Y_train.ravel() 
    aee.y_test = Y_test.ravel() 
    aee.k = k
    aee.nk = nk
    aee.w = w

    
    return aee

  def NodeToVec(self,dataset):
    n2v = copy.deepcopy(self)

    X = dataset.X
    Y = dataset.Y
    test_idx = dataset.test_idx
    train_idx = dataset.train_idx

    # Partition the data
    X_test, X_train = X[test_idx,:][:,train_idx], X[train_idx,:][:,train_idx]
    Y_train = Y[train_idx]

    Y_test = dataset.Y_test
    k = dataset.d

    G = nx.from_numpy_matrix(X)
    # use default setting from https://github.com/eliorc/node2vec
    node2vec = Node2Vec(G, dimensions=64, walk_length=30, num_walks=200, workers=4)
    # Embed nodes, use default setting from https://github.com/eliorc/node2vec
    model = node2vec.fit(window=10, min_count=1, batch_words=4)
    # get embedding matrix
    Z = model.wv.vectors
    
    n2v.z_train= Z[train_idx]
    n2v.z_test = Z[test_idx]
    n2v.y_train = Y_train.ravel()
    n2v.y_test = Y_test.ravel() 
    n2v.k = k

    
    return n2v

class Hyperperameters:
  """
    define perameters for GNN.
    default values are for GNN learning -- "Leaner" ==2:
      embedding via partial label, then learn unknown label via two-layer NN

  """
  def __init__(self):
    # there is no scaled conjugate gradiant in keras optimiser, use defualt instead
    # use whatever default
    self.learning_rate = 0.01  # Initial learning rate.
    self.epochs = 100 #Number of epochs to train.
    self.hidden = 20 #Number of units in hidden layer 
    self.val_split = 0.1 #Split 10% of training data for validation
    self.loss = 'categorical_crossentropy' # loss function

class GNN:
  def __init__(self, DataSets):
    GNN.DataSets = DataSets
    GNN.hyperM = Hyperperameters()
    GNN.model = self.GNN_model()  #model summary: GNN.model.summary()
      
 
  def GNN_model(self):
    """
      build GNN model
    """
    hyperM = self.hyperM
    DataSets = self.DataSets

    z_train = DataSets.z_train
    k = DataSets.k

    feature_num = z_train.shape[1]
    
    model = keras.Sequential([
    keras.layers.Flatten(input_shape = (feature_num,)),  # input layer 
    keras.layers.Dense(hyperM.hidden, activation='relu'),  # hidden layer -- no tansig activation function in Keras, use relu instead
    keras.layers.Dense(k, activation='softmax') # output layer, matlab used softmax for patternnet default ??? max(opts.neuron,K)? opts 
    ])

    optimizer = keras.optimizers.Adam(learning_rate = hyperM.learning_rate)

    model.compile(optimizer='adam',
                  loss=hyperM.loss,
                  metrics=['accuracy'])

    return model
    
  def GNN_run(self):
    """
      Train and test directly.
      Do not learn from the unknown labels.
    """
    gnn = copy.deepcopy(self)
    hyperM = gnn.hyperM
    DataSets = self.DataSets
    k = DataSets.k
    z_train = DataSets.z_train
    y_train = DataSets.y_train
    y_test = DataSets.y_test
    z_test = DataSets.z_test
    model = gnn.model    

    y_train_one_hot = to_categorical(y_train) 
    history = model.fit(z_train, y_train_one_hot, 
          epochs=hyperM.epochs, 
          validation_split=hyperM.val_split,  
          verbose=0)
    
    y_test_one_hot = to_categorical(y_test) 
    # set verbose to 0 to silent the output
    test_loss, test_acc = gnn.model.evaluate(z_test,  y_test_one_hot, verbose=0) 
    return test_acc


#### case 10

In [None]:
case_10 = case.case_10_fully_known()
case_10.summary()

name:

    SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
2000
d:
<class 'int'>
3
X:
(2000, 2000)
[[0 0 0 ... 0 0 0]
 [0 0 1 ... 0 0 0]
 [0 1 0 ... 0 0 1]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 1 ... 0 0 0]]
Y:
(2000, 1)
[[ 1]
 [-1]
 [ 2]
 ...
 [ 2]
 [ 2]
 [-1]]


In [None]:
print(case_10.bd)

0.13


##### LDA

###### AEE

In [None]:
begin = time.time()
Encod = EncoderEmbedding()
aee = Encod.AEE(case_10)
clf = LinearDiscriminantAnalysis()
clf.fit(aee.z_train, aee.y_train)
acc = clf.score(aee.z_test, aee.y_test)
end = time.time()
print(acc)
print("--- %s seconds ---" % (end - begin))

0.9075
--- 0.10760641098022461 seconds ---


###### Node2Vec

In [None]:
begin = time.time()
Encod = EncoderEmbedding()
n2v = Encod.NodeToVec(case_10)
clf = LinearDiscriminantAnalysis()
clf.fit(n2v.z_train, n2v.Y_train)
acc = clf.score(n2v.z_test, n2v.Y_test)
end = time.time()
print(acc)
print("--- %s seconds ---" % (end - begin))

Computing transition probabilities:   0%|          | 0/2000 [00:00<?, ?it/s]

0.435
--- 2213.2506487369537 seconds ---


##### GNN

###### AEE

In [None]:
begin = time.time()
Encod = EncoderEmbedding()
aee = Encod.AEE(case_10)
gnn = GNN(aee)
acc = gnn.GNN_run()
end = time.time()
print(acc)
print("--- %s seconds ---" % (end - begin))

0.9075
--- 21.70582938194275 seconds ---


###### Node2Vec

In [None]:
begin = time.time()
Encod = EncoderEmbedding()
n2v = Encod.NodeToVec(case_10)
gnn = GNN(n2v)
acc = gnn.GNN_run()
end = time.time()
print(acc)
print("--- %s seconds ---" % (end - begin))

Computing transition probabilities:   0%|          | 0/2000 [00:00<?, ?it/s]

0.3824999928474426
--- 2131.48596739769 seconds ---


#### case 20

In [None]:
case_20 = case.case_20_fully_known()
case_20.summary()

name:

    DC-SBM with 3 classes and defined probabilities with fully known labels
    80% for training and 20% for testing
    
n:
<class 'int'>
2000
d:
<class 'int'>
3
X:
(2000, 2000)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Y:
(2000, 1)
[[ 1]
 [-1]
 [ 2]
 ...
 [ 2]
 [ 2]
 [-1]]


#####LDA

###### AEE

In [None]:
begin = time.time()
Encod = EncoderEmbedding()
aee = Encod.AEE(case_20)
clf = LinearDiscriminantAnalysis()
clf.fit(aee.z_train, aee.y_train)
acc = clf.score(aee.z_test, aee.y_test)
end = time.time()
print(acc)
print("--- %s seconds ---" % (end - begin))

0.8
--- 0.21722698211669922 seconds ---


###### Node2Vec

In [None]:
begin = time.time()
Encod = EncoderEmbedding()
n2v = Encod.NodeToVec(case_20)
clf = LinearDiscriminantAnalysis()
clf.fit(n2v.z_train, n2v.y_train)
acc = clf.score(n2v.z_test, n2v.y_test)
end = time.time()
print(acc)
print("--- %s seconds ---" % (end - begin))

Computing transition probabilities:   0%|          | 0/2000 [00:00<?, ?it/s]

0.435
--- 1268.2662296295166 seconds ---


##### GNN

###### AEE

In [None]:
begin = time.time()
Encod = EncoderEmbedding()
aee = Encod.AEE(case_20)
gnn = GNN(aee)
acc = gnn.GNN_run()
end = time.time()
print(acc)
print("--- %s seconds ---" % (end - begin))

0.8299999833106995
--- 9.782210350036621 seconds ---


###### Node2Vec

In [None]:
begin = time.time()
Encod = EncoderEmbedding()
n2v = Encod.NodeToVec(case_20)
gnn = GNN(n2v)
acc = gnn.GNN_run()
end = time.time()
print(acc)
print("--- %s seconds ---" % (end - begin))

Computing transition probabilities:   0%|          | 0/2000 [00:00<?, ?it/s]

0.4025000035762787
--- 1263.1786060333252 seconds ---
