# Supervised Graph Learning

**Supervised learning** represents the majority of practical **machine learning** tasks. Thanks to more active and effective data collection, it is imore common to deal with labeled datasets. This also applies to graphs, where labels can be assigned to nodes, communities and structures so that we learn some mapping function between the input and label.

Code available at: https://github.com/PacktPublishing/Graph-Machine-Learning/tree/main/Chapter04

In SL, a training set has a sequence of ordered pairs *(x, y)* where x is a set of input features and y is the output label assigned to it, we then want to learn the mapping function of each *x* value to each *y* value. In some situations we have a smaller dataset of labeled instances and a larger set of unlabeled instances. Here, **semi-SL (SSL)** is proposed, where algorithmns learn dependencies of available labels to learn prediting functions for unlabeled samples. There are various algorithm types:
-  feature-based methods
-  shallow embedding methods
-  regularisation methods
-  graph neural networks

![4_1](./figures/4_1.jpg)

## Feature-based methods

Simple and powerful method for ML on graphs: consider encoding function as a simple embedding lookup. One simple way to do this is to exploid graph properties, and we know that graphs can be described by (exploiting) structural properties so important information "encoding" from the graph itself. A shallow approach acts in two steps:
1.  Select a set of *good* descriptive graph properties (e.g. avg. degree length, global efficiency etc..)
2.  Use such properties as input or a traditional ML algorithm

Unfortunately, there is no general definition of *good* descriptive properties, and their choice strictly depends on the specific problem to solve. 

Steps: 
1.  Convert StellarGraph to numpy adj matrices (networkx) and convert labels from Pandas series to numpy array.
2.  Compute global metrics to describe each graph, e.g. num edges, avg. cluster coefficient, global efficiency (can compute graph metrics with networkx)
3.  Exploit sckkit-learn to create train and test sets
4.  Train a ML alg, choose support vector machine (SVM), trained to minimise the difference between the predicted labels and the actual labels

For StellarGraph PROTEINS dataset, achieve about 80% F1-score, quite good for naive task.

In [1]:
from stellargraph import datasets
from IPython.display import display, HTML

dataset = datasets.PROTEINS()
graphs, graph_labels = dataset.load()

In [2]:
# convert from StellarGraph to numpy adj matrices
adjs = [graph.to_adjacency_matrix().A for graph in graphs]

# convert labels from pd.series to np array
labels = graph_labels.to_numpy(dtype=int)

In [6]:
# compute global metrics to define each graph
import numpy as np
import networkx as nx

metrics = []
for adj in adjs:
    G = nx.from_numpy_matrix(adj)
    
    # basic properties
    num_edges = G.number_of_edges()
    
    # clustering measures
    cc = nx.average_clustering(G)
    
    # efficiency measure
    eff = nx.global_efficiency(G)
    
    metrics.append([num_edges, cc, eff])

In [7]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(metrics, labels, test_size=0.3, random_state=42)

In [9]:
X_train[0]

[116, 0.4690476190476191, 0.29735760384740045]

In [8]:
from sklearn import svm
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

clf = svm.SVC()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print('Accuracy', accuracy_score(y_test, y_pred))
print('Precision', precision_score(y_test, y_pred))
print('Recall', recall_score(y_test, y_pred))
print('F1-score', f1_score(y_test, y_pred))

Accuracy 0.7514970059880239
Precision 0.7777777777777778
Recall 0.8413461538461539
F1-score 0.8083140877598153



## Shallow embedding methods

Subset of graph embedding methods that learn node, edge orgraph representation for a finite set of input data. Cannot be applied to other instances different from ones used to train the model. 

The main difference between unsupervised and supervised embedding models is the task they attempt to solve. If unsupervised shallow embedding algorithms try to learn a good graph/node/edge representation to build well defined clusters, supervised algorithms try to find the best solution for a prediction task, such as graph/node/edge classification.

We will see more supervised shallow embedding algorithms here.

## Label Propagation Algorithm
Used to solve node classification task, the algorithm propagates the label of a given node to its neighbours or to nodes having high probability of being reached from that node.

The only nonzero elements of the degree matrix are diagonal elements whose values represent the degree of the node represented by the row. Introduce transition matrix $L=$$D$<sup>$-1$</sup>=$A$ where l<sub>$ij$</sub>$\in L$ is the probability of reaching node $v_j$ from $v_i$. Probability of reaching an end node given a start node. 

Can see the probability of nodes being assigned labels. So we can perform *n* iterations, at each iteration *t*, the algorithm will compute the solution for that iteration:
$Y^t = LY$<sup>$t-1$</sup>
And stops when a certain condition is met.

However we can see the issues:
-  Possible to assigned only to nodes a probability associated with a label
-  The initial labels of values are different from the the one defined in $Y^0$ 
    -  Can solve by forcing labeled nodes to have initial class values instead of losing its own values
    
And the algorithm runs until we reach a certain number of iterations or hit a solution tolerance error. Here we may see error with fixing the value of Y0 for original, especially if there is a labelling error, that may propagate itself. So we change the algorithm to normalised Laplacian $L = D$<sup>$-1/2$</sup>$AD$<sup>$-1/2$</sup> and change our propagation agorithm to $Y^t = \alpha L Y$<sup>t-1</sup>$ + (1-\alpha)Y^0$ and stops when a certain conition is met. Here we have regulariser $\alpha \in [0,1]$ to weight influence of original solution at each iteration, imposing the "quality" of the original solution and its influence in the final solution.

## Graph regularisation methods
Topological information and relations between data points can be encoded and leveraged to build more robust classifiers. Using network information to constrain models and enforce smooth outputs within neighbouring nodes. This can also be used to regulate the learning phase to create more robust models that tend to generalise better to unseen examples. Both the label propagation and label spreading can be implemented as a cost function to be minimised with an added regularisation term.

A loss function that depends on labeled and unlabelled samples, with the second (on unlablled) term acting as a regularising term that depends on the topological information of graph *G*. This can be powerful as a tool to regularise the training of neural networks. 


## Manifold Regularisation and semi-supervised embedding
Manifold regularisation extends label propagation by parameterising the model function in reproducing kernel Hilbert space and using a supervised loss function, mean square error. So when training SVM or LSE, apply graph regularisation based on Laplacian matrix. Label propagation and label spreading can be seen as a special case of manifold regularisation. Besides, the algorithms can also be used in the case of no-label data. It can also be used for fully labeled datasets, or on unobserved samples making it an inductive model.

**Inductive model**: Can be used on unobserved samples and does not require test samples to belong to input graph.

**Manifold learning**: A shallow form of learning whereby the the parameterised function does not leverage on any form of intermediate embeddings

**Semi-supervised embedding**: Extends concepts of graph regularisation to deeper architectures by imposing the constraint and smoothness of function on intermediate layers of network

Depending on where the regularisation is imposed, we can have three different configurations:

-  Regularisation can be applied to final output of network.
-  Regularisation applied to inetermediate layers, regularising the embedding representation
-  Regularisation applied to an auxiliary network that shares first k-1 layers; corresponds to training an unsupervised embedding network while simultaneously training a supervised network. Imposes derived regularisation of first k-1 layers constrained by unsupervised network as well and simultaneously promotes an embedding of the network nodes.

We have loss functions that ensures embeddings of neighbouring nodes stay close. Non-neighbours are pulled apart to distance specified by threshold *m*. The best choice of the above depends on data. Be aware that embeddings in deeper layers are generally harder to be trained and require a careful tuning of learning rate and margins to be used. Also when using softmax (usually at output), hinge loss may not be appropriate or suited for log probabilities. In such a case, regularised embeddings and relative loss should instead be introduced at intermediate layers.

## Neural Graph Learning
Generalises previous formulations to make it possible to apply graph regularisation to any form of a NN. Can apply to any graph, natural or synthetic. We can also generate synthetic graphs with adversarial examples where samples are perturbed to maximise errors, allowing us to obtain models more robust against adversarially generated examples.

NGL extends regularisation by augmenting the tuning parameters for graph regularisation in NN's, decomposing the contribution of labeled-labeled, labeled-unlabeled and unlabeled-unlabeled relations with parameters $\alpha_1$, $\alpha_2$ and $\alpha_3$.

Loosely speaking, NGL formulations can be seen as non-linear versions of label propagation and label spreading algorithms, or as a form of graph-regularised NN for which the manifold learning or semi-supervising embeddings can be obtained.

Here we will work on the **Cora** dataset, a labeled dataset of 2,708 scientific papers in comp-sci classified into seven classes. Each paper represents a node connected to other nodes based on citations. In total there are 5,429 links in the network. Furthermore, each node is described by a 1,433-long vector of binary values (0 or1) that represent a dicohotomic **bag-of-words (BOW)** representation of the paper; a one-hot encoding algorithm indicating the presence/absence of a word in a given vocabulary made up of 1,433 terms.

In [10]:
# download cora dataset
from stellargraph import datasets
dataset = datasets.Cora()
dataset.download()
# G is the citation network with network nodes, edges and features describing the BOW representation
# labels is a pd series providing the mapping between paper ID and one of the classes
G, labels = dataset.load()

In [14]:
import pandas as pd

# structure node features as dataframe
adjMatrix = pd.DataFrame.sparse.from_spmatrix(
    G.to_adjacency_matrix(),
    index=G.nodes(),
    columns=G.nodes()
)
# store node features as adjacency matrix
features = pd.DataFrame(G.node_features(), index=G.nodes())

In [18]:
adjMatrix.shape, features.shape

((2708, 2708), (2708, 1433))

In [23]:
def getNeighbours(idx, adjMatrix, topn=5):
    # helper fn to retrieve closest topn neighbours of a node
    weights = adjMatrix.loc[idx]
    neighbours = weights[weights>0].sort_values(ascending=False).head(topn)
    return [(k, v) for k, v in neighbours.items()]

In [24]:
topn = 5
label_index = {
      'Case_Based': 0,
      'Genetic_Algorithms': 1,
      'Neural_Networks': 2,
      'Probabilistic_Methods': 3,
      'Reinforcement_Learning': 4,
      'Rule_Learning': 5,
      'Theory': 6,
  }

# merge information into a single dataframe
dataset = {
    index: {
        'id': index,
        'words': [float(x) for x in features.loc[index].values],
        'label': label_index[label],
        'neighbours': getNeighbours(index, adjMatrix, topn)
    }
    for index, label in labels.items()
}
df = pd.DataFrame.from_dict(dataset, orient='index')

AttributeError: type object 'DataFrame' has no attribute 'frm_dict'

In [27]:
df.head(1)

Unnamed: 0,id,words,label,neighbours
31336,31336,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",2,"[(1129442, 1.0), (686532, 1.0), (10531, 1.0), ..."


In [31]:
GRAPH_PREFIX="NL_nbr"

def getFeatureOrDefault(ith, row):
    # define a function to retrieve and join the neighbourhood information
    # to join preceeding dataframe with information from neighbourhood
    try:
        nodeId, value = row["neighbours"][ith]
        return {
            f"{GRAPH_PREFIX}_{ith}_weight": value,
            f"{GRAPH_PREFIX}_{ith}_words": df.loc[nodeId]["words"]
        }
    except:
        # when neighboursare less than topn, set weight and one-hot encoding to 0
        return {
            f"{GRAPH_PREFIX}_{ith}_weight": 0.0,
            f"{GRAPH_PREFIX}_{ith}_words": [float(x) for x in np.zeros(1433)]
        }

In [29]:
def neighboursFeatures(row):
    featureList = [getFeatureOrDefault(ith, row) for ith in range(topn)]
    return pd.Series(
        {k: v for feat in featureList for k, v in feat.items()}
    )

In [32]:
neighbours = df.apply(neighboursFeatures, axis=1)
allFeatures = pd.concat([df, neighbours], axis=1)

In [34]:
allFeatures.head(1)

Unnamed: 0,id,words,label,neighbours,NL_nbr_0_weight,NL_nbr_0_words,NL_nbr_1_weight,NL_nbr_1_words,NL_nbr_2_weight,NL_nbr_2_words,NL_nbr_3_weight,NL_nbr_3_words,NL_nbr_4_weight,NL_nbr_4_words
31336,31336,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",2,"[(1129442, 1.0), (686532, 1.0), (10531, 1.0), ...",1.0,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",1.0,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",1.0,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",1.0,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",1.0,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."


In [37]:
from sklearn import model_selection

# split to train and test dataset
ratio = 0.2 # change amount of labeled vs unlabeled data pts

n = int(np.round(len(labels) * ratio))
labelled, unlabelled = model_selection.train_test_split(
    allFeatures, train_size=n, test_size=None, stratify=labels
)

In [40]:
import tensorflow as tf

train_base = {
    "words": tf.constant([
        tuple(x) for x in labelled["words"].values
    ]),
    "label": tf.constant([
        x for x in labelled["label"].values
    ])
}
train_neighbour_words = {
    k: tf.constant([tuple(x) for x in labelled[k].values]) for k in neighbours if "words" in k
}
train_neighbour_weights = {
    k: tf.constant([tuple(x) for x in labelled[k].values]) for k in neighbours if "weigt" in k
}

In [45]:
# train_base

In [46]:
# merge all information in tfds
trainSet = tf.data.Dataset.from_tensor_slices({
    k:v for feature in [train_base, train_neighbour_words, train_neighbour_weights] for k, v in feature.items()
})

In [49]:
validSet = tf.data.Dataset.from_tensor_slices({
    "words": tf.constant([tuple(x) for x in unlabelled["words"].values]),
    "label": tf.constant([x for x in unlabelled["label"].values])
})

In [51]:
def split(features):
    labels=features.pop("label")
    return features, labels

trainSet = trainSet.map(split)
validSet = validSet.map(split)

In [52]:
for features, labels in trainSet.batch(2).take(2):
    print(features)
    print(labels)

{'words': <tf.Tensor: shape=(2, 1433), dtype=float32, numpy=
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>, 'NL_nbr_0_words': <tf.Tensor: shape=(2, 1433), dtype=float32, numpy=
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>, 'NL_nbr_1_words': <tf.Tensor: shape=(2, 1433), dtype=float32, numpy=
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>, 'NL_nbr_2_words': <tf.Tensor: shape=(2, 1433), dtype=float32, numpy=
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>, 'NL_nbr_3_words': <tf.Tensor: shape=(2, 1433), dtype=float32, numpy=
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>, 'NL_nbr_4_words': <tf.Tensor: shape=(2, 1433), dtype=float32, numpy=
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>}
tf.Tensor([2 2], shape=(2,), d

In [55]:
vocabularySize = 1433

inputs = tf.keras.Input(shape=(vocabularySize,), dtype='float32', name='words')
cur_layer = inputs
for num_units in [50, 50]:
    cur_layer = tf.keras.layers.Dense(num_units, activation='relu')(cur_layer)
    cur_layer = tf.keras.layers.Dropout(0.8)(cur_layer)
outputs = tf.keras.layers.Dense(len(label_index), activation='softmax', name='label')(cur_layer)
model = tf.keras.Model(inputs, outputs=outputs)

In [56]:
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

In [58]:
model.summary()

Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
words (InputLayer)           [(None, 1433)]            0         
_________________________________________________________________
dense_2 (Dense)              (None, 50)                71700     
_________________________________________________________________
dropout_2 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 50)                2550      
_________________________________________________________________
dropout_3 (Dropout)          (None, 50)                0         
_________________________________________________________________
label (Dense)                (None, 7)                 357       
Total params: 74,607
Trainable params: 74,607
Non-trainable params: 0
__________________________________________________

In [57]:
from tensorflow.keras.callbacks import TensorBoard
model.fit(
    trainSet.batch(128), epochs=200, verbose=1,
    validation_data=validSet.batch(128),
    callbacks=[TensorBoard(log_dir='/tmp/base')]
)

Epoch 1/200


  [n for n in tensors.keys() if n not in ref_input_names])


Instructions for updating:
use `tf.profiler.experimental.stop` instead.
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200


<tensorflow.python.keras.callbacks.History at 0x7fa818e133d0>

In [62]:
# now create a graph regularised version
import neural_structured_learning as nsl
graph_reg_config=nsl.configs.make_graph_reg_config(
    max_neighbors=2, # num neighbours to compute regularisation loss for each node
    multiplier=0.1, # coefficients that tune importance of regularisation loss
    distance_type=nsl.configs.DistanceType.L2, # pairwise distance d
    sum_over_axis=-1 # whether weighted average sum should be calculated WRT features or to samples
)
graph_reg = nsl.keras.GraphRegularization(model, graph_reg_config)

In [63]:
graph_reg.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)
model.fit(
    trainSet.batch(128),
    epochs=200,
    verbose=1,
    validation_data=validSet.batch(128),
    callbacks=[TensorBoard(log_dir='/tmp/nsl')]
)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7fa81f8a6110>

Accuracy is better for regularised graph and we expect this to outperform vanilla graphs for a large number of epochs. Also note that performance increases as ratio of labeled data increases (more training data)/ ratio of supervised/unsupervised parts of our graph.

## Planetoid
Extend graph regularisation in order to account for higher-order proximities, we have Planetoid (Predicting labels and neighbours with emebddings trasductively or inductively from data), extends skip-gram for compute node embeddings to incorporate node-label information. Skip-gram methods are based on generating random walks through a graph then using the generated sequences to learn embeddings via a skip-gram model, we modify for supervised loss; where embeddings are fed to the following:
-  Softmax layer to predict graph context of sampled random-walk sequences
-  Set of hidden layers that combine together with hidden layers derived from node features to predict class labels

The cost function to be minimised is composed of supervised and unsupervised loss $L_s$, $L_u$ respectively. The unsupervised loss is analgous to the one used with skip-gram with negative sampling where the supervised loss minimises the conditional probability.

$L_s = -\sum\limits_{i \in L} logP(y_i | x_i, e_i)$

However this formulation is transductive as it requires samples belonging to the graph to be applied; in semi-supervised task this can efficiently be used to predict labels for unlabeled examples. However, cannot be used for unobserved samples. There is an inductive version of Planetoid by parameterising the embeddings as a function of the node features, via dedicated connected layers.

## Graph CNNs
Learn graph/node representations that can accurately predict node/graph labels. Note that the encoding function remains the same, what we change is the objective in a supervised setting!

In [23]:
import pandas as pd
from stellargraph import datasets # uses tf.Keras in backend
from IPython.display import display, HTML

dataset = datasets.PROTEINS()
display(HTML(dataset.description))
graphs, graph_labels = dataset.load()

labels = graph_labels.to_numpy(dtype=int)

# necessary for converting default string labels to int
graph_labels = pd.get_dummies(graph_labels, drop_first=True)

In [24]:
from stellargraph.mapper import PaddedGraphGenerator
# PaddedGraphGenerator automatically resolves differences in number of nodes by using padding
generator = PaddedGraphGenerator(graphs=graphs)

In [25]:
from stellargraph.layer import DeepGraphCNN
from tensorflow.keras import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Dense, Conv1D, MaxPool1D, Dropout, Flatten
from tensorflow.keras.losses import binary_crossentropy
import tensorflow as tf

nrows = 35  # the number of rows for the output tensor
layer_dims = [32, 32, 32, 1]

dgcnn_model = DeepGraphCNN(
    layer_sizes=layer_dims,
    activations=["tanh", "tanh", "tanh", "tanh"],
    k=nrows,
    bias=False,
    generator=generator,
)
gnn_inp, gnn_out = dgcnn_model.in_out_tensors()


x_out = Conv1D(filters=16, kernel_size=sum(layer_dims), strides=sum(layer_dims))(gnn_out)
x_out = MaxPool1D(pool_size=2)(x_out)

x_out = Conv1D(filters=32, kernel_size=5, strides=1)(x_out)

x_out = Flatten()(x_out)

x_out = Dense(units=128, activation="relu")(x_out)
x_out = Dropout(rate=0.5)(x_out)

predictions = Dense(units=1, activation="sigmoid")(x_out)

In [26]:
# concat backbone to one-dimensional (1D) convolutional layers

# necessary to connect backbone to head
gnn_inp, gnn_out = dgcnn_model.in_out_tensors()

# head part of model(classification)
x_out = Conv1D(filters=16, kernel_size=sum(layer_dims), strides=sum(layer_dims))(gnn_out)
x_out = MaxPool1D(pool_size=2)(x_out)
x_out = Conv1D(filters=32, kernel_size=5, strides=1)(x_out)
x_out = Flatten()(x_out)
x_out = Dense(units=128, activation="relu")(x_out)
x_out = Dropout(rate=0.5)(x_out)
predictions = Dense(units=1, activation="sigmoid")(x_out)

In [27]:
model = Model(inputs=gnn_inp, outputs=predictions)
model.compile(optimizer=Adam(lr=0.0001), loss=binary_crossentropy, metrics=["acc"])

In [28]:
model.summary()

Model: "functional_7"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_16 (InputLayer)           [(None, None, 4)]    0                                            
__________________________________________________________________________________________________
dropout_16 (Dropout)            (None, None, 4)      0           input_16[0][0]                   
__________________________________________________________________________________________________
input_18 (InputLayer)           [(None, None, None)] 0                                            
__________________________________________________________________________________________________
graph_convolution_12 (GraphConv (None, None, 32)     128         dropout_16[1][0]                 
                                                                 input_18[0][0]        

In [29]:
from sklearn import model_selection
train_graphs, test_graphs = model_selection.train_test_split(
    graph_labels, test_size=.3, stratify=labels,
)

gen = PaddedGraphGenerator(graphs=graphs)

train_gen = gen.flow(
    list(train_graphs.index - 1),
    targets=train_graphs.values,
    symmetric_normalization=False,
    batch_size=50,
)

test_gen = gen.flow(
    list(test_graphs.index - 1),
    targets=test_graphs.values,
    symmetric_normalization=False,
    batch_size=1,
)

In [31]:
epochs = 100
history = model.fit(train_gen, epochs=epochs, verbose=1, validation_data=test_gen, shuffle=True)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100

KeyboardInterrupt: 

## Node classification with GraphSage

In [32]:
dataset = datasets.Cora()
G, nodes = dataset.load()

In [33]:
train_nodes, test_nodes = train_test_split(nodes, train_size=0.1, test_size=None, stratify=nodes)

In [34]:
from sklearn import preprocessing

# convert using one-hot representation; often used for classification tasks and usually leads to better performance
label_encoding = preprocessing.LabelBinarizer()
train_labels = label_encoding.fit_transform(train_nodes)
test_labels = label_encoding.transform(test_nodes)

In [36]:
from stellargraph.mapper import GraphSAGENodeGenerator

batchsize = 50
n_samples = [10, 5, 7]
generator = GraphSAGENodeGenerator(G, batchsize, n_samples)

In [37]:
from stellargraph.layer import GraphSAGE
from tensorflow.keras.layers import Dense

graphsage_model = GraphSAGE(
    layer_sizes=[32, 32, 16], generator=generator, bias=True, dropout=0.6
)

In [38]:
gnn_inp, gnn_out = graphsage_model.in_out_tensors()
outputs = Dense(units=train_labels.shape[1], activation="softmax")(gnn_out)

In [40]:
from tensorflow.keras.losses import categorical_crossentropy
from keras.models import Model
from keras.optimizers import Adam

model = Model(inputs=gnn_inp, outputs=outputs)
model.compile(optimizer=Adam(lr=0.003), loss=categorical_crossentropy, metrics=["acc"])

Using TensorFlow backend.


In [41]:
# use flow function of generator for feeding model with train and test set
train_gen = generator.flow(train_nodes.index, train_labels, shuffle=True)
test_gen = generator.flow(test_nodes.index, test_labels)

In [42]:
history = model.fit(train_gen, epochs=20, validation_data=test_gen, verbose=2, shuffle=False)

Epoch 1/20
6/6 - 31s - loss: 1.9507 - acc: 0.2037 - val_loss: 1.8206 - val_acc: 0.3089
Epoch 2/20
6/6 - 30s - loss: 1.8350 - acc: 0.3111 - val_loss: 1.7468 - val_acc: 0.3175
Epoch 3/20
6/6 - 29s - loss: 1.8004 - acc: 0.3444 - val_loss: 1.6477 - val_acc: 0.4241
Epoch 4/20
6/6 - 29s - loss: 1.7028 - acc: 0.4333 - val_loss: 1.5628 - val_acc: 0.5509
Epoch 5/20
6/6 - 29s - loss: 1.6363 - acc: 0.5407 - val_loss: 1.4928 - val_acc: 0.5956
Epoch 6/20
6/6 - 30s - loss: 1.5909 - acc: 0.5556 - val_loss: 1.4140 - val_acc: 0.6341
Epoch 7/20
6/6 - 29s - loss: 1.5096 - acc: 0.5815 - val_loss: 1.3472 - val_acc: 0.6645
Epoch 8/20
6/6 - 30s - loss: 1.4286 - acc: 0.6481 - val_loss: 1.2907 - val_acc: 0.6743
Epoch 9/20
6/6 - 30s - loss: 1.3877 - acc: 0.6741 - val_loss: 1.2615 - val_acc: 0.6661
Epoch 10/20
6/6 - 29s - loss: 1.3119 - acc: 0.7111 - val_loss: 1.1991 - val_acc: 0.6973
Epoch 11/20
6/6 - 29s - loss: 1.2665 - acc: 0.7074 - val_loss: 1.1573 - val_acc: 0.7112
Epoch 12/20
6/6 - 29s - loss: 1.1916 - ac