# Supervised graph representation learning using Graph ConvNet

In this notebook we will be performing supervised graph representation learning using Deep Graph ConvNet as encoder.

The model embeds a graph by using stacked Graph ConvNet layers

In this demo, we will be using the PROTEINS dataset, already integrated in StellarGraph

In [None]:
import pandas as pd
from stellargraph import datasets
from IPython.display import display, HTML

dataset = datasets.PROTEINS()
display(HTML(dataset.description))
graphs, graph_labels = dataset.load()

# necessary for converting default string labels to int
graph_labels = pd.get_dummies(graph_labels, drop_first=True)

StellarGraph we are using for building the model, uses tf.Keras as backend. According to its specific, we need a data generator for feeding the model. For supervised graph classification, we create an instance of StellarGraph's PaddedGraphGenerator class. This generator supplies the features arrays and the adjacency matrices to a mini-batch Keras graph classification model. Differences in the number of nodes are resolved by padding each batch of features and adjacency matrices, and supplying a boolean mask indicating which are valid and which are padding.

In [None]:
from stellargraph.mapper import PaddedGraphGenerator
generator = PaddedGraphGenerator(graphs=graphs)

Now we are ready for actually create the model. The GCN layers will be created and stacked togheter through StellarGraph's utility function. This _backbone_ will be then concateneted to 1D Convolutional layers and Fully connected layers using tf.Keras

In [None]:
from stellargraph.layer import DeepGraphCNN
from tensorflow.keras import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Dense, Conv1D, MaxPool1D, Dropout, Flatten
from tensorflow.keras.losses import binary_crossentropy
import tensorflow as tf

k = 35  # the number of rows for the output tensor
layer_sizes = [32, 32, 32, 1]

dgcnn_model = DeepGraphCNN(
    layer_sizes=layer_sizes,
    activations=["tanh", "tanh", "tanh", "tanh"],
    k=k,
    bias=False,
    generator=generator,
)
x_inp, x_out = dgcnn_model.in_out_tensors()


x_out = Conv1D(filters=16, kernel_size=sum(layer_sizes), strides=sum(layer_sizes))(x_out)
x_out = MaxPool1D(pool_size=2)(x_out)

x_out = Conv1D(filters=32, kernel_size=5, strides=1)(x_out)

x_out = Flatten()(x_out)

x_out = Dense(units=128, activation="relu")(x_out)
x_out = Dropout(rate=0.5)(x_out)

predictions = Dense(units=1, activation="sigmoid")(x_out)

Let's now compile the model

In [None]:
model = Model(inputs=x_inp, outputs=predictions)
model.compile(optimizer=Adam(lr=0.0001), loss=binary_crossentropy, metrics=["acc"])

We use 70% of the dataset for training and the remaining for test

In [None]:
from sklearn import model_selection
train_graphs, test_graphs = model_selection.train_test_split(
    graph_labels, test_size=.3, stratify=graph_labels,
)

In [None]:
gen = PaddedGraphGenerator(graphs=graphs)

train_gen = gen.flow(
    list(train_graphs.index - 1),
    targets=train_graphs.values,
    batch_size=50,
    symmetric_normalization=False,
)

test_gen = gen.flow(
    list(test_graphs.index - 1),
    targets=test_graphs.values,
    batch_size=1,
    symmetric_normalization=False,
)

It's now time for training!

In [None]:
epochs = 100
history = model.fit(
    train_gen, epochs=epochs, verbose=1, validation_data=test_gen, shuffle=True,
)

## Supervised node representation learning using GraphSAGE

In [None]:
from stellargraph import datasets
from IPython.display import display, HTML

dataset = datasets.Cora()
display(HTML(dataset.description))
G, node_subjects = dataset.load()

Let's split the dataset into training and testing set

In [None]:
from sklearn.model_selection import train_test_split
train_subjects, test_subjects = train_test_split(
    node_subjects, train_size=0.1, test_size=None, stratify=node_subjects
)

Since we are performing a categorical classification, it is useful to represent each categorical label in its one-hot encoding

In [None]:
from sklearn import preprocessing, feature_extraction, model_selection
target_encoding = preprocessing.LabelBinarizer()
train_targets = target_encoding.fit_transform(train_subjects)
test_targets = target_encoding.transform(test_subjects)

It's now time for creating the mdoel. It will be composed by two GraphSAGE layers followed by a Dense layer with softmax activation for classification

In [None]:
from stellargraph.mapper import GraphSAGENodeGenerator
batch_size = 50
num_samples = [10, 5, 7]
generator = GraphSAGENodeGenerator(G, batch_size, num_samples)

In [None]:
from stellargraph.layer import GraphSAGE
from tensorflow.keras.layers import Dense

graphsage_model = GraphSAGE(
    layer_sizes=[32, 32, 16], generator=generator, bias=True, dropout=0.6,
)

In [None]:
x_inp, x_out = graphsage_model.in_out_tensors()
prediction = Dense(units=train_targets.shape[1], activation="softmax")(x_out)

In [None]:
from tensorflow.keras.losses import categorical_crossentropy
from keras.models import Model
from keras.optimizers import Adam

model = Model(inputs=x_inp, outputs=prediction)
model.compile(optimizer=Adam(lr=0.003), loss=categorical_crossentropy, metrics=["acc"],)

We will use the flow function of the generator for feeding the model with the train and the test set.

In [None]:
train_gen = generator.flow(train_subjects.index, train_targets, shuffle=True)
test_gen = generator.flow(test_subjects.index, test_targets)

Finally, let's train the model!

In [None]:
history = model.fit(train_gen, epochs=20, validation_data=test_gen, verbose=2, shuffle=False)