------

<div> 
    <center><h5>Higher Order Tutorial on Deep Learning</h5></center>
    <center><strong><h2>Graph Convolution Networks</h2></strong></center>
    <center><strong><h3>1.0.0 - Graph Classification</h3></strong></center> 
<div>

------

### Keras DGL - Node Classification:
##  `tl;dr:  MutliGraphCNN(output_dim, num_filters)([X,Adj])`

Importing: 
```python
from keras_dgl.layers import MutliGraphCNN
```

Just like any keras model: 
```python
output = MultiGraphCNN(100, num_filters, activation='elu')([X, Adj])
output = MultiGraphCNN(100, num_filters, activation='elu')([output, Adj])
output = Lambda(lambda x: K.mean(x, axis=1))(output)  
```

------

# Graph Neural Networks


Mathematically, the GCN model follows this formula:

$H^{(l+1)} = \sigma(\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}H^{(l)}W^{(l)})$

Here, $H^{(l)}$ denotes the $l^{th}$ layer in the network,
$\sigma$ is the non-linearity, and $W$ is the weight matrix for
this layer. $D$ and $A$, as commonly seen, represent degree
matrix and adjacency matrix, respectively. The ~ is a renormalization trick
in which we add a self-connection to each node of the graph, and build the
corresponding degree and adjacency matrix.  The shape of the input
$H^{(0)}$ is $N \times D$, where $N$ is the number of nodes
and $D$ is the number of input features. We can chain up multiple
layers as such to produce a node-level representation output with shape
$N \times F$, where $F$ is the dimension of the output node
feature vector.

The equation can be efficiently implemented using sparse matrix
multiplication kernels (such as Kipf's
`https://github.com/tkipf/pygcn`). The above DGL implementation
in fact has already used this trick due to the use of builtin functions. To
understand what is under the hood, please read the tutorial on page rank specified in this repository.

__References__: <br />
[1] Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016). <br />
[2] Defferrard, Michaël, Xavier Bresson, and Pierre Vandergheynst. "Convolutional neural networks on graphs with fast localized spectral filtering." In Advances in Neural Information Processing Systems, pp. 3844-3852. 2016. <br />
[3] Simonovsky, Martin, and Nikos Komodakis. "Dynamic edge-conditioned filters in convolutional neural networks on graphs." In Proc. CVPR. 2017. <br />

In [1]:
%%bash
if [ ! -d "keras-deep-graph-learning" ] ; then git clone https://github.com/ypeleg/keras-deep-graph-learning; fi

In [2]:
from tachles import fix_gcn_paths, load_mutag

Using TensorFlow backend.


In [3]:
fix_gcn_paths()
import keras_dgl
from keras_dgl.layers import MultiGraphCNN, MultiGraphAttentionCNN
from examples.utils import normalize_adj_numpy, evaluate_preds, preprocess_edge_adj_tensor

## The MUTAG Dataset

The MUTAG dataset is distributed baseline dataset for graph learning. It contains information about 340 complex molecules that are potentially carcinogenic, which is given by the isMutagenic property.

The molecules can be classified as “mutagenic” or “not mutagenic”.

In [4]:
A, A_orig, X, Y, num_edge_features, num_graph_nodes, num_graphs, orig_num_graph_nodes, orig_num_graphs = load_mutag()
print X.shape, Y.shape, A.shape

(188, 28, 7) (188, 2) (188, 308, 28)


In [5]:
import keras.backend as K
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

from keras.layers import Dense, Activation, Dropout, Input, Lambda
from keras.models import Model, Sequential
from keras.callbacks import Callback
from keras.regularizers import l2
from keras.optimizers import Adam

In [6]:
def plot_graph(adjacency_matrix):
    rows, cols = np.where(adjacency_matrix == 1)
    edges = zip(rows.tolist(), cols.tolist())
    gr = nx.Graph()
    gr.add_edges_from(edges)
    fig, ax = plt.subplots(1, 1, figsize=(6, 6))
    nx.draw_networkx(gr, ax=ax, with_labels=False, node_size=5, width=.5)
    ax.set_axis_off()
    plt.show()
    plt.close()

In [7]:
print X[0]
# plot_graph(A)

[[0 0 1 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 0 0 0 1 0]
 [0 0 0 0 0 0 1]
 [0 0 0 0 0 0 1]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]]


----

<span style="float:right;">[[source]](https://github.com/vermaMachineLearning/keras-deep-graph-learning/blob/master/keras_dgl/layers/multi_graph_cnn_layer.py#L9)</span>
## MutliGraphCNN

```python
MutliGraphCNN(output_dim, num_filters, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

MutliGraphCNN assumes that the number of nodes for each graph in the dataset is same. For graph with arbitrary size, one can simply append appropriate zero rows or columns in adjacency matrix (and node feature matrix) based on max graph size in the dataset to achieve this uniformity.

__Arguments__

- __output_dim__: Positive integer, dimensionality of each graph node feature output space (or also referred dimension of graph node embedding).
- __num_filters__: Positive integer, number of graph filters used for constructing  __graph_conv_filters__ input.
- __activation__: Activation function to use
.
If you don't specify anything, no activation is applied
(ie. "linear" activation: `a(x) = x`).
- __use_bias__: Boolean, whether the layer uses a bias vector.
- __kernel_initializer__: Initializer for the `kernel` weights matrix
.
- __bias_initializer__: Initializer for the bias vector
.
- __kernel_regularizer__: Regularizer function applied to
the `kernel` weights matrix
.
- __bias_regularizer__: Regularizer function applied to the bias vector
.
- __activity_regularizer__: Regularizer function applied to
the output of the layer (its "activation").
.
- __kernel_constraint__: Constraint function applied to the kernel matrix
.
- __bias_constraint__: Constraint function applied to the bias vector
.

__Input shapes__

* __graph node feature matrix__ input as a 3D tensor with shape: `(batch_size, num_graph_nodes, input_dim)` corresponding to graph node input feature matrix for each graph.<br />
* __graph_conv_filters__ input as a 3D tensor with shape: `(batch_size, num_filters*num_graph_nodes, num_graph_nodes)` <br />
`num_filters` is different number of graph convolution filters to be applied on graph. For instance `num_filters` could be power of graph Laplacian.<br />

__Output shape__

* 3D tensor with shape: `(batch_size, num_graph_nodes, output_dim)`	representing convoluted output graph node embedding matrix for each graph in batch size.<br />



<span style="float:right;">[[source]](https://github.com/vermaMachineLearning/keras-deep-graph-learning/blob/master/examples/multi_gcnn_graph_classification_example.py)</span>

## The model itself

In [8]:
num_filters = num_edge_features
graph_conv_filters = preprocess_edge_adj_tensor(A, symmetric=True)

In [9]:
X_input = Input(shape=(X.shape[1], X.shape[2]))
graph_conv_filters_input = Input(shape=(graph_conv_filters.shape[1], graph_conv_filters.shape[2]))

output = MultiGraphCNN(100, num_filters, activation='elu')([X_input, graph_conv_filters_input])
output = Dropout(0.2)(output)
output = MultiGraphCNN(100, num_filters, activation='elu')([output, graph_conv_filters_input])
output = Dropout(0.2)(output)
output = Lambda(lambda x: K.mean(x, axis=1))(output)  
output = Dense(Y.shape[1])(output)
output = Activation('softmax')(output)

nb_epochs = 200
batch_size = 169

model = Model(inputs=[X_input, graph_conv_filters_input], outputs=output)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 28, 7)        0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            (None, 308, 28)      0                                            
__________________________________________________________________________________________________
multi_graph_cnn_1 (MultiGraphCN (None, 28, 100)      7800        input_1[0][0]                    
                                                                 input_2[0][0]                    
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 28, 100)      0           multi_graph_cnn_1[0][0]          
__________

In [None]:
model.fit([X, graph_conv_filters], Y, batch_size=batch_size, validation_split=0.1, epochs=nb_epochs, shuffle=True, verbose=1)

Train on 169 samples, validate on 19 samples
Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200


Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78/200
Epoch 79/200
Epoch 80/200
Epoch 81/200
Epoch 82/200
Epoch 83/200
Epoch 84/200
Epoch 85/200
Epoch 86/200
Epoch 87/200
Epoch 88/200
Epoch 89/200
Epoch 90/200
Epoch 91/200
Epoch 92/200
Epoch 93/200
Epoch 94/200
Epoch 95/200
Epoch 96/200
Epoch 97/200
Epoch 98/200
Epoch 99/200
Epoch 100/200
Epoch 101/200
Epoch 102/200
Epoch 103/200
Epoch 104/200


## Your Turn! 
### Run The same but this time with Attention CGNN!

## MultiGraphAttentionCNN

```python
MutliGraphCNN(output_dim, num_filters, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

MutliGraphCNN assumes that the number of nodes for each graph in the dataset is same. For graph with arbitrary size, one can simply append appropriate zero rows or columns in adjacency matrix (and node feature matrix) based on max graph size in the dataset to achieve this uniformity.

__Arguments__

- __output_dim__: Positive integer, dimensionality of each graph node feature output space (or also referred dimension of graph node embedding).
- __num_filters__: Positive integer, number of graph filters used for constructing  __graph_conv_filters__ input.
- __activation__: Activation function to use
.
If you don't specify anything, no activation is applied

- __use_bias__: Boolean, whether the layer uses a bias vector.
- __kernel_initializer__: Initializer for the `kernel` weights matrix

- __bias_initializer__: Initializer for the bias vector

- __kernel_regularizer__: Regularizer function applied to
the `kernel` weights matrix

- __bias_regularizer__: Regularizer function applied to the bias vector

- __activity_regularizer__: Regularizer function applied to
the output of the layer (its "activation").

- __kernel_constraint__: Constraint function applied to the kernel matrix

- __bias_constraint__: Constraint function applied to the bias vector


__Input shapes__

* __graph node feature matrix__ input as a 3D tensor with shape: `(batch_size, num_graph_nodes, input_dim)` corresponding to graph node input feature matrix for each graph.<br />
* __graph_conv_filters__ input as a 3D tensor with shape: `(batch_size, num_filters*num_graph_nodes, num_graph_nodes)` <br />
`num_filters` is different number of graph convolution filters to be applied on graph. For instance `num_filters` could be power of graph Laplacian.<br />

__Output shape__

* 3D tensor with shape: `(batch_size, num_graph_nodes, output_dim)`	representing convoluted output graph node embedding matrix for each graph in batch size.<br />



<span style="float:right;">[[source]](https://github.com/vermaMachineLearning/keras-deep-graph-learning/blob/master/examples/multi_graph_attention_cnn_graph_classification_example.py)</span>

In [None]:
num_filters = 2
print A.shape

In [None]:
A_eye_tensor = []
for _ in range(orig_num_graphs):
    Identity_matrix = np.eye(orig_num_graph_nodes)
    A_eye_tensor.append(Identity_matrix)

A_eye_tensor = np.array(A_eye_tensor)
A_orig = np.add(A_orig, A_eye_tensor)
graph_conv_filters = preprocess_edge_adj_tensor(A_orig, symmetric=True)

In [None]:
# build model
X_input = Input(shape=(X.shape[1], X.shape[2]))
A_input = Input(shape=(A_orig.shape[1], A_orig.shape[2]))
graph_conv_filters_input = Input(shape=(graph_conv_filters.shape[1], graph_conv_filters.shape[2]))

output = MultiGraphAttentionCNN(100, num_filters=num_filters, num_attention_heads=2, attention_combine='concat', attention_dropout=0.5, activation='elu', kernel_regularizer=l2(5e-4))([X_input, A_input, graph_conv_filters_input])
output = Dropout(0.2)(output)
output = MultiGraphAttentionCNN(100, num_filters=num_filters, num_attention_heads=1, attention_combine='average', attention_dropout=0.5, activation='elu', kernel_regularizer=l2(5e-4))([output, A_input, graph_conv_filters_input])
output = Dropout(0.2)(output)
output = Lambda(lambda x: K.mean(x, axis=1))(output)  # adding a node invariant layer to make sure output does not depends upon the node order in a graph.
output = Dense(Y.shape[1], activation='elu')(output)
output = Activation('softmax')(output)

nb_epochs = 500
batch_size = 169

model = Model(inputs=[X_input, A_input, graph_conv_filters_input], outputs=output)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
model.fit([X, A_orig, graph_conv_filters], Y, batch_size=batch_size, validation_split=0.1, epochs=nb_epochs, shuffle=True, verbose=1)