# GNN Presentation
------------------------------------------------------
# Authors
    - Selim Lakhdar
        - selim.lakhdar.etu@univ-lille.fr
    - Josue Happe
        - josue.hape.etu@univ-lille.fr
------------------------------------------------------

# Part I
- Le premier exemple est inspiré de Graph Random Neural Network(GRAND) que propose la librairie DGL.
    - https://github.com/dmlc/dgl/tree/master/examples/pytorch/grand
    - https://arxiv.org/abs/2005.11079
    
### Abstract
We study the problem of semi-supervised learning on graphs, for which graph neural networks (GNNs) have been extensively explored. However, most existing GNNs inherently suffer from the limitations of over-smoothing, non-robustness, and weak-generalization when labeled nodes are scarce. In this paper, we propose a simple yet effective framework -- GRAPH RANDOM NEURAL NETWORKS (GRAND) -- to address these issues. In GRAND, we first design a random propagation strategy to perform graph data augmentation. Then we leverage consistency regularization to optimize the prediction consistency of unlabeled nodes across different data augmentations. Extensive experiments on graph benchmark datasets suggest that GRAND significantly outperforms state-of-the-art GNN baselines on semi-supervised node classification. Finally, we show that GRAND mitigates the issues of over-smoothing and non-robustness, exhibiting better generalization behavior than existing GNNs. The source code of GRAND is publicly available at https://github.com/THUDM/GRAND 

----------------------------------------------------------


# Part II
- Dans la seconde partie nous avons essayé un GraphCONV sur le dataset CoauthorCSDataset. Le but est de prédire la catégorie d'un article en ayant déjà quelques observations. Chaque noeud représente un word count vector comme features.
```
Coauthor CS and Coauthor Physics are co-authorship graphs based on the Microsoft Academic Graph from the KDD Cup 2016 challenge. Here, nodes are authors, that are connected by an edge if they co-authored a paper; node features represent paper keywords for each author’s papers, and class labels indicate most active fields of study for each author.
```
    - Statistics:
        - Nodes: 18,333
        - Edges: 327,576
        - Number of classes: 15
        - Node feature size: 6,805
----------------------------------------------------------


# Part III
- Dans la 3e partie nous avons fait de la prédiction de liens sur le dataset CoraGraphDataset.
```
Cora citation network dataset : Nodes mean paper and edges mean citation relationships. Each node has a predefined feature with 1433 dimensions. The dataset is designed for the node classification task. The task is to predict the category of certain paper.
```
    - Statistics:
        - Nodes: 2708
        - Edges: 10556
        - Number of Classes: 7
        
----------------------------------------------------------
----------------------------------------------------------

# Imports

In [1]:
import dgl
from dgl.data import CoraGraphDataset, CiteseerGraphDataset, PubmedGraphDataset
import dgl.function as fn

import torch as th
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import numpy as np
import itertools
import scipy.sparse as sp

if th.cuda.is_available():
    device = 'cuda:0'
else:
    device = 'cpu'
print("device:", device)
# force it
device = "cpu"

Using backend: pytorch


device: cuda:0


In [2]:
dgl.__version__=="0.7.2"

True

# Datasets

In [3]:
datasets = {
    'CoraGraphDataset': CoraGraphDataset(),
    'CiteseerGraphDataset': CiteseerGraphDataset(),
    'PubmedGraphDataset': PubmedGraphDataset()
}

  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.
  NumNodes: 3327
  NumEdges: 9228
  NumFeats: 3703
  NumClasses: 6
  NumTrainingSamples: 120
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.
  NumNodes: 19717
  NumEdges: 88651
  NumFeats: 500
  NumClasses: 3
  NumTrainingSamples: 60
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.


# Part I
    - https://arxiv.org/pdf/2005.11079.pdf 

```
To effectively augment graph data, we propose random propagation in GRAND, wherein each node’s
features can be randomly dropped either partially (dropout) or entirely, after which the perturbed
feature matrix is propagated over the graph. As a result, each node is enabled to be insensitive
to specific neighborhoods, increasing the robustness of GRAND. Further, the design of random
propagation can naturally separate feature propagation and transformation, which are commonly
coupled with each other in most GNNs. This empowers GRAND to safely perform higher-order feature
propagation without increasing the complexity, reducing the risk of over-smoothing for GRAND. More
importantly, random propagation enables each node to randomly pass messages to its neighborhoods.
Under the assumption of homophily of graph data [30], we are able to stochastically generate different
augmented representations for each node. We then utilize consistency regularization to enforce the
prediction model, e.g., a simple Multilayer Perception (MLP), to output similar predictions on
different augmentations of the same unlabeled data, improving GRAND’s generalization behavior
under the semi-supervised setting.
Finally, we theoretically illustrate that random propagation and consistency regularization can enforce
the consistency of classification confidence between each node and its multi-hop neighborhoods.
Empirically, we also show both strategies can improve the generalization of GRAND, and mitigate the
issues of non-robustness and over-smoothing that are commonly faced by existing GNNs. Altogether,
extensive experiments demonstrate that GRAND achieves state-of-the-art semi-supervised learning
results on GNN benchmark datasets.
```

![](grand_model.png)

# Model

In [4]:
def drop_node(feats, drop_rate, training):
    
    n = feats.shape[0]
    drop_rates = th.FloatTensor(np.ones(n) * drop_rate)
    
    if training:
            
        masks = th.bernoulli(1. - drop_rates).unsqueeze(1)
        feats = masks.to(feats.device) * feats
        
    else:
        feats = feats * (1. - drop_rate)

    return feats

class MLP(nn.Module):
    def __init__(self, nfeat, nhid, nclass, input_droprate, hidden_droprate, use_bn =False):
        super(MLP, self).__init__()
        
        self.layer1 = nn.Linear(nfeat, nhid, bias = True)
        self.layer2 = nn.Linear(nhid, nclass, bias = True)

        self.input_dropout = nn.Dropout(input_droprate)
        self.hidden_dropout = nn.Dropout(hidden_droprate)
        self.bn1 = nn.BatchNorm1d(nfeat)
        self.bn2 = nn.BatchNorm1d(nhid)
        self.use_bn = use_bn
    
    def reset_parameters(self):
        self.layer1.reset_parameters()
        self.layer2.reset_parameters()
        
    def forward(self, x):
         
        if self.use_bn: 
            x = self.bn1(x)
        x = self.input_dropout(x)
        x = F.relu(self.layer1(x))
        
        if self.use_bn:
            x = self.bn2(x)
        x = self.hidden_dropout(x)
        x = self.layer2(x)

        return x   
        

def GRANDConv(graph, feats, order):
    '''
    Parameters
    -----------
    graph: dgl.Graph
        The input graph
    feats: Tensor (n_nodes * feat_dim)
        Node features
    order: int 
        Propagation Steps
    '''
    with graph.local_scope():
        
        ''' Calculate Symmetric normalized adjacency matrix   \hat{A} '''
        degs = graph.in_degrees().float().clamp(min=1)
        norm = th.pow(degs, -0.5).to(feats.device).unsqueeze(1)

        graph.ndata['norm'] = norm
        graph.apply_edges(fn.u_mul_v('norm', 'norm', 'weight')) 
        
        ''' Graph Conv '''
        x = feats
        y = 0+feats

        for i in range(order):
            graph.ndata['h'] = x
            graph.update_all(fn.u_mul_e('h', 'weight', 'm'), fn.sum('m', 'h'))
            x = graph.ndata.pop('h')
            y.add_(x)

    return y /(order + 1)

class GRAND(nn.Module):
    r"""
    Parameters
    -----------
    in_dim: int
        Input feature size. i.e, the number of dimensions of: math: `H^{(i)}`.
    hid_dim: int
        Hidden feature size.
    n_class: int
        Number of classes.
    S: int
        Number of Augmentation samples
    K: int
        Number of Propagation Steps
    node_dropout: float
        Dropout rate on node features.
    input_dropout: float
        Dropout rate of the input layer of a MLP
    hidden_dropout: float
        Dropout rate of the hidden layer of a MLPx
    batchnorm: bool, optional
        If True, use batch normalization.
    """
    def __init__(self,
                 in_dim,
                 hid_dim,
                 n_class,
                 S = 1,
                 K = 3,
                 node_dropout=0.0,
                 input_droprate = 0.0, 
                 hidden_droprate = 0.0,
                 batchnorm=False):

        super(GRAND, self).__init__()
        self.in_dim = in_dim
        self.hid_dim = hid_dim
        self.S = S
        self.K = K
        self.n_class = n_class
        
        self.mlp = MLP(in_dim, hid_dim, n_class, input_droprate, hidden_droprate, batchnorm)
        
        self.dropout = node_dropout
        self.node_dropout = nn.Dropout(node_dropout)

    def forward(self, graph, feats, training = True):
        
        X = feats
        S = self.S
        
        if training: # Training Mode
            output_list = []
            for s in range(S):
                drop_feat = drop_node(X, self.dropout, True)  # Drop node
                feat = GRANDConv(graph, drop_feat, self.K)    # Graph Convolution
                output_list.append(th.log_softmax(self.mlp(feat), dim=-1))  # Prediction
        
            return output_list
        else:   # Inference Mode
            drop_feat = drop_node(X, self.dropout, False) 
            X =  GRANDConv(graph, drop_feat, self.K)

            return th.log_softmax(self.mlp(X), dim = -1)

# Parameters

### Training parameters

In [5]:
# Training epochs.
nb_epochs = 200
# Patient epochs to wait before early stopping.
early_stopping = 200
# Learning rate
lr = 0.01
# L2 reg.
weight_decay = 5e-4

### Model parameters

In [6]:
# Hidden layer dimensionalities
hid_dim = 32
# Dropnode rate (1 - keep probability).
dropnode_rate = 0.5
# Dropout rate of input layer
input_droprate = 0.0
# Dropout rate of hidden layer
hidden_droprate = 0.0
# Propagation step
order = 8
# Sampling times of dropnode
sample = 4
# Sharpening temperature
tem = 0.5
# Coefficient of consistency regularization
lam = 1.0
# Using Batch Normalization
use_bn = False

# Utils

In [7]:
def consis_loss(logps, temp, lam):
    ps = [th.exp(p) for p in logps]
    ps = th.stack(ps, dim = 2)
    
    avg_p = th.mean(ps, dim = 2)
    sharp_p = (th.pow(avg_p, 1./temp) / th.sum(th.pow(avg_p, 1./temp), dim=1, keepdim=True)).detach()

    sharp_p = sharp_p.unsqueeze(2)
    loss = th.mean(th.sum(th.pow(ps - sharp_p, 2), dim = 1, keepdim=True))

    loss = lam * loss
    return loss

In [8]:
def data_split(dataset):
    graph = dataset[0]
    graph = dgl.add_self_loop(graph)
    
    # retrieve the number of classes
    n_classes = dataset.num_classes

    # retrieve labels of ground truth
    labels = graph.ndata.pop('label').to(device).long()
    
    # Extract node features
    feats = graph.ndata.pop('feat').to(device)
    n_features = feats.shape[-1]
    
    # retrieve masks for train/validation/test
    train_mask = graph.ndata.pop('train_mask')
    val_mask = graph.ndata.pop('val_mask')
    test_mask = graph.ndata.pop('test_mask')
    
    train_idx = th.nonzero(train_mask, as_tuple=False).squeeze().to(device)
    val_idx = th.nonzero(val_mask, as_tuple=False).squeeze().to(device)
    test_idx = th.nonzero(test_mask, as_tuple=False).squeeze().to(device)
    
    return graph, n_classes, labels, feats, n_features, train_idx, val_idx, test_idx

-----------------------------------

In [9]:
dataname = 'CoraGraphDataset'
graph, n_classes, labels, feats, n_features, train_idx, val_idx, test_idx = data_split(datasets[dataname])

In [10]:
# Create model
model = GRAND(n_features, hid_dim, n_classes, sample, order,
                  dropnode_rate, input_droprate, 
                  hidden_droprate, use_bn)

model = model.to(device)
graph = graph.to(device)

In [11]:
# Create training components
loss_fn = nn.NLLLoss()
opt = optim.Adam(model.parameters(), lr = lr, weight_decay = weight_decay)

loss_best = np.inf
acc_best = 0

In [12]:
# training epoches
for epoch in range(nb_epochs):

    ''' Training '''
    model.train()

    loss_sup = 0
    logits = model(graph, feats, True)

    # calculate supervised loss
    for k in range(sample):
        loss_sup += F.nll_loss(logits[k][train_idx], labels[train_idx])

    loss_sup = loss_sup / sample

    # calculate consistency loss
    loss_consis = consis_loss(logits, tem, lam)

    loss_train = loss_sup + loss_consis
    acc_train = th.sum(logits[0][train_idx].argmax(dim=1) == labels[train_idx]).item() / len(train_idx)

    # backward
    opt.zero_grad()
    loss_train.backward()
    opt.step()

    ''' Validating '''
    model.eval()
    with th.no_grad():

        val_logits = model(graph, feats, False)

        loss_val = F.nll_loss(val_logits[val_idx], labels[val_idx])
        acc_val = th.sum(val_logits[val_idx].argmax(dim=1) == labels[val_idx]).item() / len(val_idx)

        # Print out performance
        print("In epoch {}, Train Acc: {:.4f} | Train Loss: {:.4f} ,Val Acc: {:.4f} | Val Loss: {:.4f}".
              format(epoch, acc_train, loss_train.item(), acc_val, loss_val.item()))

        # set early stopping counter
        if loss_val < loss_best or acc_val > acc_best:
            if loss_val < loss_best:
                best_epoch = epoch
                th.save(model.state_dict(), dataname + '.pkl')
            no_improvement = 0
            loss_best = min(loss_val, loss_best)
            acc_best = max(acc_val, acc_best)
        else:
            no_improvement += 1
            if no_improvement == early_stopping:
                print('Early stopping.')
                break

print("Optimization Finished!")

print('Loading {}th epoch'.format(best_epoch))
model.load_state_dict(th.load(dataname + '.pkl'))

''' Testing '''
model.eval()

test_logits = model(graph, feats, False)
test_acc = th.sum(test_logits[test_idx].argmax(dim=1) == labels[test_idx]).item() / len(test_idx)

print("Test Acc: {:.4f}".format(test_acc))


In epoch 0, Train Acc: 0.1500 | Train Loss: 1.9560 ,Val Acc: 0.1240 | Val Loss: 1.9221
In epoch 1, Train Acc: 0.1571 | Train Loss: 1.9529 ,Val Acc: 0.3660 | Val Loss: 1.9244
In epoch 2, Train Acc: 0.2786 | Train Loss: 1.9499 ,Val Acc: 0.3560 | Val Loss: 1.9265
In epoch 3, Train Acc: 0.2214 | Train Loss: 1.9467 ,Val Acc: 0.3420 | Val Loss: 1.9289
In epoch 4, Train Acc: 0.2000 | Train Loss: 1.9436 ,Val Acc: 0.3400 | Val Loss: 1.9309
In epoch 5, Train Acc: 0.1929 | Train Loss: 1.9398 ,Val Acc: 0.3340 | Val Loss: 1.9329
In epoch 6, Train Acc: 0.1857 | Train Loss: 1.9367 ,Val Acc: 0.3400 | Val Loss: 1.9351
In epoch 7, Train Acc: 0.2857 | Train Loss: 1.9328 ,Val Acc: 0.3160 | Val Loss: 1.9372
In epoch 8, Train Acc: 0.4143 | Train Loss: 1.9294 ,Val Acc: 0.1640 | Val Loss: 1.9389
In epoch 9, Train Acc: 0.3714 | Train Loss: 1.9249 ,Val Acc: 0.1180 | Val Loss: 1.9399
In epoch 10, Train Acc: 0.2071 | Train Loss: 1.9229 ,Val Acc: 0.1160 | Val Loss: 1.9399
In epoch 11, Train Acc: 0.2357 | Train Los

In epoch 94, Train Acc: 0.9143 | Train Loss: 0.8763 ,Val Acc: 0.8140 | Val Loss: 1.0345
In epoch 95, Train Acc: 0.9143 | Train Loss: 0.8494 ,Val Acc: 0.8060 | Val Loss: 1.0308
In epoch 96, Train Acc: 0.9214 | Train Loss: 0.9062 ,Val Acc: 0.8040 | Val Loss: 1.0296
In epoch 97, Train Acc: 0.9143 | Train Loss: 0.8921 ,Val Acc: 0.8040 | Val Loss: 1.0215
In epoch 98, Train Acc: 0.9000 | Train Loss: 0.8479 ,Val Acc: 0.8040 | Val Loss: 1.0118
In epoch 99, Train Acc: 0.8714 | Train Loss: 0.8712 ,Val Acc: 0.8080 | Val Loss: 1.0036
In epoch 100, Train Acc: 0.8929 | Train Loss: 0.8407 ,Val Acc: 0.8080 | Val Loss: 0.9947
In epoch 101, Train Acc: 0.8857 | Train Loss: 0.8721 ,Val Acc: 0.8080 | Val Loss: 0.9863
In epoch 102, Train Acc: 0.8714 | Train Loss: 0.8258 ,Val Acc: 0.8040 | Val Loss: 0.9816
In epoch 103, Train Acc: 0.8643 | Train Loss: 0.8406 ,Val Acc: 0.8100 | Val Loss: 0.9811
In epoch 104, Train Acc: 0.9357 | Train Loss: 0.7904 ,Val Acc: 0.8060 | Val Loss: 0.9721
In epoch 105, Train Acc: 0.

In epoch 187, Train Acc: 0.9071 | Train Loss: 0.6089 ,Val Acc: 0.8160 | Val Loss: 0.7538
In epoch 188, Train Acc: 0.9143 | Train Loss: 0.6054 ,Val Acc: 0.8160 | Val Loss: 0.7547
In epoch 189, Train Acc: 0.9500 | Train Loss: 0.5848 ,Val Acc: 0.8120 | Val Loss: 0.7589
In epoch 190, Train Acc: 0.9429 | Train Loss: 0.6204 ,Val Acc: 0.8140 | Val Loss: 0.7643
In epoch 191, Train Acc: 0.9286 | Train Loss: 0.6115 ,Val Acc: 0.8120 | Val Loss: 0.7717
In epoch 192, Train Acc: 0.9357 | Train Loss: 0.6028 ,Val Acc: 0.8100 | Val Loss: 0.7753
In epoch 193, Train Acc: 0.9429 | Train Loss: 0.5918 ,Val Acc: 0.8080 | Val Loss: 0.7719
In epoch 194, Train Acc: 0.9000 | Train Loss: 0.6188 ,Val Acc: 0.8100 | Val Loss: 0.7651
In epoch 195, Train Acc: 0.9357 | Train Loss: 0.6104 ,Val Acc: 0.8100 | Val Loss: 0.7569
In epoch 196, Train Acc: 0.9286 | Train Loss: 0.5956 ,Val Acc: 0.8100 | Val Loss: 0.7498
In epoch 197, Train Acc: 0.9214 | Train Loss: 0.5876 ,Val Acc: 0.8160 | Val Loss: 0.7522
In epoch 198, Train A

--------------------------------------
# Part II : GraphCONV
    - Dataset: CoauthorCSDataset

In [13]:
dataset2 = dgl.data.CoauthorCSDataset()
graph2 = dataset2[0]
print('Number of categories:', dataset2.num_classes)

Number of categories: 15


In [14]:
from dgl.nn import GraphConv

# Le modèle de graphes convolution network utilisé
class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h

In [15]:
def train(g, model):
    optimizer = optim.Adam(model.parameters(), lr=0.01)
    best_val_acc = 0
    best_test_acc = 0

    # On créé les masques pour créer les ensembles de train, test et validation
    nb_nodes = len(g.ndata['feat'])
    mask1 = int(nb_nodes*0.2)
    mask2 = int(nb_nodes*0.7)
    features = g.ndata['feat']
    labels = g.ndata['label']
    
    train_mask = np.zeros(nb_nodes,dtype=bool)
    train_mask[:mask1] = 1
    train_mask = th.from_numpy(train_mask)
    
    val_mask = np.zeros(nb_nodes,dtype=bool)
    val_mask[mask1:mask2] = 1
    val_mask = th.from_numpy(val_mask)
    
    test_mask = np.zeros(nb_nodes,dtype=bool)
    test_mask[mask2:] = 1
    test_mask = th.from_numpy(test_mask)
    
    # train loop
    for e in range(nb_epochs):
        logits = model(g, features)

        pred = logits.argmax(1)

        # On calcule la loss
        loss = F.cross_entropy(logits[train_mask], labels[train_mask])

        # On calcule la précision sur chaque set
        train_acc = (pred[train_mask] == labels[train_mask]).float().mean()
        val_acc = (pred[val_mask] == labels[val_mask]).float().mean()
        test_acc = (pred[test_mask] == labels[test_mask]).float().mean()

        # On garde la meilleure validation accuracy
        if best_val_acc < val_acc:
            best_val_acc = val_acc
            best_test_acc = test_acc

        # Backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if e % 5 == 0:
            print('Epoque {}, loss: {:.3f}, val acc: {:.3f} (best {:.3f}), test acc: {:.3f} (best {:.3f})'.format(
                e, loss, val_acc, best_val_acc, test_acc, best_test_acc))

In [16]:
model2 = GCN(graph2.ndata['feat'].shape[1], 16, dataset2.num_classes)
train(graph2, model2)

Epoque 0, loss: 2.727, val acc: 0.120 (best 0.120), test acc: 0.056 (best 0.056)
Epoque 5, loss: 0.824, val acc: 0.570 (best 0.570), test acc: 0.294 (best 0.294)
Epoque 10, loss: 0.471, val acc: 0.732 (best 0.732), test acc: 0.459 (best 0.459)
Epoque 15, loss: 0.278, val acc: 0.818 (best 0.818), test acc: 0.609 (best 0.609)
Epoque 20, loss: 0.197, val acc: 0.843 (best 0.843), test acc: 0.608 (best 0.608)
Epoque 25, loss: 0.148, val acc: 0.874 (best 0.874), test acc: 0.680 (best 0.680)
Epoque 30, loss: 0.115, val acc: 0.891 (best 0.891), test acc: 0.726 (best 0.726)
Epoque 35, loss: 0.094, val acc: 0.901 (best 0.901), test acc: 0.727 (best 0.727)
Epoque 40, loss: 0.080, val acc: 0.898 (best 0.901), test acc: 0.723 (best 0.727)
Epoque 45, loss: 0.067, val acc: 0.900 (best 0.901), test acc: 0.739 (best 0.727)
Epoque 50, loss: 0.058, val acc: 0.899 (best 0.901), test acc: 0.745 (best 0.727)
Epoque 55, loss: 0.051, val acc: 0.897 (best 0.901), test acc: 0.755 (best 0.727)
Epoque 60, loss: 0

# Part III: Link Prediction
   - Link Prediction avec : CoraGraphDataset
       - Prediction de l'existance d'une arrête entre deux noeuds arbitraire !

In [17]:
dataset3 = dgl.data.CoraGraphDataset()
g3 = dataset3[0]

  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.


In [18]:
# On répartit les arêtes dans des ensembles de train et de test (20% des données en test ici)
u, v = g3.edges()

eids = np.arange(g3.number_of_edges())
eids = np.random.permutation(eids)
test_size = int(len(eids) * 0.2)
train_size = g3.number_of_edges() - test_size
test_pos_u, test_pos_v = u[eids[:test_size]], v[eids[:test_size]]
train_pos_u, train_pos_v = u[eids[test_size:]], v[eids[test_size:]]

adj = sp.coo_matrix((np.ones(len(u)), (u.numpy(), v.numpy())))
adj_neg = 1 - adj.todense() - np.eye(g3.number_of_nodes())
neg_u, neg_v = np.where(adj_neg != 0)

# On sépare les noeuds de chaque arête
neg_eids = np.random.choice(len(neg_u), g3.number_of_edges())
test_neg_u, test_neg_v = neg_u[neg_eids[:test_size]], neg_v[neg_eids[:test_size]]
train_neg_u, train_neg_v = neg_u[neg_eids[test_size:]], neg_v[neg_eids[test_size:]]

# On crée également le graphe ne contenant que les arêtes de train (le graphe - les arêtes de test)
train_g3 = dgl.remove_edges(g3, eids[:test_size])

In [19]:
from dgl.nn import SAGEConv

# On créé un modèle GrapheSage
class GraphSAGE(nn.Module):
    def __init__(self, in_feats, h_feats):
        super(GraphSAGE, self).__init__()
        self.conv1 = SAGEConv(in_feats, h_feats, 'mean')
        self.conv2 = SAGEConv(h_feats, h_feats, 'mean')

    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h

In [20]:
# On créé 2 graphes pour chaque l'ensemble de train et de test, puisque on fait de la prédiction d'arêtes donc de paires de noeuds
train_pos_g3 = dgl.graph((train_pos_u, train_pos_v), num_nodes=g3.number_of_nodes())
train_neg_g3 = dgl.graph((train_neg_u, train_neg_v), num_nodes=g3.number_of_nodes())

test_pos_g3 = dgl.graph((test_pos_u, test_pos_v), num_nodes=g3.number_of_nodes())
test_neg_g3 = dgl.graph((test_neg_u, test_neg_v), num_nodes=g3.number_of_nodes())

In [21]:
import dgl.function as fn

class DotPredictor(nn.Module):
    def forward(self, g, h):
        with g.local_scope():
            g.ndata['h'] = h
            # On créé une nouvelle arête entre 2 neouds de feature 'h'
            g.apply_edges(fn.u_dot_v('h', 'h', 'score'))
            return g.edata['score'][:, 0]

In [22]:
model = GraphSAGE(train_g3.ndata['feat'].shape[1], 16)
pred = DotPredictor()

# Calcul de la perte
def compute_loss(pos_score, neg_score):
    scores = th.cat([pos_score, neg_score])
    labels = th.cat([th.ones(pos_score.shape[0]), th.zeros(neg_score.shape[0])])
    return F.binary_cross_entropy_with_logits(scores, labels)

def compute_auc(pos_score, neg_score):
    scores = th.cat([pos_score, neg_score]).numpy()
    labels = th.cat(
        [th.ones(pos_score.shape[0]), th.zeros(neg_score.shape[0])]).numpy()
    return roc_auc_score(labels, scores)

optimizer = th.optim.Adam(itertools.chain(model.parameters(), pred.parameters()), lr=0.01)

In [23]:
all_logits = []
for e in range(100):
    # forward
    h = model(train_g3, train_g3.ndata['feat'])
    pos_score = pred(train_pos_g3, h)
    neg_score = pred(train_neg_g3, h)
    loss = compute_loss(pos_score, neg_score)

    # backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if e % 5 == 0:
        print('Epoque {}, loss: {}'.format(e, loss))

# On affiche les résultats
from sklearn.metrics import roc_auc_score
with th.no_grad():
    pos_score = pred(test_pos_g3, h)
    neg_score = pred(test_neg_g3, h)
    print('AUC', compute_auc(pos_score, neg_score))

Epoque 0, loss: 0.6929873824119568
Epoque 5, loss: 0.6595125794410706
Epoque 10, loss: 0.5768710374832153
Epoque 15, loss: 0.5419908165931702
Epoque 20, loss: 0.5046783089637756
Epoque 25, loss: 0.48118430376052856
Epoque 30, loss: 0.45551249384880066
Epoque 35, loss: 0.4335471987724304
Epoque 40, loss: 0.4114689826965332
Epoque 45, loss: 0.38669711351394653
Epoque 50, loss: 0.36170870065689087
Epoque 55, loss: 0.3375073969364166
Epoque 60, loss: 0.31328362226486206
Epoque 65, loss: 0.2881085276603699
Epoque 70, loss: 0.2633311450481415
Epoque 75, loss: 0.23842160403728485
Epoque 80, loss: 0.21374496817588806
Epoque 85, loss: 0.1896471530199051
Epoque 90, loss: 0.1660909354686737
Epoque 95, loss: 0.1437922567129135
AUC 0.8452577810260975
