# 第五课 图上的其他深度学习模型

前面的课程中我们介绍了许多图神经网络模型。除了图神经网络，针对于图数据的深度学习模型还有很多，比如图上的自编码器、变分自编码器、循环神经网络和对抗生成网络等。在这一课中，我们对自编码器和变分自编码器进行代码实践。这其中包括了对模型细节和它们的应用的讲解。

## 0. 链接预测数据集

链接预测（link prediction）是常见的与图有关的任务。该任务旨在预测两个节点之间是否存在链接（link），即是否存在边。

关于链接预测的数据集，我们可以从节点分类任务的数据集直接构造。比如我们之前常用的Cora数据集，就可以无视掉它的节点标签，把Cora图里面的边当成训练/测试数据。下面我们具体来实践一下。

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import dgl
from dgl.data import CoraGraphDataset

# set device to GPU:
device = torch.device('cuda:0')
# load dataset:
dataset = CoraGraphDataset('./data') # 将数据保存在data文件夹下
g = dataset[0]

  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.


In [2]:
# 构造一个负采样的函数，获取包含负采样的边的一个新图
def construct_negative_graph(graph, k):
    """ construct k negative samples
    """
    src, dst = graph.edges()

    neg_src = src.repeat_interleave(k)
    neg_dst = torch.randint(0, graph.num_nodes(), (len(src) * k,))
    
    return dgl.graph((neg_src, neg_dst), num_nodes=graph.num_nodes())

In [3]:
neg_g = construct_negative_graph(g, 2)
neg_g

Graph(num_nodes=2708, num_edges=21112,
      ndata_schemes={}
      edata_schemes={})

这个负采样的图的一些内容：
* 点的数量和原图相同，点的特征就可以复用原图的点特征。
* 边的数量是原图的k倍，上面例子里面k=2。
* 边是通过对原图里的源节点随机采样目标节点生成的，所以有很小的概率会出现：和原图相同的边以及重复的边。
* 由于是负采样的图，所以所有的边的标签都是0。

In [4]:
import numpy as np

def split_edges(graph, train_ratio=0.8, val_ratio=0.1):
    """ train-validaion-test split of graph dataset
    """
    all_edge_idx = np.arange(graph.num_edges())
    np.random.shuffle(all_edge_idx)
    
    train_idx_num = int(graph.num_edges() * train_ratio)
    val_idx_num = int(graph.num_edges() * val_ratio)
    
    train_idx = all_edge_idx[: train_idx_num]
    val_idx = all_edge_idx[train_idx_num: (train_idx_num + val_idx_num)]
    test_idx = all_edge_idx[(train_idx_num + val_idx_num):]
    
    return train_idx, val_idx, test_idx

下面我们按照 85:5:10 的比例把原图和负采样图的边划分成训练/验证/测试的集合

In [5]:
train_pos_edge_idx, val_pos_edge_idx, test_pos_edge_idx = split_edges(g, train_ratio=0.85, val_ratio=0.05)
train_neg_edge_idx, val_neg_edge_idx, test_neg_edge_idx = split_edges(neg_g, train_ratio=0.85, val_ratio=0.05)

In [6]:
train_pos_edge_idx

array([2505, 9744, 8864, ..., 9848, 5367, 9568])

值得注意的是：
* 由于我们的负采样图是可以随时构建的，因此负样本的训练/验证和测试是可以在训练的循环里随时生成。
* 通过变量名称，我们就可以设定标签为1还是0.

## 1. 自编码器

针对于图数据的自编码器我们称之为GAE (Graph AutoEncoder)。其包含两个组成部分，编码器（encoder）和解码器（decoder）。图上的编码器常用的就是GCN了；而解码器呢通常用一个内积来表示。具体地，给定两个节点的节点表示，解码器将计算二者的内积，其结果作为两个节点之间存在边的概率。

In [7]:
from dgl.nn import GraphConv

首先构造编码器，由两层GCN组成。

In [8]:
from dgl.nn import GraphConv

class GCNEncoder(nn.Module):
    """ deep GCN based encoder
    """
    def __init__(self, in_channels, out_channels):
        super(GCNEncoder, self).__init__()
        
        # GCN:
        self.conv1 = GraphConv(
            in_feats=in_channels, out_feats=2*out_channels, 
            bias=True, 
            activation=F.relu, 
            allow_zero_in_degree=True
        )
        
        # GCN:
        self.conv2 = GraphConv(
            in_feats=2*out_channels, out_feats=out_channels, 
            bias=True, 
            allow_zero_in_degree=True
        )

    def forward(self, g, features):
        h = self.conv1(g, features)
        h = self.conv2(g, h)
        return h

然后构建解码器，将给定的节点对映射到[0，1]之间。

In [9]:
import dgl.function as fn

# 定义内积预测Decoder
class InnerProductDecoder(nn.Module):
    """ simple inner product decoder
    """
    def forward(self, graph, h, sigmoid=True):
        graph.ndata['h'] = h
        graph.apply_edges(fn.u_dot_v('h', 'h', 'score'))
        value = graph.edata['score'].sum(dim=1) 
        return torch.sigmoid(value) if sigmoid else value

In [10]:
class GAE(torch.nn.Module):
    """ graph autoencoder
    """
    EPSILON = 1e-15 # EPS是一个很小的值，防止取对数的时候出现0值
    
    def __init__(self, encoder, decoder=None):
        super().__init__()
        self.encoder = encoder
        self.decoder = InnerProductDecoder()

    def encode(self, *args, **kwargs): 
        """编码功能"""
        return self.encoder(*args, **kwargs)

    def decode(self, *args, **kwargs):
        """解码功能"""
        return self.decoder(*args, **kwargs)

    def recon_loss(self, g, neg_g, h, pos_edge_index, neg_edge_index):
        """计算正边和负边的二值交叉熵
        
        参数说明
        ----
        g: 原始图，即包含正边的图
        neg_g：负采样图，即包含负边的图
        h: 编码器的输出
        pos_edge_index: 正边的边索引
        neg_edge_index: 负边的边索引
        """
        # encourage correct prediction of actual edge:
        pos_loss = -torch.log(
            self.decoder(g, h)[pos_edge_index] + GAE.EPSILON
        ).mean()
        
        # penalize indication of fabricated edge:
        neg_loss = -torch.log(
            1 - self.decoder(neg_g, h)[neg_edge_index] + GAE.EPSILON
        ).mean()

        return pos_loss + neg_loss

In [11]:
# init model:
in_feats, out_feats = g.ndata['feat'].shape[1], 16
model = GAE(GCNEncoder(in_feats, out_feats))

In [12]:
# verify forward propagation, encoder:
latent = model.encode(g, g.ndata['feat'])
latent, latent.shape

(tensor([[ 0.0062,  0.0005,  0.0062,  ...,  0.0014, -0.0021,  0.0010],
         [ 0.0046, -0.0008,  0.0048,  ...,  0.0022, -0.0022, -0.0005],
         [ 0.0046, -0.0008,  0.0048,  ...,  0.0022, -0.0022, -0.0005],
         ...,
         [ 0.0009, -0.0005, -0.0013,  ...,  0.0069,  0.0050,  0.0017],
         [ 0.0066, -0.0092,  0.0010,  ...,  0.0056,  0.0011, -0.0016],
         [ 0.0064, -0.0039,  0.0001,  ...,  0.0070, -0.0030, -0.0099]],
        grad_fn=<AddBackward0>),
 torch.Size([2708, 16]))

In [13]:
# verify forward propagation, decoder:
model.decode(g, latent)

tensor([0.5000, 0.5001, 0.5001,  ..., 0.5000, 0.5000, 0.5000],
       grad_fn=<SigmoidBackward0>)

## 2. 变分自编码器

变分自编码器和自编码器基本结构相同，都是一个编码器加一个解码器。它们的主要区别是，变分自编码器编码后的隐层表示不再是连续的向量表示，而是通过一个高斯分布来表示。具体地，变分自编码器学习的是这个高斯分布的均值（下面用变量`mu`来表示）和标准差（下面用变量`std`来表示）。

In [14]:
class VariationalGCNEncoder(nn.Module):
    MAX_LOGSTD = 10

    def __init__(self, in_channels, out_channels):
        super().__init__()
        
        # GCN:
        self.conv1 = GraphConv(
            in_feats=in_channels, out_feats=2 * out_channels, 
            bias=True, 
            activation=F.relu, 
            allow_zero_in_degree=True
        )
        
        # encoded mu,  
        self.conv_mu = GraphConv(
            in_feats=2 * out_channels, out_feats=out_channels, 
            allow_zero_in_degree=True
        ) 
        # encoded log(std):
        self.conv_logstd = GraphConv(
            in_feats=2 * out_channels, out_feats=out_channels, 
            allow_zero_in_degree=True
        )

    def forward(self, g, features):
        h = self.conv1(g, features)
        
        # get encoded Gaussian distribution:
        mu = self.conv_mu(g, h)
        logstd = self.conv_logstd(g, h)
        
        return mu, logstd
    
class VGAE(GAE): 
    def __init__(self, encoder, decoder=None):
        super().__init__(encoder, decoder)

    def reparametrize(self, mu, logstd):
        if self.training:
            # get encoding as [mu - std, mu + std]:
            return mu + (2*torch.randn_like(logstd) - 1) * torch.exp(logstd)
        else:
            return mu

    def encode(self, *args, **kwargs):
        """ Gaussian distribution encoding
        """
        # get mu and log(std), params of encoded Gaussian:
        self.__mu__, self.__logstd__ = self.encoder(*args, **kwargs)
        # clamp log(std):
        self.__logstd__ = self.__logstd__.clamp(max=VariationalGCNEncoder.MAX_LOGSTD)
        # sample from encoded Gaussian:
        h = self.reparametrize(self.__mu__, self.__logstd__)
        
        return h

    def kl_loss(self, mu=None, logstd=None):
        """
        """ 
        mu = self.__mu__ if mu is None else mu
        logstd = self.__logstd__ if logstd is None else logstd.clamp(max=VariationalGCNEncoder.MAX_LOGSTD)
        
        # the KL divergence from prior, N(0, I):
        return -0.5 * torch.mean(
            torch.sum(1 + 2 * logstd - mu**2 - logstd.exp()**2, dim=1)
        )
    

（两个高斯分布的kl loss的公式可以参考该[链接](https://stats.stackexchange.com/questions/234757/how-to-use-kullback-leibler-divergence-if-mean-and-standard-deviation-of-of-two)）

In [15]:
model = VGAE(
    encoder=VariationalGCNEncoder(in_feats, out_feats),
    decoder=None
)

In [16]:
latent = model.encode(g, g.ndata['feat'])
latent, latent.shape

(tensor([[-0.8699,  1.4980, -0.2904,  ..., -1.2899, -4.7583, -3.3002],
         [-1.5015,  0.9344, -2.7489,  ..., -3.6698,  1.5656, -0.8233],
         [-1.0845, -1.4393, -2.0605,  ..., -2.1850,  0.0485,  1.8520],
         ...,
         [-2.4298,  0.3927, -0.9932,  ..., -3.0765, -1.0976, -2.3869],
         [-3.5497, -0.9164, -0.9434,  ..., -4.3243,  0.3257,  2.5688],
         [ 0.6291, -1.2572, -2.2819,  ...,  0.1744, -0.9206, -1.1180]],
        grad_fn=<AddBackward0>),
 torch.Size([2708, 16]))

In [17]:
model.decode(g, latent)

tensor([1.0000e+00, 1.0000e+00, 9.6508e-09,  ..., 1.0000e+00, 9.5799e-02,
        9.5799e-02], grad_fn=<SigmoidBackward0>)

## 3. 训练自编码器和变分自编码器

接下来我们展示自编码器和变分自编码器的训练。

In [18]:
def train_gae(model, g, neg_g, pos_edge_idx, neg_edge_idx):
    """训练GAE模型"""
    model.train()
    optimizer.zero_grad()
    h = model.encode(g, g.ndata['feat'])
    loss = model.recon_loss(g, neg_g, h, train_pos_edge_idx, train_neg_edge_idx)
    loss.backward()
    optimizer.step()
    return loss.item()

def train_vgae(model, g, neg_g, pos_edge_idx, neg_edge_idx):
    """训练VGAE模型，损失函数由重建损失和kl损失组成"""
    model.train()
    optimizer.zero_grad()
    h = model.encode(g, g.ndata['feat'])
    loss = model.recon_loss(g, neg_g, h, train_pos_edge_idx, train_neg_edge_idx)
    loss = loss + (1 / g.num_nodes()) * model.kl_loss() # 加上kl loss
    loss.backward()
    optimizer.step()
    return loss.item()

In [19]:
@torch.no_grad()
def test(model, g, neg_g, pos_edge_idx, neg_edge_idx):
    """测试模型"""
    from sklearn.metrics import roc_auc_score, average_precision_score
    model.eval()
    
    pos_h = model.encode(g, g.ndata['feat'])
    neg_h = model.encode(neg_g, g.ndata['feat'])
    pos_y = pos_h.new_ones(pos_edge_idx.size) # 正样本标签
    neg_y = neg_h.new_zeros(neg_edge_idx.size) # 负样本标签
    y = torch.cat([pos_y, neg_y], dim=0)

    pos_pred = model.decoder(g, pos_h)[pos_edge_idx]
    neg_pred = model.decoder(neg_g, neg_h)[neg_edge_idx]
    pred = torch.cat([pos_pred, neg_pred], dim=0)

    y, pred = y.detach().cpu().numpy(), pred.detach().cpu().numpy()

    return roc_auc_score(y, pred), average_precision_score(y, pred) # 计算AUC和AP

In [23]:
from sklearn.preprocessing import normalize
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import roc_auc_score, average_precision_score
from sklearn.metrics import classification_report, accuracy_score

def evaluate_node_classification(
    embeddings, labels, 
    train_mask, test_mask, 
    normalize_embedding=True, 
    max_iter=1000
):
    """ use single-layer MLP for node label prediction using (variational) graph auto-encoder embeddings
    """
    # normalize:
    X = embeddings
    if normalize_embedding:
        X = normalize(embeddings)
    
    # split train-test sets:
    X_train, y_train = X[train_mask, :], labels[train_mask]
    X_test, y_test = X[test_mask, :], labels[test_mask]
    
    # build classifier:
    clf = MLPClassifier(
        random_state=42,
        hidden_layer_sizes=[32],
        max_iter=max_iter
    ).fit(X_train, y_train)
    
    # make prediction:
    preds = clf.predict(X_test)
    
    # get classification report:
    print(
        classification_report(
            y_true=y_test, y_pred=preds
        )
    )
    # get accuracy score:
    test_acc = accuracy_score(y_true=y_test, y_pred=preds)
    
    return preds, test_acc

训练GAE：

In [29]:
model = GAE(GCNEncoder(in_feats, out_feats))
model = model.to(device)

g = g.to(device)
neg_g = neg_g.to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=0.01) 
epochs = 2000

for epoch in range(1, epochs + 1):
    loss = train_gae(model, g, neg_g, train_pos_edge_idx, train_neg_edge_idx)
    if epoch % 100 == 0:
        auc, ap = test(model, g, neg_g, test_pos_edge_idx, test_neg_edge_idx)
        print('Epoch: {:03d}, Loss_train: {:.4f}, AUC: {:.4f}, AP: {:.4f}'.format(epoch, loss, auc, ap))

Epoch: 100, Loss_train: 0.8407, AUC: 0.9954, AP: 0.9964
Epoch: 200, Loss_train: 0.6873, AUC: 0.9981, AP: 0.9977
Epoch: 300, Loss_train: 0.5649, AUC: 0.9969, AP: 0.9969
Epoch: 400, Loss_train: 0.4703, AUC: 0.9952, AP: 0.9944
Epoch: 500, Loss_train: 0.3934, AUC: 0.9917, AP: 0.9897
Epoch: 600, Loss_train: 0.3307, AUC: 0.9885, AP: 0.9848
Epoch: 700, Loss_train: 0.2777, AUC: 0.9835, AP: 0.9776
Epoch: 800, Loss_train: 0.2348, AUC: 0.9779, AP: 0.9692
Epoch: 900, Loss_train: 0.1995, AUC: 0.9748, AP: 0.9638
Epoch: 1000, Loss_train: 0.1676, AUC: 0.9682, AP: 0.9539
Epoch: 1100, Loss_train: 0.1412, AUC: 0.9617, AP: 0.9445
Epoch: 1200, Loss_train: 0.1277, AUC: 0.9597, AP: 0.9433
Epoch: 1300, Loss_train: 0.1071, AUC: 0.9587, AP: 0.9406
Epoch: 1400, Loss_train: 0.0954, AUC: 0.9568, AP: 0.9373
Epoch: 1500, Loss_train: 0.0845, AUC: 0.9553, AP: 0.9352
Epoch: 1600, Loss_train: 0.0746, AUC: 0.9539, AP: 0.9333
Epoch: 1700, Loss_train: 0.0658, AUC: 0.9537, AP: 0.9325
Epoch: 1800, Loss_train: 0.0758, AUC: 0.

In [31]:
embedding_gae = model.encode(g, g.ndata['feat']).cpu().detach().numpy()

preds, test_acc = evaluate_node_classification(
    embedding_gae, g.ndata['label'].cpu().detach().numpy(), 
    g.ndata['train_mask'].cpu().detach().numpy(), g.ndata['test_mask'].cpu().detach().numpy()
)

print('GAE Test Accuracy: %.4f' % test_acc)

              precision    recall  f1-score   support

           0       0.35      0.28      0.31       130
           1       0.24      0.44      0.31        91
           2       0.33      0.34      0.34       144
           3       0.52      0.15      0.23       319
           4       0.26      0.30      0.28       149
           5       0.26      0.43      0.33       103
           6       0.14      0.33      0.19        64

    accuracy                           0.28      1000
   macro avg       0.30      0.32      0.28      1000
weighted avg       0.36      0.28      0.28      1000

GAE Test Accuracy: 0.2830




训练VGAE：

In [27]:
model = VGAE(VariationalGCNEncoder(in_feats, out_feats))
model = model.to(device)

g = g.to(device)
neg_g = neg_g.to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=0.01) 
epochs = 2000

for epoch in range(1, epochs + 1):
    loss = train_vgae(model, g, neg_g, train_pos_edge_idx, train_neg_edge_idx)
    if epoch % 100 == 0:
        auc, ap = test(model, g, neg_g, test_pos_edge_idx, test_neg_edge_idx)
        print('Epoch: {:03d}, Loss_train: {:.4f}, AUC: {:.4f}, AP: {:.4f}'.format(epoch, loss, auc, ap))

Epoch: 100, Loss_train: 1.3937, AUC: 0.6216, AP: 0.5946
Epoch: 200, Loss_train: 1.3818, AUC: 0.6626, AP: 0.6815
Epoch: 300, Loss_train: 1.2023, AUC: 0.9176, AP: 0.9239
Epoch: 400, Loss_train: 1.1174, AUC: 0.9672, AP: 0.9733
Epoch: 500, Loss_train: 1.0175, AUC: 0.9838, AP: 0.9878
Epoch: 600, Loss_train: 0.9029, AUC: 0.9916, AP: 0.9931
Epoch: 700, Loss_train: 0.8507, AUC: 0.9931, AP: 0.9939
Epoch: 800, Loss_train: 0.8155, AUC: 0.9930, AP: 0.9931
Epoch: 900, Loss_train: 0.7855, AUC: 0.9914, AP: 0.9925
Epoch: 1000, Loss_train: 0.7592, AUC: 0.9899, AP: 0.9904
Epoch: 1100, Loss_train: 0.7351, AUC: 0.9900, AP: 0.9897
Epoch: 1200, Loss_train: 0.7069, AUC: 0.9923, AP: 0.9912
Epoch: 1300, Loss_train: 0.6690, AUC: 0.9934, AP: 0.9924
Epoch: 1400, Loss_train: 0.6310, AUC: 0.9930, AP: 0.9924
Epoch: 1500, Loss_train: 0.5940, AUC: 0.9927, AP: 0.9917
Epoch: 1600, Loss_train: 0.5567, AUC: 0.9917, AP: 0.9897
Epoch: 1700, Loss_train: 0.5204, AUC: 0.9903, AP: 0.9878
Epoch: 1800, Loss_train: 0.4831, AUC: 0.

In [28]:
embedding_gae = model.encode(g, g.ndata['feat']).cpu().detach().numpy()

preds, test_acc = evaluate_node_classification(
    embedding_gae, g.ndata['label'].cpu().detach().numpy(), 
    g.ndata['train_mask'].cpu().detach().numpy(), g.ndata['test_mask'].cpu().detach().numpy()
)

print('GAE Test Accuracy: %.4f' % test_acc)

              precision    recall  f1-score   support

           0       0.42      0.28      0.34       130
           1       0.36      0.52      0.42        91
           2       0.38      0.44      0.41       144
           3       0.52      0.35      0.42       319
           4       0.39      0.34      0.36       149
           5       0.43      0.58      0.50       103
           6       0.14      0.30      0.19        64

    accuracy                           0.39      1000
   macro avg       0.38      0.40      0.38      1000
weighted avg       0.42      0.39      0.39      1000

GAE Test Accuracy: 0.3870


