# 第五课 图上的其他深度学习模型

前面的课程中我们介绍了许多图神经网络模型。除了图神经网络，针对于图数据的深度学习模型还有很多，比如图上的自编码器、变分自编码器、循环神经网络和对抗生成网络等。在这一课中，我们对自编码器和变分自编码器进行代码实践。这其中包括了对模型细节和它们的应用的讲解。

## 0. 链接预测数据集

链接预测（link prediction）是常见的与图有关的任务。该任务旨在预测两个节点之间是否存在链接（link），即是否存在边。

关于链接预测的数据集，我们可以从节点分类任务的数据集直接构造。比如我们之前常用的Cora数据集，就可以无视掉它的节点标签，把Cora图里面的边当成训练/测试数据。下面我们具体来实践一下。

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import dgl
from dgl.data import CoraGraphDataset

# set device to GPU:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# load dataset:
dataset = CoraGraphDataset('./data') # 将数据保存在data文件夹下
g = dataset[0]

  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.


In [3]:
# 构造一个负采样的函数，获取包含负采样的边的一个新图
def construct_negative_graph(graph, k):
    """ construct k negative samples
    """
    src, dst = graph.edges()

    neg_src = src.repeat_interleave(k)
    neg_dst = torch.randint(0, graph.num_nodes(), (len(src) * k,))
    
    return dgl.graph((neg_src, neg_dst), num_nodes=graph.num_nodes())

In [4]:
neg_g = construct_negative_graph(g, 2)
neg_g

Graph(num_nodes=2708, num_edges=21112,
      ndata_schemes={}
      edata_schemes={})

这个负采样的图的一些内容：
* 点的数量和原图相同，点的特征就可以复用原图的点特征。
* 边的数量是原图的k倍，上面例子里面k=2。
* 边是通过对原图里的源节点随机采样目标节点生成的，所以有很小的概率会出现：和原图相同的边以及重复的边。
* 由于是负采样的图，所以所有的边的标签都是0。

In [5]:
import numpy as np

def split_edges(graph, train_ratio=0.8, val_ratio=0.1):
    """ train-validaion-test split of graph dataset
    """
    all_edge_idx = np.arange(graph.num_edges())
    np.random.shuffle(all_edge_idx)
    
    train_idx_num = int(graph.num_edges() * train_ratio)
    val_idx_num = int(graph.num_edges() * val_ratio)
    
    train_idx = all_edge_idx[: train_idx_num]
    val_idx = all_edge_idx[train_idx_num: (train_idx_num + val_idx_num)]
    test_idx = all_edge_idx[(train_idx_num + val_idx_num):]
    
    return train_idx, val_idx, test_idx

下面我们按照 85:5:10 的比例把原图和负采样图的边划分成训练/验证/测试的集合

In [6]:
train_pos_edge_idx, val_pos_edge_idx, test_pos_edge_idx = split_edges(g, train_ratio=0.85, val_ratio=0.05)
train_neg_edge_idx, val_neg_edge_idx, test_neg_edge_idx = split_edges(neg_g, train_ratio=0.85, val_ratio=0.05)

In [7]:
train_pos_edge_idx

array([9233, 7324,  228, ...,  926, 1506, 8939])

值得注意的是：
* 由于我们的负采样图是可以随时构建的，因此负样本的训练/验证和测试是可以在训练的循环里随时生成。
* 通过变量名称，我们就可以设定标签为1还是0.

## 1. 自编码器

针对于图数据的自编码器我们称之为GAE (Graph AutoEncoder)。其包含两个组成部分，编码器（encoder）和解码器（decoder）。图上的编码器常用的就是GCN了；而解码器呢通常用一个内积来表示。具体地，给定两个节点的节点表示，解码器将计算二者的内积，其结果作为两个节点之间存在边的概率。

In [10]:
from dgl.nn import GraphConv

首先构造编码器，由两层GCN组成。

In [11]:
from dgl.nn import GraphConv

class GCNEncoder(nn.Module):
    """ deep GCN based encoder
    """
    def __init__(self, in_channels, out_channels):
        super(GCNEncoder, self).__init__()
        
        # GCN:
        self.conv1 = GraphConv(
            in_feats=in_channels, out_feats=2*out_channels, 
            bias=True, 
            activation=F.relu, 
            allow_zero_in_degree=True
        )
        
        # GCN:
        self.conv2 = GraphConv(
            in_feats=2*out_channels, out_feats=out_channels, 
            bias=True, 
            allow_zero_in_degree=True
        )

    def forward(self, g, features):
        h = self.conv1(g, features)
        h = self.conv2(g, h)
        return h

然后构建解码器，将给定的节点对映射到[0，1]之间。

In [12]:
import dgl.function as fn

# 定义内积预测Decoder
class InnerProductDecoder(nn.Module):
    """ simple inner product decoder
    """
    def forward(self, graph, h, sigmoid=True):
        graph.ndata['h'] = h
        graph.apply_edges(fn.u_dot_v('h', 'h', 'score'))
        value = graph.edata['score'].sum(dim=1) 
        return torch.sigmoid(value) if sigmoid else value

In [13]:
class GAE(torch.nn.Module):
    """ graph autoencoder
    """
    EPSILON = 1e-15 # EPS是一个很小的值，防止取对数的时候出现0值
    
    def __init__(self, encoder, decoder=None):
        super().__init__()
        self.encoder = encoder
        self.decoder = InnerProductDecoder()

    def encode(self, *args, **kwargs): 
        """编码功能"""
        return self.encoder(*args, **kwargs)

    def decode(self, *args, **kwargs):
        """解码功能"""
        return self.decoder(*args, **kwargs)

    def recon_loss(self, g, neg_g, h, pos_edge_index, neg_edge_index):
        """计算正边和负边的二值交叉熵
        
        参数说明
        ----
        g: 原始图，即包含正边的图
        neg_g：负采样图，即包含负边的图
        h: 编码器的输出
        pos_edge_index: 正边的边索引
        neg_edge_index: 负边的边索引
        """
        # encourage correct prediction of actual edge:
        pos_loss = -torch.log(
            self.decoder(g, h)[pos_edge_index] + GAE.EPSILON
        ).mean()
        
        # penalize indication of fabricated edge:
        neg_loss = -torch.log(
            1 - self.decoder(neg_g, h)[neg_edge_index] + GAE.EPSILON
        ).mean()

        return pos_loss + neg_loss

In [15]:
# init model:
in_feats, out_feats = g.ndata['feat'].shape[1], 16
model = GAE(GCNEncoder(in_feats, out_feats))

In [16]:
# verify forward propagation, encoder:
latent = model.encode(g, g.ndata['feat'])
latent, latent.shape

(tensor([[-0.0004, -0.0065, -0.0013,  ..., -0.0011, -0.0008, -0.0009],
         [ 0.0013, -0.0061, -0.0015,  ..., -0.0011,  0.0014,  0.0005],
         [ 0.0013, -0.0061, -0.0015,  ..., -0.0011,  0.0014,  0.0005],
         ...,
         [-0.0037, -0.0017, -0.0040,  ..., -0.0014, -0.0021, -0.0034],
         [-0.0154, -0.0038, -0.0116,  ..., -0.0047, -0.0047, -0.0010],
         [ 0.0012, -0.0008, -0.0014,  ...,  0.0028,  0.0041,  0.0047]],
        grad_fn=<AddBackward0>),
 torch.Size([2708, 16]))

In [18]:
# verify forward propagation, decoder:
model.decode(g, latent)

tensor([0.5000, 0.5000, 0.5000,  ..., 0.5001, 0.5000, 0.5000],
       grad_fn=<SigmoidBackward0>)

## 2. 变分自编码器

变分自编码器和自编码器基本结构相同，都是一个编码器加一个解码器。它们的主要区别是，变分自编码器编码后的隐层表示不再是连续的向量表示，而是通过一个高斯分布来表示。具体地，变分自编码器学习的是这个高斯分布的均值（下面用变量`mu`来表示）和标准差（下面用变量`std`来表示）。

In [20]:
class VariationalGCNEncoder(nn.Module):
    MAX_LOGSTD = 10

    def __init__(self, in_channels, out_channels):
        super().__init__()
        
        # GCN:
        self.conv1 = GraphConv(
            in_feats=in_channels, out_feats=2 * out_channels, 
            bias=True, 
            activation=F.relu, 
            allow_zero_in_degree=True
        )
        
        # encoded mu,  
        self.conv_mu = GraphConv(
            in_feats=2 * out_channels, out_feats=out_channels, 
            allow_zero_in_degree=True
        ) 
        # encoded log(std):
        self.conv_logstd = GraphConv(
            in_feats=2 * out_channels, out_feats=out_channels, 
            allow_zero_in_degree=True
        )

    def forward(self, g, features):
        h = self.conv1(g, features)
        
        # get encoded Gaussian distribution:
        mu = self.conv_mu(g, h)
        logstd = self.conv_logstd(g, h)
        
        return mu, logstd
    
class VGAE(GAE): 
    def __init__(self, encoder, decoder=None):
        super().__init__(encoder, decoder)

    def reparametrize(self, mu, logstd):
        if self.training:
            # get encoding as [mu - std, mu + std]:
            return mu + (2*torch.randn_like(logstd) - 1) * torch.exp(logstd)
        else:
            return mu

    def encode(self, *args, **kwargs):
        """ Gaussian distribution encoding
        """
        # get mu and log(std), params of encoded Gaussian:
        self.__mu__, self.__logstd__ = self.encoder(*args, **kwargs)
        # clamp log(std):
        self.__logstd__ = self.__logstd__.clamp(max=VariationalGCNEncoder.MAX_LOGSTD)
        # sample from encoded Gaussian:
        h = self.reparametrize(self.__mu__, self.__logstd__)
        
        return h

    def kl_loss(self, mu=None, logstd=None):
        """
        """ 
        mu = self.__mu__ if mu is None else mu
        logstd = self.__logstd__ if logstd is None else logstd.clamp(max=VariationalGCNEncoder.MAX_LOGSTD)
        
        # the KL divergence from prior, N(0, I):
        return -0.5 * torch.mean(
            torch.sum(1 + 2 * logstd - mu**2 - logstd.exp()**2, dim=1)
        )
    

（两个高斯分布的kl loss的公式可以参考该[链接](https://stats.stackexchange.com/questions/234757/how-to-use-kullback-leibler-divergence-if-mean-and-standard-deviation-of-of-two)）

In [21]:
model = VGAE(
    encoder=VariationalGCNEncoder(in_channels, out_channels),
    decoder=None
)
model = model.to(device)

In [22]:
latent = model.encode(g, g.ndata['feat'])
latent, latent.shape

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_mm)

In [None]:
model.decode(g, latent)

## 3. 训练自编码器和变分自编码器

接下来我们展示自编码器和变分自编码器的训练。

In [29]:
def train_gae(model, g, neg_g, pos_edge_idx, neg_edge_idx):
    """训练GAE模型"""
    model.train()
    optimizer.zero_grad()
    h = model.encode(g, g.ndata['feat'])
    loss = model.recon_loss(g, neg_g, h, train_pos_edge_idx, train_neg_edge_idx)
    loss.backward()
    optimizer.step()
    return loss.item()

def train_vgae(model, g, neg_g, pos_edge_idx, neg_edge_idx):
    """训练VGAE模型，损失函数由重建损失和kl损失组成"""
    model.train()
    optimizer.zero_grad()
    h = model.encode(g, g.ndata['feat'])
    loss = model.recon_loss(g, neg_g, h, train_pos_edge_idx, train_neg_edge_idx)
    loss = loss + (1 / g.num_nodes()) * model.kl_loss() # 加上kl loss
    loss.backward()
    optimizer.step()
    return loss.item()

In [30]:
@torch.no_grad()
def test(model, g, neg_g, pos_edge_idx, neg_edge_idx):
    """测试模型"""
    from sklearn.metrics import roc_auc_score, average_precision_score
    model.eval()
    
    pos_h = model.encode(g, g.ndata['feat'])
    neg_h = model.encode(neg_g, g.ndata['feat'])
    pos_y = pos_h.new_ones(pos_edge_idx.size) # 正样本标签
    neg_y = neg_h.new_zeros(neg_edge_idx.size) # 负样本标签
    y = torch.cat([pos_y, neg_y], dim=0)

    pos_pred = model.decoder(g, pos_h)[pos_edge_idx]
    neg_pred = model.decoder(neg_g, neg_h)[neg_edge_idx]
    pred = torch.cat([pos_pred, neg_pred], dim=0)

    y, pred = y.detach().cpu().numpy(), pred.detach().cpu().numpy()

    return roc_auc_score(y, pred), average_precision_score(y, pred) # 计算AUC和AP

训练GAE：

In [27]:
model = GAE(GCNEncoder(in_channels, out_channels))
model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
 

epochs = 2000
for epoch in range(1, epochs + 1):
    loss = train_gae(model, g, neg_g, train_pos_edge_idx, train_neg_edge_idx)
    if epoch % 100 == 0:
        auc, ap = test(model, g, neg_g, test_pos_edge_idx, test_neg_edge_idx)
        print('Epoch: {:03d}, Loss_train: {:.4f}, AUC: {:.4f}, AP: {:.4f}'.format(epoch, loss, auc, ap))

Epoch: 100, Loss_train: 0.8731, AUC: 0.9938, AP: 0.9953
Epoch: 200, Loss_train: 0.7226, AUC: 0.9960, AP: 0.9964
Epoch: 300, Loss_train: 0.6000, AUC: 0.9938, AP: 0.9947
Epoch: 400, Loss_train: 0.4872, AUC: 0.9905, AP: 0.9901
Epoch: 500, Loss_train: 0.3914, AUC: 0.9863, AP: 0.9851
Epoch: 600, Loss_train: 0.3177, AUC: 0.9791, AP: 0.9757
Epoch: 700, Loss_train: 0.2567, AUC: 0.9722, AP: 0.9660
Epoch: 800, Loss_train: 0.2110, AUC: 0.9659, AP: 0.9574
Epoch: 900, Loss_train: 0.1707, AUC: 0.9608, AP: 0.9492
Epoch: 1000, Loss_train: 0.1385, AUC: 0.9534, AP: 0.9388
Epoch: 1100, Loss_train: 0.1276, AUC: 0.9499, AP: 0.9353
Epoch: 1200, Loss_train: 0.1160, AUC: 0.9499, AP: 0.9343
Epoch: 1300, Loss_train: 0.1068, AUC: 0.9483, AP: 0.9312
Epoch: 1400, Loss_train: 0.0981, AUC: 0.9468, AP: 0.9290
Epoch: 1500, Loss_train: 0.0897, AUC: 0.9450, AP: 0.9259
Epoch: 1600, Loss_train: 0.0815, AUC: 0.9432, AP: 0.9230
Epoch: 1700, Loss_train: 0.0738, AUC: 0.9413, AP: 0.9200
Epoch: 1800, Loss_train: 0.0666, AUC: 0.

训练VGAE：

In [31]:
model = VGAE(VariationalGCNEncoder(in_channels, out_channels))
model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
 
epochs = 2000
for epoch in range(1, epochs + 1):
    loss = train_vgae(model, g, neg_g, train_pos_edge_idx, train_neg_edge_idx)
    if epoch % 100 == 0:
        auc, ap = test(model, g, neg_g, test_pos_edge_idx, test_neg_edge_idx)
        print('Epoch: {:03d}, Loss_train: {:.4f}, AUC: {:.4f}, AP: {:.4f}'.format(epoch, loss, auc, ap))

Epoch: 100, Loss_train: 1.1905, AUC: 0.9409, AP: 0.9442
Epoch: 200, Loss_train: 0.9519, AUC: 0.9952, AP: 0.9964
Epoch: 300, Loss_train: 0.8732, AUC: 0.9965, AP: 0.9970
Epoch: 400, Loss_train: 0.7789, AUC: 0.9956, AP: 0.9959
Epoch: 500, Loss_train: 0.7062, AUC: 0.9949, AP: 0.9956
Epoch: 600, Loss_train: 0.6481, AUC: 0.9918, AP: 0.9934
Epoch: 700, Loss_train: 0.5987, AUC: 0.9927, AP: 0.9933
Epoch: 800, Loss_train: 0.5571, AUC: 0.9927, AP: 0.9922
Epoch: 900, Loss_train: 0.5202, AUC: 0.9936, AP: 0.9919
Epoch: 1000, Loss_train: 0.4823, AUC: 0.9923, AP: 0.9900
Epoch: 1100, Loss_train: 0.4491, AUC: 0.9903, AP: 0.9877
Epoch: 1200, Loss_train: 0.4156, AUC: 0.9894, AP: 0.9863
Epoch: 1300, Loss_train: 0.3848, AUC: 0.9884, AP: 0.9841
Epoch: 1400, Loss_train: 0.3548, AUC: 0.9877, AP: 0.9826
Epoch: 1500, Loss_train: 0.3292, AUC: 0.9854, AP: 0.9797
Epoch: 1600, Loss_train: 0.3029, AUC: 0.9836, AP: 0.9765
Epoch: 1700, Loss_train: 0.2785, AUC: 0.9831, AP: 0.9752
Epoch: 1800, Loss_train: 0.2566, AUC: 0.