# Problem setting

In this tutorial, we demonstrate how graph neural networks can be used for recommendation. Here we focus on item-based recommendation model. This method in this tutorial recommends items that are similar to the ones purchased by the user. We demonstrate the recommendation model on the MovieLens dataset.

# Get started

DGL can be used with different deep learning frameworks. Currently, DGL can be used with Pytorch and MXNet. Here, we show how DGL works with Pytorch.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

When we load DGL, we need to set the DGL backend for one of the deep learning frameworks. Because this tutorial develops models in Pytorch, we have to set the DGL backend to Pytorch.

In [None]:
import dgl
from dgl import DGLGraph

# Load Pytorch as backend
dgl.load_backend('pytorch')

Load the rest of necessary libraries.

In [None]:
import numpy as np
import pandas as pd
from scipy import stats
from scipy import sparse as spsp

## Load the retail rocket dataset

First load the session-item set for training and test.

In [None]:
import pandas as pd
train_data = pd.read_csv('retail/train_uv.csv', sep='\t')
test_data = pd.read_csv('retail/test_uv.csv', sep='\t')
user_item_spm = spsp.coo_matrix((np.ones(len(train_data)),
                                 (np.array(train_data['visitorid']),
                                  np.array(train_data['itemid']))))
num_items = user_item_spm.shape[1]
print(user_item_spm.shape)

For evaluation, we use the previous item used in a session to predict the next item. Here we construct a list of query items (the previous items in a session) and a list of truth item (the item that follows the query item in the session).

In [None]:
def construct_edges(data):
    data = data.sort_values(by=['visitorid', 'timestamp'])
    spm = spsp.coo_matrix((np.ones((len(data))), (data['visitorid'], data['itemid']))).tocsr()
    print(spm.nnz)
    query = []
    truth = []
    for i in range(spm.shape[0]):
        row = spm[i]
        num_items = spm[i].nnz
        for t in range(num_items - 1):
            query.append(row.indices[t])
            truth.append(row.indices[t+1])
    query = np.array(query, dtype=np.int64)
    truth = np.array(truth, dtype=np.int64)
    return query, truth

test_query, test_truth = construct_edges(test_data)

Load item features from a file.

In [None]:
import pickle
features = pickle.load(open('retail/retail_item_feats_100.pkl', 'rb'))
assert features.shape[0] == num_items

Session-item graph has a strong signal for prediction. Here we use SVD for dimension reduction to generate item features from the session-item graph.

In [None]:
u, s, vt = spsp.linalg.svds(user_item_spm, k=500)
v = vt.transpose() * np.sqrt(s).transpose()
features = np.concatenate((features, v), 1)
features = torch.tensor(features)
in_feats = features.shape[1]
print('#feats:', in_feats)

# The recommendation model

At large, the model first learns item embeddings from the user-item interaction dataset and use the item embeddings to recommend users similar items they have purchased. To learn item embeddings, we first need to construct an item similarity graph and train GNN on the item graph.

There are many ways of constructing the item similarity graph. Here we use the [SLIM model](https://dl.acm.org/citation.cfm?id=2118303) to learn item similarity and use the learned result to construct the item graph. The resulting graph will have an edge between two items if they are similar and the edge has a weight that represents the similarity score.

## Construct the item similarity graph with SLIM
SLIM is an item-based recommendation model. When training SLIM on a user-item dataset, it learns an item similarity graph. This similarity graph is the item graph we construct for the GNN model.

Please follow the instruction on the [SLIM github repo](https://github.com/KarypisLab/SLIM) to install SLIM.

To use SLIM to generate an item similarity graph, there are two hyperparameters we can tune. `l1r` is the co-efficient for the L1 regularization and `l2r` is the co-efficient for the L2 regularization. Increasing `l1r` will generate a sparser similarity graph and increasing `l2r` leads to a denser similarity graph.

In [None]:
from graph_construct import create_SLIM_graph
conv_spm = create_SLIM_graph(user_item_spm, l1r=0.8, l2r=1, test=False)
num_items = user_item_spm.shape[1]
deg = conv_spm.dot(np.ones((num_items)))
print(np.sum(deg == 0))
print(conv_spm.sum(0))
print(conv_spm.nnz)

Once we construct the item similarity matrix, we load it to the DGL graph.

In [None]:
conv_g = dgl.DGLGraph(conv_spm, readonly=True)
conv_g.edata['similarity'] = torch.tensor(conv_spm.data, dtype=torch.float32)
conv_g.ndata['feats'] = torch.tensor(features, dtype=torch.float32)
#g.ndata['id'] = torch.arange(num_items, dtype=torch.int64)
print('#nodes:', conv_g.number_of_nodes())
print('#edges:', conv_g.number_of_edges())

We need to train the GNN model with some signal. Because our final task is to predict the next item based on the item accessed previously in a session. We construct a loss graph to store the training signal. In this loss graph, two nodes are connected if they are next to each other in a session. Then we train the GNN model on the loss graph in a way similar to the link prediction task in [the previous section](https://github.com/zheng-da/DGL_devday_tutorial/blob/master/BasicTasks_pytorch.ipynb).

In [None]:
train_query, train_truth = construct_edges(train_data)
loss_spm = spsp.coo_matrix((np.ones(len(train_query)), (train_query, train_truth)))
print('train graph has #edges:', loss_spm.nnz)

We may want to remove the test samples that have appeared in the training set.

In [None]:
loss_spm = loss_spm.tocsr()
not_in_train = np.array((loss_spm[test_query, test_truth] == 0).transpose()).squeeze()
print('#tests in train:', len(test_query) - np.sum(not_in_train))
test_query = test_query[not_in_train]
test_truth = test_truth[not_in_train]

Load the loss graph to DGL.

In [None]:
loss_g = dgl.DGLGraph(loss_spm, readonly=True)
loss_g.edata['similarity'] = torch.tensor(loss_spm.data, dtype=torch.float32)
#g.ndata['id'] = torch.arange(num_items, dtype=torch.int64)
print('#nodes:', loss_g.number_of_nodes())
print('#edges:', loss_g.number_of_edges())

## GNN models

We run GNN on the item graph to compute item embeddings. In this tutorial, we use a customized [GraphSage](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) model to compute node embeddings. The original GraphSage performs the following computation on every node $v$ in the graph:

$$h_{N(v)}^{(l)} \gets AGGREGATE_k({h_u^{(l-1)}, \forall u \in N(v)})$$
$$h_v^{(l)} \gets \sigma(W^k \cdot CONCAT(h_v^{(l-1)}, h_{N(v)}^{(l)})),$$

where $N(v)$ is the neighborhood of node $v$ and $l$ is the layer Id.

The original GraphSage model treats each neighbor equally. However, the SLIM model learns the item similarity based on the user-item iteration. The GNN model should take the similarity into account. Thus, we customize the GraphSage model in the following fashion. Instead of aggregating all neighbors equally, we aggregate neighbors embeddings rescaled by the similarity on the edges. Thus, the aggregation step is defined as follows:

$$h_{N(v)}^{(l)} \gets \Sigma_{u \in N(v)}({h_u^{(l-1)} * s_{uv}}),$$

where $s_{uv}$ is the similarity score between two vertices $u$ and $v$.

The GNN model has multiple layers. In each layer, a vertex accesses its direct neighbors. When we stack $k$ layers in a model, a node $v$ access neighbors within $k$ hops. The output of the GNN model is node embeddings that represent the nodes and all information in the k-hop neighborhood.

<img src="https://github.com/zheng-da/DGL_devday_tutorial/raw/master/GNN.png" alt="drawing" width="600"/>

We implement the computation in each layer of the customized GraphSage model in `SAGEConv` and implement the multi-layer model in `GraphSAGEModel`.

In [None]:
from sageconv import SAGEConv

class GraphSAGEModel(nn.Module):
    def __init__(self,
                 in_feats,
                 n_hidden,
                 out_dim,
                 n_layers,
                 activation,
                 dropout,
                 aggregator_type):
        super(GraphSAGEModel, self).__init__()
        self.layers = nn.ModuleList()
        self.proj = nn.Sequential(nn.ReLU(),
                                  nn.Linear(out_dim, out_dim),
                                  nn.LayerNorm((out_dim,)),
                                  )
        if n_layers == 1:
            self.layers.append(SAGEConv(in_feats, out_dim, aggregator_type,
                                        feat_drop=dropout, activation=None))
        elif n_layers > 1:
            # input layer
            self.layers.append(SAGEConv(in_feats, n_hidden, aggregator_type,
                                        feat_drop=dropout, activation=activation))
            # hidden layer
            for i in range(n_layers - 2):
                self.layers.append(SAGEConv(n_hidden, n_hidden, aggregator_type,
                                            feat_drop=dropout, activation=activation))
            # output layer
            self.layers.append(SAGEConv(n_hidden, out_dim, aggregator_type,
                                        feat_drop=dropout, activation=None))

    def forward(self, g, features):
        h = features
        for layer in self.layers:
            h = layer(g, h, g.edata['similarity'])
        h = self.proj(h)
        # normalize embeddings.
        #h = F.normalize(h, p=2, dim=1)
        return h

## Train Item Embeddings

We train the item embeddings with the edges in the item graph as the training signal. This step is very similar to the link prediction task in the [basic applications](https://github.com/zheng-da/DGL_devday_tutorial/blob/master/BasicTasks_pytorch.ipynb).

Because the MovieLens dataset has sparse features (both genre and title are stored as multi-hot encoding). The sparse features have many dimensions. To run GNN on the item features, we first create an encoding layer to project the sparse features to a lower dimension. 

In [None]:
class EncodeLayer(nn.Module):
    def __init__(self, in_feats, num_hidden):
        super(EncodeLayer, self).__init__()
        self.proj = nn.Sequential(nn.Linear(in_feats, num_hidden),
                                  nn.ReLU(),
                                  nn.Linear(num_hidden, num_hidden),
                                  )
        #self.emb = nn.Embedding(19174 + 1, num_hidden)
        #self.nid = torch.arange(19174).to(device)
        
    def forward(self, feats):
        return self.proj(feats)
        #return self.proj(feats) + self.emb(self.nid)

Here is the code to verify if negative edges are true negative edges.

In [None]:
loss_spm = loss_spm.tocsr()
def verify_neg(neg_g):
    false_neg = neg_g.edata['false_neg'].detach().cpu().numpy()
    src_nid, dst_nid = neg_g.all_edges(order='eid')
    src_nid = neg_g.parent_nid[src_nid].detach().cpu().numpy()
    dst_nid = neg_g.parent_nid[dst_nid].detach().cpu().numpy()
    true_neg_src = src_nid[false_neg == 0]
    true_neg_dst = dst_nid[false_neg == 0]
    false_neg_src = src_nid[false_neg == 1]
    false_neg_dst = dst_nid[false_neg == 1]
    assert np.sum(loss_spm[true_neg_src, true_neg_dst] == 0) == len(true_neg_src)
    assert np.sum(loss_spm[false_neg_src, false_neg_dst] != 0) == len(false_neg_src)

We simply use node connectivity as the training signal: nodes connected by edges are similar, while nodes not connected by edges are dissimilar.

To train such a model, we need to deploy negative sampling to construct negative samples. A positive sample is an edge that exist in the loss graph, while a negative sample is a pair of nodes that don't have an edge between them in the graph. We usually train on each positive sample with multiple negative samples.

After having the node embeddings, we compute the similarity scores on positive samples and negative samples. We construct the following ranking loss function on a positive sample and the corresponding negative samples:

$$L = (1 - (r_{ij} - \tilde{r}_{ij}))^2,$$

With this loss, training should increase the difference between positive samples and negative samples.

In [None]:
# NCE loss
def NCE_loss(pos_score, neg_score, neg_sample_size):
    pos_score = F.logsigmoid(pos_score)
    neg_score = F.logsigmoid(-neg_score).reshape(-1, neg_sample_size)
    return -pos_score - torch.sum(neg_score, dim=1)

def rank_loss(pos_score, neg_score, neg_sample_size, mask):
    diff = 1 - (pos_score.unsqueeze(1) - neg_score.reshape(-1, neg_sample_size))
    mask = mask.reshape(-1, neg_sample_size)
    return torch.sum(torch.mul(diff, diff) * mask/2)

def rank_loss2(pos_score, neg_score, neg_sample_size, mask):
    pos_score = pos_score.unsqueeze(1)
    neg_score = neg_score.reshape(-1, neg_sample_size)
    mask = mask.reshape(-1, neg_sample_size)
    return torch.sum(F.logsigmoid(pos_score - neg_score) * mask) * (-1.0)

class GNNRec(nn.Module):
    def __init__(self, gconv_model):
        super(GNNRec, self).__init__()
        self.encode = EncodeLayer(in_feats, n_hidden)
        self.gconv_model = gconv_model

    def forward(self, conv_g, loss_g, pos_g, neg_g, features, neg_sample_size):
        emb = self.encode(features)
        emb = self.gconv_model(conv_g, emb)
        pos_score = score_func(pos_g, emb)
        neg_score = score_func(neg_g, emb)
        verify_neg(neg_g)
        mask = (1- neg_g.edata['false_neg']).to(emb.device).float()
        return torch.mean(rank_loss2(pos_score, neg_score, neg_sample_size, mask))

In this tutorial, we use dot-product similarity to measure the similarity between two nodes.

In [None]:
def score_func(g, emb):
    src_nid, dst_nid = g.all_edges(order='eid')
    # Get the node Ids in the parent graph.
    src_nid = g.parent_nid[src_nid]
    dst_nid = g.parent_nid[dst_nid]
    # Read the node embeddings of the source nodes and destination nodes.
    pos_heads = emb[src_nid]
    pos_tails = emb[dst_nid]
    # cosine similarity
    return torch.sum(pos_heads * pos_tails, dim=1)

We evaluate the performance of the trained item embeddings in the item-based recommendation task. We use the last item that a user purchased to represent the user and compute the similarity between the last item and all other items and select the most similar items. We calculate the ranking of the item that will be purchased among the list of items.

In [None]:
def dot_score(data):
    return np.dot(data, data.transpose())

def knnEvaluate(data, query_eval, item_eval, score_fn):
    scores = score_fn(data) - np.diag(np.ones(data.shape[0]))
    query_scores = scores[query_eval]
    truth_scores = np.expand_dims(scores[query_eval,item_eval], 1)
    return np.mean(np.sum(query_scores >= truth_scores, 1) < 3)

def RecEvaluate(model, g, features, query_eval, item_eval):
    gconv_model.eval()
    with torch.no_grad():
        emb = model.encode(features)
        emb = model.gconv_model(g, emb)
        return knnEvaluate(emb.cpu().numpy(), query_eval, item_eval, dot_score)

Now we put everything in the training loop.

In [None]:
if torch.cuda.is_available():
    device = torch.device('cuda:0')
else:
    device = torch.device('cpu')

#Model hyperparameters
n_hidden = 256
n_layers = 2
dropout = 0.6
aggregator_type = 'sum'

# create GraphSAGE model
gconv_model = GraphSAGEModel(n_hidden,
                             n_hidden,
                             n_hidden,
                             n_layers,
                             F.leaky_relu,
                             dropout,
                             aggregator_type)
    
# Model for link prediction
model = GNNRec(gconv_model).to(device)
conv_g.to(device)
loss_g.to(device)
features = torch.tensor(features, dtype=torch.float32)
features = features.to(device)

# Training hyperparameters
weight_decay = 1e-3
n_epochs = 100
lr = 1e-3
neg_sample_size = 10

# use optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)

# initialize graph
dur = []
prev_acc = 0
for epoch in range(n_epochs):
    losses = []
    model.train()
    for pos_subg, neg_subg in dgl.contrib.sampling.EdgeSampler(loss_g, batch_size=1024,
                                               seed_edges=None,
                                               neg_sample_size=neg_sample_size,
                                               negative_mode='tail',
                                               shuffle=True,
                                               return_false_neg=True):
        loss = model(conv_g, loss_g, pos_subg, neg_subg, features, neg_sample_size)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        losses.append(loss.detach().item())
    acc = RecEvaluate(model, conv_g, features, test_query, test_truth)
    print('Epoch:{}, loss:{:.4f}, HITS@3:{:.4f}'.format(epoch, np.mean(losses), acc))

print()
# Let's save the trained node embeddings.
RecEvaluate(model, conv_g, features, test_query, test_truth)

Here is a baseline of an item-based recommendation model. We can consider the sessions that an item has been viewed/purchased as the item feature and recommend the most similar items. Here we use cosine similarity.

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
hits_10 = knnEvaluate(user_item_spm.transpose().tocsr(), test_query, test_truth, cosine_similarity)
print(hits_10)

In the previous method, an item has a very high dimension. We can first do dimension reduction with SVD and run KNN on the low-dimension embeddings.

In [None]:
def dot_score(data):
    return np.dot(data, data.transpose())

for d in (1, 10, 20, 40, 80, 160, 320, 640, 1000, 10000):
    u, s, vt = spsp.linalg.svds(user_item_spm, k=d)
    v = vt.transpose() * np.sqrt(s).transpose()
    hits_10 = knnEvaluate(v, test_query, test_truth, dot_score)
    print('d={}, hits10={}'.format(d, hits_10))

In [None]:
def dot_score(data):
    return np.dot(data, data.transpose())

from sklearn.metrics.pairwise import cosine_similarity
hits_10 = knnEvaluate(features.cpu().numpy(), test_query, test_truth, dot_score)
print(hits_10)