# Problem setting

In this tutorial, we demonstrate how graph neural networks can be used for recommendation. Here we focus on item-based recommendation model. This method in this tutorial recommends items that are similar to the ones purchased by the user. We demonstrate the recommendation model on the MovieLens dataset.

# Get started

DGL can be used with different deep learning frameworks. Currently, DGL can be used with Pytorch and MXNet. Here, we show how DGL works with Pytorch.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

When we load DGL, we need to set the DGL backend for one of the deep learning frameworks. Because this tutorial develops models in Pytorch, we have to set the DGL backend to Pytorch.

In [2]:
import dgl
from dgl import DGLGraph

# Load Pytorch as backend
dgl.load_backend('pytorch')

Load the rest of necessary libraries.

In [3]:
import numpy as np
import pandas as pd
from scipy import stats
from scipy import sparse as spsp

## Bio-Techne data

In [4]:
import pickle
item_data = pickle.load(open('BioTechne/item_data.pkl', 'rb'))
ratings = pickle.load(open('BioTechne/ratings.pkl', 'rb'))
item_id_map = pickle.load(open('BioTechne/item_id_map.pkl', 'rb'))
print(len(ratings))

627444


### Trim data (optional)

The original dataset has many long sessions. These sessions contain orders for different people. Thus, the original dataset is very noisy and confuses the FISM model to make right recommendation.

One way to reduce noise in the dataset is to trim the original dataset and only keep the orders in the last hour in each session. Only keeping the orders in the last hour may make sessions less noisy.

In [None]:
import datetime

ratings = ratings.sort_values(['user_idx', 'timestamp'])
def trim_on_time(df):
    oldest_time = np.max(df['timestamp']) - datetime.timedelta(seconds=3600)
    return df[df['timestamp'] > oldest_time]
ratings = ratings.groupby('user_idx').apply(trim_on_time)
ratings = ratings.drop_duplicates(['user_idx', 'item_idx'])
ratings['sess_id'] = ratings['user_idx']

### Split long sessions (optional)

Another option is to split the long sessions into short ones. The criteria of splitting the dataset is: if two consecutive orders are placed within 10 minutes, we consider them in the same session.

In [5]:
import datetime

ratings = ratings.sort_values(['user_idx', 'timestamp'])
def split_on_time(df):
    time1 = np.array(df['timestamp'][1:], dtype=np.int64)
    time2 = np.array(df['timestamp'][:-1], dtype=np.int64)
    assignment = np.cumsum((time1 - time2)/1000000000 > 600)
    assignment = np.concatenate([np.array([0], dtype=np.int64), assignment])
    assert len(df) == len(assignment)
    df['assign'] = assignment
    return df
# This is to split the orders within a session into subsessions based on the timestamp.
# If two contiguous orders are placed within less than 30 minutes, they are considered within the same subsession.
ratings = ratings.groupby('user_idx').apply(split_on_time)
# drop the duplicated items within a subsession.
ratings = ratings.drop_duplicates(['user_idx', 'assign', 'item_idx'])
# Remove the subsessions that have fewer than 2 orders.
ratings = ratings.groupby(['user_idx', 'assign']).filter(lambda df: len(df) > 2)

In [6]:
print('#original users:', np.max(ratings['user_idx']) + 1)
# a combination of 'user_idx' and 'assign' defines a new user
num_users = len(ratings.groupby(['user_idx', 'assign']))
print('#new users:', num_users)
user_idx = ratings['user_idx']
assign = ratings['assign']
num_groups = np.max(assign) + 1
# The session id identifies a unique subsession.
ratings['sess_id'] = user_idx * num_groups + assign
assert num_users == len(np.unique(ratings['sess_id']))

#original users: 14624
#new users: 31132


In [7]:
uniq_sess = np.unique(ratings['sess_id'])
sess_map = {sess_id: i for i, sess_id in enumerate(uniq_sess)}
ratings['sess_id'] = ratings['sess_id'].map(sess_map)
print(np.max(ratings['sess_id']))

31131


### Prepare data to run models

Here we split orders and place them in three sets: training, validation and testing.

In [8]:
user_idx = ratings['sess_id']
user_item_spm = spsp.coo_matrix((np.ones(len(ratings)), (user_idx, ratings['item_idx'])))
features = np.concatenate([item_data['title'], item_data['item_types']], 1)

user_span = np.zeros(user_item_spm.shape[0])
for i in range(user_item_spm.shape[0]):
    user_rating = ratings[user_idx == i]
    if len(user_rating) == 0:
        continue
    timestamp = np.array(user_rating['timestamp'])
    start = np.min(timestamp)
    end = np.max(timestamp)
    span = int(end - start)/1000000000
    user_span[i] = span

We can potentially use the embeddings computed from the knowledge graph. For now, let's not use the knowledge graph embeddings.

In [None]:
'''
kge_feats = pickle.load(open('BioTechne/bio-techne_entity_embed_features.pkl', 'rb'))
item_kge_map = pd.read_csv('BioTechne/_matched_item2feaidx', sep='\t', header=None)
item_kge_map = {item_name: kge_idx for kge_idx, item_name in zip(item_kge_map[0], item_kge_map[1])}
print(len(item_kge_map))

orig_idx2_kge_idx = {}
for item_id, item_idx in zip(ratings['item_id'], ratings['item_idx']):
    if item_id in item_kge_map:
        orig_idx2_kge_idx[item_idx] = item_kge_map[item_id]

exist_mask = np.zeros((len(ratings['item_id'])))
for i, item_idx in enumerate(ratings['item_idx']):
    if item_idx in orig_idx2_kge_idx:
        exist_mask[i] = 1
exist_mask = (exist_mask == 1)
ratings = ratings[exist_mask]

item_feats = np.zeros((len(orig_idx2_kge_idx), features.shape[1]), dtype=np.float32)
for orig_idx, kge_idx in orig_idx2_kge_idx.items():
    item_feats[kge_idx] = features[orig_idx]

#features = kge_feats
print(kge_feats.dtype)
print(item_feats.dtype)
features = item_feats
print(features.shape)

user_item_spm = spsp.coo_matrix((np.ones(len(ratings)), (ratings['user_idx'], ratings['item_idx'])))
user_idx = user_item_spm.row
item_idx = user_item_spm.col
for i, orig_idx in enumerate(item_idx):
    if orig_idx in orig_idx2_kge_idx:
        item_idx[i] = orig_idx2_kge_idx[orig_idx]
    else:
        item_idx[i] = -1
user_idx = user_idx[item_idx >= 0]
item_idx = item_idx[item_idx >= 0]
user_item_spm = spsp.coo_matrix((np.ones(len(user_idx)), (user_idx, item_idx))).tocsr()
print(user_item_spm.shape)
'''

In [9]:
print(user_item_spm.shape)
num_items = user_item_spm.shape[1]
user_deg = user_item_spm.dot(np.ones((num_items)))
print(np.sum(user_deg == 1))
print(np.sum(user_deg == 2))
user_item_spm = user_item_spm.tocsr()
user_item_spm = user_item_spm[np.nonzero(user_deg > 2)]
user_span = user_span[np.nonzero(user_deg > 2)]
print(user_item_spm.shape)
num_users = user_item_spm.shape[0]
num_items = user_item_spm.shape[1]

(31132, 3256)
0
0
(31132, 3256)


Split the dataset into training, validation and testing. For each user/session, we randomly pick an item from this user as the validation item. Similarly, we randomly pick an item as the test item. We use the remaining items as the training set for the user.

In [10]:
def pick_test(user_item_spm):
    user_item_spm = user_item_spm.tocoo()
    users = user_item_spm.row
    items = user_item_spm.col
    picks = np.zeros(shape=(len(users)))
    user_item_spm = user_item_spm.tocsr()
    indptr = user_item_spm.indptr
    valid_set = np.zeros(shape=(num_users))
    test_set = np.zeros(shape=(num_users))
    for i in range(user_item_spm.shape[0]):
        start_idx = indptr[i]
        end_idx = indptr[i+1]
        idx = np.random.choice(np.arange(start_idx, end_idx), 2, replace=False)
        valid_set[i] = items[idx[0]]
        picks[idx[0]] = 1
        test_set[i] = items[idx[1]]
        picks[idx[1]] = 1
    users = users[picks == 0]
    items = items[picks == 0]
    return spsp.coo_matrix((np.ones((len(users),)), (users, items))), valid_set, test_set

np.random.seed(0)
orig_user_item_spm = user_item_spm.tocsr()
user_item_spm, valid_set, test_set = pick_test(user_item_spm)
print('#training size:', user_item_spm.nnz)
users_valid = np.arange(num_users)
items_valid = valid_set
users_test = np.arange(num_users)
items_test = test_set
valid_size = len(users_valid)
test_size = len(users_test)
num_users = user_item_spm.shape[0]
num_items = user_item_spm.shape[1]
print('valid set:', valid_size)
print('test set:', test_size)

#training size: 117267
valid set: 31132
test set: 31132


we compute SVD on the training dataset to generate more item embeddings. Here we compute 100 singular vectors for each item and concatenate them with items' original features (titles and item types).

In [11]:
u, s, vt = spsp.linalg.svds(user_item_spm, k=100)
v = vt.transpose() * np.sqrt(s).transpose()
features = np.concatenate((item_data['title'], item_data['item_types'], v), 1).astype(np.float32)
in_feats = features.shape[1]

In [12]:
user_deg = user_item_spm.dot(np.ones((num_items)))
print(user_item_spm.shape)
print(np.sum(user_deg < 3))

(31132, 3256)
18086


Save the data into a file so that we can make a fair comparion with other recommendation models.

In [None]:
orig_user_item_spm = orig_user_item_spm.tocoo()
orig_users = orig_user_item_spm.row
orig_items = orig_user_item_spm.col
valid_set = set()
test_set = set()
for user, item in zip(users_valid, items_valid):
    valid_set.add((user, item))
for user, item in zip(users_test, items_test):
    test_set.add((user, item))
valid_mask = np.zeros((len(orig_users)), dtype=np.int64)
test_mask = np.zeros((len(orig_users)), dtype=np.int64)
for i in range(len(valid_mask)):
    user = orig_users[i]
    item = orig_items[i]
    if (user, item) in valid_set:
        valid_mask[i] = 1
    elif (user, item) in test_set:
        test_mask[i] = 1
assert np.sum(valid_mask) == num_users
assert np.sum(test_mask) == num_users
full_data = np.concatenate([np.expand_dims(orig_users, 1),
                            np.expand_dims(orig_items, 1),
                            np.expand_dims(valid_mask, 1),
                            np.expand_dims(test_mask, 1)], 1)
np.savetxt('bio-techne-split-sess.csv', full_data, fmt='%d', delimiter=',')

# verify
test_spm = user_item_spm.tocsr()
for i in range(len(full_data)):
    user = full_data[i, 0]
    item = full_data[i, 1]
    if full_data[i, 2] == 1:
        assert item == items_valid[user]
    elif full_data[i, 3] == 1:
        assert item == items_test[user]
    else:
        assert test_spm[user, item] != 0


# The recommendation model

At large, the model first learns item embeddings from the user-item interaction dataset and use the item embeddings to recommend users similar items they have purchased. To learn item embeddings, we first need to construct an item similarity graph and train GNN on the item graph.

There are many ways of constructing the item similarity graph. Here we use the [SLIM model](https://dl.acm.org/citation.cfm?id=2118303) to learn item similarity and use the learned result to construct the item graph. The resulting graph will have an edge between two items if they are similar and the edge has a weight that represents the similarity score.

After the item similarity graph is constructed, we run a GNN model on it and use the vertex connectivity as the training signal to train the GNN model. The GNN training procedure is very similar to the link prediction task in [the previous section](https://github.com/zheng-da/DGL_devday_tutorial/blob/master/BasicTasks_pytorch.ipynb).

## Construct the item graph with SLIM
SLIM is an item-based recommendation model. When training SLIM on a user-item dataset, it learns an item similarity graph. This similarity graph is the item graph we construct for the GNN model.

Please follow the instruction on the [SLIM github repo](https://github.com/KarypisLab/SLIM) to install SLIM.

To use SLIM to generate an item similarity graph, there are two hyperparameters we can tune. `l1r` is the co-efficient for the L1 regularization and `l2r` is the co-efficient for the L2 regularization. Increasing `l1r` will generate a sparser similarity graph and increasing `l2r` leads to a denser similarity graph.

In [None]:
from graph_construct import create_SLIM_graph
item_spm = create_SLIM_graph(user_item_spm, l1r=1, l2r=1, test=False)
use_edge_similarity = True

In [None]:
topk=30
dense_item = np.sort(item_spm.todense())
topk_item = dense_item[:,-topk]
topk_item_spm = item_spm > topk_item
topk_item_spm = spsp.csr_matrix(topk_item_spm)
item_spm = item_spm.multiply(topk_item_spm)

In [None]:
deg = item_spm.dot(np.ones((num_items)))
print(item_spm.nnz)
print(np.sum(deg == 0))
print(len(deg))
print(item_spm.sum(0))

## Construct the co-occurence graph
Or we can simply construct a co-occurrence graph. That is, if two items are used by the same user, we draw an edge between these two items.

When using this method for graph construction, there are also two hyperparameters to tune. `downsample_factor` controls how much we should down sample user-item pairs based on the frequency of items. A larger `downsample_factor` leads more down sampling. `topk` controls how many items should an item connect to. If it's None, an item connects to all items that have co-occurrence with the item; otherwise, an item connects with the most frequently co-occurred items.

In [None]:
from graph_construct import create_cooccur_graph
item_spm = create_cooccur_graph(user_item_spm, downsample_factor=1e-5, topk=50)
use_edge_similarity = False

## Construct the cosine-similarity graph
We can also use cosine similarity to build a graph. We compute the cosine similarity of the neighborhoods of every pair of items. This is quite similar to co-occurrence graph except that we use cosine similarity instead of the number of co-occurrence to measure the similarity of two items.

In this case, there is one hyperparameter `topk`. If it's specified, an item connects to top K most similar items in terms of cosine similarity in the neighborhood.

In [13]:
from graph_construct import create_cosine_graph
item_spm = create_cosine_graph(user_item_spm, topk=10)
use_edge_similarity = True

Once we construct the graph, we load it to the DGL graph.

In [14]:
g = dgl.DGLGraph(item_spm, readonly=True)
g.edata['similarity'] = torch.tensor(item_spm.data, dtype=torch.float32)
g.ndata['feats'] = torch.tensor(features)
#g.ndata['id'] = torch.arange(num_items, dtype=torch.int64)
print('#nodes:', g.number_of_nodes())
print('#edges:', g.number_of_edges())

#nodes: 3256
#edges: 27759


## GNN models

We run GNN on the item graph to compute item embeddings. In this tutorial, we use a customized [GraphSage](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) model to compute node embeddings. The original GraphSage performs the following computation on every node $v$ in the graph:

$$h_{N(v)}^{(l)} \gets AGGREGATE_k({h_u^{(l-1)}, \forall u \in N(v)})$$
$$h_v^{(l)} \gets \sigma(W^k \cdot CONCAT(h_v^{(l-1)}, h_{N(v)}^{(l)})),$$

where $N(v)$ is the neighborhood of node $v$ and $l$ is the layer Id.

The original GraphSage model treats each neighbor equally. However, the SLIM model learns the item similarity based on the user-item iteration. The GNN model should take the similarity into account. Thus, we customize the GraphSage model in the following fashion. Instead of aggregating all neighbors equally, we aggregate neighbors embeddings rescaled by the similarity on the edges. Thus, the aggregation step is defined as follows:

$$h_{N(v)}^{(l)} \gets \Sigma_{u \in N(v)}({h_u^{(l-1)} * s_{uv}}),$$

where $s_{uv}$ is the similarity score between two vertices $u$ and $v$.

The GNN model has multiple layers. In each layer, a vertex accesses its direct neighbors. When we stack $k$ layers in a model, a node $v$ access neighbors within $k$ hops. The output of the GNN model is node embeddings that represent the nodes and all information in the k-hop neighborhood.

<img src="https://github.com/zheng-da/DGL_devday_tutorial/raw/master/GNN.png" alt="drawing" width="600"/>

We implement the computation in each layer of the customized GraphSage model in `SAGEConv` and implement the multi-layer model in `GraphSAGEModel`.

In [15]:
if use_edge_similarity:
    from sageconv import SAGEConv
else:
    from dgl.nn.pytorch.conv import SAGEConv

class GraphSAGEModel(nn.Module):
    def __init__(self,
                 in_feats,
                 n_hidden,
                 out_dim,
                 n_layers,
                 activation,
                 dropout,
                 aggregator_type):
        super(GraphSAGEModel, self).__init__()
        self.norm = nn.LayerNorm((out_dim,))
        self.layers = nn.ModuleList()
        if n_layers == 1:
            self.layers.append(SAGEConv(in_feats, out_dim, aggregator_type,
                                        feat_drop=dropout, activation=None))
        elif n_layers > 1:
            # input layer
            self.layers.append(SAGEConv(in_feats, n_hidden, aggregator_type,
                                        feat_drop=dropout, activation=activation))
            # hidden layer
            for i in range(n_layers - 2):
                self.layers.append(SAGEConv(n_hidden, n_hidden, aggregator_type,
                                            feat_drop=dropout, activation=activation))
            # output layer
            self.layers.append(SAGEConv(n_hidden, out_dim, aggregator_type,
                                        feat_drop=dropout, activation=None))

    def forward(self, g, features):
        h = features
        for layer in self.layers:
            if use_edge_similarity:
                h = layer(g, h, g.edata['similarity'])
            else:
                h = layer(g, h)
            #h = tmp + prev_h
            #prev_h = h
        h = self.norm(h)
        return h

## Train Item Embeddings

We train the item embeddings with the edges in the item graph as the training signal. This step is very similar to the link prediction task in the [basic applications](https://github.com/zheng-da/DGL_devday_tutorial/blob/master/BasicTasks_pytorch.ipynb).

Because the MovieLens dataset has sparse features (both genre and title are stored as multi-hot encoding). The sparse features have many dimensions. To run GNN on the item features, we first create an encoding layer to project the sparse features to a lower dimension. 

In [16]:
def mix_embeddings(h, ndata, emb, proj):
    '''Combine node-specific trainable embedding ``h`` with categorical inputs
    (projected by ``emb``) and numeric inputs (projected by ``proj``).
    '''
    e = []
    for key, value in ndata.items():
        if value.dtype == torch.int64:
            e.append(emb[key](value))
        elif value.dtype == torch.float32:
            e.append(proj[key](value))
    if len(e) == 0:
        return h
    else:
        return h + torch.stack(e, 0).sum(0)
    
class EncodeLayer(nn.Module):
    def __init__(self, ndata, num_hidden, device):
        super(EncodeLayer, self).__init__()
        self.proj = nn.ModuleDict()
        self.emb = nn.ModuleDict()
        for key in ndata.keys():
            vals = ndata[key]
            if vals.dtype == torch.float32:
                self.proj[key] = nn.Linear(ndata[key].shape[1], num_hidden)
                #self.proj[key] = nn.Sequential(
                #                    nn.Linear(ndata[key].shape[1], num_hidden),
                #                    nn.LeakyReLU(),
                #                    )
            elif vals.dtype == torch.int64:
                self.emb[key] = nn.Embedding(
                            vals.max().item() + 1,
                            num_hidden,
                            padding_idx=0)
                
    def forward(self, ndata):
        return mix_embeddings(0, ndata, self.emb, self.proj)

In [17]:
class FISMrating(nn.Module):
    r"""
    PinSAGE + FISM for item-based recommender systems
    The formulation of FISM goes as
    .. math::
       r_{ui} = b_u + b_i + \left(n_u^+\right)^{-\alpha}
       \sum_{j \in R_u^+} p_j q_i^\top
    In FISM, both :math:`p_j` and :math:`q_i` are trainable parameters.  Here
    we replace them as outputs from two PinSAGE models ``P`` and
    ``Q``.
    """
    def __init__(self, P, Q, num_users, num_movies, alpha=0):
        super().__init__()

        self.P = P
        self.Q = Q
        self.b_u = nn.Parameter(torch.zeros(num_users))
        self.b_i = nn.Parameter(torch.zeros(num_movies))
        self.alpha = alpha

    
    def forward(self, I, U, I_neg, I_U, N_U, test):
        '''
        I: 1D LongTensor
        U: 1D LongTensor
        I_neg: 2D LongTensor (batch_size, n_negs)
        '''
        batch_size = I.shape[0]
        device = I.device
        I_U = I_U.to(device)
        # number of interacted items
        N_U = N_U.to(device)
        U_idx = torch.arange(U.shape[0], device=device).repeat_interleave(N_U)

        q = self.Q(I)
        p = self.P(I_U)
        # If this is training, we need to subtract the embedding of the self node from the context embedding
        if not test:
            p_self = self.P(I)
        p_sum = torch.zeros_like(q)
        p_sum = p_sum.scatter_add(0, U_idx[:, None].expand_as(p), p)    # batch_size, n_dims
        if test:
            p_ctx = p_sum
            pq = (p_ctx * q).sum(1) / (N_U.float() ** self.alpha)
        else:
            p_ctx = p_sum - p_self
            pq = (p_ctx * q).sum(1) / ((N_U.float() - 1).clamp(min=1) ** self.alpha)
        r = self.b_u[U] + self.b_i[I] + pq

        if I_neg is not None:
            n_negs = I_neg.shape[1]
            I_neg_flat = I_neg.view(-1)
            q_neg = self.Q(I_neg_flat)
            q_neg = q_neg.view(batch_size, n_negs, -1)  # batch_size, n_negs, n_dims
            if test:
                pq_neg = (p_ctx.unsqueeze(1) * q_neg).sum(2) / (N_U.float().unsqueeze(1) ** self.alpha)
            else:
                pq_neg = (p_ctx.unsqueeze(1) * q_neg).sum(2) / ((N_U.float() - 1).clamp(min=1).unsqueeze(1) ** self.alpha)
            r_neg = self.b_u[U].unsqueeze(1) + self.b_i[I_neg] + pq_neg
            return r, r_neg
        else:
            return r

We use the FISM model to train.

In [18]:
beta = 0
gamma = 0

def rank_loss2(pos_score, neg_score, true_neg):
    pos_score = torch.unsqueeze(pos_score, 1)
    return torch.sum(torch.mul(F.logsigmoid(pos_score - neg_score), true_neg)) * (-1.0)
    #return torch.sum(F.logsigmoid(pos_score - neg_score)) * (-1.0)


class FISM(nn.Module):
    def __init__(self, user_item_spm, gconv_p, gconv_q, g, num_hidden, device):
        super(FISM, self).__init__()
        num_users = user_item_spm.shape[0]
        num_movies = user_item_spm.shape[1]
        self.encode_p = EncodeLayer(g.ndata, num_hidden, device)
        self.encode_q = EncodeLayer(g.ndata, num_hidden, device)
        self.gconv_p = gconv_p
        self.gconv_q = gconv_q
        P = lambda I: self.gconv_p(g, self.encode_p(g.ndata))[I]
        Q = lambda I: self.gconv_q(g, self.encode_q(g.ndata))[I]
        self.fism_rating = FISMrating(P, Q, num_users, num_movies, 1)

    def est_rating(self, I, U, I_neg, I_U, N_U):
        r, r_neg = self.fism_rating(I, U, I_neg, I_U, N_U, True)
        neg_sample_size = int(len(r_neg) / len(r))
        return torch.unsqueeze(r, 1), r_neg.reshape((-1, neg_sample_size))

    def loss(self, r_ui, neg_r_ui, true_neg):
        return rank_loss2(r_ui, neg_r_ui, true_neg)
        #diff = 1 - (r_ui - neg_r_ui)
        #return torch.sum(torch.mul(diff, diff)/2)# \
        #    + beta/2 * torch.sum(torch.mul(P, P) + torch.mul(Q, Q)) \
        #    + gamma/2 * (torch.sum(torch.mul(self.fism_rating.b_u, self.fism_rating.b_u)) \
        #                 + torch.sum(torch.mul(self.fism_rating.b_i, self.fism_rating.b_i)))

    def forward(self, I, U, I_neg, true_neg, I_U, N_U):
        r, r_neg = self.fism_rating(I, U, I_neg, I_U, N_U, False)
        #neg_sample_size = int(len(r_neg) / len(r))
        #r_neg = r_neg.reshape((-1, neg_sample_size))
        return self.loss(r, r_neg, true_neg)

In [19]:
class EdgeSampler:
    def __init__(self, user_item_spm, batch_size, neg_sample_size):
        edge_ids = np.random.permutation(user_item_spm.nnz)
        self.batches = np.split(edge_ids, np.arange(batch_size, len(edge_ids), batch_size))
        self.idx = 0
        user_item_spm = user_item_spm.tocoo()
        self.users = user_item_spm.row
        self.movies = user_item_spm.col
        self.user_item_spm = user_item_spm.tocsr()
        self.num_movies = user_item_spm.shape[1]
        self.num_users = user_item_spm.shape[0]
        self.neg_sample_size = neg_sample_size
        
    def __next__(self):
        if self.idx == len(self.batches):
            raise StopIteration
        batch = self.batches[self.idx]
        self.idx += 1
        I = self.movies[batch]
        U = self.users[batch]
        neighbors = self.user_item_spm[U]
        I_neg = np.random.choice(num_items, self.neg_sample_size * len(batch)).reshape(-1, self.neg_sample_size)
        true_neg = np.zeros_like(I_neg)
        for i in range(self.neg_sample_size):
            true_neg[:,i] = self.user_item_spm[U, I_neg[:,i]] == 0
        I = torch.LongTensor(I).to(device)
        U = torch.LongTensor(U).to(device)
        I_neg = torch.LongTensor(I_neg).to(device)
        I_U = torch.LongTensor(neighbors.indices).to(device)
        N_U = torch.LongTensor(neighbors.indptr[1:] - neighbors.indptr[:-1]).to(device)
        true_neg = torch.FloatTensor(true_neg).to(device)
        return I, U, I_neg, true_neg, I_U, N_U
    
    def __iter__(self):
        return self

We evaluate the performance of the trained item embeddings in the item-based recommendation task. We use the last item that a user purchased to represent the user and compute the similarity between the last item and a list of items (an item the user will purchase and a set of randomly sampled items). We calculate the ranking of the item that will be purchased among the list of items.

In [20]:
def RecEval(model, user_item_spm, k, users_eval, items_eval, neg_eval):
    model.eval()
    with torch.no_grad():
        neg_items_eval = neg_eval[users_eval]
        neighbors = user_item_spm.tocsr()[users_eval]
        I_U = torch.LongTensor(neighbors.indices)
        N_U = torch.LongTensor(neighbors.indptr[1:] - neighbors.indptr[:-1])
        r, neg_r = model.est_rating(torch.LongTensor(items_eval).to(device),
                                    torch.LongTensor(users_eval).to(device),
                                    torch.LongTensor(neg_items_eval).to(device),
                                    I_U.to(device),
                                    N_U.to(device))
        neg_sample_size = int(len(neg_r) / len(r))
        neg_r = neg_r.reshape((-1, neg_sample_size))
        hits = (torch.sum(neg_r >= r, 1) <= k).cpu().numpy()
        return np.mean(hits)

In [25]:
user_item_spm = user_item_spm.tocoo()
all_users = []
all_items = []
all_users.append(user_item_spm.row)
all_items.append(user_item_spm.col)
all_users.append(users_valid)
all_items.append(items_valid)
all_users = np.concatenate(all_users).astype(np.int64)
all_items = np.concatenate(all_items).astype(np.int64)
all_spm1 = spsp.coo_matrix((np.ones((len(all_users))), (all_users, all_items)))

all_users = []
all_items = []
all_users.append(user_item_spm.row)
all_items.append(user_item_spm.col)
all_users.append(users_valid)
all_items.append(items_valid)
all_users.append(users_test)
all_items.append(items_test)
all_users = np.concatenate(all_users).astype(np.int64)
all_items = np.concatenate(all_items).astype(np.int64)
all_spm2 = spsp.coo_matrix((np.ones((len(all_users))), (all_users, all_items)))
        
def RecEvalAll(model, user_item_spm, k, eval_type):
    model.eval()
    if eval_type == 'valid':
        ctx_spm = user_item_spm.tocsr()
        pos_spm = all_spm1.tocsr()
        users_eval = users_valid
        items_eval = items_valid
    elif eval_type == 'test':
        ctx_spm = all_spm1.tocsr()
        pos_spm = all_spm2.tocsr()
        users_eval = users_test
        items_eval = items_test
    else:
        raise Exception()

    batch_size = 1024
    batches = np.split(np.arange(len(users_eval)), np.arange(batch_size, len(users_eval), batch_size))
    all_hits = np.zeros((user_item_spm.shape[0],), dtype=np.int64)
    with torch.no_grad():
        hits_list = []
        for idx in batches:
            users = users_eval[idx]
            items = items_eval[idx]
            neg_items_eval = np.tile(np.arange(num_items), len(users)).reshape(len(users), num_items)
            neigh_ctx = ctx_spm[users]
            pos_neighbors = pos_spm[users]
            assert neigh_ctx.nnz > 0
            I_U = torch.LongTensor(neigh_ctx.indices)
            N_U = torch.LongTensor(neigh_ctx.indptr[1:] - neigh_ctx.indptr[:-1])
            r, neg_r = model.est_rating(torch.LongTensor(items).to(device),
                                        torch.LongTensor(users).to(device),
                                        torch.LongTensor(neg_items_eval).to(device),
                                        I_U.to(device),
                                        N_U.to(device))
            neg_sample_size = num_items
            # Here neg_r includes the scores on the positive edges. let's make the scores
            # on the positive edges very small. This is equivalent to exclude positive edges
            # from negative edges.
            neg_r = neg_r.reshape((-1, neg_sample_size)).cpu().numpy() - pos_neighbors * 10
            hits = (np.sum(neg_r > r.cpu().numpy(), 1) < k)
            all_hits[idx] = np.squeeze(hits)
            #hits_list.append(hits)
    #ranges = [0, 60, 600, 1800, 3600, 3600*2, 3600*5, 3600*24]
    #for i in range(len(ranges) - 1):
    #    user_in_range = np.logical_and(user_span >= ranges[i], user_span < ranges[i+1])
    #    if np.sum(user_in_range) > 0:
    #        print('[{}-{})({})'.format(ranges[i], ranges[i+1], np.sum(user_in_range)),
    #              np.mean(all_hits[user_in_range]))
    return np.mean(all_hits)

Now we put everything in the training loop.

In [26]:
g.ndata

{'feats': tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -8.5190e-03,
          7.1877e-03, -8.0836e-04],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ..., -9.3564e-03,
          9.2015e-03, -8.4092e-04],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  4.1970e-04,
          4.8619e-03, -1.1785e-03],
        ...,
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  1.1188e-01,
          1.2709e-01, -2.5882e-03],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  4.1160e-02,
          6.2134e-02, -9.3057e-04],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  1.0177e+00,
          1.0163e+00, -1.0442e-02]], device='cuda:0')}

In [27]:
import time

if torch.cuda.is_available():
    device = torch.device('cuda:0')
else:
    device = torch.device('cpu')

#Model hyperparameters
n_hidden = 256
n_layers = 1
dropout = 0.4
aggregator_type = 'sum' if use_edge_similarity else 'gcn'

# create GraphSAGE model
gconv_p = GraphSAGEModel(n_hidden,
                         n_hidden,
                         n_hidden,
                         n_layers,
                         F.relu,
                         dropout,
                         aggregator_type)

gconv_q = GraphSAGEModel(n_hidden,
                         n_hidden,
                         n_hidden,
                         n_layers,
                         F.relu,
                         dropout,
                         aggregator_type)

model = FISM(user_item_spm, gconv_p, gconv_q, g, n_hidden, device).to(device)
g.to(device)

# Training hyperparameters
weight_decay = 1e-5
n_epochs = 200
lr = 1e-3
neg_sample_size = 20

# use optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)

batch_size = 1024
print('#edges:', user_item_spm.nnz)
print('#batch/epoch:', user_item_spm.nnz/batch_size)

# initialize graph
dur = []
best_acc = 0
for epoch in range(n_epochs):
    model.train()
    losses = []
    start = time.time()
    negs = []
    for I, U, I_neg, true_neg, I_U, N_U in EdgeSampler(user_item_spm, batch_size, neg_sample_size):
        loss = model(I, U, I_neg, true_neg, I_U, N_U)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        losses.append(loss.detach().item())
    train_time = time.time() - start
    
    start = time.time()
    hits10_sub = 0
    hits_all = RecEvalAll(model, user_item_spm, 10, 'valid')
    eval_time = time.time() - start
    print("Epoch {:05d} | train {:.4f} | eval {:.4f} | Loss {:.4f} | HITS@10 sub:{:.4f} | HITS@10 all:{:.4f}".format(
        epoch, train_time, eval_time, np.mean(losses), hits10_sub, hits_all))
    if best_acc < hits_all:
        best_acc = hits_all
        test_hits_all = RecEvalAll(model, user_item_spm, 10, 'test')
        print('test acc:{:.4f}'.format(test_hits_all))


#edges: 117267
#batch/epoch: 114.5185546875
Epoch 00000 | train 2.3398 | eval 2.1600 | Loss 20555.3052 | HITS@10 sub:0.0000 | HITS@10 all:0.0947
test acc:0.1012
Epoch 00001 | train 2.3359 | eval 2.1433 | Loss 10008.6790 | HITS@10 sub:0.0000 | HITS@10 all:0.1020
test acc:0.1100
Epoch 00002 | train 2.3366 | eval 2.1486 | Loss 9272.1274 | HITS@10 sub:0.0000 | HITS@10 all:0.1034
test acc:0.1119
Epoch 00003 | train 2.3346 | eval 2.1427 | Loss 8843.9131 | HITS@10 sub:0.0000 | HITS@10 all:0.1082
test acc:0.1158
Epoch 00004 | train 2.3393 | eval 2.1462 | Loss 8555.6296 | HITS@10 sub:0.0000 | HITS@10 all:0.1115
test acc:0.1189
Epoch 00005 | train 2.3384 | eval 2.1438 | Loss 8272.8355 | HITS@10 sub:0.0000 | HITS@10 all:0.1151
test acc:0.1274
Epoch 00006 | train 2.3377 | eval 2.1417 | Loss 8061.9247 | HITS@10 sub:0.0000 | HITS@10 all:0.1210
test acc:0.1312
Epoch 00007 | train 2.3380 | eval 2.1570 | Loss 7841.2546 | HITS@10 sub:0.0000 | HITS@10 all:0.1235
test acc:0.1354
Epoch 00008 | train 2.3383

Epoch 00077 | train 2.4123 | eval 1.7187 | Loss 4489.7421 | HITS@10 sub:0.0000 | HITS@10 all:0.1551
Epoch 00078 | train 2.4064 | eval 1.7177 | Loss 4479.5447 | HITS@10 sub:0.0000 | HITS@10 all:0.1549
Epoch 00079 | train 2.4100 | eval 1.7192 | Loss 4463.2181 | HITS@10 sub:0.0000 | HITS@10 all:0.1538
Epoch 00080 | train 2.4073 | eval 1.7190 | Loss 4446.2377 | HITS@10 sub:0.0000 | HITS@10 all:0.1545
Epoch 00081 | train 2.4098 | eval 1.7212 | Loss 4433.9150 | HITS@10 sub:0.0000 | HITS@10 all:0.1548
Epoch 00082 | train 2.4092 | eval 1.7176 | Loss 4436.8742 | HITS@10 sub:0.0000 | HITS@10 all:0.1558
Epoch 00083 | train 2.4110 | eval 1.7162 | Loss 4397.6203 | HITS@10 sub:0.0000 | HITS@10 all:0.1554
Epoch 00084 | train 2.4090 | eval 1.7173 | Loss 4403.2628 | HITS@10 sub:0.0000 | HITS@10 all:0.1548
Epoch 00085 | train 2.4083 | eval 1.7181 | Loss 4394.7864 | HITS@10 sub:0.0000 | HITS@10 all:0.1557
Epoch 00086 | train 2.4065 | eval 1.7210 | Loss 4398.4028 | HITS@10 sub:0.0000 | HITS@10 all:0.1533


KeyboardInterrupt: 