# Indexable Representation Learning

As mentioned in the presentation, indexable representation refers to recommendation algorihtms whose latent vector representations are immediately sublinear searchable. In this tutorial, we are going to experiment with one of such models, namely Indexable Bayesian Personalized Ranking or IBPR for short.

In [1]:
%matplotlib inline
import numpy as np
import time
import pickle
import matplotlib.pyplot as plt
import scipy.sparse as ss

from cornac.eval_methods import BaseMethod
from cornac.models import BPR, IBPR
from utils.lsh import *
from utils.load_data import *
from utils.pmf import *
from utils.evaluation import *

## Load the train/test data

In [3]:
train   = pickle.load(open('train_data', 'rb'))
test    = pickle.load(open('test_data', 'rb'))

In [4]:
eval_method = BaseMethod.from_provided(train_data=train, test_data=test,
                                       exclude_unknowns=False, verbose=True)

rating_threshold = 1.0
exclude_unknowns = False
Building training set
Number of training users = 6040
Number of training items = 3653
Max rating = 5.0
Min rating = 1.0
Global mean = 3.6
Building test set
Number of tested users = 6040
Number of unknown users = 0
Number of unknown items = 53


## Bayesian Personalized Ranking - BPR

We first measure the performance both after learning and after indexing of BPR model. Details of the paper can be found in the following link: [BPR-paper](https://arxiv.org/ftp/arxiv/papers/1205/1205.2618.pdf).


<img src='resources/images/bpr.png' width = 700>



In [None]:
rec_bpr = BPR(k = 100, max_iter=50, learning_rate=0.01, lamda=0.001, batch_size=5000, init_params={'U':None, 'V':None})
rec_bpr.fit(eval_method.train_set)

pickle.dump(rec_bpr, open('bpr.model', 'wb'))
#rec_bpr = pickle.load(rec_bpr, open('bpr.model', 'rb'))

Shafling the data
epoch: 0 loss: tensor(26494.1270, grad_fn=<ThSubBackward>)


In [None]:
#number of recommendations
topK = 10

queries = rec_bpr.U[1:1000,:]
data    = rec_bpr.V

M = np.linalg.norm(data, axis=1)
max_norm = max(M)
q_norm = np.sqrt((queries * queries).sum(axis=1))
n_queries = queries/q_norm.reshape(queries.shape[0], 1)

n_data = np.concatenate((data, np.sqrt(max_norm**2 - pow(M, 2)).reshape(data.shape[0], -1)), axis = 1)
n_data = n_data/max_norm # normalized data vectors
n_queries = np.concatenate((n_queries, np.zeros((n_queries.shape[0], 1))), axis = 1)
        
bpr_prec, bpr_recall = evaluate_topK(test, data, queries, topK)
print('bpr_prec@{0} \t bpr_recall@{0}'.format(topK))
print('{0}\t{1}'.format(bpr_prec, bpr_recall))
print('-----------------------------------------------------------------------')

topK = 10
b_vals = [4]
L_vals = [10]

print('#table\t #bit \t relative_prec@{0} \t relative_recall@{0} \t touched'.format(topK))
for nt in L_vals:
    print('-----------------------------------------------------------------------')
    for b in b_vals: 
        prec, recall, touched = evaluate_LSHTopK(test, n_data, - n_queries, CosineHashFamily, nt, b, dot, topK)
        print("{0}\t{1}\t{2}\t{3}\t{4}".format(nt, b, prec/bpr_prec, recall/bpr_recall, touched)) 


## Indexable Bayesian Personalized Ranking - IBPR

We will test the effectiveness of indexable model here. In this experiment, we train IBPR with the same data we use for BPR above. Details of the paper can be found in the following link: [IBPR-paper](https://ink.library.smu.edu.sg/sis_research/3884/)

<img src='resources/images/ibpr.png' width=700>



In [None]:
rec_ibpr = IBPR(k = 100, max_iter=50, learning_rate=0.01, lamda=0.001, batch_size=5000, init_params={'U':None, 'V':None})
rec_ibpr.fit(eval_method.train_set)

pickle.dump(rec_ibpr, open('ibpr.model', 'wb'))
#rec_ibpr = pickle.load(open('ibpr.model', 'rb'))

In [None]:
#number of recommendations
topK = 10

queries = rec_ibpr.U[1:1000, :]
data    = rec_ibpr.V

ibpr_prec, ibpr_recall = evaluate_topK(test, data, queries, topK)
print('ibpr_prec@{0} \t ibpr_recall@{0}'.format(topK))
print('{0}\t{1}'.format(ibpr_prec, ibpr_recall))
print('-----------------------------------------------------------------------')

topK = 10
b_vals = [4]
L_vals = [10]

print('#table\t #bit \t relative_prec@{0} \t relative_recall@{0} \t touched'.format(topK))
for nt in L_vals:
    print('-----------------------------------------------------------------------')
    for b in b_vals: 
        prec, recall, touched = evaluate_LSHTopK(test, data, -queries, CosineHashFamily, nt, b, dot, topK)
        print("{0}\t{1}\t{2}\t{3}\t{4}".format(nt, b, prec/ibpr_prec, recall/ibpr_recall, touched)) 