# Advanced BPR

## Load Data

In [1]:
from corpus import Corpus

corpus = Corpus()
reviews_path = "data/amzn/reviews_Women_ALL_scraped.csv"
images_path = "data/amzn/image_features_Women.b"
corpus.load_data(reviews_path, images_path, 5, 0);

Loading dataset from:  data/amzn/reviews_Women_ALL_scraped.csv
generating stats...
84090 302230 714381
Loading image features from:  data/amzn/image_features_Women.b
0
100000
200000
300000
400000
500000
600000
700000
800000
extracted image feature count:  300401


## Setup Testbench

We need to start a tensorflow session and choose a BRP sampling method:

In [2]:
import tensorflow as tf
import sampling

sampler = sampling.Uniform()

### BPR

First let's test BPR:

In [3]:
#clear the graph if you rerun this cell
tf.reset_default_graph()
session = tf.Session()

from models.bpr import BPR
K=20
reg=10.0
bias_reg=0.01
bpr = BPR(session, corpus, sampler, K, reg, bias_reg)

BPR - K=20, reg_lf: 10.00, reg_bias=0.01


In [4]:
batch_size=128
batch_count=400
iterations=10
for iteration, duration, train_loss in bpr.train(10, batch_size, batch_count):
    print iteration, duration, train_loss

max_iterations: 10, batch_size: 128, batch_count: 400
1 27.5626740456 733.75
2 27.186975956 646.483
3 27.8556120396 562.475
4 26.7566859722 489.321
5 25.9751479626 427.084
6 25.9668660164 370.275
7 25.9990429878 323.807
8 26.5436167717 283.306
9 26.1647830009 247.311
10 26.4177498817 216.746


#### Evaluation

Let's check the performance of our BPR model using overall AUC and coldstart AUC:

In [9]:
bpr.evaluate(bpr.val_ratings, sample_size=1000)

(0.5975979, 1772.3512)

In [10]:
bpr.evaluate(bpr.test_ratings, sample_size=1000)

(0.61772799, 1781.6512)

In [11]:
bpr.evaluate(bpr.test_ratings, sample_size=1000, cold_start=True)

(0.47666404, 2134.5886)

### VBPR

Now lets check VBPR:

In [16]:
#clear the graph if you rerun this cell
tf.reset_default_graph()
session = tf.Session()

from models.vbpr import VBPR
K=10
K2=10
reg=10.0
bias_reg=0.01
vbpr = VBPR(session, corpus, sampler, K, K2, reg, bias_reg)

VBPR - K=10, K2=10, reg_lf=10.00, reg_bias=0.01


In [17]:
for iteration, duration, train_loss in vbpr.train(iterations, batch_size, batch_count):
    print iteration, duration, train_loss

1 26.6737940311 363.681
2 25.7442569733 319.802
3 26.2017970085 278.399
4 25.1992931366 241.885
5 26.0139992237 211.026
6 25.1577601433 184.373
7 24.7614409924 160.398
8 25.1349849701 140.487
9 25.5298659801 123.637


#### Evaluation

Lets check overall validation auc, test auc and cold start auc:

In [18]:
vbpr.evaluate(vbpr.val_ratings, sample_size=1000)

(0.65634727, 998.43842)

In [19]:
vbpr.evaluate(vbpr.test_ratings, sample_size=1000)

(0.65708756, 989.82263)

In [20]:
vbpr.evaluate(vbpr.test_ratings, sample_size=1000, cold_start=True)

(0.55550295, 1212.0905)

#### VBPR Comments

It's best to train the model up to `max_iterations` while keeping an eye on the iteration w/ the highest val-auc. Save this model and then evaluate on it. In the above demo we only trained on 10 iterations, however, in practice up to 50+ iterations is ususally requried to converge. Regardless, we can see that VBPR increased overall and cold start AUC.