# Bayesian Personalized Ranking
[BPR: Bayesian Personalized Ranking from Implicit Feedback](https://arxiv.org/ftp/arxiv/papers/1205/1205.2618.pdf)

The paper puts forth something they call “BRP-Opt”, a generic optimization criterion for optimal personalized ranking. Basically what this means is an approach that can be applied to different types of recommendation models like Matrix Factorization or k-Nearest-Neighbour (that's the “generic” part) and that solves a ranking for a set of items for a given user.

Whereas most other optimizers just look at if a user interacted with an item, BPR looks at the user, one item the user interacted with and one item the user did not (the unknown item). This gives us a triplet **(u, i, j)** of a user **(u)**, one known item **(i)** and one unknown item **(j)**.

We can express the relationship between known and unknown items like this:
$$\hat{x}_ui - \hat{x}_uj > 0 $$

Here $\hat{x}$<sub>ui</sub> denotes the score for user 'u' and a known item 'i' and $\hat{x}$<sub>uj</sub> the score for user 'u' and unknown item 'j'. The above condition is satisfied if the score for the known item is larger than that of the unknown one.

### Bayesian Formulation

Posterior probability is proportional to the likelihood multiplied by the prior probability:
<i>Posterior probability ∝ Likelihood x Prior probability</i>

So what we want to do is to use a formulation to update the probability of our hypothesis as more events/information becomes available, i.e we want to maximize the probability of it being true

### A Closer Look At The Math

Using the formula <i>Posterior probability ∝ Likelihood x Prior probability</i> the posterior probability we want to maximize in this case is:

$$p(\theta | >_u) ∝ p(>_u | \theta) x p(\theta)$$
Here $>_u$ is a latent preference structure for user **u** and $\theta$ represents the parameters of some kind of model, like matrix factorization or kNN.

So basically we want to maximize the probability of parameters $\theta$ given a latent preference structure for user $>_u$. The paper shows us that the product of the likelihood $p(>_u | \theta)$ is equal to the product of $p(i <_u j | \theta)$:
![form1](https://cdn-images-1.medium.com/max/1200/1*8ZHgR8pozAH2AK6apzD90A.png)
Next, we define the likelihood that a given user actually prefers the known item i over unknown item j as the following:
![form2](https://cdn-images-1.medium.com/max/1200/1*xPh5Vdd3p5YeNLOOUqVgWA.png)
Here $\sigma$ is the sigmoid function and $\hat{x}_{uij}(\theta)$ is some kind of function that models the relationship between a user, a known item and an unknown item given a set of parameters. BPR does not dictate what function this is and can, therefore, be used with a number of different model classes. In our case this function is the difference between the score of **u and j** subtracted from the score of **u and i**:
![form3](https://cdn-images-1.medium.com/max/1000/1*FTTWeuM33N6GG83JDHBVmg.png)
We can then rewrite it to use the sigmoid function for the likelihood. The paper also suggests using ln sigmoid (natural log) based on their MLE (Maximum Likelihood Evaluation):
![form4](https://cdn-images-1.medium.com/max/1200/1*oS4Ouq1bU7fVKTNOkCynhQ.png)
Remembering the logarithm product rule ( ln(a * b) = ln(a) + ln(b) ) we can now change the whole equation to:
![form5](https://cdn-images-1.medium.com/max/1200/1*oLeyiJCV1bkaOUBEw3_q6A.png)
Based on this we then arrive at the final optimization criterion for our model:
![form6](https://cdn-images-1.medium.com/max/1200/1*ZWcjPtba24-UFqvHveh9uQ.png)
* $\theta$ : Our model parameter vector.
* $\hat{x}_{uij}$ : Relationship between (u, i, j). Here the score of (u, j) subtracted from the score of (u, j).
* **ln** : The natural logarithm.
* $\sigma$ : The logistic sigmoid.
* $\lambda$ : LAmbda, the regularization hyperparameter for our model.
* $||\theta||$ : The L2 norm of our model parameters.



**Important Note :**
Following is an implementation if BPR using *implicit* module. The implicit's implementation of BPR ignores the weight value of the matrix right now - it treats non zero entries as a binary signal that the user liked the item. 

In [0]:
import pandas as pd
from scipy.sparse import coo_matrix
from implicit.bpr import BayesianPersonalizedRanking

In [0]:
from scipy.sparse import csr_matrix

In [6]:
from google.colab import files
uploaded = files.upload()

Saving events.csv to events.csv


In [0]:
events_df = pd.read_csv('events.csv')

In [8]:
events_df.head()

Unnamed: 0,timestamp,visitorid,event,itemid,transactionid
0,1433221332117,257597,view,355908,
1,1433224214164,992329,view,248676,
2,1433221999827,111016,view,318965,
3,1433221955914,483717,view,253185,
4,1433221337106,951259,view,367447,


In [9]:
all_customers = events_df.visitorid.unique()
len(all_customers)

1407580

In [10]:
customer_purchased = events_df[events_df.transactionid.notnull()].visitorid.unique()
len(customer_purchased)

11719

In [11]:
customer_browsed = [x for x in all_customers if x not in customer_purchased]
len(customer_browsed)

1395861

In [0]:
events_df['item_id'] = events_df['itemid'].astype('category').cat.codes
item_lookup = events_df[['item_id','itemid']].drop_duplicates()

In [0]:
events_df.drop(['itemid'],axis=1,inplace=True)

In [0]:
events_df['visitor_id'] = events_df['visitorid'].astype('category').cat.codes
visitor_lookup = events_df[['visitor_id','visitorid']].drop_duplicates()

In [0]:
events_df.drop(['visitorid'],axis=1,inplace=True)

In [0]:
item = []
users = []
data = []
for row in events_df.itertuples():
  item.append(row.item_id)
  users.append(row.visitor_id)
  data.append(1)

In [0]:
row = len(events_df.item_id.unique().tolist())
col = len(events_df.visitor_id.unique().tolist())

In [0]:
data_coo = coo_matrix((data,(item,users)), shape=(row,col))

In [0]:
data_csr = csr_matrix((data,(users,item)), shape = (col,row))

In [19]:
model = BayesianPersonalizedRanking(learning_rate = 0.005,iterations=50)

GPU training requires factor size to be a multiple of 32 - 1. Increasing factors from 100 to 127.


In [20]:
model.fit(data_coo,show_progress=True)

100%|██████████| 50/50 [00:04<00:00, 11.88it/s, correct=52.45%, skipped=0.11%]


In [24]:
events_df.head()

Unnamed: 0,timestamp,event,transactionid,item_id,visitor_id
0,1433221332117,view,,179333,257597
1,1433224214164,view,,125263,992329
2,1433221999827,view,,160653,111016
3,1433221955914,view,,127563,483717
4,1433221337106,view,,185159,951259


In [29]:
events_df[events_df.visitor_id == 2]

Unnamed: 0,timestamp,event,transactionid,item_id,visitor_id
726292,1438970468920,view,,108948,2
728288,1438971657845,view,,163762,2
735202,1438971444375,view,,172691,2
735273,1438970013790,view,,163762,2
737615,1438970905669,view,,172691,2
737711,1438970212664,view,,130961,2
742485,1438971463170,view,,108948,2
742616,1438969904567,view,,163762,2


In [35]:
model.recommend(userid=2,user_items=data_csr,N=10,filter_already_liked_items=True,filter_items=None,recalculate_user=False)

[(23529, 0.25551754),
 (87396, 0.2513472),
 (78691, 0.24466138),
 (61939, 0.24268672),
 (104609, 0.24058555),
 (73316, 0.23925929),
 (210249, 0.23906568),
 (100149, 0.23842876),
 (28612, 0.23562463),
 (191081, 0.23550995)]