# Personalized Ranking for Recommender Systems

- the former sections
  - only **explicit** feedback 
  -  trained and tested on **observed** ratings
- **non-observed** user-item pairs
  - ignored in matrix factorization and autorec
  - **implicit** feedback
  - contains
    - real negative feedback *users are not interested in*
    - missing values *users are not aware of*
- personalized ranking models can be optimized with
  - **pointwise**
    - consider **each** user-item pair as an independent instance
    - a single interaction/rating at a time
      - to predict the rating of a user-item pair
    - matrix factorization and autorec
  - **pairwise** *we introduce two loss in this section*
    - consider **a pair of** items for each user
    - aim to **rank** the positive item higher than the negative item *approximate the optimal ordering*
    -  ranking task
  - **listwise**
    - approximate the ordering of **the entire list** of items
    - Normalized Discounted Cumulative Gain *NDCG*

## Bayesian Personalized Ranking Loss and its Implementation

- **BPR**
  - training data consists of both **positive** and **negative** pairs (missing values)
  - user prefers the positive item over all other non-observed items
  - aims to **maximize** the **posterior probability** 
$$
p(\Theta \mid >_u )  \propto  p(>_u \mid \Theta) p(\Theta)
$$
where
- $\Theta$ is the model parameters
- $>_u$ represents the desired ranking of all items for user $u$
$$
\begin{split}\begin{aligned}
\textrm{BPR-OPT} : &= \ln p(\Theta \mid >_u) \\
         & \propto \ln p(>_u \mid \Theta) p(\Theta) \\
         &= \ln \prod_{(u, i, j \in D)} \sigma(\hat{y}_{ui} - \hat{y}_{uj}) p(\Theta) \\
         &= \sum_{(u, i, j \in D)} \ln \sigma(\hat{y}_{ui} - \hat{y}_{uj}) + \ln p(\Theta) \\
         &= \sum_{(u, i, j \in D)} \ln \sigma(\hat{y}_{ui} - \hat{y}_{uj}) - \lambda_\Theta \|\Theta \|^2
\end{aligned}\end{split}
$$
where
- $D \stackrel{\textrm{def}}{=} \{(u, i, j) \mid i \in I^+_u \wedge j \in I \backslash I^+_u \}$
  - the training set
  - $I^+_u$ is the set of items that user $u$ likes *(positive feedback)*
  - $I$ is the set of all items *(positive and negative feedback)*
  - $I \backslash I^+_u$ is the set of items that user $u$ does not like *(negative feedback)*
- $\hat{y}_{ui}$ is the predicted rating of user $u$ on item $i$
- $\hat{y}_{uj}$ is the predicted rating of user $u$ on item $j$
- $p(\Theta)$ is a normal distribution
  - with zero mean
  - and variance-covariance matrix $\lambda_\Theta I$
![ranking](images/rec-ranking.svg)

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class BPRLoss(nn.Module):
    def __init__(self):
        super(BPRLoss, self).__init__()

    def forward(self, pos, neg):
        loss = -F.logsigmoid(pos - neg)
        return loss.mean()

## Hinge Loss and its Implementation

$$
\sum_{(u, i, j \in D)} \max( m - \hat{y}_{ui} + \hat{y}_{uj}, 0)
$$
where
- $m$ is a margin
  - aims to push neg items further away from the pos items
  - aims to **optimize for relevant distance** between the neg and pos items
    - instead of absolute outputs

In [None]:
class HingeLoss(nn.Module):
    def __init__(self):
        super(HingeLoss, self).__init__()

    def forward(self, pos, neg, margin=1.0):
        loss = F.relu(neg - pos + margin)
        return loss.mean()