<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/PreferredAI/tutorials/blob/master/multimodal-recsys/02_multimodality.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/PreferredAI/tutorials/blob/master/multimodal-recsys/02_multimodality.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

# Multimodality (Part 2)

While preference data in the form of user-item interactions are the backbone of many recommender systems, such data tends to be sparse in nature. One way to address this sparsity is to look beyond the interaction data to the additional information associated with users or with items. The intuition is that items with similarity in "content profiles" would be correlated with similarity in preferences. Multimodality deals with how to model both preference data (one modality) and some content data either on user or item side (other modalities). In this tutorial, we see three forms of additional modalities, namely graph, text, and image, and investigate whether they add value to the the resulting recommendations.

## 1. Setup

In [1]:
!pip install --quiet cornac==1.14.0

[K     |████████████████████████████████| 12.4 MB 31 kB/s 
[?25h

In [2]:
import os
import sys

import cornac
from cornac.utils import cache
from cornac.datasets import filmtrust, amazon_clothing
from cornac.eval_methods import RatioSplit
from cornac.models import PMF, SoRec, WMF, CTR, BPR, VBPR
from cornac.data import GraphModality, TextModality, ImageModality
from cornac.data.text import BaseTokenizer

%tensorflow_version 1.x
import tensorflow as tf

print(f"System version: {sys.version}")
print(f"Cornac version: {cornac.__version__}")
print(f"Tensorflow version: {tf.__version__}")

SEED = 42
VERBOSE = True

TensorFlow 1.x selected.
System version: 3.7.12 (default, Sep 10 2021, 00:21:48) 
[GCC 7.5.0]
Cornac version: 1.14.0
Tensorflow version: 1.15.2


## 2. Graph Modality

In recommender systems, graph can be used to represent user social network or item contexts (e.g., co-views, co-purchases).  In this tutorial, we take the former as an example and discuss SoRec [2], a representative model for this class of algorithms. 


### Social Recommendation (SoRec)

SoRec model is based on matrix factorization framework. The idea is fusing user-item rating matrix with the user’s social network.  In summary, the *user-item rating matrix* ($R$) and the *user-user graph adjacency matrix* ($G$) are factorized with shared users' latent factors.  The user latent vectors in $\mathbf{U}$ are shared to capture both user preferences as well as their social connections.  The rating prediction is obtained as $\hat{r}_{i,j} = \mathbf{u}_i^T \mathbf{v}_j$, similarly to PMF model.

To learn the model parameters, we minimize the following loss function:

$$ \mathcal{L}(\mathbf{U,V,Z}|\lambda,\lambda_C) = \frac{1}{2} \sum_{r_{i,j} \in \mathcal{R}} (r_{i,j} - \mathbf{u}_i^T \mathbf{v}_j)^2 + \frac{\lambda_C}{2} \sum_{g_{i,h} \in \mathcal{G}} (g_{i,h} - \mathbf{u}_i^T \mathbf{z}_h)^2 + \frac{\lambda}{2} \sum_{i=1}^{N} ||\mathbf{u}_i||^2 + \frac{\lambda}{2} \sum_{j=1}^{M} ||\mathbf{v}_j||^2 + \frac{\lambda}{2} \sum_{h=1}^{N} ||\mathbf{z}_h||^2 $$

where $\lambda_C$ is the relative importance of the social network factorization and $\lambda$ is the regularization weight. 

Let's do a comparison between SoRec and its base model PMF on [FilmTrust dataset](http://konect.cc/networks/librec-filmtrust-trust/).

In [3]:
K = 20
sorec = SoRec(k=K, max_iter=50, learning_rate=0.001, verbose=VERBOSE, seed=SEED)
pmf = PMF(k=K, max_iter=50, learning_rate=0.001, lambda_reg=0.01, verbose=VERBOSE, seed=SEED)

ratings = filmtrust.load_feedback()
trust = filmtrust.load_trust()

user_graph_modality = GraphModality(data=trust)

ratio_split = RatioSplit(
    data=ratings,
    test_size=0.2,
    rating_threshold=2.5,
    exclude_unknowns=True,
    user_graph=user_graph_modality,
    verbose=VERBOSE,
    seed=SEED,
)

mae = cornac.metrics.MAE()

cornac.Experiment(eval_method=ratio_split, models=[sorec, pmf], metrics=[mae]).run()

Data from https://static.preferred.ai/cornac/datasets/filmtrust/filmtrust.zip
will be cached into /root/.cornac/filmtrust/ratings.txt


0.00B [00:00, ?B/s]

Unzipping ...
File cached!





TEST:
...
      |    MAE | Train (s) | Test (s)
----- + ------ + --------- + --------
SoRec | 0.6644 |    1.5542 |   0.6820
PMF   | 0.7018 |    1.1516 |   0.6841



From the experiment, we see that SoRec achieves lower (better) MAE score as compared to PMF.  This improvement should be explained by useful information from user social network captured inside the model predictions.

## 3. Text Modality

Often times, we are interested in building a recommender system for textual items (e.g., news, scientific papers), or items associated with text (e.g., titles, descriptions, reviews).  Text is informative and descriptive, therefore, exploiting textual information for better recommendations is an important topic in recommender systems.  In this tutorial, we introduce CTR [3], a recommendation model that combines matrix factorization and probablistic topic modeling. 



### Collaborative Topic Regression (CTR)

Under factorization framework, adoption prediction is in the form of $\hat{r}_{i,j} = \mathbf{u}_i^T \mathbf{v}_j $.  The intuition in CTR model is that two items with similar topics would behave similarly. Thus, item latent factors $\mathbf{v_j}$ is assumed to be drawn from a Normal distribution:

$$
\mathbf{v}_j \sim \mathcal{N}(\mathbf{\theta}_j, \lambda^{-1} \mathbf{I})
$$

where the mean $\mathbf{\theta}_j$ is a vector indicating topic proportions of the item $j$. It is equivalent to:

\begin{align}
\mathbf{v}_j &= \mathbf{\theta}_j + \mathbf{\epsilon}_j \\
\mathbf{\epsilon}_j &\sim \mathcal{N}(\mathbf{0}, \lambda^{-1} \mathbf{I})
\end{align}

Please refer to paper [3] for the generative process of CTR model.


CTR also extends matrix factorization, in which the base model is WMF under implicit feedback setting. The adoption $p_{i,j}$ and confidence $c_{i,j}$ are defined as follows: 

\begin{equation}
p_{i,j} = 
\begin{cases} 
r_{i, j} &\mbox{if } r_{i,j} > 0 \\
0 & \mbox{otherwise} 
\end{cases}
\end{equation}


\begin{equation}
c_{i,j} = 
\begin{cases} 
a & \mbox{if } r_{i,j} > 0 \\
b & \mbox{otherwise }
\end{cases}
\end{equation}

The learning of CTR model is done via minimizing the following negative log-likelihood:

$$ \mathcal{L}(\mathbf{U,V,\theta, \beta}|\lambda) = \frac{1}{2} \sum_{i,j} c_{i,j} (p_{i,j} - \mathbf{u}_i^T \mathbf{v}_j)^2 - \sum_{j}\sum_{n} \log \big( \sum_{k=1}^K \mathbf{\theta}_{j,k} \mathbf{\beta}_{k,w_{jn}} \big) + \frac{\lambda}{2} \sum_{i=1}^{N} ||\mathbf{u}_i||^2 + \frac{\lambda}{2} \sum_{j=1}^{M} (\mathbf{v}_j - \mathbf{\theta}_j)^T (\mathbf{v}_j - \mathbf{\theta}_j) $$

It is an iterative procedure of alternating between three steps:
- Optimize for user and item latent vectors, $\mathbf{u}_i$ and $\mathbf{v}_j$, based on the current topic proportions $\mathbf{\theta}_j$.  
- Optimize for topic proportions $\mathbf{\theta}_j$ based on the current vectors $\mathbf{u}_i$ and $\mathbf{v}_j$ and topic words $\mathbf{\beta}_k$.
- Optimize for topic words $\mathbf{\beta}_k$ based on the current topic proportions $\mathbf{\theta}_i$.

Let's experiment with two models CTR and WMF on a dataset from Amazon Clothing category.  Using this dataset, CTR will learn topics from item description.  

In [4]:
K = 20
ctr = CTR(k=K, max_iter=50, a=1.0, b=0.01, lambda_u=0.01, lambda_v=0.01, verbose=VERBOSE, seed=SEED)
wmf = WMF(k=K, max_iter=50, a=1.0, b=0.01, learning_rate=0.005, lambda_u=0.01, lambda_v=0.01, 
          verbose=VERBOSE, seed=SEED)

ratings = amazon_clothing.load_feedback()
docs, item_ids = amazon_clothing.load_text()

item_text_modality = TextModality(
    corpus=docs,
    ids=item_ids,
    tokenizer=BaseTokenizer(sep=" ", stop_words="english"),
    max_vocab=5000,
    max_doc_freq=0.5,
)

ratio_split = RatioSplit(
    data=ratings,
    test_size=0.2,
    rating_threshold=4.0,
    exclude_unknowns=True,
    item_text=item_text_modality,
    verbose=VERBOSE,
    seed=SEED,
)

rec_50 = cornac.metrics.Recall(50)

cornac.Experiment(eval_method=ratio_split, models=[ctr, wmf], metrics=[rec_50]).run()

Data from https://static.preferred.ai/cornac/datasets/amazon_clothing/rating.zip
will be cached into /root/.cornac/amazon_clothing/rating.txt


0.00B [00:00, ?B/s]

Unzipping ...
File cached!
Data from https://static.preferred.ai/cornac/datasets/amazon_clothing/text.zip
will be cached into /root/.cornac/amazon_clothing/text.txt


0.00B [00:00, ?B/s]

Unzipping ...
File cached!

TEST:
...
    | Recall@50 | Train (s) | Test (s)
--- + --------- + --------- + --------
CTR |    0.2239 |  136.8923 |   1.3036
WMF |    0.1582 |   20.6115 |   1.3306



The results show that CTR model performs significantly better than WMF model in terms of Recall@50, which is due to the contribution of items' textual information.

## 4. Image Modality

In some contexts, item images are informative (e.g., fashion). With the existence of effective methods to learn image representation, using item images in recommender systems is gaining popularity. In this tutorial, we present VBPR [4], a recommendation model making use of item image features extracted from pre-trained Convolutional Neural Network (CNN).

### Visual Bayesian Personalized Ranking (VBPR)

VBPR, which is also based on matrix factorization, is an extension of BPR model.  The novelty of VBPR is on how item visual features incorporated into the matrix factorization framework.  The preference score user $i$ giving to item $j$ is predicted as follows:

$$
\hat{r}_{i,j} = \alpha + b_i + b_j + \mathbf{u}_i^T \mathbf{v}_j + \mathbf{p}_{i}^T(\mathbf{E} \times \mathbf{f}_j) + \mathbf{\Theta}^T \mathbf{f}_j
$$

where:
- $\alpha, b_i, b_j$ are global bias, user bias, and item bias, respectively
- $\mathbf{u}_i \in \mathbb{R}^K$ and $\mathbf{v}_j \in \mathbb{R}^K$ are user and item latent vectors, respectively
- $\mathbf{f}_j \in \mathbb{R}^D$ is the item image feature vector
- $\mathbf{p}_i \in \mathbb{R}^Q$ is user visual preference, and $(\mathbf{E} \times \mathbf{f}_j) \in \mathbb{R}^Q$ is item visual representation with $\mathbf{E} \in \mathbb{R}^{K \times D}$ is the projection from visual feature space into preference space
- $\mathbf{\Theta} \in \mathbb{R}^D$ is global visual bias vector

Learning parameters of VBPR model can be done, similarly to BPR, via minimizing the following negative log-likelihood:

$$ \mathcal{L}(\mathbf{U,V,b,E,\Theta, P}|\lambda) = \sum_{(j >_i l) \in \mathbf{S}} \ln (1 + \exp\{-(\hat{r}_{i,j} - \hat{r}_{i,l})\}) + \frac{\lambda}{2} \sum_{i=1}^{N} (||\mathbf{u}_i||^2 + ||\mathbf{p}_i||^2) + \frac{\lambda}{2} \sum_{j=1}^{M} (b_j + ||\mathbf{v}_j||^2) + \frac{\lambda}{2} ||\mathbf{\Theta}||^2 + \frac{\lambda}{2} ||\mathbf{E}||^2_2 $$

Noted that global bias $\alpha$ and user bias $b_i$ do not affect the ranking of items, thus they are redundant and removed from the model parameters.

Let's compare VBPR and BPR models with an experiment on Amazon Clothing dataset.


In [5]:
K = 10
vbpr = VBPR(k=K, k2=K, n_epochs=50, batch_size=100, learning_rate=0.001,
            lambda_w=1, lambda_b=0.01, lambda_e=0.0, use_gpu=True, verbose=VERBOSE, seed=SEED)
bpr = BPR(k=(K * 2), max_iter=50, learning_rate=0.001, lambda_reg=0.001, verbose=VERBOSE, seed=SEED)

ratings = amazon_clothing.load_feedback()
img_features, item_ids = amazon_clothing.load_visual_feature()

item_image_modality = ImageModality(features=img_features, ids=item_ids, normalized=True)

ratio_split = RatioSplit(
    data=ratings,
    test_size=0.2,
    rating_threshold=4.0,
    exclude_unknowns=True,
    item_image=item_image_modality,
    verbose=VERBOSE,
    seed=SEED,
)

auc = cornac.metrics.AUC()

cornac.Experiment(eval_method=ratio_split, models=[vbpr, bpr], metrics=[auc]).run()

Data from https://static.preferred.ai/cornac/datasets/amazon_clothing/image.zip
will be cached into /root/.cornac/amazon_clothing/image_features.npy


0.00B [00:00, ?B/s]

Unzipping ...
File cached!
Data from https://static.preferred.ai/cornac/datasets/amazon_clothing/item_ids.zip
will be cached into /root/.cornac/amazon_clothing/item_ids.txt


0.00B [00:00, ?B/s]

Unzipping ...
File cached!
Optimization finished!

TEST:
...
     |    AUC | Train (s) | Test (s)
---- + ------ + --------- + --------
VBPR | 0.7053 |   54.9758 |   1.5157
BPR  | 0.5992 |    0.0911 |   1.3775



The results show that VBPR obtains higher performance than BPR in terms of AUC. That can be attributed to the usage of item visual features.

## References

1.   Ma, H., Yang, H., Lyu, M. R., & King, I. (2008, October). Sorec: social recommendation using probabilistic matrix factorization. In Proceedings of the 17th ACM conference on Information and knowledge management (pp. 931-940).
2.   Wang, C., & Blei, D. M. (2011, August). Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 448-456).
3.   He, R., & McAuley, J. (2016, February). VBPR: visual bayesian personalized ranking from implicit feedback. In Thirtieth AAAI Conference on Artificial Intelligence.
4.   Salah, A., Truong, Q. T., & Lauw, H. W. (2020). Cornac: A Comparative Framework for Multimodal Recommender Systems. J. Mach. Learn. Res., 21, 95-1. https://cornac.preferred.ai
