# Learning Robust Recommender from Noisy Implicit Feedback

## Executive summary

| | |
| --- | --- |
| Problem | It is of importance to account for the inevitable noises in implicit feedback. However, little work on recommendation has taken the noisy nature of implicit feedback into consideration. |
| Prblm Stmt. | We formulate a denoising recommender training task as $Θ^∗ = \text{argmin}_Θ \mathcal{L}(\textit{denoise}(\bar{D}))$ aiming to learn a reliable recommender model with parameters $Θ^∗$ by denoising implicit feedback $\bar{D}$. Formally, by assuming the existence of inconsistency between $y_{ui}^∗$ and $\bar{y}_{ui}$, we define noisy interactions (a.k.a. false-positive interactions) as $\{(u, i)|y_{ui}^∗ = 0 ∧ \bar{y}_{ui} = 1\}$. |
| Solution | Adaptive Denoising Training (ADT), which adaptively prunes the noisy interactions by two paradigms - Truncated Loss and Reweighted Loss. Furthermore, we consider extra feedback (e.g., rating) as auxiliary signal and employ three strategies to incorporate extra feedback into ADT: fine-tuning, warm-up training, and colliding inference. |
| Dataset | Adressa, Amazon-books, Yelp2018. |
| Preprocessing | We split the dataset into training, validation, and testing sets, and explored two experimental settings: 1) Extra feedback is unavailable during training. To evaluate the performance of denoising implicit feedback, we kept all interactions, including the false-positive ones, in training and validation, and tested the models only on true-positive interactions. 2) Sparse extra feedback is available during training. We assume that partial true-positive interactions have already been known, which will be used to verify the performance of the proposed three strategies: fine-tuning, warm-up training, and colliding inference. |
| Metrics | Recall, NDCG |
| Hyperparams | For GMF and NeuMF, the factor numbers of users and items are both 32. As to CDAE, the hidden size of MLP is set as 200. In addition, Adam is applied to optimize all the parameters with the learning rate initialized as 0.001 and he batch size set as 1,024. As to the ADT strategies, they have three hyper-parameters in total: α and max in the T-CE loss, and β in the R-CE loss. In detail, max is searched in {0.05, 0.1, 0.2} and β is tuned in {0.05, 0.1, ..., 0.25, 0.5, 1.0}. As for α, we controlled its range by adjusting the iteration number N to the maximum drop rate max, and N is adjusted in {1k, 5k, 10k, 20k, 30k}. In colliding inference, the number of neighbors Nu is tuned in {1, 3, 5, 10, 20, 50, 100}, wj is set as 1/|Nu|, and λ is searched in {0, 0.1, 0.2, ..., 1}. We used the validation set to tune the hyper-parameters and reported the performance on the testing set. |
| Models | GMF, NMF, CDAE, {GMF, NMF, CDAE}+T_CE, {GMF, NMF, CDAE}+R_CE |
| Cluster | Python 3.6+, PyTorch |
| Tags | `LossReweighting`, `TruncatedLoss`, `MatrixFactorization`, `Denoising` |
| Credits | Wenjie Wang |

## Methods

![The comparison between normal training (a); two prior solutions (b) and (c); and the proposed denoising training without additional data (d) and with extra feedback (e). Note that red lines in the user-item graph denote false-positive interactions, and extra feedback usually cannot identify all false-positive ones due to the sparsity issue.](https://github.com/RecoHut-Stanzas/S063707/raw/main/images/img1.png)

The comparison between normal training (a); two prior solutions (b) and (c); and the proposed denoising training without additional data (d) and with extra feedback (e). Note that red lines in the user-item graph denote false-positive interactions, and extra feedback usually cannot identify all false-positive ones due to the sparsity issue.

## Model

ADT either discards or reweighs the interactions with large loss values to reduce their influences on the recommender training. Towards this end, we devise two paradigms to formulate loss functions for denoising training without using extra feedback:

- Truncated Loss. This is to truncate the loss values of hard interactions to 0 with a dynamic threshold function.
- Reweighted Loss. It adaptively assigns hard interactions with smaller weights during training.

These two paradigms can be applied to various recommendation loss functions, e.g., CE loss, square loss, and BPR loss. We take CE loss as an example to elaborate them.

### Truncated Cross-Entropy Loss

Functionally speaking, the Truncated Cross-Entropy (shorted as T-CE) loss discards positive interactions according to the values of CE loss. Formally, we define it as:

$$\mathcal{L}_{T-CE}(u,i) = \begin{cases} 0, & \mathcal{L}_{CE}(u,i)>\tau \wedge\bar{y}_{ui}=1 \\ \mathcal{L}_{CE}(u,i), & \text{otherwise}, \end{cases}$$

### Re-weighted Cross-Entropy Loss

Generally, the Re-weighted Cross-Entropy (shorted as R-CE) loss down-weights the hard positive interactions. Formally,

$$\mathcal{L}_{R-CE}(u,i) = \omega(u,i)\mathcal{L}_{CE}(u,i)$$

where $\omega(u, i)$ is a weight function that adjusts the contribution of an interaction to the training objective.

### ADT with Extra Feedback

Although extra feedback (e.g., ratings) is usually sparse, it is reliable to reflect the actual user satisfaction, i.e., indicating the true-positive interactions. We thus further utilize extra feedback for ADT when available. By considering the order of training with implicit feedback and extra feedback, we introduce two training strategies: fine-tuning and warm-up training.

![Illustration of fine-tuning and warm-up training with extra feedback.](https://github.com/RecoHut-Stanzas/S063707/raw/main/images/img2.png)

Illustration of fine-tuning and warm-up training with extra feedback.

## Tutorials

### Training GMF with Truncated and Reweighted Denoising Losses on Yelp Dataset

[direct link to notebook →](https://github.com/RecoHut-Stanzas/S063707/blob/main/nbs/P254192_Training_GMF_with_Truncated_and_Reweighted_Denoising_Losses_on_Yelp_Dataset.ipynb)

![https://github.com/RecoHut-Stanzas/S063707/raw/main/images/process_flow.svg](https://github.com/RecoHut-Stanzas/S063707/raw/main/images/process_flow.svg)

## References

1. [https://github.com/RecoHut-Stanzas/S063707](https://github.com/RecoHut-Stanzas/S063707)
2. [https://arxiv.org/pdf/2112.01160v1.pdf](https://arxiv.org/pdf/2112.01160v1.pdf)
3. [https://github.com/WenjieWWJ/DenoisingRec](https://github.com/WenjieWWJ/DenoisingRec)