# Recipe Recommendation System
### Could it recommend a subtitle too?

## part 1 - Research

### 0. Tensorflow recommenders
https://www.tensorflow.org/recommenders/examples/quickstart

### 1. Papers with code

https://paperswithcode.com/task/recommendation-systems

The Recommendation Systems task is to produce a list of recommendations for a user. The most common methods used in recommender systems are factor models (Koren et al., 2009; Weimer et al., 2007; Hidasi & Tikk, 2012) and neighborhood methods (Sarwar et al., 2001; Koren, 2008). Factor models work by decomposing the sparse user-item interactions matrix to a set of d dimensional vectors one for each item and user in the dataset. Factor models are hard to apply in session-based recommendations due to the absence of a user profile. On the other hand, neighborhood methods, which rely on computing similarities between items (or users) are based on co-occurrences of items in sessions (or user profiles). Neighborhood methods have been used extensively in session-based recommendations.


+ datasets, benchmarks, models, top implemented papers

### 2. Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)

[papers with code link](https://paperswithcode.com/paper/recommendation-as-language-processing-rlp-a)


[axiv link](https://arxiv.org/pdf/2203.13366v7.pdf)

###### ABSTRACT

- In P5, all data such as user-item interactions, user descriptions, item metadata, and user reviews are converted to a common format — natural language sequences

- learns
different tasks with the same language modeling objective during
pretraining. Thus, it serves as the foundation model for various
downstream recommendation tasks, allows easy integration with
other modalities, and enables instruction-based recommendation
based on prompts

- make predictions in a zero-shot or few-shot manner

- also
hosted on Hugging Face at https://huggingface.co/makitanikaze/P5

###### 2.1 INTRODUCTION

In retrospect, we can summarize the development trend of modern recommender systems – towards a more
comprehensive system that accommodates diverse features and a
wide spectrum of application scenarios.
On one hand, feature engineering and learning in recommender
systems has evolved greatly from simple to complex. In early ages,
recommender systems typically adopt __logistic regression or collaborative filtering__ [25, 35, 50, 52] which utilize user-item interaction
records to model users’ behavioral patterns. Later on, the contextual
features such as user profile and item metadata are further integrated into the system through more sophisticated models such as
__factorization machines__ [48] and __GBDT__ [20]. Recently, deep neural
network models [3, 5, 19, 74] facilitate crossing and combination
among even more diverse and sophisticated features. As a result,
these models gain better representation ability compared with traditional feature engineering based approaches.
On the other hand, more recommendation tasks have emerged.
Except for classical rating prediction and direct user-item matchingbased recommendation tasks, recent works are broadening the spectrum to new tasks and scenarios such as __sequential recommendation__
[21, 60, 63, 80], __conversational recommendation__ [8, 61, 76], __explainable recommendation__ [17, 31, 62, 70, 75, 77] and so on. While the
approaches to the aforementioned recommendation tasks are often
proposed separately, there is an evident trend of utilizing multiple
recommendation tasks to jointly learn the transferable representations [31, 56, 57, 72]. Although existing recommender systems
achieved great success, there is still a considerable gap between
current solutions and the foreseeable intersection of the aforementioned trends – a comprehensive recommender system that can
accommodate diverse features and different types of tasks. Since
recommendation tasks usually share a common user–item pool and
have overlapping contextual features, we believe it is promising to
merge even more recommendation tasks into a unified framework
so that they can implicitly transfer knowledge to benefit each other
and enable generalization to other unseen tasks.
Inspired by the recent progress in __multitask prompt-based training__ [1, 51, 67], in this work, we propose a unified “Pretrain, Personalized Prompt & Predict Paradigm” (denoted as P5). We show that P5
is possible to learn multiple recommendation related tasks together
through a unified sequence-to-sequence framework by formulating these problems as prompt-based natural language tasks, where
user–item information and corresponding features are integrated
with personalized prompt templates as model inputs.

##### 2.2 RELATED WORK

- Unified Frameworks
- Prompt Learning
- NLP for Recommendation
- Zero-shot and Cold Start Recommendation

##### Dataset

from Amazon1 dataset - sports, beauty, toys, yelp

##### Into the rabbithole

+ [48] Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference
on data mining. IEEE, 995–1000. 
[factorization machines abstract](https://ieeexplore.ieee.org/document/5694074)   
_Punchline_ - __Factorization machines__ new model class that combines the advantages of Support Vector Machines (SVM) with factorization models. Like SVMs, FMs are a general predictor working with any real valued feature vector. In contrast to SVMs, FMs model all interactions between variables using factorized parameters. Thus they are able to estimate interactions even in problems with huge sparsity (like recommender systems) where SVMs fail. 

+ [20] Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine
Atallah, Ralf Herbrich, Stuart Bowers, et al. 2014. Practical lessons from predicting
clicks on ads at facebook. In Proceedings of the Eighth International Workshop on
Data Mining for Online Advertising. 1–9.                            
_Punchline_ - __GBDT - Gradient-boosted decision trees__ model which combines decision
trees with logistic regression to predict clicks on ads in facebook 

+ [3] Hanxiong Chen, Shaoyun Shi, Yunqi Li, and Yongfeng Zhang. 2021. Neural
collaborative reasoning. In Proceedings of the Web Conference 2021. 1516–1527.



### 3. Neural Collaborative Reasoning

Hanxiong Chen, Shaoyun Shi, Yunqi Li, and Yongfeng Zhang. 2021. Neural
collaborative reasoning. In Proceedings of the Web Conference 2021. 1516–1527.

[axiv link](https://arxiv.org/pdf/2005.08129.pdf)



#### Intro
Existing Collaborative Filtering (CF) methods are mostly designed
based on the idea of matching, i.e., by learning user and item embeddings from data using shallow or deep models, they try to capture
the associative relevance patterns in data, so that a user embedding
can be matched with relevant item embeddings using designed or
learned similarity functions. However, as a cognition rather than a
perception intelligent task, recommendation requires not only the
ability of pattern recognition and matching from data, but also the
ability of cognitive reasoning in data

To solve the problem, we propose
a modularized reasoning architecture, which learns logical operations such as AND (∧), OR (∨) and NOT (¬) as neural modules for
implication reasoning (→).



#### 3.1 INTRODUCTION

Collaborative Filtering (CF) is an important approach to recommender systems [12, 45]. By leveraging the wisdom of crowd, CF methods predict a user’s future preferences based on his or her
previous records. Many existing CF methods are designed based on
the fundamental idea of similarity matching, with either designed
or learned matching functions, as illustrated in Figure 1(a). For
example, early CF algorithms, such as User-based CF [44] and Itembased CF [49], consider the row and column vectors in the original
user-item rating matrix as the user and item representations (i.e.,
embedding), and a manually designed weighted average function is
used as the matching function 𝑓 (·) to calculate the relevance score
between each user𝑢 and a candidate item𝑣. The advance of machine
learning has further extended CF methods for improved accuracy.
One prominent example is Matrix Factorization (MF) techniques
for CF [30], which takes inner product as the matching function
𝑓 (·), and learns the user and item embeddings in the inner product
space to fit ground-truth user-item interactions.
Researchers have further explored CF under the similarity matching framework. One approach is to learn better embeddings. For
example, context-aware CF integrates context information such as
time and location to learn informative embeddings [1, 25, 29], and
heterogeneous information sources can be used to enrich the embeddings [54], such as text [58], image [19], and knowledge graphs
[2, 52]. We can also explicitly consider a user’s behavior history
to learn better embeddings (Figure 1(b)), such as in sequential recommendation [6, 21, 24, 32]. Another approach is to learn better
matching functions. For example, using vector translation instead
of inner product for matching [18], or learning the matching function based on metric learning [22] and neural networks [7, 20, 51].
However, whether complex neural matching functions are better
than simple matching functions is controversial [8, 9, 14, 43].
Similarity matching-based CF methods have been adopted in
many real-world recommender systems. However, as a cognition
rather than a perception task, recommendation requires not only
the ability of pattern learning and matching, but also the ability of
cognitive reasoning, because a user’s future behavior may not be
simply driven by its similarity with the user’s previous behaviors,
but instead by the user’s cognitive reasoning procedure about what
to do next. For example, if a user has purchased a laptop before, this
does not lead to the user purchasing similar laptops in the future,
rather, one would expect the user to purchase further equipment
such as a laptop bag. Such a reasoning procedure may exhibit certain
logical structures, such as (𝑎 ∨𝑏) ∧ ¬𝑐 → 𝑣, as shown in Figure 1(c),
which means that if the user likes 𝑎 or 𝑏, and does not like 𝑐, then
he/she would probability like 𝑣. In a broader sense, the community
has realized the importance of advancing AI from perception to
cognition tasks [4, 34, 50]. As a representative cognitive reasoning
task, we hope an intelligent recommendation system would be able
to conduct logical reasoning over the data to predict user’s future
behaviors for personalized recommendation.


![image.png](attachment:image.png)

#### 3.2 RELATED WORK

Collaborative Filtering (CF) has been an important approach to
recommender systems. Due to its long-time research history and the wide scope of literature, it is hardly possible to cover all CF algorithms, so we review some representative methods in this section,
and a more comprehensive review can be seen in [12, 53, 55].
Early approaches to CF consider the user-item rating matrix and
conduct rating prediction with user-based [27, 44] or item-based
[33, 49] collaborative filtering methods. With the development of
dimension reduction methods, latent factor models such as matrix
factorization are later widely adopted in recommender systems,
such as singular value decomposition [30], non-negative matrix
factorization [31], and probabilistic matrix factorization [38]. In
these approaches, each user and item is learned as a latent vector
to calculate the matching score of the user-item pairs.
Recently, the development of deep learning and neural network
models has further extended collaborative filtering methods for
recommendation. The relevant methods can be broadly classified
into two sub-categories: similarity learning approach, and representation learning approach. The similarity learning approach adopts
simple user/item representations (such as one-hot) and learns a complex matching function (such as a prediction network) to calculate
user-item matching scores [7, 18, 20, 22, 51], while the representation learning approach learns rich user/item representations and
adopts a simple matching function (e.g., inner product) for efficient
matching score calculation [2, 35, 52, 54, 58]. However, there exist
debates over whether complex matching functions are better than
simple functions [8, 9, 14, 43]. Another important direction is learning to rank for recommendation, which learns the relative ordering
of items instead of the absolute preference scores. A representative
method is Bayesian personalized ranking (BPR) [42], which is a
pair-wise learning to rank method. It is also further generalized to
take other information sources such as images [19].
Although many CF approaches have been developed for recommendation tasks, existing methods mostly model recommendation
as a perception task based on similarity matching instead of a cognition task based on cognitive/logical reasoning. However, users’
future behaviors may not be simply driven by the similarity with
their previous behavior, but a concrete reasoning procedure about
what to do next. Integrating logical reasoning and neural networks
has been considered in several research contexts. According to [5],
connectionism in AI can date back to 1943 [36], which is arguably
the first neural-symbolic system for Boolean logic. More recently,
it is shown that argumentation frameworks, abductive reasoning,
and normative multi-agent systems can also be represented by
neural symbolic frameworks [5, 10, 11, 15, 23]. Another approach
to integrating machine learning and logical reasoning is Markov
logic networks [41, 46, 56], which combines probabilistic graphical
models with first-order logic. It leverages domain knowledge and
logic rules to learn graph structure for inference, which is effective
for reasoning on knowledge graphs [41].
The most related work to ours is neural logic reasoning [50],
which adopts neural logic modules for solving logical equations
and (non-personalized) recommendation. However, our work is
different on three aspects: we build neural models for logical reasoning based on the implication form of Horn clauses, which is a
more natural way of making logical predictions in recommendation
tasks; we develop a personalized recommendation model while the
model in [50] can only conduct non-personalized recommendation;

### 4. RECipe: Does a Multi-Modal Recipe Knowledge Graph Fit a Multi-Purpose Recommendation System?
ALI PESARANGHADER∗ and TOUQIR SAJED, LG Electronics, Toronto AI Lab, Canada

from DS

[axiv link](https://arxiv.org/pdf/2308.04579.pdf)

Over the past two decades, recommendation systems (RSs) have used machine learning (ML) solutions to recommend items, e.g.,
movies, books, and restaurants, to clients of a business or an online platform. Recipe recommendation, however, has not yet received
much attention compared to those applications. We introduce RECipe as a multi-purpose recipe recommendation framework with
a multi-modal knowledge graph (MMKG) backbone. The motivation behind RECipe is to go beyond (deep) neural collaborative
filtering (NCF) by recommending recipes to users when they query in natural language or by providing an image. RECipe consists of 3
subsystems: (1) behavior-based recommender, (2) review-based recommender, and (3) image-based recommender. Each subsystem
relies on the embedding representations of entities and relations in the graph. We first obtain (pre-trained) embedding representations
of textual entities, such as reviews or ingredients, from a fine-tuned model of Microsoft’s MPNet. We initialize the weights of the
entities with these embeddings to train our knowledge graph embedding (KGE) model. For the visual component, i.e., recipe images,
we develop a KGE-Guided variational autoencoder (KG-VAE) to learn the distribution of images and their latent representations.
Once KGE and KG-VAE models are fully trained, we use them as a multi-purpose recommendation framework. For benchmarking, we
created two knowledge graphs (KGs) from public datasets on Kaggle for recipe recommendation. Our experiments show that the KGE
models have comparable performance to the neural solutions. We also present pre-trained NLP embeddings to address important
applications such as zero-shot inference for new users (or the cold start problem) and conditional recommendation with respect to recipe
categories. We eventually demonstrate the application of RECipe in a multi-purpose recommendation setting.

#### 4.1. Introduction

Online and e-commerce platforms, e.g., Netflix, Amazon and Tripadvisor, have benefited from recommendation systems
(RSs) to recommend items, e.g., movies, books, clothes, or restaurants, to their clients for the past two decades, if not
longer. Recipe recommendation, however, has not yet received much attention compared to those domains by the
research community, despite the fact that there are uncountable resources for various foods, recipes, and cuisines on
social media such as Instagram1
, Pinterest2
, and other platforms like Food.com3
and Allrecipes.com4
. Such abundance
is implicitly indicative of the interest in as well as the need for recipe recommendation systems. Predictably, food
recommendation systems will eventually become an inseparable part of our daily life. In other words, RSs will help us
search for recipes, walk us through their instructions, and prepare a meal while they consider our personal preferences,
diet and health. They can even potentially assist professional chefs in serving their clients better than ever. Therefore,
the importance of developing recipe recommendation solutions is undebatable.

We focus on three RS settings in this work5
: (1) behavior-based recommendation6
, (2) text-based recommendation,
and (3) image-based recommendation. The behavior-based algorithms learn (implicit or explicit) patterns from users’
past behaviours, e.g., viewed, purchased, or liked, to recommend new items. Content-based and collaborative filtering, as
examples of behavior-based solutions, are commonly used in the RS applications [3, 13, 14, 16, 25, 46, 53, 57]. A text-based
recommender retrieves relevant items by matching an input query, or a profile, with the textual information of items,
e.g., news titles or movie descriptions [6, 19, 22, 30, 54]. Review-based solutions, as a subdiscipline of text-based RS,
benefit from valuable information in (textual) reviews for personalization and recommendation [1, 2, 10, 32, 38, 39, 55].
For image-based recommendation, a user may upload an image with the intention of item recognition and retrieval
of similar items, potentially along with their (descriptive) information [5, 26, 27, 31, 47, 56]. Recall that our primary
interest is recipe recommendation; therefore, we explain the foundation of our work for that domain hereafter.
To the best of our knowledge, there has been no solution that handles the aforementioned recommendation settings
holistically in one place, particularly for recommending recipes. Knowledge graphs (KGs) are a potential solution to
connect all these tasks with different modalities (e.g., structured data, texts, and images) for multi-purpose recommendation. We consider an RS as a multi-purpose solution if it accomplishes behavior-based, text-based, and image-based
recommendations together7
. In this work, we introduce RECipe as a multi-purpose recipe recommendation system that
benefits from KGs with multi-modalities8
. We summarize our key contributions below.
(1) We introduce RECipe as a multi-purpose recipe recommendation solution.
(2) We present two knowledge graphs (KGs) for recipe recommendation or retrieval. The KGs can be considered as
benchmarks for future research work.
(3) We address zero-shot inference for new users (or the cold start problem) by employing and aligning pre-trained
NLP embeddings to obtain their initial embedding representations.
(4) We introduce conditional recommendation with respect to categories of recipes to improve overall accuracy
considering various ranking measures.
(5) We present the RECipe applications in behavior-based, review-based, and image-based recommendations.
The remainder of this paper is organized as follows. We study related works to the recipe recommendation or retrieval
tasks in Section 2. We present our RECipe framework and its components in Section 3. In Section 4, we define our
research questions, then conduct extensive experiments for the multi-purpose recommendations. Finally, we conclude
the paper and discuss potential future work in Section 5