# <font color=red>Content</font>

THEORY

1. [Reccomender systems](#1)
    - 1.1 [Introduction](#1.1)
    - 1.2 [Types of classic recommender systems](#1.2)
        - 1.2.1 [Collabarative recommendation systems](#1.2.1)
            - 1.2.1.1 [Memomory-based recommender systems](#1.2.1.1)
            - 1.2.1.2 [Model-based recommender systems](#1.2.1.2)
                - 1.2.1.2.1 [Singular Value decomposition (SVD) in recommender systems](#1.2.1.2.1)
        - 1.2.2 [Demographic recommender systems](#1.2.2)
        - 1.2.3 [Content-based recommender systems](#1.2.3)
        - 1.2.4 [Utility-based recommender systems](#1.2.4)
        - 1.2.5 [Knowledge-based recommender systems](#1.2.5)
    - 1.3 [Comparing recommendation techniques](#1.3)
    - 1.4 [Hybrid recommender systems](#1.4)
    - 1.5 [Evaluating recommender systems](#1.5)
    - 1.6 [Recall and Precision at k for Recommender Systems](#1.6)
    
    ********
    
    - 1.7 [Novelty in Recommender System](#1.7)
        

<a id=1.1></a>

****

# 1. Recommender systems

## 1.1 Introduction

Recommender systems represent user preferences for the purpose of suggesting items to purchase or examine. They have become fundamental applications in electronic commerce and information access, providing suggestions that effectively prune large information spaces so that users are directed toward those items that best meet their needs and preferences. A variety of techniques have been proposed for performing recommendation, including content-based, collaborative, knowledge-based and other technique

<a id=1.2></a>

## 1.2 Types of classic recommender systems.

<a id=1.2.1></a>

### 1.2.1 Collaborative

**Collaborative** - recommendation is probably the most familiar, most widely implemented and most mature of the technologies. Collaborative recommender systems aggregate ratings or recommendations of objects, recognize commonalities between users on the basis of their ratings, and generate new recommendations based on inter-user comparisons.A typical user profile in a collaborative system consists of a vector of items and their ratings, continuously augmented as the user interacts with the system over time. Some systems used time-based discounting of ratings to account for drift in user interests (Billsus & Pazzani, 2000; Schwab, et al. 2001). 

In some cases, ratings may be binary (like/dislike) or real-valued indicating degree of preference. These systems can be either **memory-based**, comparing users against each other directly using correlation or other measures, or model-based, in which a model is derived from the historical rating data and used to make predictions. **Model-based** recommenders have used a variety of learning techniques including neural networks (Jennings & Higuchi, 1993), latent semantic indexing (Foltz, 1990), and Bayesian networks (Condliff, et al. 1999).

Unlike **content-based recommendation system**, which requires to define a feature vector before training. CF computes the similarities of items and then give recommendations based on that. All we need is the ratings from users to products, we are good to go.

![](../pics/33.png)

* before considering next types lets go deeper for what is the *memory-based* and *model-based* algorithms
    

![](../pics/34.png)


<a id=1.2.1.1></a>

### 1.2.1.1 Memory-Based Collaborative Filtering
> **Memory-Based Collaborative Filtering** approaches can be divided into two main sections: user-item filtering and item-item filtering. A user-item filtering takes a particular user, find users that are similar to that user based on similarity of ratings, and recommend items that those similar users liked. In contrast, item-item filtering will take an item, find users who liked that item, and find other items that those users or similar users also liked. It takes items and outputs other items as recommendations.
Item-Item Collaborative Filtering: “Users who liked this item also liked 
User-Item Collaborative Filtering: “Users who are similar to you also liked ..."

The key difference of memory-based approach from the model-based techniques (hang on, will be discussed in next paragraph) is that we are not learning any parameter using gradient descent (or any other optimization algorithm). The closest user or items are calculated only by using Cosine similarity or Pearson correlation coefficients, which are only based on arithmetic operations.

As stated in above paragraph, the techniques where we don’t use parametric machine learning approach are classified as Memory based techniques. Therefore, non parametric ML approaches like KNN (clustering) should also come under Memory based approach.

> A common distance metric is cosine similarity. The metric can be thought of geometrically if one treats a given user’s (item’s) row (column) of the ratings matrix as a vector. For user-based collaborative filtering, two users’ similarity is measured as the cosine of the angle between the two users’ vectors. For users u and u′, the cosine similarity is:

As no training or optimization is involved, it is an easy to use approach. But its performance decreases when we have sparse data which hinders scalability of this approach for most of the real-world problems.


<a id=1.2.1.2></a>

### 1.2.1.2 Model-Based Collaborative Filtering

In this approach, CF models are developed using machine learning algorithms to predict user’s rating of unrated items. 

![](../pics/36.png)

* **Clustering** based algorithm (KNN): The idea of clustering is same as that of memory-based recommendation systems. In memory-based algorithms, we use the similarities between users and/or items and use them as weights to predict a rating for a user and an item. The difference is that the similarities in this approach are calculated based on an unsupervised learning model, rather than Pearson correlation or cosine similarity. In this approach, we also limit the number of similar users as k, which makes system more scalable.

* **Matrix Factorization** (MF): The idea behind such models is that attitudes or preferences of a user can be determined by a small number of hidden factors. We can call these factors as Embeddings.

> Matrix decomposition can be reformulated as an optimization problem with loss functions and constraints. Now the constraints are chosen based on property of our model. For e.g. for Non negative matrix decomposition, we want non negative elements in resultant matrices.

![](../pics/37.png)

#### Embeddings
Intuitively, we can understand embeddings as low dimensional hidden factors for items and users. For e.g. say we have 5 dimensional (i.e. D or n_factors = 5 in above figure) embeddings for both items and users (# 5 chosen randomly). Then for user-X & movie-A, we can say the those 5 numbers might represent 5 different characteristics about the movie, like (i) how much movie-A is sci-fi intense (ii) how recent is the movie (iii) how much special effects are in movie A (iv) how dialogue driven is the movie (v) how CGI driven is the movie. Likewise, 5 numbers in user embedding matrix might represent, (i) how much does user-X like sci-fi movie (ii) how much does user-X like recent movies …and so on. In above figure, a higher number from dot product of user-X and movie-A matrix means that movie-A is a good recommendation for user-X.

****

Matrix factorization can be done by various methods and there are several research papers out there. In next section, there is python implementation for orthogonal factorization (SVD) or probabilistic factorization (PMF) or Non-negative factorization (NMF).

* **Neural Nets/ Deep Learning**: There is a ton of research material on collaborative filtering using matrix factorization or similarity matrix. But there is lack on online material to learn how to use deep learning models for collaborative filtering. This is something that I learnt in fast.ai deep learning part 1 v2.

<a id=1.2.1.2.1></a>

## 1.2.1.2.1 Singular Value decomposition (SVD) in recommender systems

SVD is a matrix factorization technique that is usually used to reduce the number of features of a data set by reducing space dimensions from N to K where K < N. For the purpose of the recommendation systems however, we are only interested in the matrix factorization part keeping same dimensionality. The matrix factorization is done on the user-item ratings matrix. From a high level, matrix factorization can be thought of as finding 2 matrices whose product is the original matrix.

Each item can be represented by a vector `qi`. Similarly each user can be represented by a vector `pu` such that the dot product of those 2 vectors is the expected rating



***

### 1.2.2 Demographic

Demographic recommender systems aim to categorize the user based on personal attributes and make ecommendations based on demographic classes. For example, uses demographic groups from marketing research to suggest a range of products and services. The benefit of a demographic approach is that it may not require a history of user ratings of the type needed by collaborative and content-based techniques.

<a id=1.2.3></a>

***

### 1.2.3 Content-based
Content-based recommendation is an outgrowth and continuation of information filtering research (Belkin & Croft
1992). In a content-based system, the objects of interest are defined by their associated features. For example, text
recommendation systems like the newsgroup filtering system NewsWeeder (Lang 1995) uses the words of their texts
as features. A content-based recommender learns a profile of the user’s interests based on the features present in
objects the user has rated. Schafer, Konstan & Riedl call this “item-to-item correlation.” The type of user profile
derived by a content-based recommender depends on the learning method employed. Decision trees, neural nets, and
vector-based representations have all been used. As in the collaborative case, content-based user profiles are long-
term models and updated as more evidence about user preferences is observed.

<a id=1.2.4></a>

***

### 1.2.4 Utility-based
Utility-based and knowledge-based recommenders do not attempt to build long-term generalizations about their
users, but rather base their advice on an evaluation of the match between a user’s need and the set of options
available. Utility-based recommenders make suggestions based on a computation of the utility of each object for the
user. Of course, the central problem is how to create a utility function for each user. Tête-à-Tête and the e-commerce
site PersonaLogic 2 each have different techniques for arriving at a user-specific utility function and applying it to the
objects under consideration (Guttman 1998). The user profile therefore is the utility function that the system has
derived for the user, and the system employs constraint satisfaction techniques to locate the best match. The benefit
of utility-based recommendation is that it can factor non-product attributes, such as vendor reliability and product
availability, into the utility computation, making it possible for example to trade off price against delivery schedule
for a user who has an immediate need.

<a id=1.2.5></a>

***

### 1.2.5 Knowledge-based

Knowledge-based recommendation attempts to suggest objects based on inferences about a user’s needs and
preferences. In some sense, all recommendation techniques could be described as doing some kind of inference.
Knowledge-based approaches are distinguished in that they have functional knowledge: they have knowledge about
how a particular item meets a particular user need, and can therefore reason about the relationship between a need
and a possible recommendation. The user profile can be any knowledge structure that supports this inference. In the
simplest case, as in Google, it may simply be the query that the user has formulated. In others, it may be a more
detailed representation of the user’s needs (Towle & Quinn, 2000). The Entree system (described below) and several
other recent systems (for example, [Schmitt & Bergmann, 1999]) employ techniques from case-based reasoning for
knowledge-based recommendation. Schafer, Konstan & Riedl call knowledge-based recommendation the “Editor’s
choice” method.
The knowledge used by a knowledge-based recommender can also take many forms. Google uses information
about the links between web pages to infer popularity and authoritative value (Brin and Page, 1998). Entree uses
knowledge of cuisines to infer similarity between restaurants. Utility-based approaches calculate a utility value for
objects to be recommended, and in principle, such calculations could be based on functional knowledge. However,
existing systems do not use such inference, requiring users to do their own mapping between their needs and the
features of products, either in the form of preference functions for each feature in the case of Tête-à-Tête or answers
to a detailed questionnaire in the case of PersonaLogic.

<a id=1.3></a>

## 1.3 Comparing recommendation techniques

All recommendation techniques have strengths and weaknesses discussed below and summarized in Table II.

![](../pics/38.png)


Perhaps the best known is the **“ramp-up”** problem (Konstan, et al. 1998). This term actually refers to two distinct but
related problems.

**New User**: Because recommendations follow from a comparison between the target user and other users based solely
on the accumulation of ratings, a user with few ratings becomes difficult to categorize.

**New Item**: Similarly, a new item that has not had many ratings also cannot be easily recommended: the “new item”
problem. This problem shows up in domains such as news articles where there is a constant stream of new items and
each user only rates a few. It is also known as the “early rater” problem, since the first person to rate an item gets
little benefit from doing so: such early ratings do not improve a user’s ability to match against others (Avery and
Zeckhauser, 1997). This makes it necessary for recommender systems to provide other incentives to encourage users
to provide ratings.

Collaborative recommender systems depend on overlap in ratings across users and have difficulty when the space
of ratings is sparse: few users have rated the same items. The sparsity problem is somewhat reduced in model-based
approaches, such as singular value decomposition, which can reduce the dimensionality of the space.

These three problems suggest that pure collaborative techniques are best suited to problems where the density of
user interest is relatively high across a small and static universe of items. If the set of items changes too rapidly, old
ratings will be of little value to new users who will not be able to have their ratings compared to those of the existing
users. If the set of items is large and user interest thinly spread, then the probability of overlap with other users will
be small.

Collaborative recommenders work best for a user who fits into a niche with many neighbors of similar taste. The
technique does not work well for so-called **“gray sheep”** (Claypool, et al. 1999), who fall on a border between
existing cliques of users. This is also a problem for demographic systems that attempt to categorize users on personal
characteristics. On the other hand, demographic recommenders do not have the “new user” problem, because they do
not require a list of ratings from the user. Instead they have the problem of gathering the requisite demographic
information. With sensitivity to on-line privacy increasing, especially in electronic commerce contexts (USITIC,
1997), demographic recommenders are likely to remain rare: the data most predictive of user preference is likely to
be information that users are reluctant to disclose.

Content-based techniques also have a start-up problem in that they must accumulate enough ratings to build a
reliable classifier. Relative to collaborative filtering, content-based techniques also have the problem that they are
limited by the features that are explicitly associated with the objects that they recommend. For example, content-
based movie recommendation can only be based on written materials about a movie: actors’ names, plot summaries,
etc. because the movie itself is opaque to the system. This puts these techniques at the mercy of the descriptive data
available. Collaborative systems rely only on user ratings and can be used to recommend items without any
descriptive data. Even in the presence of descriptive data, some experiments have found that collaborative
recommender systems can be more accurate than content-based ones

The great power of the collaborative approach relative to content-based ones is its cross-genre or “outside the
box” recommendation ability. It may be that listeners who enjoy free jazz also enjoy avant-garde classical music, but
a content-based recommender trained on the preferences of a free jazz aficionado would not be able to suggest items
in the classical realm since none of the features (performers, instruments, repertoire) associated with items in the
different categories would be shared. Only by looking outside the preferences of the individual can such suggestions
be made.

Both content-based and collaborative techniques suffer from the **“portfolio effect.”** An ideal recommender would not suggest a stock that the user already owns or a movie she has already seen. The problem becomes quite tricky in domains such as news filtering, since stories that look quite similar to those already read may in fact present some new facts or new perspectives that would be valuable to the user. At the same time, many different presentations of the same wire-service story from different newspapers would not be useful. The DailyLearner system (Billsus & Pazzani, 2000) uses an upper bound of similarity in its content-based recommender to filter out news items too similar to those already seen by the user.

> Utility-based and knowledge-based recommenders do not have ramp-up or sparsity problems, since they do not
base their recommendations on accumulated statistical evidence. Utility-based techniques require that the system
build a complete utility function across all features of the objects under consideration. One benefit of this approach is
that it can incorporate many different factors that contribute to the value of a product, such as delivery schedule,
warranty terms or conceivably the user’s existing portfolio, rather than just product-specific features. In addition,
these non-product features may have extremely idiosyncratic utility: how soon something can be delivered may
matter very much to a user facing a deadline. A utility-based framework thereby lets the user express all of the considerations that need to go into a recommendation.

For this reason, Guttman (1999) describes Tête-à-Tête as
“product and merchant brokering” system rather than a recommender system. However, under the definition given
above, Tête-à-Tête does fit since its main output is a recommendation (a top-ranked item) that is generated on a
personalized basis.
The flexibility of utility-based systems is also to some degree a failing. The user must construct a complete
preference function, and must therefore weigh the significance of each possible feature. Often this creates a
significant burden of interaction. Tête-à-Tête uses a small number of “stereotype” preference functions to get the
user started, but ultimately the user needs to look at, weigh, and select a preference function for each feature that
describes an item of interest. 

This might be feasible for items with only a few characteristics, such as price, quality
and delivery date, but not for more complex and subjective domains like movies or news articles. PersonaLogic does
not require the user to input a utility function, but instead derives the function through an interactive questionnaire.
While the complete explicit utility function might be a boon to some users, for example, technical users with specific
purchasing requirements, it is likely to overwhelm a more casual user with a less-detailed knowledge. Large moves in
the product space, for example, from “sports cars” to “family cars” require a complete re-tooling of the preference
function, including everything from interior space to fuel economy. This makes a utility-based system less
appropriate for the casual browser.

Knowledge-based recommender systems are prone to the drawback of all knowledge-based systems: the need for
knowledge acquisition. There are three types of knowledge that are involved in such a system:
* Catalog knowledge: Knowledge about the objects being recommended and their features. For example, the Entree
recommender should know that “Thai” cuisine is a kind of “Asian” cuisine.
* Functional knowledge: The system must be able to map between the user’s needs and the object that might satisfy
those needs. For example, Entree knows that a need for a romantic dinner spot could be met by a restaurant that is
“quiet with an ocean view.”
* User knowledge: To provide good recommendations, the system must have some knowledge about the user. This
might take the form of general demographic information or specific information about the need for which a
recommendation is sought. Of these knowledge types, the last is the most challenging, as it is, in the worst case, an
instance of the general user-modeling problem (Towle & Quinn, 2000).

All of the learning-based techniques (collaborative, content-based and demographic) suffer from the ramp-up
problem in one form or another. The converse of this problem is the stability vs. plasticity problem for such learners.
Once a user’s profile has been established in the system, it is difficult to change one’s preferences. A steak-eater who
becomes a vegetarian will continue to get steakhouse recommendations from a content-based or collaborative
recommender for some time, until newer ratings have the chance to tip the scales. Many adaptive systems include
some sort of temporal discount to cause older ratings to have less influence, but they do so at the risk of losing
information about interests that are long-term but sporadically exercised (Billsus & Pazzani, 2000; Schwab, et al.
2001). For example, a user might like to read about major earthquakes when they happen, but such occurrences are
sufficiently rare that the ratings associated with last year’s earthquake are gone by the time the next big one hits.
Knowledge- and utility-based recommenders respond to the user’s immediate need and do not need any kind of
retraining when preferences change

<a id=1.4></a>

# 1.4 Hybrid recommender systems

Hybrid recommender systems combine two or more recommendation techniques to gain better performance with
fewer of the drawbacks of any individual one. Most commonly, collaborative filtering is combined with some other
technique in an attempt to avoid the ramp-up problem. Table III shows some of the combination methods that have
been employed.

![](../pics/39.png)

![](../pics/40.png)

<a id=1.5></a>

# 1.5 Evaluating Recommender systems, choosing best recommneder systems to business

First of all to select best RS, we should consider two types of business where, RS will be usefull:
* **Offline evaluation in academic world** -  (plus the Netflix Prize), searching for low prediction errors (RMSE/MAE) and high Recall/Catalog coverage. TLDR; just know these measures exists and you probably don’t wanna use them. But I still give a brief summary of them in case you are interested.
* **Online evaluation in business world** - searching for high Customer Lifetime Values (CLV), going through A/B-testing, CTR, CR, ROI, and QA. You should read this section if you are seriously considering recommendations boosting your business.

### The Offline World = How Academics Do It?

RSs have been investigated for decades in academic research. There are lot of research papers introducing different algorithms, and to make the algorithms comparable, they use academic measures. We call these measures the offline measures. You don’t put anything into production, you just play with the algorithms in your sandbox and fine-tune them according to these measures. But even in the middle ages of 2006 in the famous Netflix Prize, a purely academic measure called the RMSE (root mean squared error) has been used.

Just to briefly explain how it works, it supposes your users explicitly rate your products with say number of stars (1=strong dislike, 5=strong like), and you have a bunch of such ratings (records saying that user A rated item X with Y stars) from the past. A technique called the split validation is used: you take only a subset of these ratings, say 80% (called the train set), build the RS on them, and then ask the RS to predict the ratings on the 20% you’ve hidden (the test set). And so it may happen that a test user rated some item with 4 stars, but your model predicts 3.5, hence it has an error of 0.5 on that rating, and that’s exactly where RMSE comes from. Then you just compute the average of the errors from the whole test set using a formula and get a final result of 0.71623. BINGO! That’s how good (or, more precisely, bad) your RS is. Or you may also use different formula and get the MAE (mean absolute error), which does not penalize huge errors (true 4 stars, predicted 1 star) that much, so you might only get 0.6134.

One tiny drawback here is that such a data almost doesn’t exist in the real world, or at least there is too few of it.

Users are too lazy and they won’t rate anything. They just open a web page and if they like what they see, they might buy it/consume it; if it sucks, they leave as fast as they came. And so you only have so-called implicit ratings in you web-server log or a database of purchases, and you can’t measure the number-of-stars error on them, simply because there are no stars. You only have +1 = user viewed a detail or purchased a product, and, typically, nothing else. Sometimes these are called the unary ratings, which you know from Facebook’s “Like” button: the rating is either positive, or unknown (the user just mightn’t know the content exists).

You can still use the split-validation on such data, even for your own offline comparison of SaaS recommenders. Say you take, by example, your purchases database, submit history of 80% users to the RS, and then, for each test user, submit only a few purchases and ask the RS to predict the rest. You may have hidden 4 purchased items and ask the RS for 10 items. You may get 0%, 25%, 50%, 75%, or 100% accuracy for that user, depending on how many of the hidden 4 appeared in the recommended 10. And this accuracy is called the Recall. You may average it over your whole test set and TADAAA! You result is 31.4159%, that’s how good your RS is.

Now honestly, even though the Recall is much more sane than RMSE, it still brings a lot of pain. Say a test user watched 20 episodes of the same TV series, and you measure recall on her. So you hide episodes #18–20 and ask the RS to predict them from #1–17. It is quite easy task as the episodes are strongly connected, so you get recall 100%. Now, did your user discover something new? Do you want to recommend her such a content at all? And what brings the highest business value to you anyway? Say in online store, do you wish to recommend alternatives, or accessories? You should feel you’re getting on a very thin ice with recall.

Now consider a bestseller model. It might have quite good recall, but almost zero coverage (5 constants items?). And take a random recommender. It has almost zero recall and 100% coverage. You might feel you’d like some compromise.

Now you might say: **What about my business?** Measuring recall and coverage might be fine, but how are they related to my KPIs?
And you are right. To put SaaS RS on X-axis and $$$ on the Y-axis, we have to leave the offline world and go into the production!


### The Online World: Follow the examples of smart CTOs

The above section was about measuring the quality of the RS before it goes into production, now it’s time to talk about business KPIs.

While in the offline evaluation we typically use the split-validation, in the online evaluation, the A/B-testing (or multivariate testing) is the today’s most prominent approach. You may integrate few different RSs, divide your users into groups and put the RSs into fight. A bit costly, because it consumes your development resources, so you can use the estimated difficulty of integration and future customizations/adjustments costs as one of your measures, which might a-priori reduce the pool of candidates.

Now lets say you have the integration ready and are able to divide your online users into A/B-test groups. You may either use your own hashing of their UID cookies, or use some tool for that (by example, VWO, Optimizely, or even GAs, though the last option is a little bit painful). To do the experiment, you should determine one good place on your website/application where to test the recommendations, because you sure don’t want to do the full integration of all the candidate RSs early in the pilot stage, right? If you have small traffic, keep in mind the selected place must be visible enough to collect significant results. In the opposite case, if you have huge traffic, you may choose a conservative strategy to, for example, release only 20% of you traffic to the testing, keeping yourself and the rest 80% users safe in case some of the candidate RSs would be completely broken and recommend odd stuff.

Suppose the whole thing is up and running. What to measure? The easiest measures are the **Click-Through Rate (CTR)** and the **Conversion Rate (CR)** of the recommendations.

Displayed set of N recommendations 20 times, from which 3 times a user clicked on at least one of the recommended items? Then your CTR is 15%. Indeed, clicking is nice, but it probably led the user to a detail-page and you might want to know what happened next. Did the user really find the content interesting? Did she watch the whole video, listen to the whole song, read the whole article, answer the job offer, put the product into the cart and actually order it? This is the conversion rate = number of recommendations that made both you and your user happy.

CTR and CR may give you a good estimate of the recommender performance, but you should stay careful and keep thinking about your product. You may be running a news portal, putting the breaking news on the homepage. This might not bring you the highest possible CTR, but it maintains the quality and the feeling you and you users have about your service. Now you may put a RS there and it might start showing different content, such as yellow journalism articles or funny articles about “very fast dogs running at incredible hihg speeds”. This may increase your immediate CTR by 5 times, but it will damage your image and you may lose you users in the long term.

Here comes the empiric evaluation of the RSs. Just start a new session with empty cookies, simulate the behavior of a user and check whether the recommendations are sane. If you have a QA team, get them to the job! Empiric evaluation is both complicated and easy at once. It’s complicated, because it does not produce any numbers you could present on the product board. But it’s also easy, because, thanks to your human intuition, you will simply recognize which recommendations are good and which are bad. If you choose oddly-working recommender, you’re putting yourself into a lot of future trouble even if the CTR/CR are high at the moment.

<a id=1.6></a>

## Recall and Precision at k for Recommender Systems

Precision and recall are classical evaluation metrics in binary classification algorithms and for document retrieval tasks. These metrics have been “Translated” to help us evaluate recommendation systems.
To understand how these metrics work, we need to first understand the workflow of recommendation systems and then how to evaluate them.

Precision and recall are binary metrics used to evaluate models with binary output. Thus we need a way to translate our numerical problem (ratings usually from 1 to 5) into a binary problem (relevant and not relevant items)


To do the translation we will assume that any true rating above 3.5 corresponds to a relevant item and any true rating below 3.5 is irrelevant. A relevant item for a specific user-item pair means that this item is a good recommendation for the user in question.

#### Translating to binary

3.5 is just a threshold value I chose. There are multiple ways to set this threshold value such as taking into consideration the history of ratings given by the user. for the sake of simplicity, we will stick to the 3.5 threshold.

#### Setting 'k'

In the context of recommendation systems we are most likely interested in recommending top-N items to the user. So it makes more sense to compute precision and recall metrics in the first N items instead of all the items. Thus the notion of precision and recall at k where k is a user definable integer that is set by the user to match the top-N recommendations objective.

#### Relevant vs. Recommended
We have already seen the definition of a relevant items. In the rest of the article we will user relevant and recommended items frequently. Here is a good point to pause and grasp their exact definition.

#### Precision and recall at k: Definition
Precision at k is the proportion of recommended items in the top-k set that are relevant
Its interpretation is as follows. Suppose that my precision at 10 in a top-10 recommendation problem is 80%. This means that 80% of the recommendation I make are relevant to the user.
Mathematically precision@k is defined as follows:

> **Precision@k = (# of recommended items @k that are relevant) / (# of recommended items @k)**

Recall at k is the proportion of relevant items found in the top-k recommendations

Suppose that we computed recall at 10 and found it is 40% in our top-10 recommendation system. This means that 40% of the total number of the relevant items appear in the top-k results.
Mathematically recall@k is defined as follows:

> **Recall@k = (# of recommended items @k that are relevant) / (total # of relevant items)**

#### An Illustrative example

In this example we will illustrate the method to calculate precision@k and recall@k metrics

![](../pics/41.png)

As a start we will ignore all the ratings where the actual value is not known. Values with no known true rating cannot be used. We will sort the rest of the items by descending prediction rating. The results will be as follows:

`item/actual/predicted
item7/2/4.9
item5/5/4.5
item10/4/4.3
item2/2/3.6
item2/3/3.4
item1/4/2.3`

The number of relevant items are the items with actual rating greater or equal to 3.5. 

`Relevant items: item5, item10 and item1
total # of relevant items = 3`

The recommended items at 3 are item7, item5 and item10

It is the intersection between Recommended@3 and Relevant@3 which are item 5 and item 10

##### Precision@3 will be

`Precision@3 = (# of recommended items that are relevant @3)/(# of recommended  items at 3) = 2/3 = 66.67%`


##### Recall @ 3:

`Recall@3 
= (# of recommended items that are relevant @3)/(total # of relevant items)
= 2/3
= 66.67%`

<a id=1.7></a>

# 1.7 More jargons in Rec Sys

* Relevance 
Recommended items will only make sense if they are relevant to the user. Users are more likely to buy or consume items they find interesting

* Novelty 
Along with relevance, novelty is another vital factor. Recommended items will make more sense If the items are something that the user has not seen or consumed before.

* Serendipity
Sometimes recommending items which are somewhat unexpected can also boost sales. Serendipity is however different from novelty. In the author’s words:
>“if a new Indian restaurant opens in a neighbourhood, then the recommendation of that restaurant to a user who normally eats Indian food is novel but not necessarily serendipitous. On the other hand, when the same user is recommended Ethiopian food, and it was unknown to the user that such food might appeal to her, then the recommendation is serendipitous”.

* Diversity
Also increasing diversity in recommendations is equally important. Simply recommending items which are similar to each other, isn’t of much use

# References

1. [Implementation of collabarative filtering](https://towardsdatascience.com/various-implementations-of-collaborative-filtering-100385c6dfe0)
2. [Hybrid Recommender Systems: Survey and Experiments](https://www.researchgate.net/publication/263377228)
3. [ Evaluating Recommender systems, choosing best recommneder systems to business](https://medium.com/recombee-blog/evaluating-recommender-systems-choosing-the-best-one-for-your-business-c688ab781a35)
4. [Recall and Precision at k for Recommender Systems](https://medium.com/@m_n_malaeb/recall-and-precision-at-k-for-recommender-systems-618483226c54)

Backlog:
* Customer Lifetime Values
* A/B testing
* CR
* CTR
* ROI
* QA
* Coverage (Catalog coverage)
* empiric evaluation 
* indiviual threshhold for users

#  TO DO:

1. [Developing a prescriptive recommender system through Matrix Factorization](https://towardsdatascience.com/developing-a-prescriptive-recommender-system-through-matrix-factorization-8b0c69cce611)
2. [Building a Recommender System Using Embeddings; Theory](https://drop.engineering/building-a-recommender-system-using-embeddings-de5a30e655aa)

In [1]:
 ! jupyter nbconvert --to ipynb ML4.ipynb

[NbConvertApp] Converting notebook ML4.ipynb to ipynb
[NbConvertApp] Writing 44718 bytes to ML4.nbconvert.ipynb
