## Recommender systems

Recommender systems are ***active information filtering systems*** that personalize the information coming to a user based on his interests, relevance of the information, etc.

There are typically 2 ways that these systems are built.

1. **Content based filtering**

![The algorithm infers you like the brand duff!](https://miro.medium.com/max/1334/1*oYpMnPQFZaiZQizgVWBpoA.png)

  * users are recommended items based on those they have already consumed. For example, if you bought a blue pen, amazon may recommend you a red pen as they both are similar. This is not the scope of our discussion today.
  * Problems include creation of a filter bubble, recommending items you have already purchased


2. **Collaborative filtering**

![](https://miro.medium.com/max/1374/1*-Jr1l2rlj9SBcCzlDHtN5g.jpeg)

  * CF accumulates customer product ratings, identifies customers with common ratings, and offers recommendations based on inter-customer comparisons. It’s based on the idea that people who agree in their evaluations of certain items in the past are likely to agree again in the future. 


## Collaborative Filtering

There are 2 ways to build collaborative filtering systems.

### Memory based CF

* User based:
  * User 1 rated movie A highly
  * User 2 rated movie A highly -> User 1 and User 2 are similar.
  * Hence, if User 1 rates movie B low, user 2 will also rate movie B low

* Item-based:
  * Instead of measuring the similarity between users, the item-based CF recommends items based on their similarity with the items that the target user rated

  ![](https://miro.medium.com/max/1400/1*7bFc9R97Z4jKK6J2jaSUlw.jpeg)

* Similarity Matrix - Here, item(i, j) represents the rating user i gives movie j
![](https://hackernoon.com/hn-images/1*9TC6BrfxYttJwiATFAIFBg.png)

* Similarity between users x and y is calculated using cosine similarity where xi is the rating user x gives movie i and yi is the rating user y gives movie i. If user x has rated a movie and user y has not, you can assume the rating to be 0, use a constant or even skip it.

![](https://i.imgur.com/I9T81nG.png)



### Code Example - User based CF


We use the [Surprise Scikit](http://surpriselib.com/) (Simple Python RecommendatIon System Engine) library which is a scikit-learn type collection of ML recommendation algorithms.

In [49]:
%pip install scikit-surprise
from surprise import Dataset

# Loads the builtin Movielens-100k data
movielens = Dataset.load_builtin('ml-100k')



In [50]:
import pandas as pd
from surprise import Reader
from surprise import KNNWithMeans

# This is the same data that was plotted for similarity earlier
# with one new user "E" who has rated only movie 1
ratings_dict = {
    "item": [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6],
    "user": ['A', 'B', 'F', 'D', 'F', 'A', 'B', 'C', 'F', 'A', 'D', 'F', 'A', 'C', 'B', 'D', 'E'],
    "rating": [2, 5, 4, 1, 5, 2, 4, 5, 4, 4, 5, 1, 5, 2, 1, 4, 2],
}

df = pd.DataFrame(ratings_dict)
reader = Reader(rating_scale=(1, 5))

# Loads Pandas dataframe
data = Dataset.load_from_df(df[["user", "item", "rating"]], reader)

# To use user-based cosine similarity
sim_options = {
    "name": "cosine",
    "user_based": True,
}
algo = KNNWithMeans(sim_options=sim_options)


In [51]:
# Train
trainingSet = data.build_full_trainset()
algo.fit(trainingSet)

# Predict
prediction = algo.predict('E', 1)
print (prediction.est)

Computing the cosine similarity matrix...
Done computing similarity matrix.
3.6666666666666665


### Model based CF


* Typically uses matrix based factorization, which is useful for deriving meaningful inferences from sparse matrices, primarily through latent features.
* Matrix factorization can be seen as breaking down a large matrix into a product of smaller ones. A matrix A with dimensions `m x n` can be reduced to a product of two matrices X and Y with dimensions `m x p` and `p x n` respectively.

#### Concrete Example 

![](https://files.realpython.com/media/dimensionality-reduction.f8686dd52b9c.jpg)

Here, the left matrix and top matrix map `m` users to `2` latent features and `n` movies to the same 2 latent features respectively. Here is an interpretation 

* Assume that in a user vector (u, v), u represents how much a user likes the Horror genre, and v represents how much they like the Romance genre.

  * The user vector (2, -1) thus represents a user who likes horror movies and rates them positively and dislikes movies that have romance and rates them negatively.

* Assume that in an item vector (i, j), i represents how much a movie belongs to the Horror genre, and j represents how much that movie belongs to the Romance genre.

  * The movie (2.5, 1) has a Horror rating of 2.5 and a Romance rating of 1. Multiplying it by the user vector using matrix multiplication rules gives you (2 * 2.5) + (-1 * 1) = 4.

* **So, the movie belonged to the Horror genre, and the user could have rated it 5, but the slight inclusion of Romance caused the final rating to drop to 4.**


#### **Singular Value Decomposition**

* Simply a way to factorize a matrix, most popularly used for collaborative filtering and recommendation algorithms. 
* Users RMSE as the metric to ensure that the closest approximation is reached. We essentially turn the recommendation problem into an optimization problem.
* The canonical equation is as shown. The way you derive inferences is just like the example we discussed

![](https://hackernoon.com/hn-images/1*haUDjEiQmG0RapR0SHos6Q.png)

<br>

### Code Example - SVD based CF

We use `GridSearchCV` in conjucntion with the `SVD` algorithm, to give us the most optimal results by minimizing RMSE.

In [64]:
from surprise import SVD
from surprise.model_selection import GridSearchCV

"""
lr_all is the learning rate for all parameters (how much the parameters are adjusted in each iteration)
reg_all is the regularization term for all parameters, which is a penalty term added to prevent overfitting.
"""
param_grid = {
    "n_epochs": [5, 10],
    "lr_all": [0.002, 0.005],
    "reg_all": [0.4, 0.6]
}

# Get the best params using GridSearchCV
gs = GridSearchCV(SVD, param_grid, measures=["rmse"], cv=3)
gs.fit(data)
best_params = gs.best_params["rmse"]

# Extract and train model with best params
svd_algo = SVD(n_epochs=best_params['n_epochs'],
               lr_all=best_params['lr_all'],
               reg_all=best_params['reg_all'])
svd_algo.fit(trainingSet)

# Predict
prediction = svd_algo.predict('E', 1)
print (prediction.est)


3.449428534520331


## Drawbacks

* Visibility - The main drawback of SVD is that there is no to little explanation to the reason that we recommend an item to an user. This can be a huge problem if users are eager to know why a specific item is recommended to them.
* Missing values - In the case of SVD, it doesn’t assume anything about missing values. So you need to give some missing value imputation for SVD. This might bring in unnecessary noise into the model.
* Recommending new items - Collaborative filtering can lead to some problems like cold start for new items that are added to the list. Until someone rates them, they don’t get recommended.



## References
* [Building collaborative filtering systems](https://realpython.com/build-recommendation-engine-collaborative-filtering/#model-based)
* [Collaborative filtering using KNN](https://heartbeat.fritz.ai/recommender-systems-with-python-part-ii-collaborative-filtering-k-nearest-neighbors-algorithm-c8dcd5fd89b2)
* [Collaborative filtering using SVD](https://heartbeat.fritz.ai/recommender-systems-with-python-part-iii-collaborative-filtering-singular-value-decomposition-5b5dcb3f242b)
* [Beginner's guide to creating an SVD Recommender System](https://towardsdatascience.com/beginners-guide-to-creating-an-svd-recommender-system-1fd7326d1f65)