# BLU10 - Part 3 of 3 - Non-personalized recommenders

In [1]:
import os
import numpy as np
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

It is finally time to implement our first basic recommender system! We will use a data set of user ratings of movies.

## 1. Non-personalized RS

The core function of any RS is to identify useful items for the user. Going back to our framework, non-personalized RS are typically built on the base model, exploiting just the interactions between users and items - the ratings.

![Recommender Sytems Framework](./media/recommender_systems_framework_base.png)

We consider users, however, only as providers of the ratings, ignoring user preferences shown by the ratings they give. **The rationale is that a generic user also likes something that is liked by many users.**

This approach is particularly relevant in the absence of information about user preferences, like for users that didn't provide any ratings, new users, or if we simply did not collect any user information. If we are unable to predict the utility of an item for a particular user, then we recommend an item with high utility for many users.

We will use the non-personalized algorithm example here to give you an idea of building an RS.

## 2. Loading the data

First, we read the movie ratings data into into a NumPy array.

In [2]:
def read_data():
    """ Read ratings data from a text file to a NumPy array."""   
    path = os.path.join('data', 'ml-latest-small', 'ratings.csv')
    data = np.genfromtxt(path, delimiter=',',skip_header=1, usecols=[0, 1, 2])
    return data


data = read_data()

Let's look at the data. To make the visualization easier, we show the data in a dataframe:

In [3]:
pd.DataFrame(data).head(5)

Unnamed: 0,0,1,2
0,1.0,31.0,2.5
1,1.0,1029.0,3.0
2,1.0,1061.0,3.0
3,1.0,1129.0,2.0
4,1.0,1172.0,4.0


What we have here is:
* User identification (userId) in the first column
* Item identification (movieId) in the second column
* Rating in the third column.

This is a memory effective way to store the data, but it's not convenient to build an RS. The next step is to use this data to construct a ratings matrix.

The dataset also contains a timestamp that we don't need at the moment. We are not using the column names to make further array manipulation easier.

We have 100004 movie ratings from 671 users and 9066 movies.

In [4]:
data.shape[0], pd.DataFrame(data)[0].nunique(), pd.DataFrame(data)[1].nunique()

(100004, 671, 9066)

## 3. Building the ratings matrix

The second step is to transform this data representation into a ratings matrix, with:
* Users as rows
* Items as columns
* Ratings as values.

We have 671 user ids (1-671) and 9066 movies ids (1-163949). As the ids are not necessarily consecutive, we will store the mapping of the ids to the matrix indices so that we can reconstruct the original array. 

Then we create a matrix filled with zeros with the size we want:
* The number of unique users is the number of rows
* The number of unique items is the number of columns.

Finally, we fill in the rating values using the stored indexes. We'll do this in a vectorized way which is the great advantage of using NumPy.

### 3.1 Building the ratings matrix with Numpy
We're going to use the [unique](https://numpy.org/doc/stable/reference/generated/numpy.unique.html#numpy.unique) function to obtain the unique elements from the user and item id input arrays. This is to remember the relation of the ratings matrix indices to the user/item ids.

Then we construct an empty matrix and fill in the ratings values using the previously obtained indices.

In [5]:
def make_ratings(data):
    
    users, user_pos = np.unique(data[:, 0], return_inverse=True)
    items, item_pos = np.unique(data[:, 1], return_inverse=True)
    
    R = np.zeros((len(users), len(items)))
    R[user_pos, item_pos] = data[:, 2]
    
    return R


R = make_ratings(data)
R

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [4., 0., 0., ..., 0., 0., 0.],
       [5., 0., 0., ..., 0., 0., 0.]], shape=(671, 9066))

How many values do you think will be non-zero in this matrix?

In [6]:
len(R[R>0])

100004

Exactly the number of rows we had in our tabular format!

### 3.2 Building the ratings matrix with Pandas 

Another beautiful constructor that we can use is the Pandas pivot function (that you've already met when learning about tidy data). It can be convenient to retain indexes for our products and users in a dataframe instead of having a numpy array and Pandas rescues us on that.

The dataframe pivot method takes as arguments: 
- index: The row index (the user id)
- columns: The column names (the item id)
- values: The values of the matrix (the ratings)

In [7]:
pd.DataFrame(data).pivot(index=0, columns=1, values=2).head(5)

1,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,...,161084.0,161155.0,161594.0,161830.0,161918.0,161944.0,162376.0,162542.0,162672.0,163949.0
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,,,,,,,,,,,...,,,,,,,,,,
2.0,,,,,,,,,,4.0,...,,,,,,,,,,
3.0,,,,,,,,,,,...,,,,,,,,,,
4.0,,,,,,,,,,4.0,...,,,,,,,,,,
5.0,,,4.0,,,,,,,,...,,,,,,,,,,


Nice! We can power this up with `fillna`.

In [8]:
pd.DataFrame(data).pivot(index=0, columns=1, values=2).fillna(0).head(5)

1,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,...,161084.0,161155.0,161594.0,161830.0,161918.0,161944.0,162376.0,162542.0,162672.0,163949.0
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


A thing of beauty. One-line of code! But remember that with great power comes great responsibility - this matrix is really hungry and eats a lot of space! (That's why we won't save it.)

## 4. Density and sparsity scores

Density and sparsity scores tell us how sparse is our sparse matrix.

### 4.1 Density 

Density is the fraction of the non-zero elements of the matrix. We will use the [nonzero](https://numpy.org/doc/stable/reference/generated/numpy.nonzero.html#numpy-nonzero) method to return a mask of the elements that are non-zero. 

This is another way to get the elements that are not zero in a matrix. If the matrix has negative elements, it's more efficient to use this method instead of doing `R[R>0]`.

In [9]:
R.nonzero()

(array([  0,   0,   0, ..., 670, 670, 670], shape=(100004,)),
 array([  30,  833,  859, ..., 4597, 4610, 4696], shape=(100004,)))

We compute the density score as

$$Density = \frac{|R'|}{|R|}$$

Where $|R'|$ is equal to the number of matrix elements that are not zero.

In [10]:
R[R.nonzero()].size / R.size

0.016439141608663475

Holy moly, at least now we know what we are up against! Only 1.6% of the matrix has values that are not zero.

### 4.2 Sparsity

Sparsity is the opposite of density - the fraction of matrix elements that are zero. Simply put

$$Sparsity = 1- \frac{|R'|}{|R|}$$

In [11]:
1 - R[R.nonzero()].size / R.size

0.9835608583913366

These two attributes complement each other and they are important when dealing with ratings matrixes. In this case, **our ratings matrix is ~2% dense and ~98% sparse!** Here we are with our sparse ratings matrix ready to set up the first non-personalized recommender system.

## 5. Aggregated opinion

The most important aspect of non-personalized recommenders is that we predict the utility of an item for any user. We'll be doing that by looking at the opinions of all the users in the community.

Perhaps the oldest RS is aggregated opinion, i.e. most popular/most hated (think Billboard or [IMDb Bottom 100](https://www.imdb.com/chart/bottom)). Another simple option is the most rated items. Most rated items tend to be those that get most interaction from the users or those that elicit the strongest opinions.

### 5.1 Most-rated

One way to find the most popular items is to select the ones with most ratings.

We start by selecting the elements of the ratings matrix that are not zero.

In [12]:
def is_rating(R):
    return np.greater(R, 0)

is_rating(R)

array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [ True, False, False, ..., False, False, False],
       [ True, False, False, ..., False, False, False]], shape=(671, 9066))

Recall that each row corresponds to a user and each column to an item, so we can sum each column to know how many times each item was rated.

In [13]:
def count_ratings(R):
    R_ = is_rating(R)
    return R_.sum(axis=0)

count_ratings(R)

array([247, 107,  59, ...,   1,   1,   1], shape=(9066,))

Now we can write a function that retrieves the N most-rated items.

In [14]:
def most_rated(R, n):
    R_ = count_ratings(R)
    return np.negative(R_).argsort()[:n]


most_rated(R, 3)

array([321, 266, 284])

### 5.2 Yeah, but what if most ratings are negative?

The most-rated items are certainly the most interesting or maybe most controversial, but are they also good? We can extend the function above to mimic another popular algorithm, "Highest % of Top Ratings", which counts only the positive ratings.

Let's say a positive rating is anything above the value of 3 (e.g. 3 stars).

In [15]:
def count_positive_ratings(R, threshold):
    R_ = is_above_threshold(R, threshold)
    return R_.sum(axis=0)


def is_above_threshold(R, threshold):
    return np.greater(R, threshold)


count_positive_ratings(R, 3)

array([182,  51,  24, ...,   1,   0,   1], shape=(9066,))

Now, we just need to count the number of positive ratings and sort the resulting array.

In [16]:
def most_positive_ratings(R, threshold, n):
    R_ = count_positive_ratings(R, threshold)
    return np.negative(R_).argsort()[:n]


most_positive_ratings(R, 3, 3)

array([284, 321, 266])

## 6. Powering up with summary statistics

Until now we have used only the counts of the ratings as measures of popularity. But we can rely on good old statistics to help us out here for better recommendations.

Probably the most popular non-personalized algorithm is the average rating. Popularized at first by Amazon and Ebay and then IMDB and Netflix, this is a basic yet widely used algorithm.

The first step is to remove the zeros so that they don't affect our average.

In [17]:
def remove_zeros(R):
    R_ = R.copy()
    R_[R_ == 0] = np.nan
    
    return R_


remove_zeros(R)

array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [ 4., nan, nan, ..., nan, nan, nan],
       [ 5., nan, nan, ..., nan, nan, nan]], shape=(671, 9066))

And now we can safely compute the average rating per item and sort the array. We are using the [np.nanmean](https://numpy.org/doc/stable/reference/generated/numpy.nanmean.html#numpy-nanmean) function which ignores the NaNs when computing the mean.

In [18]:
def mean_ratings(R):
    R_ = remove_zeros(R)
    return np.nanmean(R_, axis=0)


mean_ratings(R)

array([3.87246964, 3.40186916, 3.16101695, ..., 5.        , 3.        ,
       5.        ], shape=(9066,))

In [19]:
def best_mean_rating(R, n):
    R_ = mean_ratings(R)
    return np.negative(R_).argsort()[:n]


best_mean_rating(R, 3)

array([9065, 8119, 8125])

An alternative to average rating is computing the mean rating only for users that liked the item, so basically ignoring the less good ratings.

It's also increasingly popular to show a histogram alongside mean ratings to give an idea of the distribution of the ratings. Or to normalize the mean by the number of ratings so that items with a low number of ratings do not get an advantage.

## 7. Association rules

Perhaps one of the most interesting (and also very popular) non-personalized algorithms is "people who buy X also buy Y".

These kinds of algorithms are called **association rules**. The [mlxtend](http://rasbt.github.io/mlxtend/) library has the implementation of some of them.

Before looking at the association rules, we need to define a helper concept, **support**, and look at different metrics that measure the strength of the association rules.

### 7.1 Support

Support measures the frequency of a set of items in a database of transactions. Let's say we have a database of purchases in a supermarket - a table where rows are purchases and columns are items. The support of a set of purchased items is the fraction of purchases that contains that specific item set. For instance, in a pandemic, everybody buys toilet paper - the support of toilet paper is 1 because it is contained in every purchase made in the supermarket. If every tenth purchase contains toilet paper and bananas, the support of the item set {toilet paper, bananas} is 0.1.

Now onto the association rules.

### 7.2 Apriori

[Apriori](http://rasbt.github.io/mlxtend/api_subpackages/mlxtend.frequent_patterns/#apriori) is a function used to identify common item sets, i.e. stuff that usually goes together. Usually, we are interested only in item sets that are frequent, i.e. that have some minimal support (occurence threshold). This is is how the algorithm goes:

* We identify individual items that satisfy a minimum occurrence threshold (minimum support)
* We extend the item sets, adding one item at a time
* Every time we check if the resulting item set satisfies the specified threshold
* The algorithm stops when there are no more items to add that meet the threshold.

Clearly, if a certain set meets the minimum occurence threshold, all its subsets also meet it.

For our database of purchases (rows are purchases, columns are items), `mlxtend` expects a one-hot input, so 0/1 or True/False to indicate which items belong to which purchase. The `min_support` parameter indicates the minimum occurrence threshold (fraction of purchases containing the given item set).

Unfortunately, `mlxtend` only supports dataframes at this point. The items in the selected item sets are returned as respective column indices. Let's write a function that takes the ratings matrix and gets us item sets with the given minimal support.

In [20]:
def get_apriori_itemsets(R, min_support=0.3):
    R_ = pd.DataFrame(R > 0)
    return apriori(R_, min_support)


get_apriori_itemsets(R)

Unnamed: 0,support,itemsets
0,0.368107,(0)
1,0.339791,(100)
2,0.433681,(232)
3,0.482861,(266)
4,0.463487,(284)
5,0.508197,(321)
6,0.317437,(406)
7,0.408346,(427)
8,0.363636,(472)
9,0.320417,(521)


After identiying the item sets with the `apriori` function, we can use the `association_rules` function to find out how strong is the association between the items in the item sets.

The strength of the association rules is measured with different metrics, we will look at two of them. Notice that association rules can have a direction because we may be interested in looking at implications. Do people who buy toilet paper also buy bananas? Do people who buy bananas also buy toilet paper? That's why we talk about antecedent and consequent items or item sets.

### 7.3 Confidence

Given two item sets $i$, called *antecendent*, and $j$, called *consequent*, the confidence refers to how frequently the item set $j$ is purchased given that the item set $i$ was purchased (how many people who buy toilet paper also buy bananas?):

$$Confidence\{i \to j \} = \frac{Support\{i, j\}}{Support\{i\}}$$

Or, in a more familiar way, confidence is the conditional probability of $j$ given $i$:

$$P(j|i) = \frac{P(i \cap j)}{P(i)}$$

However, do $i$ and $j$ occur for the same users for a reason, or is it random? What if $j$ is a trendy item?

### 7.4 Lift

Meet the toilet paper trap: just because people buy toilet paper all the time, it doesn't mean toilet paper goes well with bananas. Fortunately, there is a better way. 

The lift algorithm takes into consideration the popularity of the items and normalizes the confidence to the support of $j$.

$$Lift\{i \to j\}\ =\ \frac{Confidence\{i \to j \}}{Support\{j\}}\ =\ \frac{Support\{i, j\}}{Support\{i\} * Support\{j\}}$$

The denominator is the likelihood that $i$ and $j$ appear together by chance, so lift questions whether $i$ makes $j$ more probable or not.

Mlxtend implements two more metrics measuring the independence of the item sets, *leverage* and *conviction*. Check out the definitions [here](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/#metrics).

### 7.5 Association rules
Let's look at the association rules between two-item item sets from our ratings matrix (antecendent and consequent). 

The [association_rules](http://rasbt.github.io/mlxtend/api_subpackages/mlxtend.frequent_patterns/#association_rules) method returns a table where each row is an association rule with its strength indicated by different metrics:

In [21]:
def get_rules(R, min_support=.3, min_threshold=1.2):
    itemsets = get_apriori_itemsets(R, min_support=0.3)
    return association_rules(itemsets, metric="lift", min_threshold=min_threshold, num_itemsets=itemsets.shape[0])

get_rules(R)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(232),(953),0.433681,0.348733,0.302534,0.697595,2.000367,1.0,0.151295,2.153621,0.883057,0.630435,0.535666,0.782558
1,(953),(232),0.348733,0.433681,0.302534,0.867521,2.000367,1.0,0.151295,4.274794,0.767875,0.630435,0.766071,0.782558
2,(266),(284),0.482861,0.463487,0.326379,0.675926,1.458348,1.0,0.102578,1.655525,0.607753,0.526442,0.395962,0.690053
3,(284),(266),0.463487,0.482861,0.326379,0.70418,1.458348,1.0,0.102578,1.748153,0.585807,0.526442,0.427968,0.690053
4,(321),(266),0.508197,0.482861,0.344262,0.677419,1.402927,1.0,0.098874,1.60313,0.583983,0.532258,0.37622,0.695191
5,(266),(321),0.482861,0.508197,0.344262,0.712963,1.402927,1.0,0.098874,1.713379,0.555373,0.532258,0.416358,0.695191
6,(266),(525),0.482861,0.453055,0.33383,0.691358,1.525991,1.0,0.115067,1.772101,0.666529,0.554455,0.435698,0.7141
7,(525),(266),0.453055,0.482861,0.33383,0.736842,1.525991,1.0,0.115067,1.965127,0.630206,0.554455,0.491127,0.7141
8,(321),(284),0.508197,0.463487,0.321908,0.633431,1.366663,1.0,0.086365,1.463607,0.545525,0.495413,0.316756,0.663982
9,(284),(321),0.463487,0.508197,0.321908,0.694534,1.366663,1.0,0.086365,1.610009,0.500064,0.495413,0.378885,0.663982


Wrapping up: 
- Non-personalized recommenders do not take into account specific user preferences or characteristics.
- Non-personalized recommenders approaches are the simplest way to design recommendation engines.
- It's really important to know how to handle matrix sparsity as it will impact your workflow until the end.

Now we have the foundations to tackle more complex recommendation approaches, coming up in the next BLUs.

Time to practice!