*Copyright (c) Cornac Authors. All rights reserved.*

*Licensed under the Apache 2.0 License.*

# Model Ensembling

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/PreferredAI/cornac/blob/master/tutorials/model_ensembling.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/PreferredAI/cornac/blob/master/tutorials/model_ensembling.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

This notebook provides an example of how to ensemble multiple recommendation models in Cornac.

Ensemble models is a technique that combines the predictions of multiple models to produce a single prediction. The idea is that by combining the predictions of multiple models, we can improve the overall performance of the recommendation system.

We will use the MovieLens 100K dataset and ensemble 2 models.

** Note: ** Part of this notebook (in Section 4) uses the `scikit-learn` package. 

## 1. Setup

### 1.1 Install required dependencies

In [None]:
! pip install seaborn scikit-learn cornac==2.2.2 tensorflow==2.12.0

In [38]:
from IPython.display import display
import numpy as np
import pandas as pd
from tqdm import tqdm

from cornac.datasets import movielens
from cornac.models import BPR, WMF
from cornac.eval_methods import RatioSplit
from cornac.metrics import Precision, Recall
from cornac.utils import cache
from cornac import Experiment

from sklearn import linear_model
from sklearn.ensemble import RandomForestRegressor

## 2. Prepare Experiment

### 2.1 Loading Dataset

First, we load the MovieLens 100K dataset.

In [13]:
data = movielens.load_feedback(variant="100K") # Load MovieLens Dataset

rs = RatioSplit(data, test_size=0.2, seed=42, verbose=True) # Split to train-test set to 80-20
train_set, test_set = rs.train_set, rs.test_set

rating_threshold = 1.0
exclude_unknowns = True
---
Training data:
Number of users = 943
Number of items = 1651
Number of ratings = 80000
Max rating = 5.0
Min rating = 1.0
Global mean = 3.5
---
Test data:
Number of users = 943
Number of items = 1651
Number of ratings = 19964
Number of unknown users = 0
Number of unknown items = 0
---
Total users = 943
Total items = 1651


### 2.2 Training BPR and WMF models

We will train two models: 

1. BPR (Bayesian Personalized Ranking)
2. WMF (Weighted Matrix Factorization)

In [14]:
bpr_model = BPR(k=10, max_iter=100, learning_rate=0.01, lambda_reg=0.001) # Initialize BPR model
wmf_model = WMF(k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01,) # Initialize WMF model

models = [bpr_model, wmf_model]
metrics = [Precision(k=50), Recall(k=50)] # Set metrics for experiment

experiment = Experiment(rs, models, metrics, user_based=True).run() # Run Experiment to compare BPR model to WMF model individually


[BPR] Training started!

[BPR] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


TEST:
...
    | Precision@50 | Recall@50 | Train (s) | Test (s)
--- + ------------ + --------- + --------- + --------
BPR |       0.1801 |    0.5060 |    0.1678 |   0.7775
WMF |       0.1764 |    0.5131 |   22.1048 |   0.4298



Comparing Precision and Recall, both BPR and WMF are providing comparable results.

Let's move on to try to interpret these results by using the genres of movies that were recommended to us.

Generally, we could assume that if an individual likes a particular film genre like 'Romance', the recommender system should provide more of such 'Romance' films.

### 2.3 Interpreting Results

##### Creating a Movie Genre Dataframe

In [15]:
# Creating a dataframe of movies with its corresponding genres

# Download some information of MovieLens 100K dataset
item_df = pd.read_csv(
  cache("http://files.grouplens.org/datasets/movielens/ml-100k/u.item"), 
  sep="|", encoding="ISO-8859-1",
  names=["ItemID", "Title", "Release Date", "Video Release Date", "IMDb URL", 
         "unknown", "Action", "Adventure", "Animation", "Children's", "Comedy", 
         "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror", 
         "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"]
).set_index("ItemID").drop(columns=["Video Release Date", "IMDb URL", "unknown"])

item_idx2id = train_set.item_ids # create a item index to film ID mapping
user_idx2id = train_set.user_ids

# Let's take a look at an example of this dataframe
display(item_df.head(3))


Data from http://files.grouplens.org/datasets/movielens/ml-100k/u.item
will be cached into /home/ubuntu/.cornac/u.item


0.00B [00:00, ?B/s]

File cached!


Unnamed: 0_level_0,Title,Release Date,Action,Adventure,Animation,Children's,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,Toy Story (1995),01-Jan-1995,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,GoldenEye (1995),01-Jan-1995,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
3,Four Rooms (1995),01-Jan-1995,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


The `item_df` dataframe consists of all movie items with its corresponding genre attributes.

Further down below, we are going to filter this table with the recommendations that we get from the recommender system models we created to get a better sense of its performance.

##### Creating Training Data Dataframe

To get a sense of what data has been inserted into our model for training, let's count the genres of the training data used to train the model.

But first, let's create a `training_data_df` dataframe with all training data.

The training data consists of 80000 triplets of **User Index**, **Item Index** and **Rating** rows as seen in the dataset summary in Section 2.1.

In [16]:
# Let's view a sample of the training data dataframe
print("Sample row of record:")
print("(user_index, item_index, rating):", list(zip(*train_set.uir_tuple))[0])

# Create a training data dataframe
training_data_df = pd.DataFrame(zip(*train_set.uir_tuple)) # adding all training data into dataframe
training_data_df.columns = ['user_idx', 'item_idx', 'rating'] # adding column names to the data

# Add new column, 'item_id', for further filtering in later sections
training_data_df['item_id'] = training_data_df.apply(lambda row: item_idx2id[int(row['item_idx'])], axis=1) # converted from the item index field

Sample row of record:
(user_index, item_index, rating): (0, 0, 4.0)


##### Filtering Training Data

Let's filter based on a particular user to learn more about the user.

We set ``UIDX`` to user index **3**, and ``TOPK`` to **50**, to get the top 50 recommendations in each model for comparison.

In [17]:
UIDX = 3
TOPK = 50

In [18]:
# Filter training data (rating = 5.0 and user index = UIDX)
filter_df = training_data_df[(training_data_df['rating'] == 5.0) & (training_data_df['user_idx'] == UIDX)]
filter_df = item_df.loc[[int(item_id) for item_id in filter_df["item_id"]]] # get genres of movie items

print("Number of movies:", len(filter_df))

# Group by Movie Genre and Sum by genres
filter_df = filter_df.select_dtypes(np.number).sum() 
filter_df = filter_df.to_frame("Sum") # Let's call that column 'Sum'

# Add a new column '%' for the percentage of individual genre sum compared to total sum
filter_df["%"] = filter_df["Sum"] / filter_df["Sum"].sum() * 100
filter_df["%"] = filter_df["%"].round(1)

# Let's see the training data genres, sums and percentages
print("Movies rated 5.0 by user index 3 in training data")
display(filter_df.sort_values("Sum", ascending=False))

Number of movies: 138
Movies rated 5.0 by user index 3 in training data


Unnamed: 0,Sum,%
Drama,71,24.9
Comedy,39,13.7
Romance,32,11.2
Action,30,10.5
Thriller,29,10.2
Adventure,19,6.7
War,15,5.3
Crime,12,4.2
Sci-Fi,9,3.2
Mystery,8,2.8


As shown above in the training data, the top genres for user index 3 with movies rated 5.0 include 'Drama', 'Comedy', 'Romance', 'Action' and 'Thriller'.

Let's now compare them to the recommendations of the BPR and WMF models respectively.

##### Interpreting Recommendations of BPR, WMF Models

In [19]:
# Get the Top 5 Genres in filtered training data for user index 3
top_genres = filter_df.sort_values("Sum", ascending=False).head(5).index.tolist()
print("\nTop 5 Genres in training data:", top_genres)

# Get top K recommendations for BPR and put them into the genre dataframe
bpr_recommendations, bpr_scores = bpr_model.rank(UIDX, k=TOPK) # rank recommendations by score, limit to top K
print(len(bpr_recommendations))
bpr_topk = [item_idx2id[iidx] for iidx in bpr_recommendations[:TOPK]] # convert item indexes into item ids
bpr_df = item_df.loc[[int(iid) for iid in bpr_topk]] # filter the movie genre dataframe by item ids

# Let's view the top recommendations for BPR by top genres
display("BPR: Top recommendations", bpr_df[["Title"] + top_genres].head(10))

# Now, let's do likewise for WMF - get top K recommendations and put them into the genre dataframe
wmf_recommendations, wmf_scores = wmf_model.rank(UIDX, k=TOPK) # rank recommendations by score
wmf_topk = [item_idx2id[iidx] for iidx in wmf_recommendations[:TOPK]] # convert item indexes into item ids
wmf_df = item_df.loc[[int(iid) for iid in wmf_topk]] # filter the movie genre dataframe by item ids

# View the top recommendations for WMF
display("WMF: Top recommendations", wmf_df[["Title"] + top_genres].head(10))


Top 5 Genres in training data: ['Drama', 'Comedy', 'Romance', 'Action', 'Thriller']
1651


'BPR: Top recommendations'

Unnamed: 0_level_0,Title,Drama,Comedy,Romance,Action,Thriller
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
402,Ghost (1990),0,1,1,0,1
215,Field of Dreams (1989),1,0,0,0,0
245,"Devil's Own, The (1997)",1,0,0,1,1
655,Stand by Me (1986),1,1,0,0,0
77,"Firm, The (1993)",1,0,0,0,1
82,Jurassic Park (1993),0,0,0,1,0
318,Schindler's List (1993),1,0,0,0,0
69,Forrest Gump (1994),0,1,1,0,0
97,Dances with Wolves (1990),1,0,0,0,0
125,Phenomenon (1996),1,0,1,0,0


'WMF: Top recommendations'

Unnamed: 0_level_0,Title,Drama,Comedy,Romance,Action,Thriller
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
313,Titanic (1997),1,0,1,1,0
272,Good Will Hunting (1997),1,0,0,0,0
318,Schindler's List (1993),1,0,0,0,0
98,"Silence of the Lambs, The (1991)",1,0,0,0,1
50,Star Wars (1977),0,0,1,1,0
402,Ghost (1990),0,1,1,0,1
66,While You Were Sleeping (1995),0,1,1,0,0
79,"Fugitive, The (1993)",0,0,0,1,1
181,Return of the Jedi (1983),0,0,1,1,0
8,Babe (1995),1,1,0,0,0


Now that we have seen the top recommendations of the BPR and WMF models, let's do a comparison by taking a look at the genre distribution.

##### Comparing Models by Genre Distribution

In [20]:
# Let's introduce `combined_df` for comparison.
# This dataframe will be used to compare models by summing up genres from recommendations of different models
combined_df = pd.DataFrame({
    "Train Data %": filter_df["%"],
    "BPR Sum": bpr_df.select_dtypes(np.number).sum(), # group by genres, then get sum of each genre
    "WMF Sum": wmf_df.select_dtypes(np.number).sum() # likewise for WMF
})

# Get percentages of movie genre sums
combined_df['BPR %'] = combined_df['BPR Sum'] / combined_df['BPR Sum'].sum() * 100 
combined_df["WMF %"] = combined_df["WMF Sum"] / combined_df["WMF Sum"].sum() * 100

combined_df = combined_df.round(1) # round all 
combined_df = combined_df.sort_values("Train Data %", ascending=False)

# Let's take a look at the genre distribution by percentages
display("Train Data to Recommended % Distribution", combined_df[['Train Data %', 'BPR %', 'WMF %']])

'Train Data to Recommended % Distribution'

Unnamed: 0,Train Data %,BPR %,WMF %
Drama,24.9,22.8,22.1
Comedy,13.7,8.9,10.7
Romance,11.2,17.1,17.2
Action,10.5,11.4,10.7
Thriller,10.2,7.3,6.6
Adventure,6.7,6.5,5.7
War,5.3,8.1,8.2
Crime,4.2,0.0,2.5
Sci-Fi,3.2,6.5,5.7
Mystery,2.8,1.6,1.6


Now that we have seen the distribution of individual models, we are curious about what kind of distribution we will get from ensembling these models.

Let's see what happens when we ensemble these two models. 

## 2. Simple model ensembling by Borda Count

We will ensemble the two models using the Borda Count method. The Borda Count method is a simple voting method that ranks the items based on the sum of their ranks from each model.

Assuming that we have a list of 5 items, the Borda Count method works as follows:

1. For each model, rank the items from 1 to 5 based on the predicted scores.
2. Sum the ranks of each item across all models.
3. Sort the items based on the sum of their ranks.
4. The top-ranked item is the final recommendation.
5. Repeat the process for the next user.

Given the below example for a random user 123:

| Rank | Model 1 | Model 2 | Model 3 | Allocated Points (N - rank) |
|------|---------|---------|---------|-----------------------------|
| 1    | A       | D       | E       | 5 - 1 = 4                   |
| 2    | B       | C       | A       | 5 - 2 = 3                   |
| 3    | C       | A       | B       | 5 - 3 = 2                   |
| 4    | D       | B       | D       | 5 - 4 = 1                   |
| 5    | E       | E       | C       | 5 - 5 = 0                   |

Based on the allocated points for each of the items, we sum the points up to get our Borda Count.

| Item | Model 1 | Model 2 | Model 3 | Borda Count  |
|------|---------|---------|---------|--------------|
| A    | 4       | 2       | 3       | 9            |
| B    | 3       | 1       | 2       | 6            |
| C    | 2       | 3       | 0       | 5            |
| D    | 1       | 4       | 1       | 6            |
| E    | 0       | 0       | 4       | 4            |

New ranking: A > B, D > C > E


Lets implement this method below.

In [21]:
# Let's create a new dataframe to calculate ranking and borda count
rank_df = pd.DataFrame({
    "ItemID": item_idx2id,
})

total_items = len(rank_df) # 1651 items

# Obtain points (inverse of rank) of the items based on the BPR score
rank_df["BPR Score"] = bpr_scores
rank_df["BPR Rank"] = rank_df["BPR Score"].rank(ascending=False).astype(int) # Get Rank where 1 = Top recommendation
rank_df["BPR Points"] = total_items - rank_df["BPR Rank"] # Get points by calculating ('Total Item count' - 'Rank')

# Do likewise for WMF
rank_df["WMF Score"] = wmf_scores
rank_df["WMF Rank"] = rank_df["WMF Score"].rank(ascending=False).astype(int) # Get Rank where 1 = Top recommendation
rank_df["WMF Points"] = total_items - rank_df["WMF Rank"] # Get points by calculating ('Total Item count' - 'Rank')

# Get Borda Count by summing up points of BPR and WMF
rank_df["Borda Count"] = rank_df["BPR Points"] + rank_df["WMF Points"]

# Round decimal places for readability purposes
rank_df = rank_df.round(3)

# Now let's take a look at the table with Borda Count 
display(rank_df.head(5))

Unnamed: 0,ItemID,BPR Score,BPR Rank,BPR Points,WMF Score,WMF Rank,WMF Points,Borda Count
0,381,1.449,417,1234,3.501,276,1375,2609
1,602,-1.344,1100,551,2.105,712,939,1490
2,431,1.107,483,1168,3.547,264,1387,2555
3,875,1.08,492,1159,2.224,669,982,2141
4,182,2.22,253,1398,3.208,360,1291,2689


Now that we have Borda Count, let's rerank this list and to provide the ensembled model's recommendation.

In [22]:
# Introduce reranked dataframe for borda count
reranked_df = rank_df.sort_values("Borda Count", ascending=False)

# Let's take a look at the ensembled top 5 recommendations and their respective ranks
display("Re-ranked Top K Item recommendations", reranked_df.head(5))

'Re-ranked Top K Item recommendations'

Unnamed: 0,ItemID,BPR Score,BPR Rank,BPR Points,WMF Score,WMF Rank,WMF Points,Borda Count
279,402,6.134,1,1650,5.281,6,1645,3295
37,318,4.95,7,1644,5.616,3,1648,3292
92,50,4.424,21,1630,5.411,5,1646,3276
305,181,4.464,20,1631,5.13,9,1642,3273
97,97,4.839,9,1642,4.915,20,1631,3273


The top recommendation ItemID **181**, was ranked **7** on BPR and **12** on WMF.

By doing Borda Count, we are able to aggregate model recommendations.

Next, let's add the recommendations into the genre distribution dataframe to compare its results to the base models.

In [23]:
borda_count_topk = reranked_df["ItemID"].values[:TOPK] # Get top K (50) Item IDs

borda_df = item_df.loc[[int(i) for i in borda_count_topk]] # Filter genre data frame by the top item IDs

# Add Borda Count results into 'combined_df' dataframe for comparison
combined_df["Borda Count Sum"] = borda_df.select_dtypes(np.number).sum() # group by genre, and calculate sum of each genre
combined_df["Borda Count %"] = combined_df["Borda Count Sum"] / combined_df["Borda Count Sum"].sum() * 100 # Calculate percentage of sum to total
combined_df["Borda Count %"] = combined_df["Borda Count %"].round(1) # rounding for readability purposes

# Let's take a look at the genre distribution of train data, BPR, WMF and the newly added Borda Count
display("Borda Count Recommendations Distribution", combined_df[["Train Data %", "BPR %", "WMF %", "Borda Count %"]])

'Borda Count Recommendations Distribution'

Unnamed: 0,Train Data %,BPR %,WMF %,Borda Count %
Drama,24.9,22.8,22.1,18.5
Comedy,13.7,8.9,10.7,11.5
Romance,11.2,17.1,17.2,16.9
Action,10.5,11.4,10.7,13.1
Thriller,10.2,7.3,6.6,7.7
Adventure,6.7,6.5,5.7,8.5
War,5.3,8.1,8.2,6.9
Crime,4.2,0.0,2.5,0.0
Sci-Fi,3.2,6.5,5.7,6.2
Mystery,2.8,1.6,1.6,1.5


In the next section, we will further add more models to the ensemble.

## 3. Adding more models to the Borda Count ensemble

We can easily add more models to the ensemble by training them and adding them. One approach is to train a model with different initializations using different random seeds. By adding multiple similar models of different random seeds (`seed=123`), some models could perform better for a set of users, while other models could perform better for another set of users.

Another way is to change the number of latent factors `k`. By achieving better performance on a subset of data for each of the models, we are able to improve the performance of the ensembled model as a whole.

By ensembling these models, we could potentially achieve better performance when combined.

Let's try adding a few more similar models with different random seed initializations.

In [24]:
# WMF models with different seeds
wmf_model_123 = WMF(name="WMF_123", k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123)
wmf_model_456 = WMF(name="WMF_456", k=20, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=456)
wmf_model_789 = WMF(name="WMF_789", k=30, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=789)
wmf_model_888 = WMF(name="WMF_888", k=40, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=888)
wmf_model_999 = WMF(name="WMF_999", k=50, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=999)

models = [wmf_model_123, wmf_model_456, wmf_model_789, wmf_model_888, wmf_model_999]

# Let's run an experiment to take a look at how different these models are, with just different random seeds!
experiment = Experiment(rs, models, metrics, user_based=True).run()


[WMF_123] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_123] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF_456] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_456] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF_789] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_789] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF_888] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_888] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF_999] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_999] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


TEST:
...
        | Precision@50 | Recall@50 | Train (s) | Test (s)
------- + ------------ + --------- + --------- + --------
WMF_123 |       0.1750 |    0.5103 |   15.7512 |   0.4023
WMF_456 |       0.1779 |    0.5217 |   16.6532 |   0.3236
WMF_789 |       0.1737 |    0.5130 |   17.0526 |   0.3801
WMF_888 |       0.1705 |    0.5032 |   18.1542 |   0.3958
WMF_999 |       0.1656 |    0.4938 |   17.8338 |   0.3317



Based on the results, we can see that even within the same model, the results can vary. 

Let's try ensembling all these models together into 1 single model by Borda Count, and look at its recommendations.

In [25]:
# Let's create a different dataframe to calculate ranking and borda count
rank_2_df = pd.DataFrame({
    "ItemID": item_idx2id,
})

# Add a column named 'Ensembled WMF Model'
rank_2_df["Ensembled WMF Model"] = 0

# Calculate the points (inverse of rank) for each of the models and accumulate them into the 'Enhanced Borda Count' column
for model in models:
    name = model.name
    recommendations, scores = model.rank(UIDX)
    rank_2_df[name + "_rating"] = scores
    rank_2_df[name + "_rank"] = rank_2_df[name + "_rating"].rank(ascending=False).astype(int)
    rank_2_df[name + "_points"] = total_items - rank_2_df[name + "_rank"]
    rank_2_df["Ensembled WMF Model"] = rank_2_df["Ensembled WMF Model"] + rank_2_df[name + "_points"]

# Round results for readability
rank_2_df = rank_2_df.round(3)

# Let's sort and view the top recommendations!
print("Model score calculation:")
display(rank_2_df[["Ensembled WMF Model"]].sort_values("Ensembled WMF Model", ascending=False).head(10))

Model score calculation:


Unnamed: 0,Ensembled WMF Model
37,8229
197,8179
152,8151
279,8127
532,8118
156,8117
147,8109
522,8085
61,8070
670,8065


In [26]:
# Now, let's add them to the combined dataframe for comparison with earlier models
enhanced_borda_count_topk = list(rank_2_df.sort_values("Ensembled WMF Model", ascending=False)["ItemID"].values[:TOPK])
enhanced_borda_df = item_df.loc[[int(i) for i in enhanced_borda_count_topk]]

combined_df["Ensembled WMF Model Sum"] = enhanced_borda_df.select_dtypes(np.number).sum()
combined_df["Ensembled WMF Model %"] = combined_df["Ensembled WMF Model Sum"] / combined_df["Ensembled WMF Model Sum"].sum() * 100
combined_df["Ensembled WMF Model %"] = combined_df["Ensembled WMF Model %"].round(1)

# Let's compare the recommendation distribution
display("Combined Recommendations Distribution", combined_df[["Train Data %", "BPR %", "WMF %", "Borda Count %", "Ensembled WMF Model %"]])

'Combined Recommendations Distribution'

Unnamed: 0,Train Data %,BPR %,WMF %,Borda Count %,Ensembled WMF Model %
Drama,24.9,22.8,22.1,18.5,23.9
Comedy,13.7,8.9,10.7,11.5,13.7
Romance,11.2,17.1,17.2,16.9,16.2
Action,10.5,11.4,10.7,13.1,10.3
Thriller,10.2,7.3,6.6,7.7,6.8
Adventure,6.7,6.5,5.7,8.5,7.7
War,5.3,8.1,8.2,6.9,5.1
Crime,4.2,0.0,2.5,0.0,5.1
Sci-Fi,3.2,6.5,5.7,6.2,5.1
Mystery,2.8,1.6,1.6,1.5,3.4


Now that we have touched on simple borda count, let's see how we could use other methods and popular packages such as **scikit-learn** to do advanced model ensembling.

## 4. Model Ensembling via Regression Models

We could continue by thinking of this as a meta-learning problem. We could treat recommendations of each base model as features and train a meta-learner to predict the final recommendation.

This could be any ML model such as a Linear Regression, Random Forest, Gradient Boosting, or even a Neural Network.

In this example, we will use a simple Linear Regression model to predict the final recommendation.

We will teach a model to learn from the different outputs of the base models of WMF.

##### 4.1 Prepare Data

In [27]:
# First, lets create training and test data dataframes
training_df = pd.DataFrame(zip(*train_set.uir_tuple)) # Add 'User Index', 'Item Index', 'Rating' triples as records in dataframe
training_df.columns = ['user_idx', 'item_idx', 'ground_score'] # Set column names

# Get all possible user_index, item_index combinations, add them into dataframe for inference
all_df = pd.DataFrame({
    "user_idx": [user_idx for user_idx in range(train_set.num_users) for _ in range(train_set.num_items)],
    "item_idx": [item_idx for _ in range(train_set.num_users) for item_idx in range(train_set.num_items)],
})
all_df['item_id'] = all_df.apply(lambda row: item_idx2id[int(row['item_idx'])], axis=1) # Add 'Item ID' column into dataframe by converting 'Item Index' to 'Item ID'

# Lets get all the scores for the models trained in Part 3.

# For each model, we add individual predicted ratings by individual models to training and test dataframes
for model in models:
    name = model.name
    training_df[name + "_score"] = training_df.apply(lambda row: model.score(int(row['user_idx']), int(row['item_idx'])), axis=1) # for training
    all_df[name + "_score"] = all_df.apply(lambda row: model.score(int(row['user_idx']), int(row['item_idx'])), axis=1) # for inference

# Let's pick out the 5 features - predicted ratings from the 5 models trained
X_train = training_df[['WMF_123_score', 'WMF_456_score', 'WMF_789_score', 'WMF_888_score', 'WMF_999_score']]
y_train = training_df['ground_score'] # use ground truth to train this linear regression model
X_inference = all_df[['WMF_123_score', 'WMF_456_score', 'WMF_789_score', 'WMF_888_score', 'WMF_999_score']] # all data, used to predict values for ranking

display("Training features", X_train.head(3)) # predicting ratings as features
display("Target values", y_train.head(3)) # ground truth ratings
display("Inference Data", X_inference.head(3)) # all inference data

'Training features'

Unnamed: 0,WMF_123_score,WMF_456_score,WMF_789_score,WMF_888_score,WMF_999_score
0,2.110029,3.753359,3.692867,3.765342,3.971787
1,2.791619,2.339686,2.41794,2.736782,3.102277
2,3.751,3.568215,3.424445,3.411396,3.831695


'Target values'

0    4.0
1    3.0
2    4.0
Name: ground_score, dtype: float64

'Inference Data'

Unnamed: 0,WMF_123_score,WMF_456_score,WMF_789_score,WMF_888_score,WMF_999_score
0,2.110029,3.753359,3.692867,3.765342,3.971787
1,0.807322,-0.628235,-0.064411,-0.290574,-0.586965
2,1.648435,1.846543,1.81588,1.876733,1.527234


Now that we have already prepared the data for fitting into a **scikit-learn** model, let's first try to train a Linear Regression model

##### 4.2 Fitting Linear Regression Model

In [28]:
# Let's now fit into a Linear Regression model
regr = linear_model.LinearRegression(fit_intercept=False) # force model to only use predictions from WMF models
regr.fit(X_train, y_train) # train the model

# Input: 5 base model predicted ratings. Output: final predicted rating based on linear regression
y_pred = regr.predict(X_inference) # Get predictions based on trained model

all_df["Linear Regression Prediction"] = y_pred # create a column in `test_df` for the predictions

# Get Top K ratings from predictions
all_df = all_df.sort_values("Linear Regression Prediction", ascending=False) # sort by predicted ratings
top_item_ids = all_df[all_df['user_idx'] == UIDX]['item_id'].values[:TOPK] # filter top K (50 as set in Section 2.3)

# Place them into the comparison distribution dataframe
linear_regression_df = item_df.loc[[int(i) for i in top_item_ids]] # Get genres of ratings
combined_df["Linear Regression Sum"] = linear_regression_df.select_dtypes(np.number).sum() # group by genre and sum them up
combined_df["Linear Regression %"] = combined_df["Linear Regression Sum"] / combined_df["Linear Regression Sum"].sum() * 100 # get percentages of (genre sum / whole sum)

combined_df["Linear Regression %"] = combined_df["Linear Regression %"].round(1) # round values for readability

print(regr.coef_)
print(regr.intercept_)

# Now let's take a look at how the genre distribution is
display("Combined Recommendations Distribution", combined_df[["Train Data %", "BPR %", "WMF %", "Borda Count %", "Ensembled WMF Model %", "Linear Regression %"]])

[-0.18551166 -0.13749951  0.1434352   0.43128878  0.86336553]
0.0


'Combined Recommendations Distribution'

Unnamed: 0,Train Data %,BPR %,WMF %,Borda Count %,Ensembled WMF Model %,Linear Regression %
Drama,24.9,22.8,22.1,18.5,23.9,26.4
Comedy,13.7,8.9,10.7,11.5,13.7,7.3
Romance,11.2,17.1,17.2,16.9,16.2,14.5
Action,10.5,11.4,10.7,13.1,10.3,7.3
Thriller,10.2,7.3,6.6,7.7,6.8,9.1
Adventure,6.7,6.5,5.7,8.5,7.7,8.2
War,5.3,8.1,8.2,6.9,5.1,4.5
Crime,4.2,0.0,2.5,0.0,5.1,4.5
Sci-Fi,3.2,6.5,5.7,6.2,5.1,6.4
Mystery,2.8,1.6,1.6,1.5,3.4,1.8


Explanation, interpretation. Last model contributes the most >>

We have successfully fit into a Linear Regression model using the 5 WMF base models built on Cornac.

You could continue using different regression models on **scikit-learn** to further predict ratings.

In the section below, we will train a Random Forest model.

##### 4.3 Fitting Random Forest Model

We reuse the same training data. >> Technically can use any machine learning model.


In [29]:
# Let's now train a Random Forest model
randomforest_model = RandomForestRegressor(n_estimators=50, random_state=42) 
randomforest_model.fit(X_train, y_train) # Train the model

# Input: 5 base model predicted ratings. Output: final predicted rating based on random forest
y_pred = randomforest_model.predict(X_inference)

all_df["Random Forest Score"] = y_pred # create a column in `all_df` for the predictions

# Get Top K ratings from predictions
all_df = all_df.sort_values("Random Forest Score", ascending=False) # sort by predicted ratings
top_item_ids = all_df[all_df['user_idx'] == UIDX]['item_id'].values[:TOPK] # filter top K (50 as set in Section 2.3)

# Place them into the comparison distribution dataframe
random_forest_df = item_df.loc[[int(i) for i in top_item_ids]] # Get genres of ratings
combined_df["Random Forest Sum"] = random_forest_df.select_dtypes(np.number).sum() # group by genre and sum them up
combined_df["Random Forest %"] = combined_df["Random Forest Sum"] / combined_df["Random Forest Sum"].sum() * 100 # get percentages of (genre sum / whole sum)

combined_df["Random Forest %"] = combined_df["Random Forest %"].round(1) # round values for readability

# Now let's take a look at how the genre distribution is
display("Combined Recommendations Distribution", combined_df[["Train Data %", "BPR %", "WMF %", "Borda Count %", "Ensembled WMF Model %", "Linear Regression %", "Random Forest %"]])

'Combined Recommendations Distribution'

Unnamed: 0,Train Data %,BPR %,WMF %,Borda Count %,Ensembled WMF Model %,Linear Regression %,Random Forest %
Drama,24.9,22.8,22.1,18.5,23.9,26.4,27.9
Comedy,13.7,8.9,10.7,11.5,13.7,7.3,17.4
Romance,11.2,17.1,17.2,16.9,16.2,14.5,9.3
Action,10.5,11.4,10.7,13.1,10.3,7.3,5.8
Thriller,10.2,7.3,6.6,7.7,6.8,9.1,9.3
Adventure,6.7,6.5,5.7,8.5,7.7,8.2,2.3
War,5.3,8.1,8.2,6.9,5.1,4.5,2.3
Crime,4.2,0.0,2.5,0.0,5.1,4.5,4.7
Sci-Fi,3.2,6.5,5.7,6.2,5.1,6.4,3.5
Mystery,2.8,1.6,1.6,1.5,3.4,1.8,2.3


What happened in Section 4 was that we trained a linear regression model and random forest model to learn from the different base models, adapting to the changes in features based on how these base models are configured to be.

---

Ensemble learning doesn't stop here. You could continue on by: 
- Trying different Cornac base models
- Trying different ensemble methods
- Tweaking around base models and different parameters within them

By having different base models with different specializations, model ensembling could leverage on the strengths of different models. Effectiveness of these models could depend on many factors such as diversity and quality of base models, dataset size and quality.

It is also important to note that while model ensembling theoretically provides superior performance, there could be instances where base models outperforms ensembled models. Model ensembling also requires more computational resources as well. Therefore, we should consider striking a balance between performanace and computational costs.

---

So what constitutes to a good ensembled model? Which base models and configurations are ideal? These are topics that require further experimentation and discussion.

## 5. Experimental Comparison

Let's add the base models (BPR, WMF), Borda count, Enhanced WMF model into `all_df` so that we can compare them with the ranking metrics set.

In [32]:
additional_models = [bpr_model, wmf_model]

for model in additional_models:
    name = model.name
    all_df[name + "_score"] = all_df.apply(lambda row: model.score(int(row['user_idx']), int(row['item_idx'])), axis=1) # for inference

# TODO: Add borda acount, enhanced WMF. 
# Will need to calculate scores of all users, group users then calculate for borda count. Likewise for enhanced WMF.

# Let's create a new dataframe to calculate ranking and borda count
# rank_df = pd.DataFrame({
#     "ItemID": item_idx2id,
# })

# total_items = len(rank_df) # 1651 items

# # Obtain points (inverse of rank) of the items based on the BPR score
# rank_df["bpr_rank"] = rank_df["bpr_score"].rank(ascending=False).astype(int) # Get Rank where 1 = Top recommendation
# rank_df["bpr_points"] = total_items - rank_df["bpr_rank"] # Get points by calculating ('Total Item count' - 'Rank')

# # Do likewise for WMF
# rank_df["WMF Score"] = wmf_scores
# rank_df["WMF Rank"] = rank_df["WMF Score"].rank(ascending=False).astype(int) # Get Rank where 1 = Top recommendation
# rank_df["WMF Points"] = total_items - rank_df["WMF Rank"] # Get points by calculating ('Total Item count' - 'Rank')

# # Get Borda Count by summing up points of BPR and WMF
# rank_df["Borda Count"] = rank_df["BPR Points"] + rank_df["WMF Points"]

In [33]:
display(all_df)

Unnamed: 0,user_idx,item_idx,item_id,WMF_123_score,WMF_456_score,WMF_789_score,WMF_888_score,WMF_999_score,Linear Regression Prediction,Random Forest Score,BPR_score,WMF_score
643913,390,23,483,2.208240,2.517696,1.391984,3.053278,0.169986,0.907428,5.0,2.366010,2.411263
543197,329,18,476,1.387807,1.666423,0.679431,0.793690,2.098384,1.764850,5.0,2.266626,1.648090
1282216,776,1040,1124,0.184390,0.188445,-0.223202,0.487751,0.107284,0.210854,5.0,-3.095628,-0.108440
890212,539,323,685,1.775543,2.051972,1.685233,1.718006,-0.277654,0.131432,5.0,0.950145,2.458714
76270,46,324,114,3.297584,3.950196,4.424212,4.411396,4.487567,5.256694,5.0,2.188005,3.244331
...,...,...,...,...,...,...,...,...,...,...,...,...
740687,448,1039,1151,0.059736,0.315577,0.080908,0.112570,0.250354,0.221829,1.0,-0.606589,0.125592
1400741,848,693,1419,-0.234532,-0.238371,-0.003141,-0.294826,0.316377,0.221828,1.0,-1.835539,-0.180037
1325261,802,1159,36,-0.177930,-0.211046,-0.127962,0.125551,-0.293786,-0.155824,1.0,-1.160817,-0.174078
392605,237,1318,1270,0.270837,0.025521,0.042530,0.104559,-0.177521,-0.155822,1.0,-1.078269,0.043804


Now, let's calculate the metrics and commpare them!

We have compared models in part 3. Let's 

In [41]:
models = ["WMF_123_score", "WMF_456_score", "WMF_789_score", "WMF_888_score", "WMF_999_score", "Linear Regression Prediction", "Random Forest Score"]
result_df = {
    "Model": models,
    f"Precision@{TOPK}": [],
    f"Recall@{TOPK}": []
}
test_users = set(test_set.uir_tuple[0])
for model in tqdm(models):
    all_df = all_df.sort_values(model_col, ascending=False) # sort by predicted ratings
    predicted_ids = [all_df[all_df['user_idx'] == uidx]['item_idx'].values[:TOPK].astype(int) for uidx in range(train_set.num_users)]
    precisions, recalls = [], []
    
    for uidx in test_users:
        true_top_k = test_set.user_data[uidx][0] # ground truth data
        predicted_top_k = predicted_ids[uidx].tolist() # predicted ranking data
        # precision@K
        precision = len(set(true_top_k) & set(predicted_top_k)) / len(predicted_top_k)
        precisions.append(precision)
        #recall@K
        recall = len(set(true_top_k) & set(predicted_top_k)) / len(true_top_k)
        recalls.append(recall)

    result_df[f"Precision@{TOPK}"].append(np.mean(precisions))
    result_df[f"Recall@{TOPK}"].append(np.mean(recalls))

display(pd.DataFrame(result_df))

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:12<00:00,  1.82s/it]


Unnamed: 0,Model,Precision@50,Recall@50
0,WMF_123_score,0.020638,0.04921
1,WMF_456_score,0.020702,0.049416
2,WMF_789_score,0.020638,0.04921
3,WMF_888_score,0.020702,0.049416
4,WMF_999_score,0.020638,0.04921
5,Linear Regression Prediction,0.020702,0.049416
6,Random Forest Score,0.020638,0.04921


In [40]:
# TODO: Let's add some words to describe the results and conclude the tutorial