*Copyright (c) Cornac Authors. All rights reserved.*

*Licensed under the Apache 2.0 License.*

# Model Ensembling

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/PreferredAI/cornac/blob/master/tutorials/model_ensembling.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/PreferredAI/cornac/blob/master/tutorials/model_ensembling.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

This notebook provides an example of how to ensemble multiple recommendation models in Cornac.

Ensemble models is a technique that combines the predictions of multiple models to produce a single prediction. The idea is that by combining the predictions of multiple models, we can improve the overall performance of the recommendation system.

We will use the MovieLens 100K dataset and ensemble 2 models.

** Note: ** This notebook requires the `scikit-learn` package. 

## 1. Setup

### Install required dependencies

In [1]:
! pip install seaborn scikit-learn

Collecting seaborn
  Downloading seaborn-0.13.2-py3-none-any.whl (294 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.9/294.9 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: seaborn
Successfully installed seaborn-0.13.2


In [4]:
import numpy as np
import pandas as pd
import seaborn as sns

from cornac.datasets import movielens
from cornac.data import Dataset
from cornac.models import BPR, WMF
from cornac.eval_methods import RatioSplit
from cornac.metrics import Precision, Recall
from cornac.utils import cache
from cornac import Experiment

from sklearn.ensemble import BaggingRegressor, RandomForestRegressor, AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor

## 2. Prepare Experiment

### Loading Dataset

First, we load the MovieLens 100K dataset.

In [5]:
data = movielens.load_feedback(variant="100K") # load dataset
# dataset = Dataset.from_uir(data) # convert to Dataset object

rs = RatioSplit(data, test_size=0.2, seed=42, verbose=True)
train_set, test_set = rs.train_set, rs.test_set

rating_threshold = 1.0
exclude_unknowns = True
---
Training data:
Number of users = 943
Number of items = 1651
Number of ratings = 80000
Max rating = 5.0
Min rating = 1.0
Global mean = 3.5
---
Test data:
Number of users = 943
Number of items = 1651
Number of ratings = 19964
Number of unknown users = 0
Number of unknown items = 0
---
Total users = 943
Total items = 1651


### Training BPR and WMF models

We will train two models: 

1. BPR (Bayesian Personalized Ranking)
2. WMF (Weighted Matrix Factorization)

In [6]:
# Train BPR model
bpr_model = BPR(k=10, max_iter=100, learning_rate=0.01, lambda_reg=0.001)
# Train WMF model
wmf_model = WMF(k=10, max_iter=100, a=1.0, b=0.01, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01,)

models = [bpr_model, wmf_model]
metrics = [Precision(k=50), Recall(k=50)]

experiment = Experiment(rs, models, metrics, user_based=True).run()


[BPR] Training started!

[BPR] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 989.73it/s] 



[WMF] Training started!


100%|██████████| 100/100 [00:06<00:00, 15.91it/s, loss=96.1]


Learning completed!

[WMF] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 1350.61it/s]


TEST:
...
    | Precision@50 | Recall@50 | Train (s) | Test (s)
--- + ------------ + --------- + --------- + --------
BPR |       0.1803 |    0.5048 |    4.3437 |   0.9610
WMF |       0.0825 |    0.2055 |  220.7847 |   0.7019






Preliminarily through the above results, we can see that the BPR model generally performs better.

In [54]:
# X_train, y_train = list(zip(train_set.uir_tuple[0], train_set.uir_tuple[1])), train_set.uir_tuple[2]
# X_test, y_test = list(zip(test_set.uir_tuple[0], test_set.uir_tuple[1])), test_set.uir_tuple[2]

# bpr_scores = [bpr_model.score(uidx)[iidx] for uidx, iidx in list(X_test)]
# wmf_scores = [wmf_model.score(uidx)[iidx] for uidx, iidx in X_test]

# df = pd.DataFrame({'user': test_set.uir_tuple[0], 'item': test_set.uir_tuple[1], 'groundtruth rating': test_set.uir_tuple[2], 'bpr_rating': bpr_scores, 'wmf_rating': wmf_scores})
# df.head()


### Interpreting Results

In [7]:
# Download some information of MovieLens 100K dataset
item_df = pd.read_csv(
  cache("http://files.grouplens.org/datasets/movielens/ml-100k/u.item"), 
  sep="|", encoding="ISO-8859-1",
  names=["ItemID", "Title", "Release Date", "Video Release Date", "IMDb URL", 
         "unknown", "Action", "Adventure", "Animation", "Children's", "Comedy", 
         "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror", 
         "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"]
).set_index("ItemID").drop(columns=["Video Release Date", "IMDb URL", "unknown"])
     

In [8]:
from IPython.display import display
UIDX = 0
TOPK = 10

item_idx2id = list(train_set.item_ids)

bpr_recommendations, bpr_scores = bpr_model.rank(UIDX)
wmf_recommendations, wmf_scores = wmf_model.rank(UIDX)

# Top K recommended items for each model
bpr_topk = [item_idx2id[iidx] for iidx in bpr_recommendations[:TOPK]]
wmf_topk = [item_idx2id[iidx] for iidx in wmf_recommendations[:TOPK]]

print("BPR Top K Item IDs:", bpr_topk)
print("WMF Top K Item IDs:", wmf_topk)

print("\nTop K BPR Item recommendations with Movie Genre")
display(item_df.loc[[int(iid) for iid in bpr_topk]])

print("\nTop K WMF Item recommendations with Movie Genre")
display(item_df.loc[[int(iid) for iid in wmf_topk]])

BPR Top K Item IDs: ['269', '50', '258', '313', '300', '174', '172', '79', '181', '97']
WMF Top K Item IDs: ['272', '12', '515', '408', '174', '302', '357', '169', '603', '427']

Top K BPR Item recommendations with Movie Genre


Unnamed: 0_level_0,Title,Release Date,Action,Adventure,Animation,Children's,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
269,"Full Monty, The (1997)",01-Jan-1997,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
50,Star Wars (1977),01-Jan-1977,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0
258,Contact (1997),11-Jul-1997,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0
313,Titanic (1997),01-Jan-1997,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0
300,Air Force One (1997),01-Jan-1997,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
174,Raiders of the Lost Ark (1981),01-Jan-1981,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
172,"Empire Strikes Back, The (1980)",01-Jan-1980,1,1,0,0,0,0,0,1,0,0,0,0,0,1,1,0,1,0
79,"Fugitive, The (1993)",01-Jan-1993,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
181,Return of the Jedi (1983),14-Mar-1997,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0
97,Dances with Wolves (1990),01-Jan-1990,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1



Top K WMF Item recommendations with Movie Genre


Unnamed: 0_level_0,Title,Release Date,Action,Adventure,Animation,Children's,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
272,Good Will Hunting (1997),01-Jan-1997,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
12,"Usual Suspects, The (1995)",14-Aug-1995,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
515,"Boot, Das (1981)",04-Apr-1997,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0
408,"Close Shave, A (1995)",28-Apr-1996,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0
174,Raiders of the Lost Ark (1981),01-Jan-1981,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
302,L.A. Confidential (1997),01-Jan-1997,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0
357,One Flew Over the Cuckoo's Nest (1975),01-Jan-1975,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
169,"Wrong Trousers, The (1993)",01-Jan-1993,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
603,Rear Window (1954),01-Jan-1954,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0
427,To Kill a Mockingbird (1962),01-Jan-1962,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


It seems that the BPR model tend to favour Drama and Romance genres, while the WMF model tend to recommend mainly dramas.

Let's see what happens when we ensemble these two models. 

## 2. Simple model ensembling by Borda Count

We will ensemble the two models using the Borda Count method. The Borda Count method is a simple voting method that ranks the items based on the sum of their ranks from each model.

Assuming that we have a list of 5 items, the Borda Count method works as follows:

1. For each model, rank the items from 1 to 5 based on the predicted scores.
2. Sum the ranks of each item across all models.
3. Sort the items based on the sum of their ranks.
4. The top-ranked item is the final recommendation.
5. Repeat the process for the next user.

Given the below example for a random user 123:

| Rank | Model 1 | Model 2 | Model 3 | Allocated Points (N - rank) |
|------|---------|---------|---------|-----------------------------|
| 1    | A       | D       | E       | 5 - 1 = 4                   |
| 2    | B       | C       | A       | 5 - 2 = 3                   |
| 3    | C       | A       | B       | 5 - 3 = 2                   |
| 4    | D       | B       | D       | 5 - 4 = 1                   |
| 5    | E       | E       | C       | 5 - 5 = 0                   |

The final ranking is as follows after summing them all up

| Item | Total Points     |
|------|------------------|
| A    | 4 + 2 + 3 = 9    |
| B    | 3 + 1 + 2 = 6    |
| D    | 1 + 4 + 1 = 6    |
| C    | 2 + 3 + 0 = 5    |
| E    | 0 + 0 + 4 = 4    |

New ranking: A > B, D > C > E


Lets implement this method below by using pandas `DataFrame` for data manipulation.

In [9]:
rank_df = pd.DataFrame({
    "ItemID": item_idx2id,
    "BPR Score": bpr_scores,
    "WMF Score": wmf_scores
})

# Obtain ranks of the items based on the scores
rank_df["BPR Rank"] = rank_df["BPR Score"].rank(ascending=False)
rank_df["WMF Rank"] = rank_df["WMF Score"].rank(ascending=False)

total_items = len(rank_df) # 1651 items

# Get Borda Points for each of the models. Borda points are calculated as (total items - rank)
rank_df["BPR Borda Points"] = total_items - rank_df["BPR Rank"]
rank_df["WMF Borda Points"] = total_items - rank_df["WMF Rank"]

rank_df["Total Points"] = rank_df["BPR Borda Points"] + rank_df["WMF Borda Points"]

display(rank_df)

Unnamed: 0,ItemID,BPR Score,WMF Score,BPR Rank,WMF Rank,BPR Borda Points,WMF Borda Points,Total Points
0,381,2.716305,3.143413,68.0,336.0,1583.0,1315.0,2898.0
1,602,0.227643,2.960208,686.0,425.0,965.0,1226.0,2191.0
2,431,1.543726,3.211110,274.0,299.0,1377.0,1352.0,2729.0
3,875,0.849149,2.327704,457.0,734.0,1194.0,917.0,2111.0
4,182,2.316206,3.674445,129.0,81.0,1522.0,1570.0,3092.0
...,...,...,...,...,...,...,...,...
1646,1635,-1.732677,0.199373,1415.0,1559.0,236.0,92.0,328.0
1647,1650,-1.801109,0.287944,1451.0,1523.0,200.0,128.0,328.0
1648,1647,-1.691469,0.194078,1402.0,1564.0,249.0,87.0,336.0
1649,1663,-2.168174,0.041730,1573.0,1601.0,78.0,50.0,128.0


Now that we have a joint score, let's rerank this list and to provide the ensembled model's recommendation.

In [10]:
reranked_df = rank_df.sort_values("Total Points", ascending=False)

print("Re-ranked Top K Item recommendations")
display(reranked_df)

borda_count_topk = reranked_df["ItemID"].values[:TOPK]
print("\nTop K Ensembled Item recommendations")
print(borda_count_topk)

print("\nBorda Count recommendations with Movie Genre")
item_df.loc[[int(i) for i in borda_count_topk]]

Re-ranked Top K Item recommendations


Unnamed: 0,ItemID,BPR Score,WMF Score,BPR Rank,WMF Rank,BPR Borda Points,WMF Borda Points,Total Points
386,174,3.679183,4.035673,6.0,5.0,1645.0,1646.0,3291.0
83,269,4.178540,3.935441,1.0,13.0,1650.0,1638.0,3288.0
298,302,3.467365,4.032831,13.0,6.0,1638.0,1645.0,3283.0
152,313,3.912704,3.895014,4.0,21.0,1647.0,1630.0,3277.0
267,127,3.417382,3.935476,14.0,12.0,1637.0,1639.0,3276.0
...,...,...,...,...,...,...,...,...
669,1320,-2.456063,-0.104311,1634.0,1649.0,17.0,2.0,19.0
1166,1352,-2.524746,-0.092680,1646.0,1646.0,5.0,5.0,10.0
1620,1363,-2.490123,-0.105886,1642.0,1651.0,9.0,0.0,9.0
1347,1349,-2.509151,-0.097618,1645.0,1648.0,6.0,3.0,9.0



Top K Ensembled Item recommendations
['174' '269' '302' '313' '127' '272' '300' '275' '258' '56']

Borda Count recommendations with Movie Genre


Unnamed: 0_level_0,Title,Release Date,Action,Adventure,Animation,Children's,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
174,Raiders of the Lost Ark (1981),01-Jan-1981,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
269,"Full Monty, The (1997)",01-Jan-1997,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
302,L.A. Confidential (1997),01-Jan-1997,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0
313,Titanic (1997),01-Jan-1997,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0
127,"Godfather, The (1972)",01-Jan-1972,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0
272,Good Will Hunting (1997),01-Jan-1997,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
300,Air Force One (1997),01-Jan-1997,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
275,Sense and Sensibility (1995),01-Jan-1995,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0
258,Contact (1997),11-Jul-1997,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0
56,Pulp Fiction (1994),01-Jan-1994,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0


Now, it seems that the ensembled model is able to provide a more diverse set of recommendations.

In the next section, we will see how we could further add more models to the ensemble.

## 3. Adding more models to the Borda Count ensemble

We can easily add more models to the ensemble by training them and adding them. Another approach is to train a similar model with a different set of hyperparameters. By adding multiple similar models of different random seeds (`seed=123`), some models would outperform others due to the nature of random initialization of items during the training phase.

By ensembling these models, we could potentially achieve better performance.

Let's try adding a few more similar models with different random seed initializations.

In [None]:
# BPR models with different seeds
bpr_seed_123 = BPR(name="BPR_123", k=10, max_iter=100, learning_rate=0.01, lambda_reg=0.001, seed=123)
bpr_seed_456 = BPR(name="BPR_456", k=10, max_iter=100, learning_rate=0.01, lambda_reg=0.001, seed=456)
bpr_seed_789 = BPR(name="BPR_789", k=10, max_iter=100, learning_rate=0.01, lambda_reg=0.001, seed=789)
bpr_seed_888 = BPR(name="BPR_888", k=10, max_iter=100, learning_rate=0.01, lambda_reg=0.001, seed=888)
bpr_seed_999 = BPR(name="BPR_999", k=10, max_iter=100, learning_rate=0.01, lambda_reg=0.001, seed=999)

# WMF models with different seeds
wmf_model_123 = WMF(name="WMF_123", k=10, max_iter=100, a=1.0, b=0.01, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123)
wmf_model_456 = WMF(name="WMF_456", k=10, max_iter=100, a=1.0, b=0.01, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=456)
wmf_model_789 = WMF(name="WMF_789", k=10, max_iter=100, a=1.0, b=0.01, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=789)
wmf_model_888 = WMF(name="WMF_888", k=10, max_iter=100, a=1.0, b=0.01, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=888)
wmf_model_999 = WMF(name="WMF_999", k=10, max_iter=100, a=1.0, b=0.01, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=999)

models = [bpr_seed_123, bpr_seed_456, bpr_seed_789, bpr_seed_888, bpr_seed_999, wmf_model_123, wmf_model_456, wmf_model_789, wmf_model_888, wmf_model_999]

experiment = Experiment(rs, models, metrics, user_based=True).run()


[BPR_123] Training started!

[BPR_123] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 979.75it/s] 



[BPR_456] Training started!

[BPR_456] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 954.29it/s] 



[BPR_789] Training started!

[BPR_789] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 1520.04it/s]



[BPR_888] Training started!

[BPR_888] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 1407.69it/s]



[BPR_999] Training started!

[BPR_999] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 1452.92it/s]



[WMF_123] Training started!


100%|██████████| 100/100 [00:05<00:00, 19.63it/s, loss=95.8]


Learning completed!

[WMF_123] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 1329.26it/s]



[WMF_456] Training started!


100%|██████████| 100/100 [00:06<00:00, 15.42it/s, loss=96.1]


Learning completed!

[WMF_456] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:01<00:00, 665.47it/s]



[WMF_789] Training started!


100%|██████████| 100/100 [00:06<00:00, 15.69it/s, loss=93.3]


Learning completed!

[WMF_789] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 2082.75it/s]



[WMF_888] Training started!


100%|██████████| 100/100 [00:05<00:00, 17.35it/s, loss=95] 


Learning completed!

[WMF_888] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 2146.91it/s]



[WMF_999] Training started!


100%|██████████| 100/100 [00:06<00:00, 15.68it/s, loss=90.9]


Learning completed!

[WMF_999] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 1378.48it/s]


TEST:
...
        | Precision@50 | Recall@50 | Train (s) | Test (s)
------- + ------------ + --------- + --------- + --------
BPR_123 |       0.1824 |    0.5099 |    1.4093 |   0.9642
BPR_456 |       0.1811 |    0.5078 |    1.3285 |   0.9907
BPR_789 |       0.1822 |    0.5120 |    1.0710 |   0.6231
BPR_888 |       0.1777 |    0.4954 |    1.1011 |   0.6719
BPR_999 |       0.1805 |    0.5056 |    1.1037 |   0.6513
WMF_123 |       0.0840 |    0.2010 |    5.2778 |   0.7110
WMF_456 |       0.0836 |    0.2062 |    6.6307 |   1.4185
WMF_789 |       0.0937 |    0.2619 |    6.6275 |   0.4563
WMF_888 |       0.0875 |    0.2236 |    5.9015 |   0.4419
WMF_999 |       0.1089 |    0.3129 |    6.5083 |   0.6875






Based on the results, we can see that even within the same model, the results can vary. 

Let's try ensembling all these models together into 1 single model by Borda Count, and look at its recommendations.

In [None]:
rank_2_df = pd.DataFrame({
    "ItemID": item_idx2id,
})

rank_2_df["Total Points"] = 0

for model in models:
    name = model.name
    recommendations, scores = model.rank(UIDX)
    rank_2_df[name + " Rating"] = scores
    rank_2_df[name + " Rank"] = rank_2_df[name + " Rating"].rank(ascending=False)
    rank_2_df[name + " Points"] = total_items - rank_2_df[name + " Rank"]
    rank_2_df["Total Points"] = rank_2_df["Total Points"] + rank_2_df[name + " Points"]
    
print("Model score calculation:")
display(rank_2_df.head(10))

print("\nRe-ranked Top K Item recommendations")
borda_count_2_topk = list(rank_2_df.sort_values("Total Points", ascending=False)["ItemID"].values[:TOPK])
print(borda_count_2_topk)

print("\nBorda Count recommendations with Movie Genre")
display(item_df.loc[[int(i) for i in borda_count_2_topk]])

Model score calculation:


Unnamed: 0,ItemID,Total Points,BPR_123 Rating,BPR_123 Rank,BPR_123 Points,BPR_456 Rating,BPR_456 Rank,BPR_456 Points,BPR_789 Rating,BPR_789 Rank,...,WMF_456 Points,WMF_789 Rating,WMF_789 Rank,WMF_789 Points,WMF_888 Rating,WMF_888 Rank,WMF_888 Points,WMF_999 Rating,WMF_999 Rank,WMF_999 Points
0,381,14350.0,2.265398,157.0,1494.0,2.652233,118.0,1533.0,2.544486,101.0,...,1316.0,3.117082,336.0,1315.0,3.155842,334.0,1317.0,3.148458,315.0,1336.0
1,602,11388.0,0.368651,648.0,1003.0,0.982906,462.0,1189.0,0.632636,552.0,...,1226.0,2.935721,426.0,1225.0,2.982432,419.0,1232.0,2.841727,485.0,1166.0
2,431,13195.0,1.420758,337.0,1314.0,1.455322,344.0,1307.0,1.390834,333.0,...,1354.0,3.209204,292.0,1359.0,3.230835,296.0,1355.0,3.2214,283.0,1368.0
3,875,10447.0,0.448797,617.0,1034.0,1.163482,418.0,1233.0,1.298365,355.0,...,912.0,2.292274,741.0,910.0,2.332968,738.0,913.0,2.303686,737.0,914.0
4,182,15399.0,2.548217,106.0,1545.0,2.676836,112.0,1539.0,2.43076,121.0,...,1562.0,3.640818,84.0,1567.0,3.653655,96.0,1555.0,3.623356,71.0,1580.0
5,1074,11865.0,1.949682,221.0,1430.0,1.648239,298.0,1353.0,1.629954,280.0,...,986.0,2.477426,655.0,996.0,2.515252,663.0,988.0,2.589502,614.0,1037.0
6,286,16131.0,3.61146,11.0,1640.0,4.383797,1.0,1650.0,3.709771,7.0,...,1615.0,3.756265,50.0,1601.0,3.869004,36.0,1615.0,3.488996,142.0,1509.0
7,496,15756.0,2.387177,131.0,1520.0,2.664005,114.0,1537.0,2.302066,144.0,...,1617.0,3.821728,34.0,1617.0,3.804962,45.0,1606.0,3.787492,26.0,1625.0
8,15,15225.0,2.023902,204.0,1447.0,2.344309,166.0,1485.0,1.937232,212.0,...,1556.0,3.641594,83.0,1568.0,3.659708,93.0,1558.0,3.66781,57.0,1594.0
9,184,11553.0,0.979784,439.0,1212.0,0.486121,625.0,1026.0,0.826728,491.0,...,1266.0,3.009456,385.0,1266.0,3.041547,387.0,1264.0,3.040125,378.0,1273.0



Re-ranked Top K Item recommendations
['50', '174', '100', '56', '172', '313', '275', '191', '98', '173']

Borda Count recommendations with Movie Genre


Unnamed: 0_level_0,Title,Release Date,Action,Adventure,Animation,Children's,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
50,Star Wars (1977),01-Jan-1977,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0
174,Raiders of the Lost Ark (1981),01-Jan-1981,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
100,Fargo (1996),14-Feb-1997,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0
56,Pulp Fiction (1994),01-Jan-1994,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0
172,"Empire Strikes Back, The (1980)",01-Jan-1980,1,1,0,0,0,0,0,1,0,0,0,0,0,1,1,0,1,0
313,Titanic (1997),01-Jan-1997,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0
275,Sense and Sensibility (1995),01-Jan-1995,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0
191,Amadeus (1984),01-Jan-1984,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0
98,"Silence of the Lambs, The (1991)",01-Jan-1991,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0
173,"Princess Bride, The (1987)",01-Jan-1987,1,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0


It seems that the ensemble model now recommends movies relating to drama, action and romance genres. Let's move on to see more advanced techniques for model ensembling.

## 4. Advanced Model Ensembling

We could continue by thinking of this as a meta-learning problem. We could treat recommendations of each base model as features and train a meta-learner to predict the final recommendation.

This could be any ML model such as a Linear Regression, Random Forest, Gradient Boosting, or even a Neural Network.


In this example, we will use a simple Random Forest model to predict the final recommendation.

In [11]:
# We continue using UIDX = 0 and TOPK = 10

# Lets get all the scores for the models trained in Part 2.

# Gets all users and items in the train set
train_set_user_item = list(zip(train_set.uir_tuple[0], train_set.uir_tuple[1]))

# Get all scores from the trained base models: BPR, WMF
bpr_scores = [bpr_model.score(uidx)[iidx] for uidx, iidx in train_set_user_item]
wmf_scores = [wmf_model.score(uidx)[iidx] for uidx, iidx in train_set_user_item]

# Prepare base model scores for training the ensemble model
X_train = np.array(2 * [(uidx, iidx) for uidx, iidx in train_set_user_item])
y_train = np.concatenate((bpr_scores, wmf_scores))

# Train a Bagging Regressor model - Random Forest on top of the base models
# adaboost_model = AdaBoostRegressor(base_model, n_estimators=50)
randomforest_model = RandomForestRegressor(n_estimators=200, random_state=42) # sklearn Random Forest model
randomforest_model.fit(X_train, y_train)


### Reviewing Random Forest Results

Let's review the results of the Random Forest model. We get the predictions of the test set and put them into the genre table again.

In [15]:
UIDX = 0
TOPK = 10

randomforest_scores = randomforest_model.predict([(UIDX, iidx) for iidx in range(test_set.num_items)]) # to check

item_ids = [item_idx2id[iidx] for iidx in range(test_set.num_items)]

rank_3_df = pd.DataFrame({
    "ItemID": item_ids,
    "Random Forest Score": randomforest_scores
})

# Sort the items based on the score
rank_3_df = rank_3_df.sort_values("Random Forest Score", ascending=False)

display(rank_3_df.head(10))

print("Top K Item recommendations using Random Forest model")
print(list(rank_3_df["ItemID"].values[:TOPK]))

display(item_df.loc[[int(i) for i in rank_3_df["ItemID"].values[:TOPK]]])

Unnamed: 0,ItemID,Random Forest Score
386,174,5.12343
387,315,4.863465
598,168,4.803308
322,210,4.746296
156,64,4.700769
132,272,4.69929
61,204,4.669151
405,22,4.553309
197,191,4.552111
126,69,4.534729


Top K Item recommendations using Random Forest model
['174', '315', '168', '210', '64', '272', '204', '22', '191', '69']


Unnamed: 0_level_0,Title,Release Date,Action,Adventure,Animation,Children's,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
174,Raiders of the Lost Ark (1981),01-Jan-1981,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
315,Apt Pupil (1998),23-Oct-1998,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0
168,Monty Python and the Holy Grail (1974),01-Jan-1974,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
210,Indiana Jones and the Last Crusade (1989),01-Jan-1989,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
64,"Shawshank Redemption, The (1994)",01-Jan-1994,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
272,Good Will Hunting (1997),01-Jan-1997,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
204,Back to the Future (1985),01-Jan-1985,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0
22,Braveheart (1995),16-Feb-1996,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0
191,Amadeus (1984),01-Jan-1984,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0
69,Forrest Gump (1994),01-Jan-1994,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0
