*Copyright (c) Cornac Authors. All rights reserved.*

*Licensed under the Apache 2.0 License.*

# Model Ensembling

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/PreferredAI/cornac/blob/master/tutorials/model_ensembling.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/PreferredAI/cornac/blob/master/tutorials/model_ensembling.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

This Jupyter Notebook demonstrates the process of ensembling multiple recommendation models using the Cornac library.

Model ensembling is a technique that combines the predictions of multiple models to produce a single, more accurate prediction. By leveraging the strengths of different models, we can improve the overall performance of the recommendation system.

There are 5 main parts to this tutorial:
1. [**Introduction**](#introduction): We will first get you started by running a simple experiment with both BPR and WMF models. We will also take a look at what the dataset distribution is like.
2. [**Simple Model Ensembling**](#ensemble-models). We will ensemble the predictions of the BPR and WMF models using a simple technique called Borda Count.
3. [**Further Ensembling**](#further-ensembling). We will create variations of the WMF models, and further ensemble them using the same technique.
4. [**Ensembling with Regression Models**](#ensembling-with-regression-models). We will utilize the same WMF models and ensemble them by linear regression and random forest regression using the `scikit-learn` package.
5. [**Further Evaluation**](#evaluation). We will evaluate the performance of the ensemble models.

**Note:** Part of this notebook (in Section 4) uses the `scikit-learn` package. 

## 1. Introduction
<a id='introduction'></a>

We will first run a simple experiment with both BPR and WMF models. We will also take a look at what the dataset distribution is like.


### 1.1 Install required dependencies

In [1]:
! pip install scikit-learn cornac==2.2.2 tensorflow==2.12.0



In [2]:
from IPython.display import display
import numpy as np
import pandas as pd
from tqdm import tqdm

from cornac.datasets import movielens
from cornac.models import BPR, WMF
from cornac.eval_methods import RatioSplit
from cornac.metrics import Precision, Recall
from cornac.utils import cache
from cornac import Experiment

from sklearn import linear_model
from sklearn.ensemble import RandomForestRegressor

### 1.2 Loading Dataset

First, we load the MovieLens 100K dataset.

In [3]:
data = movielens.load_feedback(variant="100K") # Load MovieLens Dataset

rs = RatioSplit(data, test_size=0.2, rating_threshold=4.0, seed=42, verbose=True) # Split to train-test set to 80-20
train_set, test_set = rs.train_set, rs.test_set

rating_threshold = 4.0
exclude_unknowns = True
---
Training data:
Number of users = 943
Number of items = 1651
Number of ratings = 80000
Max rating = 5.0
Min rating = 1.0
Global mean = 3.5
---
Test data:
Number of users = 943
Number of items = 1651
Number of ratings = 19964
Number of unknown users = 0
Number of unknown items = 0
---
Total users = 943
Total items = 1651


### 1.3 Training BPR and WMF models

We will train two models: 

1. **BPR (Bayesian Personalized Ranking)**
2. **WMF (Weighted Matrix Factorization)**

In [4]:
bpr_model = BPR(k=10, max_iter=100, learning_rate=0.01, lambda_reg=0.001, seed=123) # Initialize BPR model
wmf_model = WMF(k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123) # Initialize WMF model

models = [bpr_model, wmf_model]
metrics = [Precision(k=50), Recall(k=50)] # Set metrics for experiment

experiment = Experiment(rs, models, metrics, user_based=True).run() # Run Experiment to compare BPR model to WMF model individually


[BPR] Training started!

[BPR] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


TEST:
...
    | Precision@50 | Recall@50 | Train (s) | Test (s)
--- + ------------ + --------- + --------- + --------
BPR |       0.0985 |    0.4922 |    0.9723 |   0.7340
WMF |       0.1133 |    0.5584 |   22.2400 |   0.2534



Comparing Precision and Recall, both BPR and WMF are providing comparable results.

Let's move on to try to interpret these results by using the genres of movies that were recommended to us.

### 1.4 Interpreting Results

##### 1.4.1 Creating a Movie Genre Dataframe

In [5]:
# Creating a dataframe of movies with its corresponding genres

# Download some information of MovieLens 100K dataset
item_df = pd.read_csv(
  cache("http://files.grouplens.org/datasets/movielens/ml-100k/u.item"), 
  sep="|", encoding="ISO-8859-1",
  names=["ItemID", "Title", "Release Date", "Video Release Date", "IMDb URL", 
         "unknown", "Action", "Adventure", "Animation", "Children's", "Comedy", 
         "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror", 
         "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"]
).set_index("ItemID").drop(columns=["Video Release Date", "IMDb URL", "unknown"])

item_idx2id = train_set.item_ids # mapping between item index and origial film ID 
user_idx2id = train_set.user_ids # mapping between user index and origial user ID

# Let's take a look at an example of this dataframe
display(item_df.head(3))

Unnamed: 0_level_0,Title,Release Date,Action,Adventure,Animation,Children's,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,Toy Story (1995),01-Jan-1995,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,GoldenEye (1995),01-Jan-1995,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
3,Four Rooms (1995),01-Jan-1995,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


The `item_df` dataframe consists of all movie items with its corresponding genre attributes.

Further down below, we are going to filter this table with the recommendations that we get from the recommender system models we created to get a better sense of its performance.

##### 1.4.2 Creating Training Data Dataframe

To get a sense of what data has been inserted into our model for training, let's count the genres of the training data used to train the model.

But first, let's create a `training_data_df` dataframe with all training data.

The training data consists of 80000 triplets of **User Index**, **Item Index** and **Rating** rows as seen in the dataset summary in Section 2.1.

In [6]:
# Let's view a sample of the training data dataframe
print("Sample row of record:")
print("(user_index, item_index, rating):", list(zip(*train_set.uir_tuple))[0])

# Create a training data dataframe
training_data_df = pd.DataFrame(zip(*train_set.uir_tuple)) # adding all training data into dataframe
training_data_df.columns = ['user_idx', 'item_idx', 'rating'] # adding column names to the data

# Add new column, 'item_id', for further filtering in later sections
training_data_df['item_id'] = training_data_df.apply(lambda row: item_idx2id[int(row['item_idx'])], axis=1) # converted from the item index field

Sample row of record:
(user_index, item_index, rating): (0, 0, 4.0)


##### 1.4.3 Filtering Training Data

Let's filter based on a particular user to learn more about the user.

We set ``UIDX`` to user index **3**, and ``TOPK`` to **50**, to get the top 50 recommendations in each model for comparison.

In [7]:
# Let's define the user index and top-k movies to be recommended
UIDX = 3
TOPK = 50

# Positively rated items by a user (rating >= 4.0 as rating_threshold used earlier, and user index = UIDX)
positively_rated_items = training_data_df[
    (training_data_df['rating'] >= 4.0) & (training_data_df['user_idx'] == UIDX)
]['item_id'].unique()
filter_df = item_df.loc[[int(item_id) for item_id in positively_rated_items]] # get genres of movie items

print("Number of movies:", len(filter_df)) # Number of movies positvely rated by user index 3 in training data

# Group by Movie Genre and Sum by genres
filter_df = filter_df.select_dtypes(np.number).sum() 
filter_df = filter_df.to_frame("Sum") # Let's call that column 'Sum'

# Add a new column '%' for the percentage of individual genre sum compared to total sum
filter_df["%"] = filter_df["Sum"] / filter_df["Sum"].sum() * 100
filter_df["%"] = filter_df["%"].round(1)

# Let's see the training data genres, sums and percentages
print("Positively rated movies by user index 3 in training data")
display(filter_df.sort_values("Sum", ascending=False)[:10])

Number of movies: 250
Positively rated movies by user index 3 in training data


Unnamed: 0,Sum,%
Drama,117,22.6
Comedy,72,13.9
Romance,56,10.8
Action,55,10.6
Thriller,50,9.7
Adventure,36,6.9
Children's,23,4.4
War,20,3.9
Crime,20,3.9
Sci-Fi,18,3.5


As shown above in the training data, the top genres for user index 3 with positively rated movies include 'Drama', 'Comedy', 'Romance', 'Action' and 'Thriller'.

Let's now compare them to the recommendations of the BPR and WMF models respectively.

##### 1.4.4 Interpreting Recommendations of BPR, WMF Models

In [8]:
# Get the Top 5 Genres in filtered training data for user index 3
top_genres = filter_df.sort_values("Sum", ascending=False).head(5).index.tolist()
print("\nTop 5 Genres in training data:", top_genres)

# Get top K recommendations for BPR and put them into the genre dataframe
bpr_recommendations, bpr_scores = bpr_model.rank(UIDX) # rank recommendations by score, limit to top K
bpr_recommendations = bpr_recommendations[:TOPK] # limit to top K
bpr_topk = [item_idx2id[iidx] for iidx in bpr_recommendations] # convert item indexes into item ids
bpr_df = item_df.loc[[int(iid) for iid in bpr_topk]] # filter the movie genre dataframe by item ids

# Let's view the top recommendations for BPR by top genres
display("BPR: Top recommendations", bpr_df[["Title"] + top_genres].head(10))

# Now, let's do likewise for WMF - get top K recommendations and put them into the genre dataframe
wmf_recommendations, wmf_scores = wmf_model.rank(UIDX) # rank recommendations by score
wmf_recommendations = wmf_recommendations[:TOPK] # limit to top K
wmf_topk = [item_idx2id[iidx] for iidx in wmf_recommendations] # convert item indexes into item ids
wmf_df = item_df.loc[[int(iid) for iid in wmf_topk]] # filter the movie genre dataframe by item ids

# View the top recommendations for WMF
display("WMF: Top recommendations", wmf_df[["Title"] + top_genres].head(10))


Top 5 Genres in training data: ['Drama', 'Comedy', 'Romance', 'Action', 'Thriller']


'BPR: Top recommendations'

Unnamed: 0_level_0,Title,Drama,Comedy,Romance,Action,Thriller
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
781,French Kiss (1995),0,1,1,0,0
294,Liar Liar (1997),0,1,0,0,0
1,Toy Story (1995),0,1,0,0,0
181,Return of the Jedi (1983),0,0,1,1,0
121,Independence Day (ID4) (1996),0,0,0,1,0
100,Fargo (1996),1,0,0,0,1
739,Pretty Woman (1990),0,1,1,0,0
313,Titanic (1997),1,0,1,1,0
402,Ghost (1990),0,1,1,0,1
471,Courage Under Fire (1996),1,0,0,0,0


'WMF: Top recommendations'

Unnamed: 0_level_0,Title,Drama,Comedy,Romance,Action,Thriller
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
313,Titanic (1997),1,0,1,1,0
204,Back to the Future (1985),0,1,0,0,0
8,Babe (1995),1,1,0,0,0
125,Phenomenon (1996),1,0,1,0,0
318,Schindler's List (1993),1,0,0,0,0
15,Mr. Holland's Opus (1995),1,0,0,0,0
64,"Shawshank Redemption, The (1994)",1,0,0,0,0
655,Stand by Me (1986),1,1,0,0,0
692,"American President, The (1995)",1,1,1,0,0
732,Dave (1993),0,1,1,0,0


Now that we have seen the top recommendations of the BPR and WMF models, let's do a comparison by taking a look at the genre distribution.

##### 1.4.5 Comparing Models by Genre Distribution

In [9]:
# Let's introduce `combined_df` for comparison.
# This dataframe will be used to compare models by summing up genres from recommendations of different models
combined_df = pd.DataFrame({
    "Train Data %": filter_df["%"],
    "BPR Sum": bpr_df.select_dtypes(np.number).sum(), # group by genres, then get sum of each genre
    "WMF Sum": wmf_df.select_dtypes(np.number).sum() # likewise for WMF
})

# Get percentages of movie genre sums
combined_df['BPR %'] = combined_df['BPR Sum'] / combined_df['BPR Sum'].sum() * 100 
combined_df["WMF %"] = combined_df["WMF Sum"] / combined_df["WMF Sum"].sum() * 100

combined_df = combined_df.round(1) # round all 
combined_df = combined_df.sort_values("Train Data %", ascending=False)

# Let's take a look at the genre distribution by percentages
display("Train Data to Recommended % Distribution", combined_df[['Train Data %', 'BPR %', 'WMF %']][:10])

'Train Data to Recommended % Distribution'

Unnamed: 0,Train Data %,BPR %,WMF %
Drama,22.6,17.1,26.0
Comedy,13.9,13.7,21.0
Romance,10.8,17.9,18.0
Action,10.6,13.7,7.0
Thriller,9.7,12.8,6.0
Adventure,6.9,6.8,5.0
Children's,4.4,1.7,2.0
War,3.9,5.1,5.0
Crime,3.9,2.6,1.0
Sci-Fi,3.5,3.4,4.0


Note that many movies belong to multiple genres, so the sum of the genre counts may exceed the total number of recommendations.

-------

Now that we have seen the distribution of individual models, we are curious about what kind of distribution we will get from ensembling these models.

Let's see what happens when we ensemble these two models. 

## 2. Simple Model Ensembling by Borda Count
<a id='ensemble-models'></a>

We will ensemble the two models using the Borda Count method. The Borda Count method is a simple voting method that ranks the items based on the sum of their ranks from each model.

Assuming that we have a list of **5 items**, the Borda Count method works as follows:

1. For each model, rank the items from 1 to 5 based on the predicted scores.
2. Sum the ranks of each item across all models.
3. Sort the items based on the sum of their ranks.
4. The top-ranked item is the final recommendation.
5. Repeat the process for the next user.

Given the below example for a random user **123**:

| Rank | Model 1 | Model 2 | Model 3 | Allocated Points (N - rank) |
|------|---------|---------|---------|-----------------------------|
| 1    | A       | D       | E       | 5 - 1 = 4                   |
| 2    | B       | C       | A       | 5 - 2 = 3                   |
| 3    | C       | A       | B       | 5 - 3 = 2                   |
| 4    | D       | B       | D       | 5 - 4 = 1                   |
| 5    | E       | E       | C       | 5 - 5 = 0                   |

Based on the allocated points for each of the items, we sum the points up to get our Borda Count.

| Item | Model 1 | Model 2 | Model 3 | Borda Count  |
|------|---------|---------|---------|--------------|
| A    | 4       | 2       | 3       | 9            |
| B    | 3       | 1       | 2       | 6            |
| C    | 2       | 3       | 0       | 5            |
| D    | 1       | 4       | 1       | 6            |
| E    | 0       | 0       | 4       | 4            |

New ranking: A > B, D > C > E


Lets implement this method below.

In [10]:
# Let's create a new dataframe to calculate ranking and borda count
rank_df = pd.DataFrame({
    "ItemID": item_idx2id,
})

total_items = len(rank_df) # 1651 items

# Obtain points (inverse of rank) of the items based on the BPR score
rank_df["BPR Score"] = bpr_scores
rank_df["BPR Rank"] = rank_df["BPR Score"].rank(ascending=False).astype(int) # Get Rank where 1 = Top recommendation
rank_df["BPR Points"] = total_items - rank_df["BPR Rank"] # Get points by calculating ('Total Item count' - 'Rank')

# Do likewise for WMF
rank_df["WMF Score"] = wmf_scores
rank_df["WMF Rank"] = rank_df["WMF Score"].rank(ascending=False).astype(int) # Get Rank where 1 = Top recommendation
rank_df["WMF Points"] = total_items - rank_df["WMF Rank"] # Get points by calculating ('Total Item count' - 'Rank')

# Get Borda Count by summing up points of BPR and WMF
rank_df["Borda Count"] = rank_df["BPR Points"] + rank_df["WMF Points"]

# Round decimal places for readability purposes
rank_df = rank_df.round(3)

# Now let's take a look at the table with Borda Count 
display(rank_df.head(5))

Unnamed: 0,ItemID,BPR Score,BPR Rank,BPR Points,WMF Score,WMF Rank,WMF Points,Borda Count
0,381,1.148,463,1188,4.442,83,1568,2756
1,602,-0.769,980,671,1.889,746,905,1576
2,431,1.972,286,1365,2.891,455,1196,2561
3,875,0.724,558,1093,2.767,487,1164,2257
4,182,3.444,60,1591,3.927,194,1457,3048


Now that we have Borda Count, let's rerank this list and to provide the ensembled model's recommendation.

In [11]:
# Introduce reranked dataframe for borda count
reranked_df = rank_df.sort_values("Borda Count", ascending=False)

# Let's take a look at the ensembled top 5 recommendations and their respective ranks
display("Re-ranked Top Item recommendations", reranked_df.head(5))

'Re-ranked Top Item recommendations'

Unnamed: 0,ItemID,BPR Score,BPR Rank,BPR Points,WMF Score,WMF Rank,WMF Points,Borda Count
152,313,4.494,8,1643,6.066,1,1650,3293
194,739,4.515,7,1644,5.057,11,1640,3284
425,237,4.196,15,1636,4.968,18,1633,3269
310,692,3.989,26,1625,5.142,9,1642,3267
382,655,3.979,27,1624,5.165,8,1643,3267


The top recommendation ItemID **313**, was ranked **8** on BPR and **1** on WMF, where the 2nd recommendation ItemID **739** was ranked **7** on BPR and **11** on WMF.

This shows how we can combine the recommendations of multiple models.

-------

Next, let's add the recommendations into the genre distribution dataframe to compare its results to the base models.

In [12]:
UIDX = 3
TOPK = 50

borda_count_topk = reranked_df["ItemID"].values[:TOPK] # Get top K (50) Item IDs

borda_df = item_df.loc[[int(i) for i in borda_count_topk]] # Filter genre data frame by the top item IDs

# Add Borda Count results into 'combined_df' dataframe for comparison
combined_df["Borda Count Sum"] = borda_df.select_dtypes(np.number).sum() # group by genre, and calculate sum of each genre
combined_df["Borda Count %"] = combined_df["Borda Count Sum"] / combined_df["Borda Count Sum"].sum() * 100 # Calculate percentage of sum to total
combined_df["Borda Count %"] = combined_df["Borda Count %"].round(1) # rounding for readability purposes

# Let's take a look at the genre distribution of train data, BPR, WMF and the newly added Borda Count
display("Borda Count Recommendations Distribution", combined_df[["Train Data %", "BPR %", "WMF %", "Borda Count %"]][:10])

'Borda Count Recommendations Distribution'

Unnamed: 0,Train Data %,BPR %,WMF %,Borda Count %
Drama,22.6,17.1,26.0,17.2
Comedy,13.9,13.7,21.0,19.0
Romance,10.8,17.9,18.0,19.0
Action,10.6,13.7,7.0,10.3
Thriller,9.7,12.8,6.0,7.8
Adventure,6.9,6.8,5.0,6.9
Children's,4.4,1.7,2.0,1.7
War,3.9,5.1,5.0,6.0
Crime,3.9,2.6,1.0,1.7
Sci-Fi,3.5,3.4,4.0,6.9


As Borda Count is a combination of both BPR and WMF models, the distributions are expected to be influenced by both models.

In the next section, we will further add more models to the ensemble.

## 3. Further Ensembling by Adding More Models
<a id='further-ensembling'></a>

We can easily add more models to the ensemble by training them and adding them. One approach is to train a model with different initializations using different random seeds. By adding multiple similar models of different random seeds (`seed=123`), some models could perform better for a set of users, while other models could perform better for another set of users.

Another way is to change the number of latent factors `k`. By achieving better performance on a subset of data for each of the models, we are able to improve the performance of the ensembled model as a whole.

By ensembling these models, we could potentially achieve better performance when combined.

Let's try adding a few more similar models with different random seed initializations.

In [13]:
# WMF models with different seeds
wmf_model_123 = WMF(name="WMF_123", k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123)
wmf_model_456 = WMF(name="WMF_456", k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=456)
wmf_model_789 = WMF(name="WMF_789", k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=789)
wmf_model_888 = WMF(name="WMF_888", k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=888)
wmf_model_999 = WMF(name="WMF_999", k=10, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=999)
# WMF models with different number of latent factors
wmf_model_k20 = WMF(name="WMF_k20", k=20, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123)
wmf_model_k30 = WMF(name="WMF_k30", k=30, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123)
wmf_model_k40 = WMF(name="WMF_k40", k=40, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123)
wmf_model_k50 = WMF(name="WMF_k50", k=50, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123)

models = [wmf_model_123, wmf_model_456, wmf_model_789, wmf_model_888, wmf_model_999, wmf_model_k20, wmf_model_k30, wmf_model_k40, wmf_model_k50]

# Let's run an experiment to take a look at how different these models are, with just different random seeds!
experiment = Experiment(rs, models, metrics, user_based=True).run()


[WMF_123] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_123] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF_456] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_456] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF_789] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_789] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF_888] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_888] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF_999] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_999] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF_k20] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_k20] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF_k30] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_k30] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF_k40] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_k40] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


[WMF_k50] Training started!


  0%|          | 0/300 [00:00<?, ?it/s]

Learning completed!

[WMF_k50] Evaluation started!


Ranking:   0%|          | 0/940 [00:00<?, ?it/s]


TEST:
...
        | Precision@50 | Recall@50 | Train (s) | Test (s)
------- + ------------ + --------- + --------- + --------
WMF_123 |       0.1133 |    0.5584 |   16.6684 |   0.3683
WMF_456 |       0.1099 |    0.5500 |   18.3544 |   0.3176
WMF_789 |       0.1133 |    0.5606 |   17.3889 |   0.3177
WMF_888 |       0.1126 |    0.5525 |   18.9417 |   0.3430
WMF_999 |       0.1133 |    0.5596 |   17.0095 |   0.2932
WMF_k20 |       0.1152 |    0.5741 |   17.6763 |   0.2648
WMF_k30 |       0.1108 |    0.5518 |   18.5937 |   0.3195
WMF_k40 |       0.1075 |    0.5426 |   19.3286 |   0.2625
WMF_k50 |       0.1045 |    0.5294 |   18.2090 |   0.3540



Based on the results, we can see that even within the same model, the results can vary. 

Let's try ensembling all these models together into 1 single model by Borda Count, and look at its recommendations.

In [14]:
# Let's create a different dataframe to calculate ranking and borda count
rank_2_df = pd.DataFrame({
    "ItemID": item_idx2id,
})

# Add a column named 'Ensembled WMF Model'
rank_2_df["WMF Borda Count"] = 0

# Calculate the points (inverse of rank) for each of the models and accumulate them into the 'WMF Borda Count' column
# We use the same formula as the 'Borda Count' calculation
for model in models:
    name = model.name
    recommendations, scores = model.rank(UIDX)
    rank_2_df[name + "_score"] = scores
    rank_2_df[name + "_rank"] = rank_2_df[name + "_score"].rank(ascending=False).astype(int)
    rank_2_df[name + "_points"] = total_items - rank_2_df[name + "_rank"]
    rank_2_df["WMF Borda Count"] = rank_2_df["WMF Borda Count"] + rank_2_df[name + "_points"]

# Let's sort and view the top recommendations!
display("Top 10 Recommendations for WMF Borda Count", rank_2_df[["ItemID", "WMF Borda Count"]].sort_values("WMF Borda Count", ascending=False).head(10))

'Top 10 Recommendations for WMF Borda Count'

Unnamed: 0,ItemID,WMF Borda Count
37,318,14757
152,313,14710
197,191,14663
132,272,14633
156,64,14629
61,204,14605
279,402,14604
305,181,14585
405,22,14581
604,215,14542


In [15]:
# Now, let's add them to the combined dataframe for comparison with earlier models
wmf_borda_count_topk = rank_2_df.sort_values("WMF Borda Count", ascending=False)["ItemID"].values[:TOPK]
wmf_borda_df = item_df.loc[[int(i) for i in wmf_borda_count_topk]]

combined_df["WMF Borda Count Sum"] = wmf_borda_df.select_dtypes(np.number).sum()
combined_df["WMF Borda Count %"] = combined_df["WMF Borda Count Sum"] / combined_df["WMF Borda Count Sum"].sum() * 100
combined_df["WMF Borda Count %"] = combined_df["WMF Borda Count %"].round(1)

# Let's compare the recommendation distribution
display("Combined Recommendations Distribution", combined_df[["Train Data %", "WMF %", "WMF Borda Count %"]][:10])

'Combined Recommendations Distribution'

Unnamed: 0,Train Data %,WMF %,WMF Borda Count %
Drama,22.6,26.0,22.7
Comedy,13.9,21.0,14.3
Romance,10.8,18.0,16.8
Action,10.6,7.0,9.2
Thriller,9.7,6.0,5.9
Adventure,6.9,5.0,5.9
Children's,4.4,2.0,3.4
War,3.9,5.0,6.7
Crime,3.9,1.0,3.4
Sci-Fi,3.5,4.0,5.0


Comparing the results of the WMF Borda Count model, we can see that the different random seed initializations, along with the different number of latent factors, have influenced the recommendations.

-------

Now that we have touched on borda count methods, let's see how we could use other methods and popular packages such as **scikit-learn** to do advanced model ensembling.

## 4. Ensembling with Regression Models
<a id='ensembling-with-regression-models'></a>

We could continue by thinking of this as a meta-learning problem. We could treat recommendations of each base model as features and train a meta-learner to predict the final recommendation.

This could be any ML model such as a Linear Regression, Random Forest, Gradient Boosting, or even a Neural Network.

In this example, we will use a simple Linear Regression model to predict the final recommendation.

We will teach a model to learn from the different outputs of the base models of WMF.

##### 4.1 Prepare Data

In [16]:
# First, lets create training and test data dataframes
training_df = pd.DataFrame(zip(*train_set.uir_tuple)) # Add 'User Index', 'Item Index', 'Rating' triples as records in dataframe
training_df.columns = ['user_idx', 'item_idx', 'rating'] # Set column names

# Get all possible user_index, item_index combinations, add them into dataframe for inference
all_df = pd.DataFrame({
    "user_idx": [user_idx for user_idx in range(train_set.num_users) for _ in range(train_set.num_items)],
    "item_idx": [item_idx for _ in range(train_set.num_users) for item_idx in range(train_set.num_items)],
})
all_df['item_id'] = all_df.apply(lambda row: item_idx2id[int(row['item_idx'])], axis=1) # Add 'Item ID' column into dataframe by converting 'Item Index' to 'Item ID'

# Lets get all the scores for the models trained in Part 3.
models = [wmf_model_123, wmf_model_456, wmf_model_789, wmf_model_888, wmf_model_999, wmf_model_k20, wmf_model_k30, wmf_model_k40, wmf_model_k50]

# For each model, we add individual predicted ratings by individual models to training and test dataframes
for model in tqdm(models):
    name = model.name
    training_df[name + "_score"] = training_df.apply(lambda row: model.score(int(row['user_idx']), int(row['item_idx'])), axis=1) # for training
    all_df[name + "_score"] = all_df.apply(lambda row: model.score(int(row['user_idx']), int(row['item_idx'])), axis=1) # for inference

# Let's pick out the 5 features - predicted ratings from the 5 models trained
X_train = training_df[['WMF_123_score', 'WMF_456_score', 'WMF_789_score', 'WMF_888_score', 'WMF_999_score', 'WMF_k20_score', 'WMF_k30_score', 'WMF_k40_score', 'WMF_k50_score']] # use these predicted ratings as features
y_train = training_df['rating'] # use ground truth to train this linear regression model
X_inference = all_df[['WMF_123_score', 'WMF_456_score', 'WMF_789_score', 'WMF_888_score', 'WMF_999_score', 'WMF_k20_score', 'WMF_k30_score', 'WMF_k40_score', 'WMF_k50_score']] # all data, used to predict values for ranking

display("Training features", X_train.head(3)) # predicting ratings as features
display("Target values", y_train.head(3)) # ground truth ratings
display("Inference Data", X_inference.head(3)) # all inference data 

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [02:13<00:00, 14.81s/it]


'Training features'

Unnamed: 0,WMF_123_score,WMF_456_score,WMF_789_score,WMF_888_score,WMF_999_score,WMF_k20_score,WMF_k30_score,WMF_k40_score,WMF_k50_score
0,2.110029,2.071523,1.903551,2.302063,3.117306,2.806793,3.367846,4.248022,3.843781
1,2.791619,2.692337,2.421384,2.478971,2.736729,2.779977,2.639981,2.263646,2.319107
2,3.751,3.385016,3.54209,3.759172,3.727951,4.115041,3.44295,3.486503,3.089492


'Target values'

0    4.0
1    3.0
2    4.0
Name: rating, dtype: float64

'Inference Data'

Unnamed: 0,WMF_123_score,WMF_456_score,WMF_789_score,WMF_888_score,WMF_999_score,WMF_k20_score,WMF_k30_score,WMF_k40_score,WMF_k50_score
0,2.110029,2.071523,1.903551,2.302063,3.117306,2.806793,3.367846,4.248022,3.843781
1,0.807322,1.29558,0.918263,0.553786,0.588544,-0.075428,-0.366332,-0.918094,0.795499
2,1.648435,1.456549,1.591913,1.271597,1.67771,2.486352,2.044948,2.325095,0.936491


Now that we have already prepared the data for fitting into a **scikit-learn** model, let's first try to train a Linear Regression model

##### 4.2 Fitting Linear Regression Model

In [17]:
UIDX = 3
TOPK = 50

# Let's now fit into a Linear Regression model
regr = linear_model.LinearRegression(fit_intercept=False) # force model to only use predictions from WMF models
regr.fit(X_train, y_train) # train the model

# Input: 9 base model predicted ratings. Output: final predicted rating based on linear regression
y_pred = regr.predict(X_inference) # Get predictions based on trained model

all_df["WMF Linear Regression"] = y_pred # create a column in `test_df` for the predictions

# Get Top K ratings from predictions
sorted_df = all_df.sort_values("WMF Linear Regression", ascending=False) # sort by predicted ratings
top_item_ids = sorted_df[sorted_df['user_idx'] == UIDX]['item_id'].values[:TOPK] # filter top K (50 as set in Section 2.3)

# Place them into the comparison distribution dataframe
linear_regression_df = item_df.loc[[int(i) for i in top_item_ids]] # Get genres of ratings
combined_df["WMF Linear Regression Sum"] = linear_regression_df.select_dtypes(np.number).sum() # group by genre and sum them up
combined_df["WMF Linear Regression %"] = combined_df["WMF Linear Regression Sum"] / combined_df["WMF Linear Regression Sum"].sum() * 100 # get percentages of (genre sum / whole sum)

combined_df["WMF Linear Regression %"] = combined_df["WMF Linear Regression %"].round(1) # round values for readability

print("Coefficients of the linear regression model")
print(regr.coef_) # coefficients of the linear regression model
print(regr.intercept_) # intercept of the linear regression model

# Now let's take a look at how the genre distribution is
display("Combined Recommendations Distribution", combined_df[["Train Data %", "WMF %", "WMF Borda Count %", "WMF Linear Regression %"]][:10])

Coefficients of the linear regression model
[-0.03553173  0.05021945 -0.09029353 -0.052304   -0.12588827 -0.10598847
  0.13995151  0.48300126  0.84912807]
0.0


'Combined Recommendations Distribution'

Unnamed: 0,Train Data %,WMF %,WMF Borda Count %,WMF Linear Regression %
Drama,22.6,26.0,22.7,26.0
Comedy,13.9,21.0,14.3,11.5
Romance,10.8,18.0,16.8,11.5
Action,10.6,7.0,9.2,12.5
Thriller,9.7,6.0,5.9,11.5
Adventure,6.9,5.0,5.9,8.7
Children's,4.4,2.0,3.4,1.9
War,3.9,5.0,6.7,3.8
Crime,3.9,1.0,3.4,3.8
Sci-Fi,3.5,4.0,5.0,2.9


We have successfully fit into a Linear Regression model using the 9 WMF base models consisting of different seeds and latent factors.

As we can tell from the coefficients, the linear regression model has given more weight to the `WMF_k50` model, which has the highest coefficient.

Let's now fit into a Random Forest Regressor model.

##### 4.3 Fitting Random Forest Model

We reuse the same training data and train a Random Forest Regressor model.

While in this example we used a Random forest model, we could also try other models like Gradient Boosting, etc.


In [18]:
UIDX = 3
TOPK = 50

# Let's now train a Random Forest model
randomforest_model = RandomForestRegressor(n_estimators=50, random_state=42) 
randomforest_model.fit(X_train, y_train) # Train the model

# Input: 5 base model predicted ratings. Output: final predicted rating based on random forest
y_pred = randomforest_model.predict(X_inference)

all_df["WMF Random Forest"] = y_pred # create a column in `all_df` for the predictions

# Get Top K ratings from predictions
sorted_df = all_df.sort_values("WMF Random Forest", ascending=False) # sort by predicted ratings
top_item_ids = sorted_df[sorted_df['user_idx'] == UIDX]['item_id'].values[:TOPK] # filter top K (50 as set in Section 2.3)

# Place them into the comparison distribution dataframe
random_forest_df = item_df.loc[[int(i) for i in top_item_ids]] # Get genres of ratings
combined_df["WMF Random Forest Sum"] = random_forest_df.select_dtypes(np.number).sum() # group by genre and sum them up
combined_df["WMF Random Forest %"] = combined_df["WMF Random Forest Sum"] / combined_df["WMF Random Forest Sum"].sum() * 100 # get percentages of (genre sum / whole sum)

combined_df["WMF Random Forest %"] = combined_df["WMF Random Forest %"].round(1) # round values for readability

# Now let's take a look at how the genre distribution is
display("Combined Recommendations Distribution", combined_df[["Train Data %", "WMF %", "WMF Borda Count %", "WMF Linear Regression %", "WMF Random Forest %"]][:10])

'Combined Recommendations Distribution'

Unnamed: 0,Train Data %,WMF %,WMF Borda Count %,WMF Linear Regression %,WMF Random Forest %
Drama,22.6,26.0,22.7,26.0,26.4
Comedy,13.9,21.0,14.3,11.5,12.3
Romance,10.8,18.0,16.8,11.5,13.2
Action,10.6,7.0,9.2,12.5,10.4
Thriller,9.7,6.0,5.9,11.5,9.4
Adventure,6.9,5.0,5.9,8.7,7.5
Children's,4.4,2.0,3.4,1.9,0.9
War,3.9,5.0,6.7,3.8,3.8
Crime,3.9,1.0,3.4,3.8,5.7
Sci-Fi,3.5,4.0,5.0,2.9,3.8


We have successfully fit into a Random Forest Regressor model using the 9 WMF base models consisting of different seeds and latent factors.

From the distribution, we can tell that these different ensemble models utilized the base model predictions differently to come up with the final prediction.

------

In the next section, we will further compare the results of the different models.

## 5. Further Evaluation
<a id='evaluation'></a>

In Section 4, we were only able to view the recommendation distribution and compare them visually based on a **single user**.

What if we want to compare the models based on **multiple users**?

> We can do so by calculating the score of all users and item combinations, then the **Precision** and **Recall** of the predictions for each model.

We will also create a dataframe `rank_df` to store the results of the models given **all users and items**.

In [19]:
rank_df = pd.DataFrame({
    "user_idx": all_df["user_idx"],
    "item_idx": all_df["item_idx"],
})

total_items = train_set.num_items # 1651 items

models_to_calculate = [bpr_model, wmf_model, wmf_model_123, wmf_model_456, wmf_model_789, wmf_model_888, wmf_model_999, wmf_model_k20, wmf_model_k30, wmf_model_k40, wmf_model_k50]

# Calculate points for each model using the Borda count process.
# Take note that points should be calculated on a per user basis.
for model in tqdm(models_to_calculate):
    name = model.name
    rank_df[name + "_score"] = rank_df.apply(lambda row: model.score(int(row['user_idx']), int(row['item_idx'])), axis=1)
    point_list = [] # This list will be added into the dataframe later

    # Calculate points for each user
    for user_idx in range(train_set.num_users):
        sub_rank_df = rank_df[rank_df["user_idx"] == user_idx].copy() # get all items for a user
        sub_rank_df.loc[:, name + "_rank"] = sub_rank_df[name + "_score"].rank(ascending=False).astype(int)  # Rank items
        sub_rank_df.loc[:, name + "_points"] = total_items - sub_rank_df[name + "_rank"]  # Calculate points
        point_list.extend(sub_rank_df[name + "_points"].values.tolist())  # Add points to the list
    
    rank_df[name + "_points"] = point_list


display(rank_df.head(5)) # Now `rank_df` contains the points for each model

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [03:02<00:00, 16.62s/it]


Unnamed: 0,user_idx,item_idx,BPR_score,BPR_points,WMF_score,WMF_points,WMF_123_score,WMF_123_points,WMF_456_score,WMF_456_points,...,WMF_999_score,WMF_999_points,WMF_k20_score,WMF_k20_points,WMF_k30_score,WMF_k30_points,WMF_k40_score,WMF_k40_points,WMF_k50_score,WMF_k50_points
0,0,0,2.265398,1494,2.110029,1495,2.110029,1495,2.071523,1488,...,3.117306,1629,2.806793,1608,3.367846,1626,4.248022,1646,3.843781,1637
1,0,1,0.36865,1003,0.807322,1026,0.807322,1026,1.29558,1259,...,0.588544,1007,-0.075428,330,-0.366332,236,-0.918094,44,0.795499,1260
2,0,2,1.420759,1314,1.648435,1378,1.648435,1378,1.456549,1313,...,1.67771,1405,2.486352,1571,2.044948,1519,2.325095,1559,0.936491,1315
3,0,3,0.448797,1034,0.4497,776,0.4497,776,-0.102504,232,...,-0.210225,151,-0.114164,262,-0.598675,130,-0.331122,243,0.106251,805
4,0,4,2.548217,1545,1.759084,1407,1.759084,1407,2.151837,1502,...,1.539425,1375,1.157579,1271,0.386267,993,1.672385,1454,0.668341,1197


You may be wondering why we are doing this Borda Count method again. This is because we are trying to compare the results of the different models based on **Precision** and **Recall**, which requires scores for each user and item combination.

After we have the points calculated. Let's add them up similarly as per the Borda Count formula shown in sections 2 and 3 above.

To reiterate, our simple `Borda Count` model consists of the **BPR Model** and **WMF Model**.

While the `WMF Borda Count` model consists of multiple models:
- Different random seed initilization: **wmf_model_123**, **wmf_model_456**, **wmf_model_789**, **wmf_model_888**, **wmf_model_999**, and
- Different latent factors: **wmf_model_k20**, **wmf_model_k30**, **wmf_model_k40**, **wmf_model_k50**.

In [20]:
borda_count_models = [bpr_model, wmf_model]
rank_df["Borda Count"] = rank_df[[model.name + "_points" for model in borda_count_models]].sum(axis=1) # Sum up points of BPR and WMF

wmf_borda_count_models = [wmf_model_123, wmf_model_456, wmf_model_789, wmf_model_888, wmf_model_999, wmf_model_k20, wmf_model_k30, wmf_model_k40, wmf_model_k50]
rank_df["WMF Borda Count"] = rank_df[[model.name + "_points" for model in wmf_borda_count_models]].sum(axis=1) # Sum up points of all WMF models

display(rank_df.head(5))

# Now, lets add them into the `all_df` dataframe for comparison
all_df.sort_values(by=["user_idx", "item_idx"], inplace=True) # ensure that the dataframe is sorted by user index and item index

all_df["BPR_score"] = rank_df["BPR_score"].values
all_df["WMF_score"] = rank_df["WMF_score"].values

all_df["Borda Count"] = rank_df["Borda Count"].values
all_df["WMF Borda Count"] = rank_df["WMF Borda Count"].values

display("`all_df` - Comparison of all Models", all_df.head(5))


Unnamed: 0,user_idx,item_idx,BPR_score,BPR_points,WMF_score,WMF_points,WMF_123_score,WMF_123_points,WMF_456_score,WMF_456_points,...,WMF_k20_score,WMF_k20_points,WMF_k30_score,WMF_k30_points,WMF_k40_score,WMF_k40_points,WMF_k50_score,WMF_k50_points,Borda Count,WMF Borda Count
0,0,0,2.265398,1494,2.110029,1495,2.110029,1495,2.071523,1488,...,2.806793,1608,3.367846,1626,4.248022,1646,3.843781,1637,2989,14111
1,0,1,0.36865,1003,0.807322,1026,0.807322,1026,1.29558,1259,...,-0.075428,330,-0.366332,236,-0.918094,44,0.795499,1260,2029,7150
2,0,2,1.420759,1314,1.648435,1378,1.648435,1378,1.456549,1313,...,2.486352,1571,2.044948,1519,2.325095,1559,0.936491,1315,2692,12639
3,0,3,0.448797,1034,0.4497,776,0.4497,776,-0.102504,232,...,-0.114164,262,-0.598675,130,-0.331122,243,0.106251,805,1810,3917
4,0,4,2.548217,1545,1.759084,1407,1.759084,1407,2.151837,1502,...,1.157579,1271,0.386267,993,1.672385,1454,0.668341,1197,2952,12121


'`all_df` - Comparison of all Models'

Unnamed: 0,user_idx,item_idx,item_id,WMF_123_score,WMF_456_score,WMF_789_score,WMF_888_score,WMF_999_score,WMF_k20_score,WMF_k30_score,WMF_k40_score,WMF_k50_score,WMF Linear Regression,WMF Random Forest,BPR_score,WMF_score,Borda Count,WMF Borda Count
0,0,0,381,2.110029,2.071523,1.903551,2.302063,3.117306,2.806793,3.367846,4.248022,3.843781,4.83385,4.34,2.265398,2.110029,2989,14111
1,0,1,602,0.807322,1.29558,0.918263,0.553786,0.588544,-0.075428,-0.366332,-0.918094,0.795499,0.039174,1.48,0.36865,0.807322,2029,7150
2,0,2,431,1.648435,1.456549,1.591913,1.271597,1.67771,2.486352,2.044948,2.325095,0.936491,1.534016,2.1,1.420759,1.648435,2692,12639
3,0,3,875,0.4497,-0.102504,0.439818,0.164991,-0.210225,-0.114164,-0.598675,-0.331122,0.106251,-0.184401,1.08,0.448797,0.4497,1810,3917
4,0,4,182,1.759084,2.151837,2.65951,1.573485,1.539425,1.157579,0.386267,1.672385,0.668341,0.835969,1.5,2.548217,1.759084,2952,12121


Now that we have all model scores in the same table. Let's calculate the same **Precision@K** AND **Recall@K** values as run in the experiments.

We do this by manually calculating recall values with the respective formulas.

In [21]:
models = ["BPR_score", "WMF_score", "Borda Count", "WMF Borda Count", "WMF Linear Regression", "WMF Random Forest"]

result_data = {
    "Metrics": ["Precision@50", "Recall@50"],
}

test_users = set(test_set.uir_tuple[0])
for model in tqdm(models):
    sorted_df = all_df.sort_values(model, ascending=False) # sort by predicted ratings
    precisions, recalls = [], []
    
    for uidx in test_users:
        true_top_k = test_set.user_data[uidx][0] # ground truth data
        predicted_top_k = sorted_df[sorted_df['user_idx'] == uidx]['item_idx'].values[:TOPK].astype(int)
        # Precision@K
        precision = len(set(true_top_k) & set(predicted_top_k)) / len(predicted_top_k)
        precisions.append(precision)
        # Recall@K
        recall = len(set(true_top_k) & set(predicted_top_k)) / len(true_top_k)
        recalls.append(recall)
        
    result_data[model] = [np.mean(precisions), np.mean(recalls)]
    # result_df[f"Recall@{TOPK}"].append(np.mean(recalls))

# Now let's take a look at the results
result_df = pd.DataFrame(result_data)

display("Base BPR, WMF comparison with Borda Count (BPR + WMF)", result_df[["Metrics", "BPR_score", "WMF_score"]])

display("WMF Models Comparison", result_df[["Metrics", "WMF Borda Count", "WMF Linear Regression", "WMF Random Forest"]])

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:15<00:00,  2.63s/it]


'Base BPR, WMF comparison with Borda Count (BPR + WMF)'

Unnamed: 0,Metrics,BPR_score,WMF_score
0,Precision@50,0.099574,0.103213
1,Recall@50,0.36385,0.372287


'WMF Models Comparison'

Unnamed: 0,Metrics,WMF Borda Count,WMF Linear Regression,WMF Random Forest
0,Precision@50,0.099745,0.075809,0.069128
1,Recall@50,0.379883,0.312031,0.283792


Based on the results, we can see varying results for Precision@50 and Recall@50.

However, there is no one model that is the 'ideal' model for all datasets and scenarios.
Depending on the dataset, different models may perform better than others.

---

Therefore, it is important to experiment with different models and ensemble methods to find the best model for a specific dataset and scenario.

Ensemble learning doesn't stop here. You could continue on by: 
- Trying different Cornac base models
- Trying different ensemble methods
- Tweaking around base models and different parameters within them

By having different base models with different specializations, model ensembling could leverage on the strengths of different models. Effectiveness of these models could depend on many factors such as diversity and quality of base models, dataset size and quality.

It is also important to note that while model ensembling theoretically provides superior performance, there could be instances where base models outperforms ensembled models. Model ensembling also requires more computational resources as well. Therefore, we should consider striking a balance between performanace and computational costs.

---

So what constitutes to a good ensembled model? Which base models and configurations are ideal? These are topics that require further experimentation and discussion.