*Copyright (c) Cornac Authors. All rights reserved.*

*Licensed under the Apache 2.0 License.*

# Model Ensembling

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/PreferredAI/cornac/blob/master/tutorials/model_ensembling.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/PreferredAI/cornac/blob/master/tutorials/model_ensembling.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

This notebook provides an example of how to ensemble multiple recommendation models in Cornac.

Ensemble models is a technique that combines the predictions of multiple models to produce a single prediction. The idea is that by combining the predictions of multiple models, we can improve the overall performance of the recommendation system.

We will use the MovieLens 100K dataset and ensemble 2 models.

** Note: ** This notebook requires the `scikit-learn` package. 

## 1. Setup

### 1.1 Install required dependencies

In [3]:
! pip install seaborn scikit-learn

[0m

In [4]:
from IPython.display import display
import numpy as np
import pandas as pd

from cornac.datasets import movielens
from cornac.models import BPR, WMF
from cornac.eval_methods import RatioSplit
from cornac.metrics import Precision, Recall
from cornac.utils import cache
from cornac import Experiment

from sklearn import linear_model
from sklearn.ensemble import RandomForestRegressor

## 2. Prepare Experiment

### 2.1 Loading Dataset

First, we load the MovieLens 100K dataset.

In [5]:
data = movielens.load_feedback(variant="100K") # Load MovieLens Dataset

rs = RatioSplit(data, test_size=0.2, seed=42, verbose=True) # Split to train-test set to 80-20
train_set, test_set = rs.train_set, rs.test_set

rating_threshold = 1.0
exclude_unknowns = True
---
Training data:
Number of users = 943
Number of items = 1651
Number of ratings = 80000
Max rating = 5.0
Min rating = 1.0
Global mean = 3.5
---
Test data:
Number of users = 943
Number of items = 1651
Number of ratings = 19964
Number of unknown users = 0
Number of unknown items = 0
---
Total users = 943
Total items = 1651


### 2.2 Training BPR and WMF models

We will train two models: 

1. BPR (Bayesian Personalized Ranking)
2. WMF (Weighted Matrix Factorization)

In [6]:
bpr_model = BPR(k=50, max_iter=100, learning_rate=0.01, lambda_reg=0.001) # Initialize BPR model
wmf_model = WMF(k=50, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01,) # Initialize WMF model

models = [bpr_model, wmf_model]
metrics = [Precision(k=50), Recall(k=50)] # Set metrics for experiment

experiment = Experiment(rs, models, metrics, user_based=True).run() # Run Experiment to compare BPR model to WMF model individually


[BPR] Training started!

[BPR] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 1408.83it/s]



[WMF] Training started!


100%|██████████| 300/300 [00:24<00:00, 12.02it/s, loss=102] 


Learning completed!

[WMF] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 2018.96it/s]


TEST:
...
    | Precision@50 | Recall@50 | Train (s) | Test (s)
--- + ------------ + --------- + --------- + --------
BPR |       0.1861 |    0.5144 |    1.9351 |   0.6744
WMF |       0.1647 |    0.4895 |   85.5716 |   0.4705






Comparing Precision and Recall, both BPR and WMF are providing comparable results.

Let's move on to try to interpret these results by using the genres of movies that were recommended to us.

Generally, we could assume that if an individual likes a particular film genre like 'Romance', the recommender system should provide more of such 'Romance' films.

### 2.3 Interpreting Results

##### Creating a Movie Genre Dataframe

In [7]:
# Creating a dataframe of movies with its corresponding genres

# Download some information of MovieLens 100K dataset
item_df = pd.read_csv(
  cache("http://files.grouplens.org/datasets/movielens/ml-100k/u.item"), 
  sep="|", encoding="ISO-8859-1",
  names=["ItemID", "Title", "Release Date", "Video Release Date", "IMDb URL", 
         "unknown", "Action", "Adventure", "Animation", "Children's", "Comedy", 
         "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror", 
         "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"]
).set_index("ItemID").drop(columns=["Video Release Date", "IMDb URL", "unknown"])

item_idx2id = list(train_set.item_ids) # create a item index to film ID mapping

# Let's take a look at an example of this dataframe
display(item_df.head(3))
     

Unnamed: 0_level_0,Title,Release Date,Action,Adventure,Animation,Children's,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,Toy Story (1995),01-Jan-1995,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,GoldenEye (1995),01-Jan-1995,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
3,Four Rooms (1995),01-Jan-1995,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


The `item_df` dataframe consists of all movie items with its corresponding genre attributes.

Further down below, we are going to filter this table with the recommendations that we get from the recommender system models we created to get a better sense of its performance.

##### Creating Training Data Dataframe

To get a sense of what data has been inserted into our model for training, let's count the genres of the training data used to train the model.

But first, let's create a `training_data_df` dataframe with all training data.

The training data consists of 80000 triplets of **User Index**, **Item Index** and **Rating** rows as seen in the dataset summary in Section 2.1.

In [8]:
# Create a training data dataframe
training_data_df = pd.DataFrame(zip(*train_set.uir_tuple)) # adding all training data into dataframe
training_data_df.columns = ['user_idx', 'item_idx', 'rating'] # adding column names to the data
training_data_df['item_id'] = training_data_df.apply(lambda row: item_idx2id[int(row['item_idx'])], axis=1) # adding item ids to the data by converting item index to item ids

# Let's view a sample of the training data dataframe
display(training_data_df.head(3))

Unnamed: 0,user_idx,item_idx,rating,item_id
0,0,0,4.0,381
1,1,1,3.0,602
2,2,2,4.0,431


##### Filtering Training Data

Let's filter based on a particular user to learn more about the user.

We set ``UIDX`` to user index **3**, and ``TOPK`` to **50**, to get the top 50 recommendations in each model for comparison.

In [9]:
UIDX = 3
TOPK = 50

In [10]:
# Filter training data (rating = 5.0 and user index = UIDX)
filter_df = training_data_df[(training_data_df['rating'] == 5.0) & (training_data_df['user_idx'] == UIDX)]
filter_df = item_df.loc[[int(item_id) for item_id in filter_df["item_id"]]] # get genres of movie items

# Group by Movie Genre and Sum by genres
filter_df = filter_df.select_dtypes(np.number).sum() 
filter_df = filter_df.to_frame("Sum") # Let's call that column 'Sum'

# Add a new column '%' for the percentage of individual genre sum compared to total sum
filter_df["%"] = filter_df["Sum"] / filter_df["Sum"].sum() * 100
filter_df["%"] = filter_df["%"].round(1)

# Let's see the training data genres, sums and percentages
print("Movies rated 5.0 by user index 3 in training data")
display(filter_df.sort_values("Sum", ascending=False))

Movies rated 5.0 by user index 3 in training data


Unnamed: 0,Sum,%
Drama,71,24.9
Comedy,39,13.7
Romance,32,11.2
Action,30,10.5
Thriller,29,10.2
Adventure,19,6.7
War,15,5.3
Crime,12,4.2
Sci-Fi,9,3.2
Mystery,8,2.8


As shown above in the training data, the top genres for user index 3 with movies rated 5.0 include 'Drama', 'Comedy', 'Romance', 'Action' and 'Thriller'.

Let's now compare them to the recommendations of the BPR and WMF models respectively.

##### Interpreting Recommendations of BPR, WMF Models

In [11]:
# Get the Top 5 Genres in filtered training data for user index 3
top_genres = filter_df.sort_values("Sum", ascending=False).head(5).index.tolist()
print("\nTop 5 Genres in training data:", top_genres)

# Get top K recommendations for BPR
bpr_recommendations, bpr_scores = bpr_model.rank(UIDX) # rank recommendations by score
bpr_topk = [item_idx2id[iidx] for iidx in bpr_recommendations[:TOPK]] # convert item indexes into item ids
bpr_df = item_df.loc[[int(iid) for iid in bpr_topk]] # filter the movie genre dataframe by item ids

# Let's view the top recommendations for BPR by top genres
display("BPR: Top recommendations", bpr_df[["Title"] + top_genres].head(10))

# Now, let's do likewise for WMF
wmf_recommendations, wmf_scores = wmf_model.rank(UIDX) # rank recommendations by score
wmf_topk = [item_idx2id[iidx] for iidx in wmf_recommendations[:TOPK]] # convert item indexes into item ids
wmf_df = item_df.loc[[int(iid) for iid in wmf_topk]] # filter the movie genre dataframe by item ids

# View the top recommendations for WMF
display("WMF: Top recommendations", wmf_df[["Title"] + top_genres].head(10))


Top 5 Genres in training data: ['Drama', 'Comedy', 'Romance', 'Action', 'Thriller']


'BPR: Top recommendations'

Unnamed: 0_level_0,Title,Drama,Comedy,Romance,Action,Thriller
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
402,Ghost (1990),0,1,1,0,1
313,Titanic (1997),1,0,1,1,0
215,Field of Dreams (1989),1,0,0,0,0
237,Jerry Maguire (1996),1,0,1,0,0
655,Stand by Me (1986),1,1,0,0,0
245,"Devil's Own, The (1997)",1,0,0,1,1
318,Schindler's List (1993),1,0,0,0,0
216,When Harry Met Sally... (1989),0,1,1,0,0
69,Forrest Gump (1994),0,1,1,0,0
328,Conspiracy Theory (1997),0,0,1,1,1


'WMF: Top recommendations'

Unnamed: 0_level_0,Title,Drama,Comedy,Romance,Action,Thriller
ItemID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
272,Good Will Hunting (1997),1,0,0,0,0
318,Schindler's List (1993),1,0,0,0,0
485,My Fair Lady (1964),0,0,1,0,0
313,Titanic (1997),1,0,1,1,0
357,One Flew Over the Cuckoo's Nest (1975),1,0,0,0,0
1119,Some Kind of Wonderful (1987),1,0,1,0,0
194,"Sting, The (1973)",0,1,0,0,0
651,Glory (1989),1,0,0,1,0
181,Return of the Jedi (1983),0,0,1,1,0
12,"Usual Suspects, The (1995)",0,0,0,0,1


Now that we have seen the top recommendations of the BPR and WMF models, let's do a comparison by taking a look at the genre distribution.

##### Comparing Models by Genre Distribution

In [12]:
# Let's introduce `combined_df` for comparison.
# This dataframe will be used to compare models by summing up genres from recommendations of different models
combined_df = pd.DataFrame({
    "Train Data %": filter_df["%"],
    "BPR Sum": bpr_df.select_dtypes(np.number).sum(), # group by genres, then get sum of each genre
    "WMF Sum": wmf_df.select_dtypes(np.number).sum() # likewise for WMF
})

# Get percentages of movie genre sums
combined_df['BPR %'] = combined_df['BPR Sum'] / combined_df['BPR Sum'].sum() * 100 
combined_df["WMF %"] = combined_df["WMF Sum"] / combined_df["WMF Sum"].sum() * 100

combined_df = combined_df.round(1) # round all 
combined_df = combined_df.sort_values("Train Data %", ascending=False)

# Let's take a look at the genre distribution by percentages
display("Train Data to Recommended % Distribution", combined_df[['Train Data %', 'BPR %', 'WMF %']])

'Train Data to Recommended % Distribution'

Unnamed: 0,Train Data %,BPR %,WMF %
Drama,24.9,21.2,22.4
Comedy,13.7,14.4,12.1
Romance,11.2,18.6,12.1
Action,10.5,11.0,10.3
Thriller,10.2,8.5,11.2
Adventure,6.7,7.6,4.7
War,5.3,6.8,5.6
Crime,4.2,0.0,5.6
Sci-Fi,3.2,5.1,4.7
Mystery,2.8,1.7,3.7


Now that we have seen the distribution of individual models, we are curious about what kind of distribution we will get from ensembling these models.

Let's see what happens when we ensemble these two models. 

## 2. Simple model ensembling by Borda Count

We will ensemble the two models using the Borda Count method. The Borda Count method is a simple voting method that ranks the items based on the sum of their ranks from each model.

Assuming that we have a list of 5 items, the Borda Count method works as follows:

1. For each model, rank the items from 1 to 5 based on the predicted scores.
2. Sum the ranks of each item across all models.
3. Sort the items based on the sum of their ranks.
4. The top-ranked item is the final recommendation.
5. Repeat the process for the next user.

Given the below example for a random user 123:

| Rank | Model 1 | Model 2 | Model 3 | Allocated Points (N - rank) |
|------|---------|---------|---------|-----------------------------|
| 1    | A       | D       | E       | 5 - 1 = 4                   |
| 2    | B       | C       | A       | 5 - 2 = 3                   |
| 3    | C       | A       | B       | 5 - 3 = 2                   |
| 4    | D       | B       | D       | 5 - 4 = 1                   |
| 5    | E       | E       | C       | 5 - 5 = 0                   |

The final ranking is as follows after summing them all up

| Item | Total Points     |
|------|------------------|
| A    | 4 + 2 + 3 = 9    |
| B    | 3 + 1 + 2 = 6    |
| D    | 1 + 4 + 1 = 6    |
| C    | 2 + 3 + 0 = 5    |
| E    | 0 + 0 + 4 = 4    |

New ranking: A > B, D > C > E


Lets implement this method below by using pandas `DataFrame` for data manipulation.

In [13]:
# Let's create a new dataframe to calculate ranking and borda count
rank_df = pd.DataFrame({
    "ItemID": item_idx2id,
})

total_items = len(rank_df) # 1651 items

# Obtain inverse ranks of the items based on the BPR score
rank_df["BPR Score"] = bpr_scores
rank_df["BPR Rank"] = rank_df["BPR Score"].rank(ascending=False).astype(int) # Get Rank where 1 = Top recommendation
rank_df["BPR Inverse Rank"] = total_items - rank_df["BPR Rank"] # Get inverse rank by calculating ('Total Item count' - 'Rank')

# Do likewise for WMF
rank_df["WMF Score"] = wmf_scores
rank_df["WMF Rank"] = rank_df["WMF Score"].rank(ascending=False).astype(int) # Get Rank where 1 = Top recommendation
rank_df["WMF Inverse Rank"] = total_items - rank_df["WMF Rank"] # Get inverse rank by calculating ('Total Item count' - 'Rank')

# Get Borda Count by summing up inverse ranks of BPR and WMF
rank_df["Borda Count"] = rank_df["BPR Inverse Rank"] + rank_df["WMF Inverse Rank"]

# Round decimal places for readability purposes
rank_df = rank_df.round(3)

# Now let's take a look at the table with Borda Count 
display(rank_df)

Unnamed: 0,ItemID,BPR Score,BPR Rank,BPR Inverse Rank,WMF Score,WMF Rank,WMF Inverse Rank,Borda Count
0,381,1.659,362,1289,2.863,370,1281,2570
1,602,-0.290,828,823,0.726,921,730,1553
2,431,0.746,569,1082,3.335,292,1359,2441
3,875,1.197,473,1178,1.825,584,1067,2245
4,182,2.207,239,1412,3.707,216,1435,2847
...,...,...,...,...,...,...,...,...
1646,1635,-2.666,1560,91,0.402,1068,583,674
1647,1650,-2.671,1562,89,0.410,1062,589,678
1648,1647,-2.731,1584,67,0.308,1129,522,589
1649,1663,-2.575,1525,126,0.040,1337,314,440


Now that we have Borda Count, let's rerank this list and to provide the ensembled model's recommendation.

In [14]:
# Introduce reranked dataframe for borda count
reranked_df = rank_df.sort_values("Borda Count", ascending=False)

# Let's take a look at the ensembled top 5 recommendations and their respective ranks
display("Re-ranked Top K Item recommendations", reranked_df.head(5))



'Re-ranked Top K Item recommendations'

Unnamed: 0,ItemID,BPR Score,BPR Rank,BPR Inverse Rank,WMF Score,WMF Rank,WMF Inverse Rank,Borda Count
152,313,5.029,2,1649,5.989,4,1647,3296
37,318,4.625,7,1644,6.084,2,1649,3293
279,402,5.682,1,1650,5.507,18,1633,3283
305,181,4.252,18,1633,5.766,9,1642,3275
382,655,4.673,5,1646,5.336,27,1624,3270


The top recommendation ItemID **181**, was ranked **7** on BPR and **12** on WMF.

By doing Borda Count, we are able to aggregate model recommendations.

Next, let's add the recommendations into the genre distribution dataframe to compare its results to the base models.

In [15]:
borda_count_topk = reranked_df["ItemID"].values[:TOPK] # Get top K (50) Item IDs

borda_df = item_df.loc[[int(i) for i in borda_count_topk]] # Filter genre data frame by the top item IDs

# Add Borda Count results into 'combined_df' dataframe for comparison
combined_df["Borda Count Sum"] = borda_df.select_dtypes(np.number).sum() # group by genre, and calculate sum of each genre
combined_df["Borda Count %"] = combined_df["Borda Count Sum"] / combined_df["Borda Count Sum"].sum() * 100 # Calculate percentage of sum to total
combined_df["Borda Count %"] = combined_df["Borda Count %"].round(1) # rounding for readability purposes

# Let's take a look at the genre distribution of train data, BPR, WMF and the newly added Borda Count
display("Borda Count Recommendations Distribution", combined_df[["Train Data %", "BPR %", "WMF %", "Borda Count %"]])

'Borda Count Recommendations Distribution'

Unnamed: 0,Train Data %,BPR %,WMF %,Borda Count %
Drama,24.9,21.2,22.4,21.0
Comedy,13.7,14.4,12.1,10.9
Romance,11.2,18.6,12.1,15.1
Action,10.5,11.0,10.3,10.9
Thriller,10.2,8.5,11.2,6.7
Adventure,6.7,7.6,4.7,7.6
War,5.3,6.8,5.6,7.6
Crime,4.2,0.0,5.6,2.5
Sci-Fi,3.2,5.1,4.7,5.0
Mystery,2.8,1.7,3.7,3.4


In the next section, we will further add more models to the ensemble.

## 3. Adding more models to the Borda Count ensemble

We can easily add more models to the ensemble by training them and adding them. One approach is to train a model with different initializations using different random seeds. By adding multiple similar models of different random seeds (`seed=123`), some models could perform better for a set of users, while other models could perform better for another set of users.

By ensembling these models, we could potentially achieve better performance when combined.

Let's try adding a few more similar models with different random seed initializations.

In [16]:
# WMF models with different seeds
wmf_model_123 = WMF(name="WMF_123", k=50, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=123)
wmf_model_456 = WMF(name="WMF_456", k=50, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=456)
wmf_model_789 = WMF(name="WMF_789", k=50, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=789)
wmf_model_888 = WMF(name="WMF_888", k=50, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=888)
wmf_model_999 = WMF(name="WMF_999", k=50, max_iter=300, a=1.0, b=0.1, learning_rate=0.001, lambda_u=0.01, lambda_v=0.01, seed=999)

models = [wmf_model_123, wmf_model_456, wmf_model_789, wmf_model_888, wmf_model_999]

# Let's run an experiment to take a look at how different these models are, with just different random seeds!
experiment = Experiment(rs, models, metrics, user_based=True).run()


[WMF_123] Training started!


100%|██████████| 300/300 [00:24<00:00, 12.05it/s, loss=102] 


Learning completed!

[WMF_123] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 2100.41it/s]



[WMF_456] Training started!


100%|██████████| 300/300 [00:24<00:00, 12.29it/s, loss=103] 


Learning completed!

[WMF_456] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 2111.88it/s]



[WMF_789] Training started!


100%|██████████| 300/300 [00:25<00:00, 11.70it/s, loss=102] 


Learning completed!

[WMF_789] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 2217.32it/s]



[WMF_888] Training started!


100%|██████████| 300/300 [00:25<00:00, 11.57it/s, loss=103] 


Learning completed!

[WMF_888] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 1633.41it/s]



[WMF_999] Training started!


100%|██████████| 300/300 [00:22<00:00, 13.56it/s, loss=102] 


Learning completed!

[WMF_999] Evaluation started!


Ranking: 100%|██████████| 940/940 [00:00<00:00, 2188.91it/s]


TEST:
...
        | Precision@50 | Recall@50 | Train (s) | Test (s)
------- + ------------ + --------- + --------- + --------
WMF_123 |       0.1666 |    0.4968 |   25.1076 |   0.4529
WMF_456 |       0.1651 |    0.4894 |   24.5397 |   0.4487
WMF_789 |       0.1668 |    0.4945 |   25.8583 |   0.4287
WMF_888 |       0.1668 |    0.4963 |   26.0542 |   0.5804
WMF_999 |       0.1656 |    0.4937 |   22.2591 |   0.4341






Based on the results, we can see that even within the same model, the results can vary. 

Let's try ensembling all these models together into 1 single model by Borda Count, and look at its recommendations.

In [17]:
# Let's create a different dataframe to calculate ranking and borda count
rank_2_df = pd.DataFrame({
    "ItemID": item_idx2id,
})

# Add a column named 'Enhanced Borda Count'
rank_2_df["Enhanced Borda Count"] = 0

# Calculate the inverse rank for each of the models and accumulate them into the 'Enhanced Borda Count' column
for model in models:
    name = model.name
    recommendations, scores = model.rank(UIDX)
    rank_2_df[name + "_rating"] = scores
    rank_2_df[name + "_rank"] = rank_2_df[name + "_rating"].rank(ascending=False).astype(int)
    rank_2_df[name + "_inverse_rank"] = total_items - rank_2_df[name + "_rank"]
    rank_2_df["Enhanced Borda Count"] = rank_2_df["Enhanced Borda Count"] + rank_2_df[name + "_inverse_rank"]

# Round results for readability
rank_2_df = rank_2_df.round(3)

# Let's sort and view the top recommendations!
print("Model score calculation:")
display(rank_2_df[["WMF_123_inverse_rank", "WMF_456_inverse_rank", "WMF_789_inverse_rank", "WMF_888_inverse_rank", "WMF_999_inverse_rank", "Enhanced Borda Count"]].sort_values("Enhanced Borda Count", ascending=False).head(10))

Model score calculation:


Unnamed: 0,WMF_123_inverse_rank,WMF_456_inverse_rank,WMF_789_inverse_rank,WMF_888_inverse_rank,WMF_999_inverse_rank,Enhanced Borda Count
37,1649,1641,1649,1650,1649,8238
670,1642,1636,1631,1638,1642,8189
197,1620,1635,1630,1623,1650,8158
156,1630,1611,1642,1619,1613,8115
527,1644,1623,1633,1568,1645,8113
305,1576,1645,1638,1649,1604,8112
246,1648,1643,1626,1639,1538,8094
386,1625,1620,1611,1626,1611,8093
122,1619,1603,1615,1644,1603,8084
322,1635,1640,1614,1606,1586,8081


In [18]:
# Now, let's add them to the combined dataframe for comparison with earlier models
enhanced_borda_count_topk = list(rank_2_df.sort_values("Enhanced Borda Count", ascending=False)["ItemID"].values[:TOPK])
enhanced_borda_df = item_df.loc[[int(i) for i in enhanced_borda_count_topk]]

combined_df["Enhanced Borda Count Sum"] = enhanced_borda_df.select_dtypes(np.number).sum()
combined_df["Enhanced Borda Count %"] = combined_df["Enhanced Borda Count Sum"] / combined_df["Enhanced Borda Count Sum"].sum() * 100
combined_df["Enhanced Borda Count %"] = combined_df["Enhanced Borda Count %"].round(1)

# Let's compare the recommendation distribution
display("Combined Recommendations Distribution", combined_df[["Train Data %", "BPR %", "WMF %", "Borda Count %", "Enhanced Borda Count %"]])

'Combined Recommendations Distribution'

Unnamed: 0,Train Data %,BPR %,WMF %,Borda Count %,Enhanced Borda Count %
Drama,24.9,21.2,22.4,21.0,28.3
Comedy,13.7,14.4,12.1,10.9,7.5
Romance,11.2,18.6,12.1,15.1,10.4
Action,10.5,11.0,10.3,10.9,12.3
Thriller,10.2,8.5,11.2,6.7,9.4
Adventure,6.7,7.6,4.7,7.6,10.4
War,5.3,6.8,5.6,7.6,5.7
Crime,4.2,0.0,5.6,2.5,3.8
Sci-Fi,3.2,5.1,4.7,5.0,4.7
Mystery,2.8,1.7,3.7,3.4,2.8


Now that we have touched on simple borda count, let's see how we could use other methods and popular packages such as **scikit-learn** to do advanced model ensembling.

## 4. Advanced Model Ensembling

We could continue by thinking of this as a meta-learning problem. We could treat recommendations of each base model as features and train a meta-learner to predict the final recommendation.

**Meta-learning**, also called 'learning to learn', is a method to teach models to learn and adapt to new tasks.
This could be any ML model such as a Linear Regression, Random Forest, Gradient Boosting, or even a Neural Network.

In this example, we will use a simple Linear Regression model to predict the final recommendation.

We will teach a model to learn from the different outputs of the base models of WMF.

##### 4.1 Prepare Data

In [19]:
# First, lets create training and test data dataframes
training_df = pd.DataFrame(zip(*train_set.uir_tuple)) # Add 'User Index', 'Item Index', 'Rating' triples as records in dataframe
training_df.columns = ['user_idx', 'item_idx', 'ground_rating'] # Set column names

# Similar to train dataframe
test_df = pd.DataFrame(zip(*test_set.uir_tuple))
test_df.columns = ['user_idx', 'item_idx', 'ground_rating']
test_df['item_id'] = test_df.apply(lambda row: item_idx2id[int(row['item_idx'])], axis=1) # Add 'Item ID' column into dataframe by converting 'Item Index' to 'Item ID'

# Lets get all the scores for the models trained in Part 3.

# For each model, we add individual predicted ratings by individual models to training and test dataframes
for model in models:
    name = model.name
    training_df[name + "_rating"] = training_df.apply(lambda row: model.score(int(row['user_idx']), int(row['item_idx'])), axis=1)
    test_df[name + "_rating"] = test_df.apply(lambda row: model.score(int(row['user_idx']), int(row['item_idx'])), axis=1)

# Let's pick out the 5 features - predicted ratings from the 5 models trained
X_train = training_df[['WMF_123_rating', 'WMF_456_rating', 'WMF_789_rating', 'WMF_888_rating', 'WMF_999_rating']]
y_train = training_df['ground_rating'] # use ground truth to train this linear regression model
X_test = test_df[['WMF_123_rating', 'WMF_456_rating', 'WMF_789_rating', 'WMF_888_rating', 'WMF_999_rating']] # test data, used to predict values for comparison

display("X_train", X_train.head(3)) # predicting ratings as features
display("y_train", y_train.head(3)) # ground truth ratings
display("X_test", X_test.head(3)) # test set

'X_train'

Unnamed: 0,WMF_123_rating,WMF_456_rating,WMF_789_rating,WMF_888_rating,WMF_999_rating
0,3.840917,4.167051,4.315118,4.39922,3.9744
1,2.303503,2.667568,2.825883,2.917715,3.102975
2,3.115823,3.776977,3.64992,3.028431,3.86547


'y_train'

0    4.0
1    3.0
2    4.0
Name: ground_rating, dtype: float64

'X_test'

Unnamed: 0,WMF_123_rating,WMF_456_rating,WMF_789_rating,WMF_888_rating,WMF_999_rating
0,1.193508,4.461263,1.642062,3.039706,2.17284
1,3.526569,3.566906,4.210879,3.870379,3.917837
2,1.772423,3.50966,2.643438,2.83223,1.559444


Now that we have already prepared the data for fitting into a scikit-learn model, let's first try to train a Linear Regression model

##### 5.2 Fitting Linear Regression Model

In [20]:
# Let's now fit into a Linear Regression model
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train) # train the model

# Input: 5 base model predicted ratings. Output: final predicted rating based on linear regression
y_pred = regr.predict(X_test) # Get predictions based on trained model

test_df["Linear Regression Rating"] = y_pred # create a column in `test_df` for the predictions

# Get Top K ratings from predictions
test_df = test_df.sort_values("Linear Regression Rating", ascending=False) # sort by predicted ratings
top_item_ids = test_df[test_df['user_idx'] == UIDX]['item_id'].values[:TOPK] # filter top K (50 as set in Section 2.3)

# Place them into the comparison distribution dataframe
linear_regression_df = item_df.loc[[int(i) for i in top_item_ids]] # Get genres of ratings
combined_df["Linear Regression Sum"] = linear_regression_df.select_dtypes(np.number).sum() # group by genre and sum them up
combined_df["Linear Regression %"] = combined_df["Linear Regression Sum"] / combined_df["Linear Regression Sum"].sum() * 100 # get percentages of (genre sum / whole sum)

combined_df["Linear Regression %"] = combined_df["Linear Regression %"].round(1) # round values for readability

# Now let's take a look at how the genre distribution is
display("Combined Recommendations Distribution", combined_df[["Train Data %", "BPR %", "WMF %", "Borda Count %", "Enhanced Borda Count %", "Linear Regression %"]])

'Combined Recommendations Distribution'

Unnamed: 0,Train Data %,BPR %,WMF %,Borda Count %,Enhanced Borda Count %,Linear Regression %
Drama,24.9,21.2,22.4,21.0,28.3,18.2
Comedy,13.7,14.4,12.1,10.9,7.5,15.5
Romance,11.2,18.6,12.1,15.1,10.4,13.6
Action,10.5,11.0,10.3,10.9,12.3,11.8
Thriller,10.2,8.5,11.2,6.7,9.4,10.9
Adventure,6.7,7.6,4.7,7.6,10.4,5.5
War,5.3,6.8,5.6,7.6,5.7,2.7
Crime,4.2,0.0,5.6,2.5,3.8,1.8
Sci-Fi,3.2,5.1,4.7,5.0,4.7,2.7
Mystery,2.8,1.7,3.7,3.4,2.8,0.9


We have successfully fit into a Linear Regression model using the 5 WMF base models built on Cornac.

You could continue using different regression models on **scikit-learn** to further predict ratings.

In the section below, we will train a Random Forest model.

##### 5.3 Fitting Random Forest Model



In [21]:
# Let's now train a Random Forest model
randomforest_model = RandomForestRegressor(n_estimators=300, random_state=42) 
randomforest_model.fit(X_train, y_train) # Train the model

# Input: 5 base model predicted ratings. Output: final predicted rating based on random forest
y_pred = randomforest_model.predict(X_test)

test_df["Random Forest Rating"] = y_pred # create a column in `test_df` for the predictions

# Get Top K ratings from predictions
test_df = test_df.sort_values("Random Forest Rating", ascending=False) # sort by predicted ratings
top_item_ids = test_df[test_df['user_idx'] == UIDX]['item_id'].values[:TOPK] # filter top K (50 as set in Section 2.3)

# Place them into the comparison distribution dataframe
random_forest_df = item_df.loc[[int(i) for i in top_item_ids]] # Get genres of ratings
combined_df["Random Forest Sum"] = random_forest_df.select_dtypes(np.number).sum() # group by genre and sum them up
combined_df["Random Forest %"] = combined_df["Random Forest Sum"] / combined_df["Random Forest Sum"].sum() * 100 # get percentages of (genre sum / whole sum)

combined_df["Random Forest %"] = combined_df["Random Forest %"].round(1) # round values for readability

# Now let's take a look at how the genre distribution is
display("Combined Recommendations Distribution", combined_df[["Train Data %", "BPR %", "WMF %", "Borda Count %", "Enhanced Borda Count %", "Linear Regression %", "Random Forest %"]])

'Combined Recommendations Distribution'

Unnamed: 0,Train Data %,BPR %,WMF %,Borda Count %,Enhanced Borda Count %,Linear Regression %,Random Forest %
Drama,24.9,21.2,22.4,21.0,28.3,18.2,21.6
Comedy,13.7,14.4,12.1,10.9,7.5,15.5,14.7
Romance,11.2,18.6,12.1,15.1,10.4,13.6,15.7
Action,10.5,11.0,10.3,10.9,12.3,11.8,11.8
Thriller,10.2,8.5,11.2,6.7,9.4,10.9,12.7
Adventure,6.7,7.6,4.7,7.6,10.4,5.5,5.9
War,5.3,6.8,5.6,7.6,5.7,2.7,2.9
Crime,4.2,0.0,5.6,2.5,3.8,1.8,0.0
Sci-Fi,3.2,5.1,4.7,5.0,4.7,2.7,2.9
Mystery,2.8,1.7,3.7,3.4,2.8,0.9,2.0


What happened in Section 5 was that we trained a linear regression model and random forest model to learn from the different base models, and adapting to the different changes of results.

Ensemble learning doesn't stop here. You could continue on to try different Cornac base models and ensemble methods, which could potentially improve the overall performance of your experiments and model deployments. 