# Re-Ranking for Topic Diversification

Recommender systems (RecSys) are built to recommend the most relevant things to the user - an ad that a customer is most likely to click on, a product that a customer is most likely to buy, and so on. In doing so, RecSys often exploit known preferences of the users to ensure that suggestions from it are relevant. This leads to an of echo chamber of sorts, for example, if you were to purchase a summer dress from a retail website, you might be shown other summer dresses everytime you visited because the RecSys knows that it is relevant to you.

However, this leads to many issues like:
- bad user experience from constantly seeing a product that might not be relevant to them anymore
- not exposing other products from the catalog that user might have bought leading to lost revenues for the company

There are several ways to resolve these issues:
- Ensure that the output of the model recommends a diverse list of topics by changes to features or type of model
- Re-ranking the items recommended by the RecSys to ensure it is diverse

For the purpose of this exercise, we will look at the latter, the problem of re-ranking. We have a dataset of movie ratings ([MovieLens Dataset](https://grouplens.org/datasets/movielens/1m/ "MovieLens 1M dataset")), a stable benchmark dataset for recommender systems. It has 1 million ratings from 6000 users on 4000 movies (or 4.16% user-movie interactions covered).

Our **baseline** RecSys will recommend 10 top-rated movies from this dataset. Our **improved** RecSys will try to bring diversity to this recommended list.

Reference: https://nbviewer.jupyter.org/github/david-cortes/datascienceprojects/blob/master/machine_learning/topic_diversification.ipynb

In [1]:
import pandas as pd

In [2]:
movies = pd.read_csv('data/ml-1m/movies.dat', sep='::', engine='python')
ratings = pd.read_csv('data/ml-1m/ratings.dat', sep='::', engine='python')
users = pd.read_csv('data/ml-1m/users.dat', sep='::', engine='python')

In [3]:
users

Unnamed: 0,UserID,Gender,Age,Occupation,Zip-code
0,1,F,1,10,48067
1,2,M,56,16,70072
2,3,M,25,15,55117
3,4,M,45,7,02460
4,5,M,25,20,55455
...,...,...,...,...,...
6035,6036,F,25,15,32603
6036,6037,F,45,1,76006
6037,6038,F,56,1,14706
6038,6039,F,45,0,01060


In [4]:
movies

Unnamed: 0,MovieID,Title,Genres
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
3878,3948,Meet the Parents (2000),Comedy
3879,3949,Requiem for a Dream (2000),Drama
3880,3950,Tigerland (2000),Drama
3881,3951,Two Family House (2000),Drama


In [5]:
ratings

Unnamed: 0,UserID,MovieID,Rating,Timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291
...,...,...,...,...
1000204,6040,1091,1,956716541
1000205,6040,1094,5,956704887
1000206,6040,562,5,956704746
1000207,6040,1096,4,956715648
