# Mitigating bias in recommender systems

This is an introduction to fairness in recommender systems. A recommender system aims to recommend the best item according to the user preference.

A recommender system can be biased in multiple ways. For example, we may be concerned that the items in our database will not get equal representation (item fairness). Alternative, our main concern may be that different groups of users (e.g. male/female users) will get different item recommendations (user fairness). In the following, we will show how to mitigate item fairness.

### 0 - Importing modules and loading the data

We will start by importing the example [dataset](https://www.openml.org/search?type=data&sort=runs&id=45050&status=active), which we host on our library via openml. This dataset contains user-item interactions. The users are Amazon users and the items are electronics items from the Amazon website. An interaction happens when a user rates a given item, on a scale of 1 to 5. We sample 10K rows from this dataset, because recommender models are notoriously slow to train!

In [None]:
!pip install holisticai

In [1]:
# base imports
import numpy as np
import pandas as pd
from collections import defaultdict

In [1]:
# fetch data from openml
from sklearn.datasets import fetch_openml
bunch = fetch_openml(data_id='45050')
df = bunch['frame']
df

Unnamed: 0,User,Item,Rating
0,AKM1MP6P0OYPR,0132793040,5.0
1,A2CX7LUOHB2NDG,0321732944,5.0
2,A2NWSAGRHCP8N5,0439886341,1.0
3,A2WNBOD3WNDNKT,0439886341,3.0
4,A1GI0U4ZRJA8WN,0439886341,1.0
...,...,...,...
9995,A1YY6103EIE3H4,B00000J1F3,5.0
9996,A1NDLPA3KGGPSM,B00000J1F3,5.0
9997,A3QP5ASI0AJJU1,B00000J1F3,4.0
9998,A2AKNCSNUEQKSZ,B00000J1F3,5.0


In [3]:
# HELPER TOOLS -- NOTHING TO DO
def explode(arr, num_items):
    out = np.zeros(num_items)
    out[arr] = 1
    return out

def recommended_items(model_pred, data_matrix, k):
    recommended_items_mask = data_matrix>0
    candidate_index = ~recommended_items_mask
    candidate_rating = model_pred*candidate_index
    return np.argsort(-candidate_rating,axis=1)[:,:k]

def recommender_rmse(mat_pred, mat_true):
    mask = mat_true>0
    rmse = np.sqrt(np.sum(np.power(mat_pred-mat_true,2)*mask)/np.sum(mask))
    return rmse

def recommender_mae(mat_pred, mat_true):
    mask = mat_true>0
    mae = np.sum(np.abs(mat_pred-mat_true)*mask)/np.sum(mask)
    return mae

def get_top_n(predictions, n=10):
    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

### 1 - Pre-processing the data
We pivot the data to an interaction matrix. When we have interaction data in a column format, it is useful to 'pivot' it into an interaction matrix. The rows of this matrix represent the users, the columns represent the items, and each interaction results in a non-NaN entry within the matrix containing the rating. These matrices are usually very sparse, hence the vast majority of entries are NaNs. For simplicity, we replace the NaN values with 0 in the data matrix.

In [5]:
df_pivot = df.pivot_table(index='User', columns='Item', values='Rating', aggfunc='mean')
user_dict = dict(zip(df_pivot.index, range(len(df_pivot.index))))
item_dict = dict(zip(df_pivot.columns, range(len(df_pivot.columns))))
data_matrix = np.nan_to_num(df_pivot.to_numpy(), nan=0)
data_matrix

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

### 2 - Train a Baseline model (Do Not Rerun this section)
We use an out of the box NMF model using surprise. NMF is non negative matrix factorization and is a common approach to collaborative filtering. Documentation can be found here : https://surprise.readthedocs.io/en/stable/matrix_factorization.html.

Import and train a model

In [117]:
# imports and load data into surprise
from surprise import Reader, Dataset, NMF, accuracy
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[["User",	"Item",	"Rating"]], reader=reader)

In [118]:
# train and testsets
trainset = data.build_full_trainset()
testset = trainset.build_anti_testset()

In [119]:
# define model and train it
mf = NMF(n_factors = 40, biased=False)
mf.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.NMF at 0x7fbc5988d040>

In [120]:
# predict the unknown values
predictions = mf.test(testset)

Computing the accuracy metrics!

In [121]:
accuracy.mae(predictions)

MAE:  1.0346


1.0346441237569046

In [122]:
accuracy.rmse(predictions)

RMSE: 1.3645


1.3644577073657274

We get the following results:

| Metric | Value | Reference |
| --- | --- | --- |
| RMSE | 1.36| 0 |
| MAE | 1.04 | 0 |




Once we have the scores, we need to predict the items our system will recommend. We choose a top K approach (with K=50). For each user, we recommend the top K highest scoring items (that are not in the training data).

In [123]:
top_n = get_top_n(predictions, n=50)

In [124]:
# predictions matrix (top 50 for each user)
mat = np.zeros(data_matrix.shape)

for key, el in top_n.items():
    key_index = user_dict[key]
    item_indices = [item_dict[code[0]] for code in el]
    mat[key_index,:] = explode(item_indices, data_matrix.shape[1])

Computing the bias metrics!

In [125]:
# compute bias metrics
from holisticai.bias.metrics import recommender_bias_metrics
recommender_bias_metrics(mat_pred=mat, metric_type='item_based')

  return -np.sum(np.where(p != 0, p * np.log(p), 0))
  return -np.sum(np.where(p != 0, p * np.log(p), 0))


Unnamed: 0_level_0,Value,Reference
Metric,Unnamed: 1_level_1,Unnamed: 2_level_1
Aggregate Diversity,0.499058,1
GINI index,0.906853,0
Exposure Distribution Entropy,6.762744,-
Average Recommendation Popularity,1236.795483,-


We get the following results:

| Metric | Value | Reference |
| --- | --- | --- |
| Aggregate Diversity | 0.5 | 1 |
| Gini Index   | 0.91 | 0 |
| Avg Recommendation Pop   | 1237 | - |




### 3 - Mitigating bias (Inprocessing)
We will now show how we can mitigate bias using the holisticai library. More specifically we will focus on item fairness, and use Blind Spot Aware Matrix Factorization.

Reference:
        Sun, Wenlong, et al. "Debiasing the human-recommender system
        feedback loop in collaborative filtering." Companion Proceedings
        of The 2019 World Wide Web Conference. 2019.

In [6]:
# Imports
from holisticai.bias.mitigation import BlindSpotAwareMF

<font color='red'>  **Task 1**
- **Train a BlindSpotAwareMF model with parameters K=40, beta=30, steps=150, alpha=0.001, lamda=3, verbose=1**
<font >

In [7]:
# TODO


# predictions
mat_pred = mf.pred

100%|██████████| 150/150 [00:34<00:00,  4.29it/s]


In [9]:
# Efficacy metric
# TODO
recommender_mae(mf.pred, data_matrix)

1.067001936688743

In [10]:
recommender_rmse(mf.pred, data_matrix)

1.1768928675991843

<font color='red'>  **Task 2**
- **Evaluate your Model's efficacy**
<font >

Use `recommender_mae` and `recommender_rmse` functions provided

Recommend top 50 scoring items according to our model (that are not in training set), and format them into an interaction matrix.

In [8]:
new_items = recommended_items(mat_pred, data_matrix, 50)
new_recs = [explode(new_items[u], len(df_pivot.columns)) for u in range(df_pivot.shape[0])]
new_df_pivot_db = pd.DataFrame(new_recs, columns = df_pivot.columns)
mat = new_df_pivot_db.replace(0,np.nan).to_numpy()

<font color='red'>  **Task 3**
- **Evaluate your Model's bias. Remember to do the analysis on the recommended items (mat).**
<font >

In [None]:
# Bias metrics
from holisticai.bias.metrics import recommender_bias_metrics
# TODO



# TODO

You should get the following results:

| Metric | Value | Reference |
| --- | --- | --- |
| Aggregate Diversity | 0.5 | 1 |
| Gini Index   | 0.82 | 0 |
| Avg Recommendation Pop   | 500| - |




<font color='red'> **Questions**
- **Has efficacy increased or decreased when training with bias mitigation?**
- **Has bias increased or decreased when training with bias mitigation?**
- Note : Average Recommendation Popularity : average over users of average over items of number of times that item appreas in interaction matrix.
<font > 