# Provider Fairness Methods

In [1]:
import pandas as pd

## Approach 1: re-ranking results based on item exposure

Given some list of recommended POIs $P_i = [p_1, p_2, ..., p_n]$ for some user $i$ and some monotonically decreasing relevance score $R_i = [r_1, r_2, ..., r_n]$ used for the ranking, the idea was to scale this ranking based on the exposure of the POI in the generated recommendation lists.

Let $e_j$ be the number of times a POI $j$ appears in all the lists across all tested users. Across all the POIs that appear in the lists, we can calculate the mean and standard deviation $\mu_e$ and $\sigma_e$ to standardize the relevance scores. Below is a sample formula that re-scales the relevance score of a POI:

$$r'_j = r_j \cdot e^\left( -1 \cdot \frac{e_j - \mu_e}{\sigma_e} \cdot \frac{1}{\alpha} \right)$$

The idea is that a POI with an average exposure (i.e. 0 when standardized) will have the same relevance score ($e^0 = 1$), a POI with more exposure (i.e. positive when standardized) will have a lower scaled relevance score ($x < 0 \rightarrow 0 < e^x < 1$), and vice-versa. A scaling factor $\alpha$ (10 in our case) was divided to try and reduce the effect of the scaling so as not to overpower the initial scores.

## Approach 2: using a power-law model of the POI popularity as a context fusion factor

Because the CAPRI framework allows for modular integration of different contextual factors, we can in theory include the POI's popularity as a "context". The popularity of a POI based on checkin counts can be modeled as a power-law or Pareto distribution. Majority of POIs have barely any checkins, while there are a few who have a large number of checkins.

We can design a model that takes in a POI's number of checkins in the training set, and outputs a score that is inversely proportional to their popularity. That way, during the fusion of the scores, their unpopularity is compensated for, and they are given a chance to appear in recommended lists. Given a list of POIs $P$ and their check-in popularities $C$, we can compute a list of scores as such:

$$I = \alpha \cdot C^\beta$$

Here, the parameters $\alpha$ and $\beta$ are learned from the training set using a log-linear regression. In our case, we use ridge regression with an L2 regularization factor of 10. This prevents the scores from exploding in value too much, especially on POIs with single-digit checkin counts, where the compensating score is the largest.

## Results

In [5]:
def get_results(modelName, reranker):
    output = pd.read_csv(f'../Outputs/Eval_{modelName}_{reranker}_Yelp_Sum_7135user_top10_limit15.csv')
    output.index = [f'{modelName}_{reranker}']
    return output

_dfs = [
    get_results(x, y)
    for (x, y) in [
        ('USG', 'TopK'),
        ('USG', 'ItemExposure'),
        ('USGI', 'TopK')
    ]
]

df = pd.concat(_dfs, axis=0)
df

Unnamed: 0,precision,recall,ndcg,map,gce_users,gce_items,hit_ratio
USG_TopK,0.02968,0.04478,0.03981,0.01991,-0.00064,-235.5631,0.23616
USG_ItemExposure,0.02615,0.03885,0.03249,0.01531,-0.05564,-143.41169,0.24275
USGI_TopK,0.02914,0.04354,0.03945,0.01984,-0.001,-5.93957,0.23196


```
USGI_TopK
[[ User precision ratios ]]
                 rg_u  rg_fair
repeat_user
False        0.477633      0.5
True         0.522367      0.5
[[ Item coverage ratios ]]
                rg_i   rg_fair
short_head
False       0.049485  0.796982
True        0.950515  0.203018


USG_TopK
[[ User precision ratios ]]
                 rg_u  rg_fair
repeat_user
False        0.482059      0.5
True         0.517941      0.5
[[ Item coverage ratios ]]
                rg_i   rg_fair
short_head
False       0.001345  0.796982
True        0.998655  0.203018


USG_ItemExposure
[[ User precision ratios ]]
                 rg_u  rg_fair
repeat_user
False        0.341778      0.5
True         0.658222      0.5
[[ Item coverage ratios ]]
                rg_i   rg_fair
short_head
False       0.002207  0.796982
True        0.997793  0.203018
```