# Ranking metrics

1. [Online metrics](#online-metrics)
   1. [Hit Ratio](#hit-ratio)
2. [Offline metrics](#offline-metrics)
   1. [Reciprocal Rank and Mean Reciprocal Rank](#reciprocal-rank-and-mean-reciprocal-rank)
   2. [Mean Average Precision](#mean-average-precision-map)
3. [Examples](#examples)
4. [References](#references)

## Online metrics

### Hit Ratio

[Source 1](#reference_1)

> the fraction of users for which the correct answer is included in the recommendation list of length $L$.

$D = \text{the superset containing every set of recommendations }d\text{ served to a user, such that }|d| = L$

$y = \text{the correct answer}$

$$HR_L = \frac{|d: d \in D \land y \in d|}{|D|}$$

**NOTE**: $L$ is a parameter.

## Offline metrics

### Reciprocal Rank and Mean Reciprocal Rank

[Source 1](#reference_1), [Back to top](#ranking-metrics)

$D = \text{the superset containing every ranked set of recommendations }d\text{ served to a user, such that }|d| = L$

$$RR(d) = \sum\limits_{i: 1 ≤ i ≤ L} \frac{relevance_i}{rank_i}$$

$$MRR(D) = \frac{ \sum\limits_{i = 1}^{|D|} RR(D_i) }{|D|} $$

**NOTE**: _one could argue that hit ratio is actually a special case of MRR in which RR(d) is binary, as it becomes 1 if there is a relevant item in the list, 0 otherwise._

### Mean Average Precision (MAP)

[Source 1](#reference_1), [Back to top](#ranking-metrics)

$K = \text{the maximum number of top elements we want to consider}$

$k = \text{the number of top elements we want to consider to calculate metrics such that } 1 ≤ k ≤ K$

$D = \text{the superset containing every ranked list of recommendations }d\text{ served to a user, such that }|d| = k$

$$AP(D_i) = \sum\limits_{k = 1}^{K} \text{Precision@}k(D_i) \times RelevanceMask_i$$

$$MAP(D) = \frac{\sum\limits_{i = 1}^{|D|} AP(D_i)}{|D|}$$

### Normalized Discounted Cumulative Gain

[Source](https://towardsdatascience.com/normalized-discounted-cumulative-gain-ndcg-the-ultimate-ranking-metric-437b03529f75), [Back to top](#ranking-metrics)

#### Cumulative Gain (CG)

> The **.cumulative gain** is the sum of the relevance scores of items in the list.

$$CG = \sum\limits_{i = 1}^{K} \text{relevance}(K_i)$$

> If you’re computing NDCG@10, CG@10 will be 12 for both lists.
> 
> If you’re computing NDCG@5, CG@5 for Model A is 7, and for Model B is 10

#### Discount Factor (DF)

> The **discount factor** involves using a logarithmic discounting factor to perform a weighted sum of the relevance scores of items in the list. The discounting factor is weighted based on the item’s position in the list.

It is based on the same intuition as the reciprocal rank but it is smoothed by the use of the logarithm: for the item in the 10th rank, instead of computing the score as $1 / 10$ (the reciprocal rank), we calculate it as $1 / log(10)$, which means that the denominator is smaller and the result, therfore, higher. So, while we still penalize higher ranks, we are not penalizing them as much as the reciprocal rank, probably reflecting the intuition that _there is not a single correct answer_, as well as smoothing out/squeezing together potential anomalies in the scoring function.

$$DF = \frac{1}{log_2(1 + i)}$$


#### Normalization constant

> We want to normalize the model’s DCG by dividing it by the DCG obtained by an ideal ranker.
> 
> An ideal ranks the items in descending order of relevance scores.

#### Example

In [66]:
from math import log
import random
from typing import Iterable, Union

import pandas as pd

In [67]:
def cumulative_gain(relevances: pd.Series, k: int) -> float:
    if not isinstance(relevances, pd.Series):
        raise TypeError(type(relevances), pd.Series)
    return relevances.head(k).sum()

def discount_factor(iterable: pd.Series) -> Iterable[float]:
    if not isinstance(iterable, pd.Series):
        raise TypeError(type(iterable), pd.Series)
    discount_factors = []
    for idx, item in enumerate(iterable):
        discount_factors.append(1 / log(1 + (idx + 1)))
    return discount_factors

def discounted_cumulative_gain(relevances: pd.Series, k: int) -> float:
    if not isinstance(relevances, pd.Series):
        raise TypeError(type(relevances), pd.Series)
    discounted_cumulative_gain = 0
    for idx, item in enumerate(relevances.head(k)):
        discounted_cumulative_gain += (item / log(1 + (idx + 1)))
    return discounted_cumulative_gain

In [68]:
relevance_by_action = {
    "Viewed": 0,
    "Clicked": 1,
    "Shared": 2,
    "AddedToCart": 3,
    "Ordered": 4,
}

actions = list(relevance_by_action.keys())
item_ids = list(range(10))

event_items = [random.choice(item_ids) for _ in range(100)]
event_actions = [random.choice(actions) for _ in event_items]

events = []
for item, action in zip(event_items, event_actions):
    event = (
        item,
        action,
        relevance_by_action[action],
        round(((5 / (relevance_by_action[action] + 1)) / 10) if random.random() >= 0.2 else random.uniform(0, 1.0), 2),
        round(random.uniform(0, 1.0), 2)
    )
    events.append(event)

df = pd.DataFrame(events, columns=["item", "action", "relevance", "model_a", "model_b"])
df_model_a = df.copy().drop("model_b", axis=1).sort_values("model_a", ascending=True)
df_model_b = df.copy().drop("model_a", axis=1).sort_values("model_b", ascending=True)
df_ideal = df.copy().drop(["model_a", "model_b"], axis=1).sort_values("relevance", ascending=False)

In [69]:
df_model_a["factor"] = discount_factor(df_model_a["item"])
df_model_a.head()

Unnamed: 0,item,action,relevance,model_a,factor
41,1,Ordered,4,0.06,1.442695
9,0,Ordered,4,0.1,0.910239
17,9,Ordered,4,0.1,0.721348
15,2,Ordered,4,0.1,0.621335
79,6,Ordered,4,0.1,0.558111


In [70]:
df_model_b["factor"] = discount_factor(df_model_b["item"])
df_model_b.head()

Unnamed: 0,item,action,relevance,model_b,factor
66,0,AddedToCart,3,0.02,1.442695
18,1,Clicked,1,0.02,0.910239
90,3,Shared,2,0.04,0.721348
96,8,Viewed,0,0.05,0.621335
17,9,Ordered,4,0.06,0.558111


In [71]:
print(cumulative_gain(df_model_a.relevance, 10000), cumulative_gain(df_model_b.relevance, 10000), cumulative_gain(df_ideal.relevance, 10000))
print(cumulative_gain(df_model_a.relevance, 10), cumulative_gain(df_model_b.relevance, 10), cumulative_gain(df_ideal.relevance, 10))

191 191 191
40 24 40


In [72]:
print(
    discounted_cumulative_gain(df_model_a.relevance, 10000),
    discounted_cumulative_gain(df_model_b.relevance, 10000),
    discounted_cumulative_gain(df_ideal.relevance, 10000)
)
print(
    discounted_cumulative_gain(df_model_a.relevance, 10),
    discounted_cumulative_gain(df_model_b.relevance, 10),
    discounted_cumulative_gain(df_ideal.relevance, 10)
)

69.12476793202073 57.920281164326546 70.50809485793498
26.21988210017919 15.412238716986126 26.21988210017919


In [73]:
df_ideal["factor"] = discount_factor(df_ideal["item"])
df_ideal.head()

Unnamed: 0,item,action,relevance,factor
45,9,Ordered,4,1.442695
79,6,Ordered,4,0.910239
48,3,Ordered,4,0.721348
67,8,Ordered,4,0.621335
69,6,Ordered,4,0.558111


# Examples

[Back to top](#ranking-metrics)

### MAP @ k

In [74]:
import random
movies = list(range(10))
n_users = 10
n_relevant = 5

get_preferences = lambda x: random.sample(movies, len(movies))

movie_preferences = [
    get_preferences(u)
    for u in range(n_users)
]

relevance_masks = []
for mvps in movie_preferences:
    relevance_mask = dict([])
    for idx, mvp in enumerate(mvps):
        if idx < n_relevant:
            relevance_mask[mvp] = 1
        else:
            relevance_mask[mvp] = 0
    relevance_masks.append(relevance_mask)

accuracy = 0.8
movie_recommendations = [
    preferences if random.random() < accuracy
    else random.sample(preferences, len(preferences))
    for preferences in movie_preferences
]

for rm, mv, mr in zip(relevance_masks, movie_preferences, movie_recommendations):
    print(mv)
    print(mr)
    print(rm)
    print()

[5, 0, 6, 4, 7, 2, 9, 8, 1, 3]
[7, 6, 4, 2, 9, 1, 3, 0, 5, 8]
{5: 1, 0: 1, 6: 1, 4: 1, 7: 1, 2: 0, 9: 0, 8: 0, 1: 0, 3: 0}

[4, 8, 6, 5, 3, 0, 2, 9, 1, 7]
[4, 8, 6, 5, 3, 0, 2, 9, 1, 7]
{4: 1, 8: 1, 6: 1, 5: 1, 3: 1, 0: 0, 2: 0, 9: 0, 1: 0, 7: 0}

[9, 4, 3, 5, 8, 7, 2, 0, 6, 1]
[9, 4, 3, 5, 8, 7, 2, 0, 6, 1]
{9: 1, 4: 1, 3: 1, 5: 1, 8: 1, 7: 0, 2: 0, 0: 0, 6: 0, 1: 0}

[2, 5, 9, 1, 3, 0, 8, 6, 7, 4]
[2, 5, 9, 1, 3, 0, 8, 6, 7, 4]
{2: 1, 5: 1, 9: 1, 1: 1, 3: 1, 0: 0, 8: 0, 6: 0, 7: 0, 4: 0}

[2, 6, 3, 1, 9, 4, 5, 8, 0, 7]
[2, 6, 3, 1, 9, 4, 5, 8, 0, 7]
{2: 1, 6: 1, 3: 1, 1: 1, 9: 1, 4: 0, 5: 0, 8: 0, 0: 0, 7: 0}

[0, 3, 1, 5, 7, 2, 6, 4, 9, 8]
[5, 7, 9, 4, 8, 0, 1, 3, 6, 2]
{0: 1, 3: 1, 1: 1, 5: 1, 7: 1, 2: 0, 6: 0, 4: 0, 9: 0, 8: 0}

[0, 1, 3, 5, 2, 4, 7, 8, 6, 9]
[0, 2, 4, 5, 7, 6, 8, 1, 3, 9]
{0: 1, 1: 1, 3: 1, 5: 1, 2: 1, 4: 0, 7: 0, 8: 0, 6: 0, 9: 0}

[2, 5, 1, 4, 9, 7, 3, 6, 8, 0]
[2, 5, 1, 4, 9, 7, 3, 6, 8, 0]
{2: 1, 5: 1, 1: 1, 4: 1, 9: 1, 7: 0, 3: 0, 6: 0, 8: 0, 0: 0}

[7, 5, 8

In [75]:
def metric_at_k(denom, y_true, y_pred, k=2, relevance_masks=[], rounding=4):
    tp = 0
    p = 0
    t = 0
    if relevance_masks:
        for preferences, recommendations, relevance_mask in zip(y_true, y_pred, relevance_masks):
            expected = {mv for mv in preferences[:k] if relevance_mask[mv]}
            predicted = {mv for mv in recommendations[:k] if relevance_mask[mv]}
            true_positives = expected.intersection(predicted)
            tp += len(true_positives)
            p += k
            t += sum(relevance_mask.values())
    else:
        for preferences, recommendations in zip(y_true, y_pred):
            expected = {mv for mv in preferences[:k]}
            predicted = {mv for mv in recommendations[:k]}
            true_positives = expected.intersection(predicted)
            tp += len(true_positives)
            p += k
            t += len(expected)
    return round(tp / p if denom == 'precision' else tp / t, rounding)

def precision_at_k(y_true, y_pred, k=2, relevance_masks=[], rounding=4):
   return metric_at_k('precision', y_true, y_pred, k=k, relevance_masks=relevance_masks, rounding=rounding)

def recall_at_k(y_true, y_pred, k=2, relevance_masks=[], rounding=4):
   return metric_at_k('recall', y_true, y_pred, k=k, relevance_masks=relevance_masks, rounding=rounding)


In [76]:
print(precision_at_k(movie_preferences, movie_recommendations, 4))
print(recall_at_k(movie_preferences, movie_recommendations, 4))
print(recall_at_k(movie_preferences, movie_recommendations, 4, relevance_masks))

0.775
0.775
0.62


# References

[Back to top](#ranking-metrics)

1. <a id="reference_1"></a> [Ranking Evaluation Metrics for Recommender Systems](https://towardsdatascience.com/ranking-evaluation-metrics-for-recommender-systems-263d0a66ef54)
2. <a id="reference_2"></a>[Demystifying NDCG](https://towardsdatascience.com/demystifying-ndcg-bee3be58cfe0)