# recall@k

In [1]:
def recall(actual, predicted, k):
    actual_set = set(actual)
    predicted_set = set(predicted[:k])
    intersection = actual_set.intersection(predicted_set)
    recall_value = len(intersection) / len(actual_set)
    return recall_value


In [4]:
actual = [2, 4, 5, 7, 9, 3]
predicted = [1, 2, 3, 4, 5, 6, 8, 9, 10]
# k is how many top elements to consider from predicted list
k = 9
for i in range(1, k+1):
    rec = recall(actual, predicted, i)
    print(f"Recall at {i}: {rec:.2f}")

Recall at 1: 0.00
Recall at 2: 0.17
Recall at 3: 0.33
Recall at 4: 0.50
Recall at 5: 0.67
Recall at 6: 0.67
Recall at 7: 0.67
Recall at 8: 0.83
Recall at 9: 0.83


# MRR ( Mean Reciprocal Rank)
Mean Reciprocal Rank (MRR) is a metric used to evaluate ranking systems, particularly in information retrieval. Here are 5 key points about MRR:

1. **Definition**: MRR measures the average of the reciprocal ranks of the first relevant item for a set of queries.

2. **Relevance Focus**: Unlike Recall@k which considers all relevant items, MRR emphasizes finding the first relevant result quickly.

3. **Range**: MRR values range from 0 to 1, where 1 indicates the first item is always relevant, and lower values indicate relevant items appear later in rankings.

4. **Use Cases**: MRR is particularly useful for search engines and question-answering systems where users typically focus on the top result.

5. **Sensitivity to Position**: MRR heavily penalizes systems where relevant items appear far down in the ranking, making it stricter than some other metrics.

**Formula:**
$$MRR = \frac{1}{|Q|} \sum_{q=1}^{|Q|} \frac{1}{rank_q}$$

Where:
- |Q| = number of queries
- rank_q = the rank of the first relevant item for query q

In [5]:
# actual relevant results for three queries
actual_results = [
    [2, 4, 5, 7, 9, 3],
    [1, 3, 6, 8],
    [10, 11, 12, 13, 14]
]
# Number of queries
Q = len(actual_results)

reciprocal = 0
for  i in range( Q ):
    first_result_relavent = actual_results[i][0]
    reciprocal += 1 / ( first_result_relavent )
    print(f"Query {i+1}:  Reciprocal: {1 / first_result_relavent:.2f}")

mrr = reciprocal / Q
print(f"Mean Reciprocal Rank (MRR): {mrr:.2f}")

Query 1:  Reciprocal: 0.50
Query 2:  Reciprocal: 1.00
Query 3:  Reciprocal: 0.10
Mean Reciprocal Rank (MRR): 0.53


# MAP(Mean Average Precision)
Mean Average Precision (MAP) is a metric that evaluates both the relevance and ranking quality of results. Here are 3 key points:

1. **Definition**: MAP calculates the average precision across multiple queries, where precision is measured at each position where a relevant item appears.

2. **Combines Precision and Recall**: Unlike MRR which only considers the first relevant item, MAP considers all relevant items and their positions in the ranking.

3. **Range and Interpretation**: MAP values range from 0 to 1, where higher values indicate better ranking quality with relevant items appearing earlier.

**Formula:**
$$MAP = \frac{1}{|Q|} \sum_{q=1}^{|Q|} AP(q)$$

$$AP(q) = \frac{1}{|R_q|} \sum_{k=1}^{n} P(k) \times rel(k)$$

Where:
- |Q| = number of queries
- AP(q) = Average Precision for query q
- |R_q| = number of relevant items for query q
- P(k) = precision at position k
- rel(k) = 1 if item at position k is relevant, 0 otherwise

In [11]:
# initialize variables
actual = [
    [2, 4, 5, 7, 9, 3],
    [1, 3, 6, 8],
    [4, 3, 1, 10, 11]
]
predicted =  [1, 2, 3, 4, 5, 6, 8]
k = 6
Q = len(actual)
AP = []

In [12]:
for q in range(Q):
    actual_set = set(actual[q])
    ap_num = 0
    for i in range(1, k+1):
        predicted_set = set(predicted[:i])
        intersection = actual_set.intersection(predicted_set)
        precision_at_i = len(intersection) / i
        if predicted[i-1] in actual_set:
            rel_k = 1
        else:
            rel_k = 0
        ap_num += precision_at_i * rel_k
    ap = ap_num / len(actual_set)
    print(f"Query {q+1}:  Average Precision (AP): {ap:.2f}")
    AP.append(ap)
MAP = sum(AP) / Q
print(f"Mean Average Precision (MAP): {MAP:.2f}")

Query 1:  Average Precision (AP): 0.45
Query 2:  Average Precision (AP): 0.54
Query 3:  Average Precision (AP): 0.48
Mean Average Precision (MAP): 0.49


# NDCG( Normalized Discounted Cumulative Gain)
Normalized Discounted Cumulative Gain (NDCG) is a ranking metric that considers both the relevance of items and their positions. Here are the key concepts:

1. **Relevance Scores**: Unlike previous metrics that use binary relevance (relevant/not relevant), NDCG can work with graded relevance scores (e.g., 0-5 scale).

2. **Position Matters**: Items appearing earlier in the ranking contribute more to the score, with a logarithmic discount applied to later positions.

3. **Normalization**: NDCG normalizes the score by comparing it to the ideal ranking, producing values between 0 and 1.

**Formulas:**

**Cumulative Gain (CG):**
$$CG@k = \sum_{i=1}^{k} rel_i$$

**Discounted Cumulative Gain (DCG):**
$$DCG@k = \sum_{i=1}^{k} \frac{rel_i}{\log_2(i+1)}$$

**Ideal DCG (IDCG):**
- Sort all relevant items by their relevance scores in descending order
- Calculate DCG for this ideal ranking
$$IDCG@k = \sum_{i=1}^{k} \frac{rel_i^{ideal}}{\log_2(i+1)}$$

**Normalized DCG (NDCG):**
$$NDCG@k = \frac{DCG@k}{IDCG@k}$$

Where:
- k = number of top results to consider
- rel_i = relevance score of item at position i
- NDCG ranges from 0 to 1, where 1 means perfect ranking

In [17]:
import numpy as np
from math import log2

# Example with graded relevance scores (0-5 scale)
relevance_scores =    [2, 4, 3, 0, 1, 2]
K = 6
dcg = 0
idcg = 0
ideal_relevance = sorted(relevance_scores, reverse=True)

for k in range(1, K+1):
    rel_k = relevance_scores[k-1]
    ideal_rel_k = ideal_relevance[k-1]

    dcg = dcg +  rel_k/log2(k+1)
    idcg = idcg + ideal_rel_k/log2(k+1)

    ndcg = dcg / idcg if idcg > 0 else 0
    print(f"NDCG at {k}: {ndcg:.2f}")

NDCG at 1: 0.50
NDCG at 2: 0.77
NDCG at 3: 0.87
NDCG at 4: 0.78
NDCG at 5: 0.79
NDCG at 6: 0.87
