# Recommender Metrics

In [1]:
import numpy as np
from collections import defaultdict

### Precision and Recall

Precision proportion of recommended items that are relevant to the user.

$$Precision = \text{Number of relevant items recommended} / \text{Total recommended items}$$

Recall measures the proportion of relevant items that are successfully recommended. 

$$Recall = \text{Number of relevant items recommended} / \text{Total relevant items}$$
    
These metrics are particularly useful for binary recommendation tasks, such as recommending whether a user will like, click on, purchase an item. However, they can be adapted for other types of recommendation tasks as well. 

For example, in the case of rating predictions (explicit feedback), precision and recall can be used by considering thresholds for predicted ratings. For instance, you can define positive interactions as ratings above a certain threshold and negative interactions as ratings below that threshold. Then, precision and recall can be calculated based on whether the predicted ratings exceed the threshold or not.

In [2]:
def precision(actual, predicted):
    # Compute precision
    true_positives = sum((a == 1 and p == 1) for a, p in zip(actual, predicted))
    predicted_positives = sum(predicted)
    return true_positives / predicted_positives if predicted_positives > 0 else 0

def recall(actual, predicted):
    # Compute recall
    true_positives = sum((a == 1 and p == 1) for a, p in zip(actual, predicted))
    actual_positives = sum(actual)
    return true_positives / actual_positives if actual_positives > 0 else 0

def f1_score(actual, predicted):
    # Compute F1 score
    prec = precision(actual, predicted)
    rec = recall(actual, predicted)
    return 2 * (prec * rec) / (prec + rec) if (prec + rec) > 0 else 0

# Example usage:
actual = [1, 0, 1, 1, 0]
predicted = [1, 1, 0, 1, 0]
print("Precision:", precision(actual, predicted))
print("Recall:", recall(actual, predicted))
print("F1 Score:", f1_score(actual, predicted))


Precision: 0.6666666666666666
Recall: 0.6666666666666666
F1 Score: 0.6666666666666666


### RMSE and MAE

For recommendation systems with explicit ratings, we can calculate RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error):

In [3]:
def rmse(actual, predicted):
    # Compute RMSE
    return np.sqrt(np.mean((np.array(actual) - np.array(predicted))**2))

def mae(actual, predicted):
    # Compute MAE
    return np.mean(np.abs(np.array(actual) - np.array(predicted)))

# Example usage:
actual_ratings = [4, 3, 5, 2, 1]
predicted_ratings = [3.5, 2.8, 4.5, 2.1, 1.2]
print("RMSE:", rmse(actual_ratings, predicted_ratings))
print("MAE:", mae(actual_ratings, predicted_ratings))


RMSE: 0.3435112807463534
MAE: 0.30000000000000004


### Coverage and Hit Rate

Coverage is often used to measure the diversity of recommendations, while hit rate evaluates the accuracy of the recommendations in terms of user interactions. 

- Coverage calculates the proportion of unique items in the recommendation list over the total number of items in the catalog.
- Hit_rate calculates the proportion of correctly recommended items (hits) out of all the items that were actually interacted with.

In [1]:
def coverage(recommendations, total_items):
    # Compute coverage
    recommended_items = set(recommendations)
    return len(recommended_items) / total_items

def hit_rate(actual, predicted):
    # Compute hit rate
    hits = sum((a == 1 and p == 1) for a, p in zip(actual, predicted))
    return hits / len(actual)

# Example usage:
total_items = 1000  # Total number of items in the catalog
recommendations = [1, 2, 3, 4, 5]  # Recommended items for a user
actual_interactions = [1, 0, 1, 0, 0]  # Actual interactions for the user (binary)
predicted_interactions = [1, 1, 0, 0, 0]  # Predicted interactions (binary)

cov = coverage(recommendations, total_items)
print("Coverage:", cov)

hit = hit_rate(actual_interactions, predicted_interactions)
print("Hit Rate:", hit)

Coverage: 0.005
Hit Rate: 0.2
