# Evaluation

Evaluating Metrics:

- Prediction Metrics (Similiar to Regression Problem)
    - RMSE
    - R2
    - MAE
    - Explained Variance


### Hit Metrics (Similiar to Classification Metrics)
**Hit** - defined by relevancy, a hit usually means whether the recommended "k" items hit the "relevant" items by the user. For example, a user may have clicked, viewed, or purchased an item for many times, and a hit in the recommended items indicate that the recommender performs well. Metrics like "precision", "recall", etc. measure the performance of such hitting accuracy.

    - Precision@k
    - Recall@k
  

### Ranking Metrics

**Ranking** - ranking metrics give more explanations about, for the hitted items, whether they are ranked in a way that is preferred by the users whom the items will be recommended to. Metrics like "mean average precision", "ndcg", etc., evaluate whether the relevant items are ranked higher than the less-relevant or irrelevant items. 


In [22]:
# set the environment path to find reco
import sys
sys.path.append("../")

In [23]:
import keras

In [24]:
import numpy as np
import pandas as pd

In [37]:
df_true = pd.DataFrame(
        {
            "USER": [1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
            "ITEM": [1, 2, 3, 1, 4, 5, 6, 7, 2, 5, 6, 8, 9, 10, 11, 12, 13, 14],
            "RATING": [5, 4, 3, 5, 5, 3, 3, 1, 5, 5, 5, 4, 4, 3, 3, 3, 2, 1],
        }
    )

df_pred = pd.DataFrame(
    {
        "USER": [1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
        "ITEM": [3, 10, 12, 10, 3, 5, 11, 13, 4, 10, 7, 13, 1, 3, 5, 2, 11, 14],
        "RATING_PRED": [14, 13, 12, 14, 13, 12, 11, 10, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5]
    }
)

In [38]:
df_true.head()

Unnamed: 0,USER,ITEM,RATING
0,1,1,5
1,1,2,4
2,1,3,3
3,2,1,5
4,2,4,5


In [39]:
common_users = set(df_true.USER).intersection(set(df_pred.USER))


In [40]:
from reco.evaluate import get_top_k_items

In [52]:
def get_hit_df(rating_true, rating_pred, k):
    
    # Make sure the prediction and true data frames have the same set of users
    common_users = set(rating_true["USER"]).intersection(set(rating_pred["USER"]))
    rating_true_common = rating_true[rating_true["USER"].isin(common_users)]
    rating_pred_common = rating_pred[rating_pred["USER"].isin(common_users)]
    n_users = len(common_users)

    df_hit = get_top_k_items(rating_pred_common, "USER", "RATING_PRED", k)
    df_hit = pd.merge(df_hit, rating_true_common, on=["USER", "ITEM"])[
        ["USER", "ITEM", "rank"]
    ]

    # count the number of hits vs actual relevant items per user
    df_hit_count = pd.merge(
        df_hit.groupby("USER", as_index=False)["USER"].agg({"hit": "count"}),
        rating_true_common.groupby("USER", as_index=False)["USER"].agg(
            {"actual": "count"}
        ),
        on="USER",
    )
    
    return df_hit, df_hit_count, n_users

In [66]:
def precision_at_k(rating_true, rating_pred, k):
    
    df_hit, df_hit_count, n_users = get_hit_df(rating_true, rating_pred, k)
    
    if df_hit.shape[0] == 0:
        return 0.0

    return (df_hit_count["hit"] / k).sum() / n_users

In [67]:
def recall_at_k(rating_true, rating_pred, k):

    df_hit, df_hit_count, n_users = get_hit_df(rating_true, rating_pred, k)

    if df_hit.shape[0] == 0:
        return 0.0

    return (df_hit_count["hit"] / df_hit_count["actual"]).sum() / n_users

In [68]:
precision_at_k(df_true, df_pred, 3)

0.3333333333333333

In [69]:
recall_at_k(df_true, df_pred, 3)

0.2111111111111111

In [64]:
def ndcg_at_k(rating_true, rating_pred, k):

    df_hit, df_hit_count, n_users = get_hit_df(rating_true, rating_pred, k)
    
    if df_hit.shape[0] == 0:
        return 0.0

    # calculate discounted gain for hit items
    df_dcg = df_hit.copy()
    # relevance in this case is always 1
    df_dcg["dcg"] = 1 / np.log1p(df_dcg["rank"])
    # sum up discount gained to get discount cumulative gain
    df_dcg = df_dcg.groupby("USER", as_index=False, sort=False).agg({"dcg": "sum"})
    # calculate ideal discounted cumulative gain
    df_ndcg = pd.merge(df_dcg, df_hit_count, on=["USER"])
    df_ndcg["idcg"] = df_ndcg["actual"].apply(
        lambda x: sum(1 / np.log1p(range(1, min(x, k) + 1)))
    )

    # DCG over IDCG is the normalized DCG
    return (df_ndcg["dcg"] / df_ndcg["idcg"]).sum() / n_users

In [65]:
ndcg_at_k(df_true, df_pred, 3)

0.33333333333333326