# Recipe Recommender Results

This notebooks runs the recommender models on the cleaned user-recipes interactions datasets and applies the various evaluation metrics for evaluating the results of our recommendations

### Start Spark Session

In [12]:
# Code from https://spark.apache.org/docs/2.2.0/ml-collaborative-filtering.html
import pyspark
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.sql import Row
from pyspark.sql import SparkSession
from pyspark.sql.functions import col


conf = pyspark.SparkConf().setAll([('spark.master', 'local[2]'),
                                   ('spark.app.name', 'Recommender Results')])
spark = SparkSession.builder.config(conf=conf).getOrCreate()

### Read in cleaned user-recipes interactions data frame

In [18]:
file_path = 'file:///home/work/data/interactions_train_cleaned.csv'
ratings = spark.read.csv(file_path, inferSchema = True, header = True)
ratings.show()

+-------+---------+------+
|user_id|recipe_id|rating|
+-------+---------+------+
|  38094|    40893|     4|
|1293707|    40893|     5|
| 190375|   134728|     5|
|1171894|   134728|     5|
| 217118|   200236|     5|
| 202555|   225241|     5|
| 684460|   225241|     5|
| 135017|   254596|     5|
| 224088|   254596|     4|
| 582223|   254596|     5|
| 935485|   321038|     5|
| 102602|    20930|     5|
| 172467|    29093|     5|
|  58332|    41090|     4|
| 160497|    41090|     5|
| 183565|    79222|     5|
| 226989|    79222|     4|
| 868654|    79222|     5|
| 302867|    79222|     5|
| 930021|    79222|     5|
+-------+---------+------+
only showing top 20 rows



### Random split and normalize training and testing

In [19]:
(unnorm_training, unnorm_test) = ratings.randomSplit([0.8, 0.2])
mean = unnorm_training.agg({'rating': 'mean'}).collect()[0][0]
std = unnorm_training.agg({'rating': 'std'}).collect()[0][0]
print(mean, std)
training = unnorm_training.withColumn("rating", (col("rating") - mean) / std)
test = unnorm_test.withColumn("rating", (col("rating") - mean) / std)

                                                                                

4.646661593045566 0.7330842167730709


## Modeling

### Generate recipe recommendations with the collaborative filtering model and evaluate with RMSE

Fit collaborative filtering model

In [20]:
# Setting cold start strategy to 'drop' to ensure we don't get NaN evaluation metrics
als = ALS(rank=200, maxIter=20, regParam=0.125, userCol="user_id", itemCol="recipe_id", ratingCol="rating",
          coldStartStrategy="drop")
model = als.fit(training)

2022-05-25 18:39:04,510 WARN netlib.InstanceBuilder$NativeBLAS: Failed to load implementation from:dev.ludovic.netlib.blas.JNIBLAS
2022-05-25 18:39:04,514 WARN netlib.InstanceBuilder$NativeBLAS: Failed to load implementation from:dev.ludovic.netlib.blas.ForeignLinkerBLAS
2022-05-25 18:39:05,072 WARN netlib.InstanceBuilder$NativeLAPACK: Failed to load implementation from:dev.ludovic.netlib.lapack.JNILAPACK
                                                                                

Evaluate model with MSE

In [21]:
normalized_predictions = model.transform(test)
predictions = normalized_predictions.withColumn(
    "rating",col("rating") * std + mean
).withColumn(
    "prediction",col("prediction") * std + mean
)
evaluator = RegressionEvaluator(metricName="mse", labelCol="rating",
                                predictionCol="prediction")

mse = evaluator.evaluate(predictions)
print("The MSE of the recommender model is", mse)
predictions.show()

                                                                                

The MSE of the recommender model is 0.4976449210815721


[Stage 263:>                                                        (0 + 2) / 2]

+-------+---------+------+------------------+
|user_id|recipe_id|rating|        prediction|
+-------+---------+------+------------------+
| 132353|       40|   5.0| 4.655222910317792|
| 194829|       91|   5.0| 4.732509125103519|
| 709476|      142|   5.0| 4.628799670807802|
|  99979|      142|   5.0| 4.673064912995164|
|  10649|      190|   3.0|4.9240342017110486|
|  91326|      192|   4.0| 4.691164935972212|
| 143387|      192|   5.0|4.6850852097682845|
| 145599|      192|   5.0|  4.67063368067291|
| 151679|      192|   5.0| 4.689854071049422|
| 675287|      231|   5.0| 4.777932053142876|
| 156951|      232|   2.0| 4.724549941677335|
| 625864|      232|   5.0| 4.526215903805111|
| 983634|      232|   5.0| 4.651344504796338|
|  84361|      251|   3.0|4.7239619842006375|
| 111075|      271|   5.0| 4.715486405151405|
| 120448|      280|   5.0| 4.681396177642237|
| 354878|      310|   5.0| 4.714933458473203|
| 249059|      324|   5.0|  4.64725394995715|
| 275316|      324|   5.0| 4.65753

                                                                                

## Evaluation Metrics

We will load in the RecEvalMetrics object and thoroughly evaluate the recipe recommender system with various evaluation metrics.

In [67]:
# !pip install rbo
# !pip install recmetrics

from functools import reduce
from scipy.stats import kendalltau
from sklearn.metrics import mean_squared_error, ndcg_score
import pandas as pd
import numpy as np
import recmetrics
import rbo

class RecEvalMetrics(object):


    # Takes user-recipe rating predictions dataframe, returns mean squarred error for top k
    # recipes of each user predicted ratings
    """ Parameters:
        predictions: Dataframe of true and predicted ratings, default 20
        k: Top k predicted ratings to evaluate with mse
    """
    @staticmethod
    def top_k_evaluator(predictions, k = 20):
        users = list(predictions.drop_duplicates(subset = ['user_id'])['user_id'])
        top_k_predictions = []
        
        for user in users:
            user_ratings = predictions[(predictions['user_id'] == user)]
            top_k_user_ratings =  user_ratings.sort_values(by = ['prediction'], ascending = False).head(k)
            top_k_predictions.append(top_k_user_ratings)
        top_k_predictions_df = pd.concat(top_k_predictions, ignore_index = True)
        
        k_mse = mean_squared_error(list(top_k_predictions_df['rating']), list(top_k_predictions_df['prediction']))
        
        return(k_mse) 


    # Takes in user-recipe rating predictions dataframe, returns percent of recipes that ended
    # up in someone's top k.  Larger value means more personalization
    """ Parameters:
        predictions: Dataframe of true and predicted ratings
        k: Top k recipes to count in percentage, default 20
    """
    @staticmethod
    def percent_in_top_ratings(predictions, k = 20):
        total_recipes = len(predictions.drop_duplicates(subset = ['recipe_id']))
        users = list(predictions.drop_duplicates(subset = ['user_id'])['user_id'])

        top_k_predictions = set()
        for user in users:
            user_ratings = predictions[(predictions['user_id'] == user)]
            user_pred_ordered = list(user_ratings.sort_values(by = ['prediction'], ascending = False)['recipe_id'])
            top_k_user_recipes = user_pred_ordered[:k]
            top_k_predictions.update(top_k_user_recipes)

        top_recipes_count = len(top_k_predictions)

        return(top_recipes_count/total_recipes)


    # Take in user-recipe rating predictios dataframe, returns ranked biased overlap
    # between top k predicted ratings and top k actual ratings
    # Refer to: https://github.com/changyaochen/rbo
    """ Parametes:
        predictions: Dataframe of true and predicted ratings
        k: Number of k recipes in the ranked list to evaluate with RBO, default 20
    """
    @staticmethod
    def rbo_evaluation(predictions, k = 20):
        users = list(predictions.drop_duplicates(subset = ['user_id'])['user_id'])

        rbos = []
        for user in users:
            user_ratings = predictions[(predictions['user_id'] == user)]
            user_actual_ordered = list(user_ratings.sort_values(by = ['rating'], ascending = False)['recipe_id'])
            user_pred_ordered = list(user_ratings.sort_values(by = ['prediction'], ascending = False)['recipe_id'])
            top_k_user_actual = user_actual_ordered[:k]
            top_k_user_pred = user_pred_ordered[:k]
            user_rbo = rbo.RankingSimilarity(top_k_user_actual, top_k_user_pred).rbo()
            rbos.append(user_rbo)
            
        return(np.mean(rbos), np.median(rbos))


    # Takes in user-recipe rating predictions dataframe, returns the Kendalls Tau evaluation
    # between actual ratings and predicted ratings
    """ Parameters:
        predictions: Dataframe of true and predicted ratings	
    """
    @staticmethod
    def kendalls_tau(predictions):
        users = list(predictions.drop_duplicates(subset = ['user_id'])['user_id'])

        tau = []
        i = 0
        for user in users:
            user_ratings = predictions[(predictions['user_id'] == user)]
            # Kendall's Tau will not work with list of size 1
            if (len(user_ratings) > 1):
                user_actual_ordered = list(user_ratings.sort_values(by = ['rating'], ascending = False)['recipe_id'])
                user_pred_ordered = list(user_ratings.sort_values(by = ['prediction'], ascending = False)['recipe_id'])
                user_tau, user_p_value = kendalltau(user_actual_ordered, user_pred_ordered)
                tau.append(user_tau)

        return(np.mean(tau), np.median(tau))


    # Takes in user-recipe rating predictions dataframe, returns the normalized discounted cummulative gain
    # evaluation between actual ratings and predicted ratings
    """ Parameters:
        predictions: Dataframe of ture and predicted ratings
    k: Number of k recipes in the ranked list to evaluate, default None	
    """
    @staticmethod
    def nDCG_evaluation(predictions, k = None):
        users = list(predictions.drop_duplicates(subset = ['user_id'])['user_id'])

        ndcg = []
        i = 0
        for user in users:
            user_ratings = predictions[(predictions['user_id'] == user)]
            if (len(user_ratings) > 1):
                relevance = np.asarray([list(user_ratings['rating'])])
                preds = np.asarray([list(user_ratings['prediction'])])
                score = ndcg_score(relevance, preds, k=k)
                ndcg.append(score)

        return(np.mean(ndcg), np.median(ndcg))

Conver the predictions pyspark dataframe to a pandas dataframe for feeding into metrics evaluations. NOTE: We're doing this to keep the eval_metrics object generalized so that it could also accept dask results following conversion

In [23]:
predictions_df = predictions.toPandas()

                                                                                

### Top 10 MSE

In [42]:
k_mse = RecEvalMetrics.top_k_evaluator(predictions_df, 10)
print("The top 10 MSE:", k_mse)

The top 10 MSE: 0.4973753393177582


### Percent in top 10 (Personalization Assessment)

In [43]:
percent_in_top = RecEvalMetrics.percent_in_top_ratings(predictions_df, 10)
print("The percent of recipes that are in some users top 10:", percent_in_top)

The percent of recipes that are in some users top 10: 0.98972338785647


### Ranked Biased Overlap, top 10

In [44]:
rbo_mean, rbo_median = RecEvalMetrics.rbo_evaluation(predictions_df, 10)
print("The RBO mean for all user recommendations:", rbo_mean)
print("The RBO median for all user recommendations:", rbo_median)

The RBO mean for all user recommendations: 0.8288452577530352
The RBO median for all user recommendations: 1.0


### Kendall's Tau

In [51]:
tau_mean, tau_median = RecEvalMetrics.kendalls_tau(predictions_df)
print("The Kendall's Tau mean for all user recommendations:", tau_mean)
print("The Kendall's Tau median for all user recommendations:", tau_median)

The Kendall's Tau mean for all user recommendations: 0.015742770758176938
The Kendall's Tau median for all user recommendations: 0.0


### Normalized Discounted Cummulative Gain

In [61]:
ndcg_mean, ndcg_median = RecEvalMetrics.nDCG_evaluation(predictions_df, 10)
print("The nDCG mean for all user recommendations:", ndcg_mean)
print("the nDCG median for all user recommendations:", ndcg_median)

The nDCG mean for all user recommendations: 0.9820059066538368
the nDCG median for all user recommendations: 1.0
