## Meals Recommendations based on Ratings

To recommend new meals to customers based on the other customers likings for a Food Delivery Company.

Final result should be in the form of a function that can take in a Spark DataFrame of a single customer's ratings for various meals and output their top 3 suggested meals

In [42]:
# Initialize pyspark
import findspark
findspark.init()
import pyspark

In [43]:
# Initialize and create ba spark session
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('meals').getOrCreate()

In [44]:
# Using Spark to read the meals data set.
data = spark.read.csv('Meal_Info.csv', header=True, inferSchema=True)

In [45]:
# Printing the first few rows of the dataframe
data.show(5, truncate=False)

+-------+------+------+--------+---------------------+
|movieId|rating|userId|mealskew|meal_name            |
+-------+------+------+--------+---------------------+
|2      |3.0   |0     |2.0     |Chicken Curry        |
|3      |1.0   |0     |3.0     |Spicy Chicken Nuggest|
|5      |2.0   |0     |5.0     |Hamburger            |
|9      |4.0   |0     |9.0     |Taco Surprise        |
|11     |1.0   |0     |11.0    |Meatloaf             |
+-------+------+------+--------+---------------------+
only showing top 5 rows



In [46]:
#Printing the schema of the dataframe
data.printSchema()

root
 |-- movieId: integer (nullable = true)
 |-- rating: double (nullable = true)
 |-- userId: integer (nullable = true)
 |-- mealskew: double (nullable = true)
 |-- meal_name: string (nullable = true)



In [47]:
data.count()

1501

In [48]:
#Dropping the null values
filtered_data = data.na.drop()

In [49]:
filtered_data.count()

486

In [50]:
#Since the dataset is smaller, we can use 80% and 20% for train-test splits
train_data,test_data = filtered_data.randomSplit([0.8,0.2])

In [51]:
train_data.describe().show()

+-------+-----------------+----------------+-----------------+-----------------+-------------------+
|summary|          movieId|          rating|           userId|         mealskew|          meal_name|
+-------+-----------------+----------------+-----------------+-----------------+-------------------+
|  count|              400|             400|              400|              400|                400|
|   mean|           15.415|          1.7325|          14.3125|           15.415|               null|
| stddev|9.262681478407114|1.17233941298931|8.552350583392446|9.262681478407114|               null|
|    min|                0|             1.0|                0|              0.0|           BBQ Ribs|
|    max|               31|             5.0|               29|             31.0|Vietnamese Sandwich|
+-------+-----------------+----------------+-----------------+-----------------+-------------------+



In [52]:
test_data.describe().show()

+-------+------------------+------------------+------------------+------------------+-------------------+
|summary|           movieId|            rating|            userId|          mealskew|          meal_name|
+-------+------------------+------------------+------------------+------------------+-------------------+
|  count|                86|                86|                86|                86|                 86|
|   mean|15.906976744186046| 1.755813953488372|15.186046511627907|15.906976744186046|               null|
| stddev| 9.237554943849009|1.2265307028278907| 8.612592187375789| 9.237554943849009|               null|
|    min|                 0|               1.0|                 0|               0.0|           BBQ Ribs|
|    max|                31|               5.0|                29|              31.0|Vietnamese Sandwich|
+-------+------------------+------------------+------------------+------------------+-------------------+



In [53]:
# Setting up Recommender system
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS

In [54]:
# Build the recommendation model using ALS on the training data
als = ALS(maxIter=5, regParam=0.01, userCol='userId', itemCol='mealskew', ratingCol='rating')

In [55]:
model = als.fit(train_data)

Now let's see how the model performed!

In [56]:
#Getting the predictions on test data
predictions = model.transform(test_data)

In [57]:
predictions.show(4)

+-------+------+------+--------+-------------------+----------+
|movieId|rating|userId|mealskew|          meal_name|prediction|
+-------+------+------+--------+-------------------+----------+
|     31|   1.0|    27|    31.0|       Chicken Wrap|-1.0234901|
|     31|   1.0|    13|    31.0|       Chicken Wrap|0.68203694|
|     28|   1.0|    23|    28.0|Penne Tomatoe Pasta| 2.9350235|
|     26|   1.0|    22|    26.0|   Spicy Beef Plate|-0.5881107|
+-------+------+------+--------+-------------------+----------+
only showing top 4 rows



Evaluating the model by computing the RMSE on the test data

In [58]:
evaluator = RegressionEvaluator(predictionCol='prediction', labelCol='rating', metricName='rmse')

In [59]:
rmse = evaluator.evaluate(predictions)

In [60]:
print("Root Mean Squared Error:",rmse)

Root Mean Squared Error: 2.309644960164883


## Recommendation for a particular user

In [61]:
single_user = test_data.select('userId','mealskew')

In [62]:
single_user.groupBy('userId').count().show()

+------+-----+
|userId|count|
+------+-----+
|    28|    2|
|    26|    4|
|    27|    2|
|    12|    4|
|    22|    2|
|     1|    2|
|    13|    3|
|     6|    3|
|    16|    4|
|     3|    3|
|    20|    2|
|     5|    2|
|    19|    2|
|    15|    1|
|    17|    3|
|     9|    5|
|     4|    5|
|     8|    1|
|    23|    3|
|     7|    1|
+------+-----+
only showing top 20 rows



In [63]:
single_user = single_user.filter("userId == 26")

In [64]:
recommendations = model.transform(single_user)

In [65]:
recommendations.orderBy('prediction', ascending=False).show(3)

+------+--------+----------+
|userId|mealskew|prediction|
+------+--------+----------+
|    26|    24.0| 2.0780482|
|    26|     3.0| 1.6575247|
|    26|    18.0| 1.3842776|
+------+--------+----------+
only showing top 3 rows



In [None]:
#Closing the spark session
spark.stop()