# Consulting Project 
## Recommender Systems - Solutions

The whole world seems to be hearing about your new amazing abilities to analyze big data and build useful systems for them! You've just taken up a new contract with a new online food delivery company. This company is trying to differentiate itself by recommending new meals to customers based off of other customers likings.

Can you build them a recommendation system?

Your final result should be in the form of a function that can take in a Spark DataFrame of a single customer's ratings for various meals and output their top 3 suggested meals. For example:

Best of luck!

** *Note from Jose: I completely made up this food data, so its likely that the actual recommendations themselves won't make any sense. But you should get a similar output to what I did given the example customer dataframe* **

In [78]:
from pyspark.sql import SparkSession
from pyspark.ml.recommendation import ALS
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.feature import StringIndexer, IndexToString

In [2]:
spark = SparkSession.builder.appName('recommendation_consult').getOrCreate()
spark

In [3]:
df = spark.read.csv('meal_info.csv',inferSchema=True,header=True)
df.printSchema()

root
 |-- movieId: integer (nullable = true)
 |-- rating: double (nullable = true)
 |-- userId: integer (nullable = true)
 |-- mealskew: double (nullable = true)
 |-- meal_name: string (nullable = true)



In [4]:
df.count()

1501

In [5]:
df.show(5)

+-------+------+------+--------+--------------------+
|movieId|rating|userId|mealskew|           meal_name|
+-------+------+------+--------+--------------------+
|      2|   3.0|     0|     2.0|       Chicken Curry|
|      3|   1.0|     0|     3.0|Spicy Chicken Nug...|
|      5|   2.0|     0|     5.0|           Hamburger|
|      9|   4.0|     0|     9.0|       Taco Surprise|
|     11|   1.0|     0|    11.0|            Meatloaf|
+-------+------+------+--------+--------------------+
only showing top 5 rows



In [6]:
df.describe().show()

+-------+------------------+------------------+------------------+------------------+-------------------+
|summary|           movieId|            rating|            userId|          mealskew|          meal_name|
+-------+------------------+------------------+------------------+------------------+-------------------+
|  count|              1501|              1501|              1501|               486|                486|
|   mean| 49.40572951365756|1.7741505662891406|14.383744170552964|15.502057613168724|               NULL|
| stddev|28.937034065088994| 1.187276166124803| 8.591040424293272| 9.250633630277568|               NULL|
|    min|                 0|               1.0|                 0|               0.0|           BBQ Ribs|
|    max|                99|               5.0|                29|              31.0|Vietnamese Sandwich|
+-------+------------------+------------------+------------------+------------------+-------------------+



In [41]:
indexer = StringIndexer(inputCol='meal_name',outputCol='meal_name_index',handleInvalid='skip')
indexer

StringIndexer_a34eb663bcfd

In [43]:
indexer_model = indexer.fit(df)
output = indexer_model.transform(df)
output.printSchema()

root
 |-- movieId: integer (nullable = true)
 |-- rating: double (nullable = true)
 |-- userId: integer (nullable = true)
 |-- mealskew: double (nullable = true)
 |-- meal_name: string (nullable = true)
 |-- meal_name_index: double (nullable = false)



In [44]:
output.show(5)

+-------+------+------+--------+--------------------+---------------+
|movieId|rating|userId|mealskew|           meal_name|meal_name_index|
+-------+------+------+--------+--------------------+---------------+
|      2|   3.0|     0|     2.0|       Chicken Curry|            4.0|
|      3|   1.0|     0|     3.0|Spicy Chicken Nug...|           26.0|
|      5|   2.0|     0|     5.0|           Hamburger|           24.0|
|      9|   4.0|     0|     9.0|       Taco Surprise|           13.0|
|     11|   1.0|     0|    11.0|            Meatloaf|           28.0|
+-------+------+------+--------+--------------------+---------------+
only showing top 5 rows



In [48]:
output.groupBy('meal_name').count().show()

+--------------------+-----+
|           meal_name|count|
+--------------------+-----+
| Penne Tomatoe Pasta|   12|
|              Nachos|   16|
|   Pulled Pork Plate|   20|
|    Spicy Beef Plate|   14|
|   Roasted Eggplant |   15|
|           Hamburger|   13|
|        Ceaser Salad|   17|
|       Chicken Curry|   19|
|      Orange Chicken|   15|
|    Fried Rice Plate|   11|
|   Chicken Chow Mein|    7|
|     Southwest Salad|   13|
|         Sushi Plate|   13|
|Cheesesteak Sandw...|   17|
|               Chili|   14|
|    Kung Pao Chicken|   18|
|             Burrito|   13|
| Roast Beef Sandwich|   14|
|            Meatloaf|   12|
|       Taco Surprise|   16|
+--------------------+-----+
only showing top 20 rows



In [50]:
output.groupBy('meal_name_index').count().show()

+---------------+-----+
|meal_name_index|count|
+---------------+-----+
|            8.0|   17|
|            0.0|   20|
|            7.0|   17|
|           29.0|   12|
|           18.0|   15|
|            1.0|   20|
|           25.0|   13|
|            4.0|   19|
|           23.0|   13|
|           31.0|    7|
|           11.0|   16|
|           21.0|   14|
|           14.0|   15|
|           22.0|   14|
|            3.0|   19|
|           19.0|   14|
|           28.0|   12|
|            2.0|   20|
|           17.0|   15|
|           27.0|   13|
+---------------+-----+
only showing top 20 rows



In [51]:
output.columns

['movieId', 'rating', 'userId', 'mealskew', 'meal_name', 'meal_name_index']

In [52]:
output.count()

486

In [53]:
train, test = output.randomSplit([0.7,0.3],seed=36)

In [55]:
train.count(), test.count()

(340, 146)

In [56]:
als = ALS(userCol='userId',itemCol='meal_name_index',ratingCol='rating')
als

ALS_bae8a72677a4

In [57]:
als_model = als.fit(train)
als_model

ALSModel: uid=ALS_bae8a72677a4, rank=10

In [58]:
recommendations = als_model.transform(test)
recommendations.show()

+-------+------+------+--------+--------------------+---------------+----------+
|movieId|rating|userId|mealskew|           meal_name|meal_name_index|prediction|
+-------+------+------+--------+--------------------+---------------+----------+
|      1|   1.0|    26|     1.0|             Burrito|           23.0| 1.7785635|
|      3|   2.0|    22|     3.0|Spicy Chicken Nug...|           26.0| 0.8549367|
|      3|   1.0|     1|     3.0|Spicy Chicken Nug...|           26.0| 0.5430573|
|      6|   1.0|     1|     6.0|  Spicy Pork Sliders|            2.0| 1.4198838|
|      1|   1.0|     3|     1.0|             Burrito|           23.0| 1.3585114|
|      1|   1.0|    20|     1.0|             Burrito|           23.0| 0.9114408|
|      0|   1.0|     5|     0.0|        Cheese Pizza|           10.0|0.82214355|
|      4|   1.0|    19|     4.0|Pretzels and Chee...|            9.0| 1.6613698|
|      5|   1.0|     9|     5.0|           Hamburger|           24.0| 1.1191124|
|      0|   1.0|     8|     

In [59]:
evaluator = RegressionEvaluator(labelCol='rating')
rmse = evaluator.evaluate(recommendations)
print("RMSE of ALS recommendation model:", rmse)

RMSE of ALS recommendation model: 1.1470106210145947


In [61]:
recommendations.orderBy('prediction',ascending=False).show()

+-------+------+------+--------+--------------------+---------------+----------+
|movieId|rating|userId|mealskew|           meal_name|meal_name_index|prediction|
+-------+------+------+--------+--------------------+---------------+----------+
|     18|   3.0|    26|    18.0|     Pepperoni Pizza|           16.0| 3.6457257|
|     29|   4.0|     7|    29.0|        Pork Sliders|            0.0| 3.3155763|
|     23|   4.0|    23|    23.0|      Orange Chicken|           15.0| 3.3064756|
|     30|   4.0|    23|    30.0| Vietnamese Sandwich|           22.0| 3.1595979|
|      2|   2.0|     7|     2.0|       Chicken Curry|            4.0| 3.1249466|
|     29|   1.0|    28|    29.0|        Pork Sliders|            0.0| 2.8809025|
|     18|   4.0|     3|    18.0|     Pepperoni Pizza|           16.0| 2.7740045|
|     19|   4.0|    11|    19.0|Cheesesteak Sandw...|            7.0| 2.7025313|
|     18|   4.0|    27|    18.0|     Pepperoni Pizza|           16.0|  2.696047|
|     18|   1.0|     8|    1

In [62]:
single_user = test.filter(test['userId'] == 22).select('meal_name_index','userId')
single_user.show()

+---------------+------+
|meal_name_index|userId|
+---------------+------+
|           26.0|    22|
|           13.0|    22|
|            5.0|    22|
|            7.0|    22|
|            1.0|    22|
|           22.0|    22|
+---------------+------+



In [64]:
recommendations = als_model.transform(single_user)
recommendations.show()

+---------------+------+----------+
|meal_name_index|userId|prediction|
+---------------+------+----------+
|           26.0|    22| 0.8549367|
|           22.0|    22| 2.3131099|
|            1.0|    22| 1.5381551|
|           13.0|    22|0.62651986|
|            5.0|    22| 1.6579856|
|            7.0|    22| 0.9563426|
+---------------+------+----------+



In [65]:
recommendations = recommendations.orderBy('prediction',ascending=False)
recommendations.show()

+---------------+------+----------+
|meal_name_index|userId|prediction|
+---------------+------+----------+
|           22.0|    22| 2.3131099|
|            5.0|    22| 1.6579856|
|            1.0|    22| 1.5381551|
|            7.0|    22| 0.9563426|
|           26.0|    22| 0.8549367|
|           13.0|    22|0.62651986|
+---------------+------+----------+



In [83]:
results = recommendations.select('meal_name_index')
results.show()

+---------------+
|meal_name_index|
+---------------+
|           22.0|
|            5.0|
|            1.0|
|            7.0|
|           26.0|
|           13.0|
+---------------+



In [81]:
index_to_string = IndexToString(inputCol='meal_name_index',outputCol='meal_name',labels=indexer_model.labels)
index_to_string.transform(recommendations.select('meal_name_index')).show()

+---------------+--------------------+
|meal_name_index|           meal_name|
+---------------+--------------------+
|           22.0| Vietnamese Sandwich|
|            5.0|    Kung Pao Chicken|
|            1.0|   Pulled Pork Plate|
|            7.0|Cheesesteak Sandw...|
|           26.0|Spicy Chicken Nug...|
|           13.0|       Taco Surprise|
+---------------+--------------------+



In [84]:
def recommend_meals(test_user_ratings):
    recommendations = als_model.transform(test_user_ratings)
    recommendations = recommendations.orderBy('prediction',ascending=False)
    results = index_to_string.transform(recommendations.select('meal_name_index')).select('meal_name').toPandas().values
    results = [x[0] for x in results]
    return results

In [86]:
recommended_meals = recommend_meals(test.filter(test['userId'] == 9).select(['meal_name_index','userId']))
recommended_meals

['Roast Beef Sandwich',
 'Taco Surprise',
 'Spicy Beef Plate',
 'Hamburger',
 'Pulled Pork Plate',
 'Nachos',
 'Lasagna',
 'Cheesesteak Sandwhich',
 'Chili']

In [87]:
recommended_meals = recommend_meals(test.filter(test['userId'] == 16).select(['meal_name_index','userId']))
recommended_meals

['Nachos', 'Spicy Pork Sliders', 'Pork Sliders']

In [88]:
recommended_meals = recommend_meals(test.filter(test['userId'] == 27).select(['meal_name_index','userId']))
recommended_meals

['Pepperoni Pizza', 'Pork Sliders']

In [89]:
recommended_meals = recommend_meals(test.filter(test['userId'] == 24).select(['meal_name_index','userId']))
recommended_meals

['Chicken Wrap', 'Spicy Pork Sliders', 'Ceaser Salad']