# Consulting Project 
## Recommender Systems - Solutions

The whole world seems to be hearing about your new amazing abilities to analyze big data and build useful systems for them! You've just taken up a new contract with a new online food delivery company. This company is trying to differentiate itself by recommending new meals to customers based off of other customers likings.

Can you build them a recommendation system?

Your final result should be in the form of a function that can take in a Spark DataFrame of a single customer's ratings for various meals and output their top 3 suggested meals. For example:

Best of luck!

** *Note from Jose: I completely made up this food data, so its likely that the actual recommendations themselves won't make any sense. But you should get a similar output to what I did given the example customer dataframe* **

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('movielens_ratings.csv')

In [3]:
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
movieId,1501.0,49.40573,28.937034,0.0,24.0,50.0,74.0,99.0
rating,1501.0,1.774151,1.187276,1.0,1.0,1.0,2.0,5.0
userId,1501.0,14.383744,8.59104,0.0,7.0,14.0,22.0,29.0


In [4]:
df.corr()

Unnamed: 0,movieId,rating,userId
movieId,1.0,0.036569,0.003267
rating,0.036569,1.0,0.056411
userId,0.003267,0.056411,1.0


In [5]:
import numpy as np
df['mealskew'] = df['movieId'].apply(lambda id: 32 if id > 31 else id)

In [6]:
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
movieId,1501.0,49.40573,28.937034,0.0,24.0,50.0,74.0,99.0
rating,1501.0,1.774151,1.187276,1.0,1.0,1.0,2.0,5.0
userId,1501.0,14.383744,8.59104,0.0,7.0,14.0,22.0,29.0
mealskew,1501.0,26.658228,9.343578,0.0,24.0,32.0,32.0,32.0


In [7]:
mealmap = { 2. : "Chicken Curry",   
           3. : "Spicy Chicken Nuggest",   
           5. : "Hamburger",   
           9. : "Taco Surprise",  
           11. : "Meatloaf",  
           12. : "Ceaser Salad",  
           15. : "BBQ Ribs",  
           17. : "Sushi Plate",  
           19. : "Cheesesteak Sandwhich",  
           21. : "Lasagna",  
           23. : "Orange Chicken",
           26. : "Spicy Beef Plate",  
           27. : "Salmon with Mashed Potatoes",  
           28. : "Penne Tomatoe Pasta",  
           29. : "Pork Sliders",  
           30. : "Vietnamese Sandwich",  
           31. : "Chicken Wrap",  
           32: "Cowboy Burger",   
           4. : "Pretzels and Cheese Plate",   
           6. : "Spicy Pork Sliders",  
           13. : "Mandarin Chicken PLate",  
           14. : "Kung Pao Chicken",
           16. : "Fried Rice Plate",  
           8. : "Chicken Chow Mein",  
           10. : "Roasted Eggplant ",  
           18. : "Pepperoni Pizza",  
           22. : "Pulled Pork Plate",   
           0. : "Cheese Pizza",   
           1. : "Burrito",   
           7. : "Nachos",  
           24. : "Chili",  
           20. : "Southwest Salad",  
           25.: "Roast Beef Sandwich"}

In [8]:
df['meal_name'] = df['mealskew'].map(mealmap)

In [9]:
df.to_csv('Meal_Info.csv',index=False)

In [10]:
import findspark
findspark.init()
from pyspark.sql import SparkSession

In [11]:
spark = SparkSession.builder.appName('recconsulting').getOrCreate()

In [12]:
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS

In [13]:
data = spark.read.csv('Meal_Info.csv',inferSchema=True,header=True)

In [14]:
data.head()

Row(movieId=2, rating=3.0, userId=0, mealskew=2, meal_name='Chicken Curry')

In [15]:
data.describe().show()

+-------+------------------+------------------+------------------+------------------+-------------------+
|summary|           movieId|            rating|            userId|          mealskew|          meal_name|
+-------+------------------+------------------+------------------+------------------+-------------------+
|  count|              1501|              1501|              1501|              1501|               1501|
|   mean| 49.40572951365756|1.7741505662891406|14.383744170552964|26.658227848101266|               null|
| stddev|28.937034065088994| 1.187276166124803| 8.591040424293272| 9.343577861685747|               null|
|    min|                 0|               1.0|                 0|                 0|           BBQ Ribs|
|    max|                99|               5.0|                29|                32|Vietnamese Sandwich|
+-------+------------------+------------------+------------------+------------------+-------------------+



In [16]:
(training, test) = data.randomSplit([0.8, 0.2])

In [17]:
# Build the recommendation model using ALS on the training data
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="mealskew", ratingCol="rating")

In [18]:
model = als.fit(training)

In [19]:
# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)

predictions.show()

evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))

+-------+------+------+--------+--------------------+------------+
|movieId|rating|userId|mealskew|           meal_name|  prediction|
+-------+------+------+--------+--------------------+------------+
|     31|   1.0|     5|      31|        Chicken Wrap|   1.8392534|
|     31|   1.0|     4|      31|        Chicken Wrap|   2.4786062|
|     28|   1.0|     5|      28| Penne Tomatoe Pasta|    1.050807|
|     28|   2.0|    15|      28| Penne Tomatoe Pasta|-0.024453297|
|     26|   1.0|    22|      26|    Spicy Beef Plate|    2.404933|
|     26|   1.0|    13|      26|    Spicy Beef Plate|   0.4080942|
|     26|   1.0|     7|      26|    Spicy Beef Plate|   2.0481837|
|     26|   2.0|    25|      26|    Spicy Beef Plate| -0.10954741|
|     26|   1.0|    18|      26|    Spicy Beef Plate| -0.42588186|
|     27|   3.0|    27|      27|Salmon with Mashe...|   3.6852567|
|     27|   1.0|     1|      27|Salmon with Mashe...| -0.23731628|
|     27|   1.0|    13|      27|Salmon with Mashe...|   2.1944