# Meals Recommendations based on Ratings

To recommend new meals to customers based on the other customers likings for a Food Delivery Company.

Final result should be in the form of a function that can take in a Spark DataFrame of a single customer's ratings for various meals and output their top 3 suggested meals

### Creating a Spark Session

In [1]:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("meals").getOrCreate()

Intitializing Scala interpreter ...

Spark Web UI available at http://Varun-CK:4040
SparkContext available as 'sc' (version = 2.3.0, master = local[*], app id = local-1577814168238)
SparkSession available as 'spark'


2019-12-31 23:12:44 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-12-31 23:13:03 WARN  SparkContext:66 - Using an existing SparkContext; some configuration may not take effect.


import org.apache.spark.sql.SparkSession
spark: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@20766ee


### Initializing Logger

In [2]:
import org.apache.log4j._
Logger.getLogger("org").setLevel(Level.ERROR)

import org.apache.log4j._


### Using Spark to read the movie lens data set

In [3]:
val data = spark.read.options(Map(("header","true"),("inferSchema","true"))).csv("Meal_Info.csv")

data: org.apache.spark.sql.DataFrame = [movieId: int, rating: double ... 3 more fields]


### Printing the first few rows of the dataframe

In [4]:
data.show(5)

+-------+------+------+--------+--------------------+
|movieId|rating|userId|mealskew|           meal_name|
+-------+------+------+--------+--------------------+
|      2|   3.0|     0|     2.0|       Chicken Curry|
|      3|   1.0|     0|     3.0|Spicy Chicken Nug...|
|      5|   2.0|     0|     5.0|           Hamburger|
|      9|   4.0|     0|     9.0|       Taco Surprise|
|     11|   1.0|     0|    11.0|            Meatloaf|
+-------+------+------+--------+--------------------+
only showing top 5 rows



### Describe

In [5]:
data.describe().show()

+-------+------------------+------------------+------------------+------------------+-------------------+
|summary|           movieId|            rating|            userId|          mealskew|          meal_name|
+-------+------------------+------------------+------------------+------------------+-------------------+
|  count|              1501|              1501|              1501|               486|               1501|
|   mean| 49.40572951365756|1.7741505662891406|14.383744170552964|15.502057613168724|               null|
| stddev|28.937034065088994| 1.187276166124803| 8.591040424293272| 9.250633630277568|               null|
|    min|                 0|               1.0|                 0|               0.0|           BBQ Ribs|
|    max|                99|               5.0|                29|              31.0|Vietnamese Sandwich|
+-------+------------------+------------------+------------------+------------------+-------------------+



### Count

In [6]:
data.count()

res3: Long = 1501


### Schema

In [7]:
data.printSchema()

root
 |-- movieId: integer (nullable = true)
 |-- rating: double (nullable = true)
 |-- userId: integer (nullable = true)
 |-- mealskew: double (nullable = true)
 |-- meal_name: string (nullable = true)



### Count by dropping Duplicates

In [9]:
data.na.drop().count()

res6: Long = 486


### Dropping the null values

In [10]:
val filtered_data = data.na.drop()

filtered_data: org.apache.spark.sql.DataFrame = [movieId: int, rating: double ... 3 more fields]


In [11]:
filtered_data.show(5)

+-------+------+------+--------+--------------------+
|movieId|rating|userId|mealskew|           meal_name|
+-------+------+------+--------+--------------------+
|      2|   3.0|     0|     2.0|       Chicken Curry|
|      3|   1.0|     0|     3.0|Spicy Chicken Nug...|
|      5|   2.0|     0|     5.0|           Hamburger|
|      9|   4.0|     0|     9.0|       Taco Surprise|
|     11|   1.0|     0|    11.0|            Meatloaf|
+-------+------+------+--------+--------------------+
only showing top 5 rows



### Since the dataset is smaller, we can use 80% and 20% for train-test splits

In [18]:
val Array(train_data,test_data) = filtered_data.randomSplit(Array(0.8,0.2))

train_data: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [movieId: int, rating: double ... 3 more fields]
test_data: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [movieId: int, rating: double ... 3 more fields]


In [20]:
filtered_data.describe().show()

+-------+------------------+------------------+-----------------+------------------+-------------------+
|summary|           movieId|            rating|           userId|          mealskew|          meal_name|
+-------+------------------+------------------+-----------------+------------------+-------------------+
|  count|               486|               486|              486|               486|                486|
|   mean|15.502057613168724|1.7366255144032923|14.46707818930041|15.502057613168724|               null|
| stddev| 9.250633630277568|1.1808507031723887| 8.56063554474528| 9.250633630277568|               null|
|    min|                 0|               1.0|                0|               0.0|           BBQ Ribs|
|    max|                31|               5.0|               29|              31.0|Vietnamese Sandwich|
+-------+------------------+------------------+-----------------+------------------+-------------------+



In [21]:
train_data.describe().show()

+-------+------------------+------------------+------------------+------------------+-------------------+
|summary|           movieId|            rating|            userId|          mealskew|          meal_name|
+-------+------------------+------------------+------------------+------------------+-------------------+
|  count|               399|               399|               399|               399|                399|
|   mean|15.401002506265664| 1.744360902255639|14.431077694235588|15.401002506265664|               null|
| stddev| 9.254068723387551|1.1861055017923565| 8.363274381687813| 9.254068723387551|               null|
|    min|                 0|               1.0|                 0|               0.0|           BBQ Ribs|
|    max|                31|               5.0|                29|              31.0|Vietnamese Sandwich|
+-------+------------------+------------------+------------------+------------------+-------------------+



In [22]:
test_data.describe().show()

+-------+-----------------+------------------+------------------+-----------------+-------------------+
|summary|          movieId|            rating|            userId|         mealskew|          meal_name|
+-------+-----------------+------------------+------------------+-----------------+-------------------+
|  count|               87|                87|                87|               87|                 87|
|   mean|15.96551724137931|1.7011494252873562|14.632183908045977|15.96551724137931|               null|
| stddev|9.274180557874002|1.1625447481133553| 9.463657465110797|9.274180557874002|               null|
|    min|                0|               1.0|                 0|              0.0|           BBQ Ribs|
|    max|               31|               5.0|                29|             31.0|Vietnamese Sandwich|
+-------+-----------------+------------------+------------------+-----------------+-------------------+



### Setting up Recommender system

In [23]:
import org.apache.spark.ml.evaluation.RegressionEvaluator
import org.apache.spark.ml.recommendation.ALS

import org.apache.spark.ml.evaluation.RegressionEvaluator
import org.apache.spark.ml.recommendation.ALS


#### Build the recommendation model using ALS on the training data

In [24]:
val als = new ALS().setUserCol("userId").setItemCol("mealskew").setRatingCol("rating").setMaxIter(5).setRegParam(0.01)

als: org.apache.spark.ml.recommendation.ALS = als_9fa1c17eed52


In [25]:
val model = als.fit(train_data)

2019-12-31 23:20:40 WARN  BLAS:61 - Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
2019-12-31 23:20:40 WARN  BLAS:61 - Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
2019-12-31 23:20:41 WARN  LAPACK:61 - Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
2019-12-31 23:20:41 WARN  LAPACK:61 - Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK


model: org.apache.spark.ml.recommendation.ALSModel = als_9fa1c17eed52


#### Now let's see how the model performed !

In [26]:
//Getting the predictions on test data
val predictions = model.transform(test_data)

predictions: org.apache.spark.sql.DataFrame = [movieId: int, rating: double ... 4 more fields]


In [27]:
predictions.show(5)

+-------+------+------+--------+------------+----------+
|movieId|rating|userId|mealskew|   meal_name|prediction|
+-------+------+------+--------+------------+----------+
|     31|   1.0|    27|    31.0|Chicken Wrap| 1.5365268|
|     31|   4.0|    12|    31.0|Chicken Wrap|-1.5856699|
|     31|   1.0|     5|    31.0|Chicken Wrap| 3.2605164|
|     31|   3.0|     8|    31.0|Chicken Wrap| 1.8514191|
|     31|   1.0|    24|    31.0|Chicken Wrap| 2.6180766|
+-------+------+------+--------+------------+----------+
only showing top 5 rows



### Evaluating the model by computing the RMSE on the test data

In [28]:
// Setting evaluator to evaluate Root Mean Squared Error
val evaluator = new RegressionEvaluator().setLabelCol("rating").setPredictionCol("prediction").setMetricName("rmse")

evaluator: org.apache.spark.ml.evaluation.RegressionEvaluator = regEval_e9472da42d92


In [29]:
val rmse = evaluator.evaluate(predictions)

rmse: Double = 1.9823851621928843


In [30]:
println(f"Root Mean Squared Error: ${rmse}%1.2f")

Root Mean Squared Error: 1.98


**The RMSE 1.98 described our error in terms of the stars rating column**

### Recommendation for a particular user

In [31]:
var single_user = test_data.select("userId","mealskew")

single_user: org.apache.spark.sql.DataFrame = [userId: int, mealskew: double]


In [32]:
single_user.groupBy("userId").count().show()

+------+-----+
|userId|count|
+------+-----+
|    28|    4|
|    27|    4|
|    26|    3|
|    12|    3|
|    22|    2|
|     1|    6|
|    13|    2|
|    16|    1|
|     3|    2|
|    20|    2|
|     5|    4|
|    19|    4|
|    15|    1|
|     9|    3|
|    17|    1|
|     4|    2|
|     8|    2|
|     7|    4|
|    10|    6|
|    25|    3|
+------+-----+
only showing top 20 rows



In [33]:
single_user = single_user.filter("userId == 26")

single_user: org.apache.spark.sql.DataFrame = [userId: int, mealskew: double]


In [34]:
val recommendations = model.transform(single_user)

recommendations: org.apache.spark.sql.DataFrame = [userId: int, mealskew: double ... 1 more field]


In [35]:
recommendations.orderBy(recommendations("prediction").desc).show()

+------+--------+----------+
|userId|mealskew|prediction|
+------+--------+----------+
|    26|     1.0| 1.8736483|
|    26|    27.0| 1.8605978|
|    26|    23.0|0.77597076|
+------+--------+----------+



### Closing the spark session

In [36]:
spark.stop()

## Thank You!