# Template for Model Evaluation

template written by Nisa Ulumuddin

evaluation done by ... 

In [1]:
import sys
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
from collections import Counter
import advertools as adv
from sklearn.feature_extraction.text import CountVectorizer
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from collections import Counter
from num2words import num2words
import os
import numpy as np
import pandas as pd
import sklearn.metrics as met
from scipy.spatial.distance import euclidean

import func_similarityfunc as sim
import func_tfidf as tfidf
import func_datapreprocessing as pp
import func_predict as pred 
import func_eval as eve

In [2]:
rd_recipe = pd.read_csv('clean_recipe_data.csv')
eval_recipe = pd.read_csv('model_dataset.csv')

## Evaluation Functions 

For the evaluation of the model, we have created two evaluation functions. 

**************

### 1. Effectivity Function
The first evaluation function is called **Effectivity Function**. This function calculates the percentage of queried ingredients inside the recommended recipes outputted by the model. 

The syntax to use this function is shown below:

`percent_table = eve.effectivity_eval(Q, query,raw_recipe)`

where:
* `Q` is the list of recommended recipe indexes (of rd_recipe) outputted by the model
* `query` is the string of user query 
* `raw_recipe` is the dataframe of the recipe table 

and `percent_table` is a dataframe showing percentage values of how many ingredients in the query actually popped up in the recipes recommended by the model. 

***********

### 2. Precision/Recall Analysis Based on Model Dataset 

Unfortunately, it was highly difficult to obtain datasets which connects a query of ingredients to the clicking rate of recipes. Ideally, our team would conduct a survey of sufficient scale to gather this data. However, due to the limitations of time, we have been advised to generate a model dataset connected that is connected to a set of model queries. 

The dataset is loaded into the variable `eval_recipe`. It consists of 57 data points. Each recipe was labeled either 1 (relevant) or 0(irrelevant) to indicate whether it is a relevant recipe to the model query. We can then do statistical analysis based on these results (i.e. recall/precision type analysis)


#### Query List:

* Query 1:  beef, salt, pepper   
* Query 2: chicken, cream      
* Query 3: noodles, chicken   
* Query 4: beef, potatoes      

#### Score metric:
* 1 = fulfils query
* 0 = does not fulfil query

Here, we load the `eval_recipe` i.e. the model dataset

In [3]:
eval_recipe.head(3)

Unnamed: 0,index,recipe_id,recipe_name,user_id,rating,Query 1,Query 2,Query 3,Query 4,ingredients,cooking_directions
0,24,56648,Gravy potatoes,,3,0,0,0,1,baking potatoes ground beef small onion conden...,{'directions': u'Prep\n5 m\nCook\n1 h 15 m\nRe...
1,25,74224,meat loaf pie,,3,0,0,0,1,ground beef small onion egg ketchup worcesters...,{'directions': u'Prep\n15 m\nCook\n40 m\nReady...
2,26,70589,Serbian ground beef veggie potato bake,,3,1,0,0,1,ground beef olive oil green bell pepper onion ...,{'directions': u'Prep\n25 m\nCook\n1 h\nReady ...


As you can see, each recipe has a score of 1 or 0 based on whether they are a relevant recipe to Query 1-4.
It is important to note that these scores were made arbitrarily based on the judgment of one DS member. Therefore we advise that the results of this evaluation to be taken with a grain of salt.

In this evaluation method, the recommendation model will vectorize the recipes in the model dataset, and match the query to the recipes in this dataset. By calling the evaluation function:

`[compiled_relevancy_score, rec_df]  = PrecRec_eval(model_df,threshold, prediction_model)`

where `model_df` is eval_recipe , `threshold` is the number of results/recipes you want to evaluate for precision/recall analysis, and `prediction model` is the prediction model. The list of `prediction model` is inside the file **func_predict.py**

An example of the usage of this function is:

 `[compiled_relevancy_score, rec_df]  = eve.PrecRec_eval(eval_recipe, 7, "cosine_similarity")`

 `compiled_relevancy_score` is an array of 1s and 0s, i.e. 
 
 `[array([0, 0, 1, 0, 0, 0, 1], dtype=int64),
 array([1, 1, 1, 0, 1, 1, 0], dtype=int64),
 array([1, 0, 1, 0, 0, 0, 0], dtype=int64),
 array([1, 1, 0, 1, 0, 0, 1], dtype=int64)]`
 
which is an indication whether the first *n* recipes are relevant(1) or irrelevant(0) to queries 1 to 4. The first row belongs to results to query 1, and so on and so forth.

Taking the example of `array([1, 1, 0, 1, 0, 0, 1], dtype=int64)` , our result shows that recipe 1,2, 4 and 7 are relevant, while the rest are irrelevant. 
 
 `rec_df` is the table of results showing the recipe_name and the similarity_score computed by the model


In our model dataset, there are only **7 recipes** which are relevant to each queries. Therefore, we can only evaluate the top 7 recipes as a maximum.

***********************************

# Evaluation of Model 1:
Details of Model 1: 
- The model vectorizes the 'cooking directions' text to represent the vector of the recipe
- cosine similarity function in func_predict.py was used 

In [4]:
processed_text = []    #processed text of the whole dataset
for text in rd_recipe['cooking_directions']:
    processed_text.append(word_tokenize(pp.preprocess(text)))

In [5]:
[D, DF, N] = tfidf.vectorize_corpus(processed_text)

In [6]:
user_query = "chicken, lemon" # try a variety of queries yourself

In [7]:
[ result_index , result_score] = pred.cosine_similarity(7, user_query, D , DF, N)

The results is summarized below

In [8]:
res_df = pd.DataFrame(columns=['recipe_index','similarity_score'])
res_df.loc[:, 'recipe_index'] = result_index
res_df.loc[:,'similarity_score'] = result_score*100
res_df

Unnamed: 0,recipe_index,similarity_score
0,8608,53.026522
1,10741,52.885447
2,5545,52.307209
3,12051,51.746265
4,11254,50.056272
5,4057,48.897255
6,8303,48.633147


The first evaluation function 

In [30]:
import sys, importlib
importlib.reload(sys.modules['func_eval'])
import func_eval as eve

In [27]:
eve.effectivity_eval( result_index, user_query,rd_recipe)

Unnamed: 0,id,ingredient,percent
0,8608,"[olive, oil, fresh, lemon, juice, fresh, lemon...",1.0
1,10741,"[lemon, juice, salt, rosemary, garlic, seasoni...",1.0
2,5545,"[chicken, butter, large, lemon, halved, head, ...",1.0
3,12051,"[skinless, boneless, chicken, breast, halves, ...",1.0
4,11254,"[skinless, boneless, chicken, breasts, egg, le...",1.0
5,4057,"[skinless, boneless, chicken, breast, halves, ...",1.0
6,8303,"[cream, chicken, soup, water, lemon, juice, bu...",1.0


The second evaluation function

In [31]:
[compiled_relevancy_score, rec_df] = eve.PrecRec_eval(eval_recipe, 7, "cosine_similarity")
compiled_relevancy_score

Performing 4 queries on the model dataset...
Query 1 : beef, salt, pepper 
Query 2 : chicken, cream 
Query 3 : noodles, chicken 
Query 4 : beef, potatoes 


[array([0, 0, 1, 0, 0, 0, 1], dtype=int64),
 array([1, 1, 1, 0, 1, 1, 0], dtype=int64),
 array([1, 0, 1, 0, 0, 0, 0], dtype=int64),
 array([1, 0, 1, 1, 0, 0, 1], dtype=int64)]

In [32]:
rec_df

Unnamed: 0,Query1,sim_score_1,Query2,sim_score_2,Query3,sim_score_3,Query4,sim_score_4
0,meat loaf pie,0.435458,chicken la charlie,0.545421,spicy chicken noodle soup,0.384131,Gravy potatoes,0.447061
1,Nana's Beef Stroganoff,0.39837,Easy Cheesy Chicken I,0.448053,reuben casserole egg noodles,0.321552,better help hamburger,0.436586
2,Serbian ground beef veggie potato bake,0.359882,Easy Creamy Chicken Casserole,0.408082,white cheese chicken lasagna,0.304852,meat loaf pie,0.435911
3,Special Irish Beef Stew,0.29227,Easy Devonshire Cream,0.35867,Mike's Maple Chicken,0.285885,Serbian ground beef veggie potato bake,0.406761
4,Steff's Shepherd Pie,0.281428,Lizzy's Creamy Chicken Bake,0.350126,top ramen,0.281275,spiced potatoes,0.404776
5,better help hamburger,0.196813,Swiss Sherry Chicken,0.328089,"Garlic Chicken, Vegetable and Rice Skillet",0.262133,Nana's Beef Stroganoff,0.326788
6,Savory Vegetable Beef Stew,0.193243,Mike's Maple Chicken,0.295445,Simple and Easy Coq au Vin,0.262111,Steff's Shepherd Pie,0.313023


For the first analysis, we want to produce a precision/recall score for the 4 queries. We first need to create an array for predicted relevancy score. Since our model lists the recipe match in descending order, we will set all of our relevancy score to 1. 

In [11]:
y_pred = np.ones((4,7))
y_pred

array([[1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1.]])

In [18]:
np.shape(compiled_relevancy_score[0])

(7,)

In [25]:
precision = []
recall = []
for f in range(0,len(compiled_relevancy_score)):
    precision.append(met.precision_score(compiled_relevancy_score[f], y_pred[f], average='weighted', zero_division = 0))
    recall.append(met.recall_score(compiled_relevancy_score[f], y_pred[f], average='weighted' , zero_division = 0 ))

print("The precision score of the model is {}".format(precision))
print("The recall score of the model is {}".format(recall))

The precision score of the model is [0.08163265306122448, 0.5102040816326531, 0.08163265306122448, 0.32653061224489793]
The recall score of the model is [0.2857142857142857, 0.7142857142857143, 0.2857142857142857, 0.5714285714285714]


I've started the analysis for you.. now it's your turn to continue! You can choose whether to use both functions or just one. It all just depends on your comfort-zone.

# Evaluation of Model 2

Details of Model 2: 
- The model vectorizes both the 'cooking directions' + 'ingredients' text to represent the vector of the recipe
- cosine similarity function in func_predict.py was used 

Loading the model...

In [None]:
rd_recipe['dir_and_ingredients'] = rd_recipe['cooking_directions'] + rd_recipe['ingredients']

In [None]:
processed_text_2 = []    #processed text of the whole dataset
for text in rd_recipe['dir_and_ingredients']:
    processed_text_2.append(word_tokenize(pp.preprocess(text)))

In [None]:
[D_2, DF_2, N_2] = tfidf.vectorize_corpus(processed_text_2)

In [None]:
[ result_index_2 , result_score_2] = pred.cosine_similarity(7, user_query, D_2 , DF_2, N_2)

In [None]:
result_index_2

Do an evaluation of Model 2!

# Evaluation of Model 3

Details of Model 3: 
- The model vectorizes the 'cooking directions' text to represent the vector of the recipe
- eucledian spatial distance was used to calculate vector distance  

In [None]:
from scipy.spatial.distance import euclidean

In [None]:
[ result_index_3 , result_score_3] = pred.euclidean_similarity(7, user_query, D , DF, N)

In [None]:
result_index_3

Do an evaluation of Model 3!