# Do chefs have lucky food?
### Question: Is there any ingredient which brings good luck (win) to chefs?
I wondered whether one chef may have his/her favorite food (ingredient). Let's call it his/her lucky food (ingredients). 

In [1]:
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.utils import np_utils
import pickle
import numpy as np
import pandas as pd
import copy
from collections import Counter
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from collections import defaultdict
from helper import *

Using TensorFlow backend.


# Step 1: Load & Prepare dataset

In [2]:
with open("data/_feature_match_data.pkl", "rb") as fp:
    match_data = pickle.load(fp)
    
with open("data/_nID_ingredients_dict.pkl", "rb") as fp:
    nID_ingredients_dict = pickle.load(fp)
    
lemm = pd.read_csv("data/ing.txt", encoding = 'utf-8', sep = ',')

In [3]:
ingredients = []
for k in list(nID_ingredients_dict.keys()):
    ing = nID_ingredients_dict.get(k)
    for i in ing:
        ingredients.append(i)

ing_freq = Counter(ingredients)

In [4]:
ing_df = pd.DataFrame.from_dict(ing_freq, orient = 'index').reset_index()
ing_df = ing_df.rename(columns = {'index': 'ingredient', 0:'cnt'})
ing_df = ing_df.sort_values(by="cnt", ascending = False).reset_index()

### Lemmatize ingredients
Borrowing the concept from NLP, I decided to group ingredients that are similar, call it [lemmatization](https://en.wikipedia.org/wiki/Lemmatisation) of ingredients. For example, I merged "minced garlic" with "garlic" and so and so forth.

In [5]:
ing_df["lemm"] = ing_df["ingredient"]
for row in lemm.itertuples():
    row_idx = row.row_idx
    new_name = row.ingb
    ing_df.at[row_idx,"lemm"] = new_name
ing_df[:3]

Unnamed: 0,index,ingredient,cnt,lemm
0,229.0,소금,415.0,깨소금후추
1,447.0,양파,342.0,양파
2,706.0,후추,336.0,깨소금후추


Easy enough? Lemmetization can be more precise (meaning a lot more manual work), but I am satisfied with what I have so far.

In [6]:
ing_df["index"] = pd.Series(list(range(0,len(ing_df)))).values 
ing_lemm_dict = dict(zip(ing_df.ingredient, ing_df.lemm))
ing_index_dict = dict(zip(ing_df.ingredient, ing_df.index))
index_ing_dict = dict(zip(ing_df.index, ing_df.ingredient))
ing_count_dict = dict(zip(ing_df.ingredient, ing_df.cnt))

In [7]:
lemm_ing_df = ing_df[["lemm", "cnt"]]
lemm_ing_df = lemm_ing_df.groupby("lemm")["cnt"].apply(sum).reset_index()
lemm_ing_df = lemm_ing_df.sort_values(by="cnt", ascending = False).reset_index(drop = True)
lemm_ing_df["lem_ing_id"] = pd.Series(list(range(0,len(lemm_ing_df)))).values

### TF-IDF on lucky food
I am throwing away most occuring ingredients (such as salt, soy sauce, etc.) since it is just too common to be any significance. I set an arbitrary value at freq > 100

In [8]:
# Get rid of those that occur more than 100 times
mask = lemm_ing_df["cnt"] <= 100
key_ing_df = lemm_ing_df.loc[mask]
key_leming_lemid_dict = dict(zip(key_ing_df.lemm, key_ing_df.lem_ing_id))

In [9]:
# Make new ingredients dictionary
nID_ingredients_new_dict = defaultdict(list)

for k in list(nID_ingredients_dict.keys()):
    ing = nID_ingredients_dict.get(k)
    for i in ing:
        # See if each ingredients can be lemmetized
        lem_ing = ing_lemm_dict.get(i, -1)
        if lem_ing == -1:
            pass
        else:
            if lem_ing in key_leming_lemid_dict.keys():
                nID_ingredients_new_dict[k].append(lem_ing)
            else:
                pass

---------------------------------------
# Step 2: Predict
### Finding whether ingredients can be used to predict the win? 

In [10]:
def lucky_ingredients_model(chef_name, total_df, nID_ingredients_new_dict):
    # Prepare X, Y
    updated_matrix, matrixID_ing_dict, X, Y = prepare_XY(chef_name, total_df,nID_ingredients_new_dict)
    
    # Train
    accuracy = train_model(X,Y)
    
    return accuracy, updated_matrix, matrixID_ing_dict

In [11]:
accuracy, updated_matrix, matrixID_ing_dict = lucky_ingredients_model("김풍", match_data, nID_ingredients_new_dict)

In [12]:
accuracy

0.7142857142857143

### It looks like 김풍 has lucky food, meaning using ingredients we can predict 김풍's win 71% of the time.
Now this is probably wrong. Because my test data size is way way way small. As a proof, if you try this multiple times, you will get an unstable series of accuracy. However, for 김풍, the result was always more than 50%, whereas other chef's value varied from 20% to high 60%. Coincidents? I don't know (I am not joking. I don't know for real)
#### -- It does not mean 김풍 only wins with 통조림, 과자, 인스턴트 재료 (as he is portrayed in the show)
It probably has to do with the fact that he loses a lot. His winning rate is the thirties. Plus, as I will show in the later section, his lucky food is not all instant ingredients (even though a good portion, it is)
#### -- One more time (because it is important) it does not mean that the model is accurate in any sense
I need more data. If NBH (냉장고를 부탁해) show goes on for the next 10 years, and I will tell you about the model accuracy then
</br>
</br>
### But surely not bad for the proof of concept!

---------------------------------------------
# Step 3: Validation
Let's manually calculate which food brought the chef a good luck of winning the cooking contest.

### In all winning matches, see which ingredients brought the most win (out of all the times it was used).

In [14]:
lucky_food_matrix = lucky_ingredients(updated_matrix, matrixID_ing_dict)

----------------
Lucky food:  부추
 Used:  3.0  times.
 Won:  3.0 times.
Lucky food:  두부
 Used:  2.0  times.
 Won:  2.0 times.
Lucky food:  통조림햄
 Used:  2.0  times.
 Won:  2.0 times.
Lucky food:  카레가루
 Used:  2.0  times.
 Won:  2.0 times.
Lucky food:  고체카레
 Used:  2.0  times.
 Won:  2.0 times.


The luckiest food(ingredient) for 김풍 is actually 부추(chives), 두부(tofu), 통조림햄 (canned ham, *ah-yes*), and 카레 (curry). Indeed we have a mixture of vegetables/soy ingredients (chives, tofu) and some that are considered instant (canned ham, curry powder).

--------------------------------
# Step 4: What about the others? 

Average of 4 iterations, it looks like 레이먼 킴, 김풍, 박준우 are the ones with lucky food syndrome. Okay just kidding. It is probably too immature to say anything conclusive. 레이먼 킴 has had few weeks of winning streak in the past several weeks, and 박준우's data is too small to compare. But with enough data, may be we can expect more!

In [789]:
iter_num = 4
chef_for_model = ["이재훈","이원일","레이먼 킴","미카엘","김풍","정호영","이연복","박준우","오세득","홍석천","최현석","유현수","이찬오","정창욱","샘킴"]

In [790]:
def ingredients_accuracy_itr(iter_num):
    chef_accu_dict = {}
    for q in range(iter_num):
        print(q)
        for chef in chef_for_model:
            
            accuracy, updated_matrix, matrixID_ing_dict = lucky_ingredients_model(chef, match_data)
            
            if chef not in chef_accu_dict.keys():
                chef_accu_dict[chef] = accuracy
            else:
                chef_accu_dict[chef] += accuracy
            
    chef_list = [k for k, v in chef_accu_dict.items()]
    accuracy_list = np.array([v for k, v in chef_accu_dict.items()])/iter_num
    
    for idx, chef in enumerate(chef_list):
        print("Chef: ", chef_list[idx])
        print("Accuracy: ", accuracy_list[idx])
    return chef_list, accuracy_list

chef_list, accuracy_list = ingredients_accuracy_itr(iter_num)

0
1
2
3
Chef:  이재훈
Accuracy:  0.3333333333333333
Chef:  이원일
Accuracy:  0.45833333333333337
Chef:  레이먼 킴
Accuracy:  0.75
Chef:  미카엘
Accuracy:  0.45454545454545453
Chef:  김풍
Accuracy:  0.6071428571428571
Chef:  정호영
Accuracy:  0.45
Chef:  이연복
Accuracy:  0.5576923076923077
Chef:  박준우
Accuracy:  0.75
Chef:  오세득
Accuracy:  0.41666666666666663
Chef:  홍석천
Accuracy:  0.45
Chef:  최현석
Accuracy:  0.44999999999999996
Chef:  유현수
Accuracy:  0.44999999999999996
Chef:  이찬오
Accuracy:  0.45
Chef:  정창욱
Accuracy:  0.375
Chef:  샘킴
Accuracy:  0.3846153846153846
