# Challenge 2 - Exploration

<p>Matheus Schmitz<br>
<a href="https://www.linkedin.com/in/matheusschmitz/">LinkedIn</a><br>
<a href="https://matheus-schmitz.github.io/">Github Portfolio</a></p>

## Workflow

1. Parse description / tile and generate encoded labels for desired categories
2. Clean ingredients list & create lexicon of ingredients
3. Ingredient array is not sequential, order doesn't matter
4. Model  
    1. Use ingredient array to predict categories
    2. Split array and map each word to ingredients, look at correlation
    3. Use BERT/GloVe/GPT to get word embeddings for our lexicon and train with those

## Challenges

1. Does salt make a foot meaty? I.e. Certain ingrediets are highly correlated
2. Metric? 
    1. Binary Cross Entropy
    2. Can the same ingredient have more than 1 sensorial property

## Imports

In [15]:
# Data Manipualtion
import numpy as np
import pandas as pd
import json
pd.options.display.float_format = "{:,.3f}".format
from collections import defaultdict, Counter
import string

# Auxilary
from tqdm import tqdm

# Plotting
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Load Data

In [25]:
sensorial_categories = ['creamy', 'sour', 'spicy', 'zesty', 'sweet', 'meaty']

In [11]:
df = pd.read_json('dataset_A.json')
df.head(3)

Unnamed: 0,categories,description,ingredients,instructions,keywords,nutritional,parsed_ingredients,title,total_time
0,"[Dinner, Main course]",A Bombay potato-topped coconut curry bake that...,"[2 tbsp vegetable oil, 4 skinless chicken brea...",Heat a deep frying pan or flameproof casserole...,"[Curry, Curry pie, Potato pie, Mash topped, Ma...","{'calories': '414', 'carbohydrateContent': '53...","[vegetable oil, chicken breast, onion, ginger,...",Masala chicken pie,75.0
1,"[Dessert, Treat]",Has all the flavours of Christmas without bein...,"[225g butter, 2 tbsp brandy, pinch saffron, 22...",Heat oven to 160C/fan 140C/gas 3. Butter and l...,"[Almond, Almonds, Baking powder, Brandy, Chris...","{'calories': '613', 'carbohydrateContent': '82...","[butter, brandy, saffron, golden caster sugar,...",Honey saffron Christmas cake,150.0
2,[],,"[Leftover roast lamb, thinly sliced and roughl...",,[More effort],{},,Iskender Kebab From Leftover Roast Lamb,35.0


In [17]:
# Tokenize the description
df['tokens'] = df['description'].apply(lambda text: str(text).lower().translate(str.maketrans('', '', string.punctuation)).split())

In [20]:
# Get word counts
df['word_counts'] = df['tokens'].apply(Counter)

In [56]:
# Keep only categories
def get_category_count(counter_obj):
    counter_obj = dict(counter_obj)
    category_dict = {}
    for category in sensorial_categories:
        if category in counter_obj:
            category_dict[category] = counter_obj[category]
        else:
            category_dict[category] = 0
    return category_dict

In [57]:
df['category_counts'] = df['word_counts'].apply(get_category_count)

In [58]:
# One hot encode
def extract_label_counts(df_row):
    index = df_row.name
    for category, count in df_row['category_counts'].items():
        df.at[index, category] = count

In [59]:
df[sensorial_categories] = 0 
_ = df.apply(lambda row: extract_label_counts(row), axis='columns')

In [64]:
for category in sensorial_categories:
    print(df[category].value_counts())
    print()

0    10615
1      480
2        1
Name: creamy, dtype: int64

0    11061
1       34
2        1
Name: sour, dtype: int64

0    10816
1      280
Name: spicy, dtype: int64

0    11012
1       84
Name: zesty, dtype: int64

0    10534
1      559
2        3
Name: sweet, dtype: int64

0    11051
1       45
Name: meaty, dtype: int64



In [61]:
df

Unnamed: 0,categories,description,ingredients,instructions,keywords,nutritional,parsed_ingredients,title,total_time,tokens,word_counts,creamy,sour,spicy,zesty,sweet,meaty,category_counts
0,"[Dinner, Main course]",A Bombay potato-topped coconut curry bake that...,"[2 tbsp vegetable oil, 4 skinless chicken brea...",Heat a deep frying pan or flameproof casserole...,"[Curry, Curry pie, Potato pie, Mash topped, Ma...","{'calories': '414', 'carbohydrateContent': '53...","[vegetable oil, chicken breast, onion, ginger,...",Masala chicken pie,75.000,"[a, bombay, potatotopped, coconut, curry, bake...","{'a': 1, 'bombay': 1, 'potatotopped': 1, 'coco...",0,0,0,0,0,0,"{'creamy': 0, 'sour': 0, 'spicy': 0, 'zesty': ..."
1,"[Dessert, Treat]",Has all the flavours of Christmas without bein...,"[225g butter, 2 tbsp brandy, pinch saffron, 22...",Heat oven to 160C/fan 140C/gas 3. Butter and l...,"[Almond, Almonds, Baking powder, Brandy, Chris...","{'calories': '613', 'carbohydrateContent': '82...","[butter, brandy, saffron, golden caster sugar,...",Honey saffron Christmas cake,150.000,"[has, all, the, flavours, of, christmas, witho...","{'has': 1, 'all': 1, 'the': 2, 'flavours': 1, ...",0,0,0,0,0,0,"{'creamy': 0, 'sour': 0, 'spicy': 0, 'zesty': ..."
2,[],,"[Leftover roast lamb, thinly sliced and roughl...",,[More effort],{},,Iskender Kebab From Leftover Roast Lamb,35.000,[],{},0,0,0,0,0,0,"{'creamy': 0, 'sour': 0, 'spicy': 0, 'zesty': ..."
3,[Drink],"A simple, sweet cordial you can make from left...","[150g carrots, juice 5 lemons, juice 1 orange,...","In a bowl, stir together the carrot, lemon jui...","[Carrot, Drink, Lemonade, Cordial, Quick, Easy...","{'calories': '40', 'carbohydrateContent': '10g...","[carrot, lemon, orange, golden caster sugar, i...",Carrot lemonade,0.000,"[a, simple, sweet, cordial, you, can, make, fr...","{'a': 2, 'simple': 1, 'sweet': 1, 'cordial': 1...",0,0,0,0,1,0,"{'creamy': 0, 'sour': 0, 'spicy': 0, 'zesty': ..."
4,[],,"[175g Butter, 150g Golden caster sugar, 3 Larg...",,[Easy],{},,Gluten Free Blackberry & Apple Crumble Cake,60.000,[],{},0,0,0,0,0,0,"{'creamy': 0, 'sour': 0, 'spicy': 0, 'zesty': ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11091,"[Side dish, Lunch]",Make the dressing for this super healthy salad...,"[200g bag mixed crispy salad leaves, 2 heads p...",Arrange the salad ingredients on a large platt...,"[Apple, Apples, Chicory, Poppy seed, Poppy see...","{'calories': '138', 'carbohydrateContent': '9g...","[salad leaf, chicory, apple, walnut, cherry to...","Lettuce, chicory & apple salad with poppy seed...",0.000,"[make, the, dressing, for, this, super, health...","{'make': 1, 'the': 3, 'dressing': 1, 'for': 2,...",0,0,0,0,0,0,"{'creamy': 0, 'sour': 0, 'spicy': 0, 'zesty': ..."
11092,[],,"[1/2 cup Thai coconut meat, 1/2 teaspoon mince...",,[A challenge],{},,Rawmunchies - Raw Vegan Tamago Nigiri,165.000,[],{},0,0,0,0,0,0,"{'creamy': 0, 'sour': 0, 'spicy': 0, 'zesty': ..."
11093,[],Bake 20 mins,"[100g/3½oz butter, 75ml/2½fl oz golden syrup, ...",,[Easy],{},,Breakfast bar,35.000,"[bake, 20, mins]","{'bake': 1, '20': 1, 'mins': 1}",0,0,0,0,0,0,"{'creamy': 0, 'sour': 0, 'spicy': 0, 'zesty': ..."
11094,"[Side dish, Dinner, Lunch]","Crispy, moreish baked potatoes cooked with jus...","[6 medium baking potatoes, olive oilolive oil,...",Heat oven to 200C/fan 180C/gas 6. Scrub the ba...,"[Alternative roast potato dish, Roast potatoes...","{'calories': '187', 'carbohydrateContent': '26...","[baking potato, olive oil, sea salt]",Olive oil-baked potatoes,50.000,"[crispy, moreish, baked, potatoes, cooked, wit...","{'crispy': 1, 'moreish': 1, 'baked': 1, 'potat...",0,0,0,0,0,0,"{'creamy': 0, 'sour': 0, 'spicy': 0, 'zesty': ..."
