After reading this notebook, we hope you will have a better understanding of how the feature "highlighting the overlapped ingredients" works. The figure below explains how we extract the ingredient root nouns from (a) ingredients and (b) instructions. 

For every instance of ingredients, we extract the root noun and derive its lemmatized form; for each paragrah of instructions, we additionally use a dictionary-based approach to filter out the non-ingredients root nouns. 
The ingredient dictionary is saved in database.pickle

<img src="../data/img/spacy.png" alt="spacy" style="width: 600px;"/>

After extracting two sets of root nouns, we calculate the set recall and set precision and present in the web page

In [1]:
from dependency import parent_dir
from common.basics import *
from common.save import load_pickle
import spacy

In [2]:
!python -m spacy download en_core_web_lg

[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_lg')


In [3]:
from utils.spacy_func import spacy_extension

In [4]:
def highlight(ingredients, directions, generate = 'directions'):
    
    '''Args:
    ingr: A list of ingredients; It should not contain any duplicated elements;
    instr: Can be str or list, containing a paragraph of cooking instructions
    '''
    
    # check the inputs
    assert generate in ['directions', 'ingredients']
    directions = ' '.join(directions) if type(directions) == list else directions
    
    # send to spacy
    root_ingr, hl_ingr = sp.ingr(ingredients)
    instr, hl_instr = sp.instr(directions)
    
    # highlighting
    root_instr = []
    for chunk in instr.noun_chunks:
        idx_rootnoun = chunk.end - 1
        str_rootnoun = instr[idx_rootnoun].lemma_
        if str_rootnoun in root_ingr:
            root_instr.append(str_rootnoun)
            hl_instr[idx_rootnoun]['highlight'] = 'correct'
            
            for idx, root in enumerate(root_ingr):
                if root == str_rootnoun:
                    for j, word in enumerate(hl_ingr[idx]):
                        if word['highlight'] =='wrong':
                            hl_ingr[idx][j]['highlight'] = 'correct'
            
        elif str_rootnoun in database:
            root_instr.append(str_rootnoun)
            hl_instr[idx_rootnoun]['highlight'] = 'wrong'
            
    # delimit the sentences
    def parse_instr(hl_instr):
        par_hl, sent = [], []
        for word in hl_instr:
            if word['text'] !='.':
                sent.append(word)
            else:
                sent.append(word)
                par_hl.append(sent)
                sent = []
        if sent:
            par_hl.append(sent)
        return par_hl
    
    hl_instr = parse_instr(hl_instr)
   
    # calculate precision and recall
    root_ingr, root_instr = set(root_ingr), set(root_instr)
    TP = len(root_ingr & root_instr)
    recall = TP/len(root_ingr) if len(root_ingr) >0 else 0
    precision = TP/len(root_instr) if len(root_instr) >0 else 0
    
    # if this is ingredients generation
    if generate == 'ingredients':
        recall, precision = precision, recall
    return {'ingredients': hl_ingr, 'directions': hl_instr, 'recall': recall, 'precision': precision}

example of usage

In [5]:
database = load_pickle('../big_data/database.pickle')
sp = spacy_extension()

In [6]:
ingredeints = ['garlic','breasts','bsp. oil']
directions = 'heat oil in large skillet on medium - high heat. \
Add chicken and garlic . or until chicken is done'
output = highlight(ingredeints, directions)

In [7]:
output

{'ingredients': [[{'text': 'garlic', 'highlight': 'correct'}],
  [{'text': 'breasts', 'highlight': 'wrong'}],
  [{'text': 'bsp.', 'highlight': None},
   {'text': 'oil', 'highlight': 'correct'}]],
 'directions': [[{'text': 'heat', 'highlight': None},
   {'text': 'oil', 'highlight': 'correct'},
   {'text': 'in', 'highlight': None},
   {'text': 'large', 'highlight': None},
   {'text': 'skillet', 'highlight': None},
   {'text': 'on', 'highlight': None},
   {'text': 'medium', 'highlight': None},
   {'text': '-', 'highlight': None},
   {'text': 'high', 'highlight': None},
   {'text': 'heat', 'highlight': None},
   {'text': '.', 'highlight': None}],
  [{'text': 'Add', 'highlight': None},
   {'text': 'chicken', 'highlight': 'wrong'},
   {'text': 'and', 'highlight': None},
   {'text': 'garlic', 'highlight': 'correct'},
   {'text': '.', 'highlight': None}],
  [{'text': 'or', 'highlight': None},
   {'text': 'until', 'highlight': None},
   {'text': 'chicken', 'highlight': 'wrong'},
   {'text': 'is