# Data Acquisition : World recipes #

## Preliminary Explanations ##


The idea behind this project is to collect data from recipes from different regions around the world, and later to exploit it. This jupyter notebook contains the code for the preliminary data acquisition step. Throughout the notebook, the use of functions are omnipresent to better reuse code and save time.

## Datasets ##

The main dataset that was used is not exactly one provided in the cluster. After careful analysis of the HTML webscraps in the recipe dataset, we found much more convenient to directly access the websites ourselves and scrap data on our own. The elected website was www.allrecipes.com. It is very structured, easy to navigate, and above all, all recipe pages are written the same way in HTML, a scraper's paradise. 

The site has little interesting details and perks that were exploited for our data collection. First of all, it is possible to add a 'print' statement to the URL to request a much cleaner HTML page, which facilitates scrapping immensely. Secondly, it is possible to specify in the URL the amount of servings the recipe is made for, and in which measuring system (metric, imperial) the ingredients' quantities have to be specified. Recipes are also given with a rating and the amount of people who pressed the 'made it' button which helps assess the recipe's popularity. By modifying the URL a bit, it is possible to request an HTML page containing the nutrition facts for the recipe (per serving), which does help quite a lot. Finally, the site is organized in a sort of tree structure, so it is easy to access all recipes in the same category (greek recipes for example).

We will comment the different scrapping and parsing functions as we go!

In [1]:
import requests, re
import pandas as pd
from bs4 import BeautifulSoup
import json, csv

In [2]:
#The header describes who is visiting the website

headers = requests.utils.default_headers()
#'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'

## Scrapping functions ##

url_gen: 

This function generates the correct url in function of the unit system, amount of servings, recipe reference and the type of data scrapping we may want to do. It is not the same url to get the nutrition facts, the recipe's popularity, or yet again the ingredients.

get_ingredients:

This function detects the ingredient list in the HTML (two column disposition) and appends them to a list for further treatment.

get_popularity:

This function detects the amounts of 'made it' and the rating of a recipe in the HTML page.

get_nutrition:

This function retrieves all the nutrition facts of the recipe, as well as it's official name.



In [3]:
def url_gen(ref,servings=1, metric = 'true', typeSearch = 'ingredients'):
    if typeSearch == 'none':
        tail = ''
    else :
        if typeSearch == 'nutrition':
            tail = 'fullrecipenutrition/'
        else:  
            tail = 'print/?recipeType=Recipe&servings={}&isMetric={}'.format(servings,metric)

    return ('https://www.allrecipes.com/recipe/' + ref+tail)

In [4]:
def get_ingredients(ref): 
    
    url = url_gen(ref)
    r=requests.get(url)
    
    ingredients = []
    soup = BeautifulSoup(r.text.encode('utf-8'),'lxml')    
    col1 = soup.find('h2').next_sibling.next_sibling
    col2 = col1.next_sibling.next_sibling

    for i in col1.findAll('li'):
        last=str(i.contents).rfind('r')
        first=str(i.contents).find('n')
        ingredients.append((str(i.contents)[(first+1):(last-1)].lstrip()))
    for i in col2.findAll('li'):
        last=str(i.contents).rfind('r')
        first=str(i.contents).find('n')
        ingredients.append((str(i.contents)[(first+1):(last-1)].lstrip()))
    return ingredients

In [5]:
def get_popularity(ref):    
    url = url_gen(ref,typeSearch = 'none')
    r=requests.get(url)
    soup = BeautifulSoup(r.text.encode('utf-8'),'lxml')    
    step1 =str(soup.findAll('div', class_="total-made-it"))
    step2 = str(soup.findAll('div', class_="rating-stars"))
    begin = '"made-it-count"></span><span>'
    end = '\xa0made it'
    step1 = step1[step1.find(begin)+len(begin):step1.find(end)]   
    begin = '<div class="rating-stars" data-ratingstars="'
    end= '" onclick'
    step2 = step2[step2.find(begin)+len(begin):step2.find(end)]  
    
    return step1,step2

In [6]:
def get_nutrition(ref):
#Returns a dictionnary with nutritional values

    url = url_gen(ref, typeSearch = 'nutrition')
    r=requests.get(url)
    soup = BeautifulSoup(r.text.encode('utf-8'),'lxml') 
    info = soup.findAll('div', class_='nutrition-row')
    general = {}
    general['Title']  = str(soup.find('h2').contents[0])
    general['Reference'] = ref
    nutrition = {}
    getCal = str(soup.find('div', class_="nutrition-top light-underline"))
    key = 'Calories:</span> '
    nutrition['Calories'] = (getCal[getCal.find(key) + len(key):getCal.rfind('<br/>')])
    
    for i in info:
        a=i.find(class_='nutrient-name')
        b = re.findall('>(.*?):', str(a))[0]
        c = re.findall('value">(.*?)<', str(a))[0]
       # print(str(b) + ' : ' + str(c))
        nutrition[b] = c
    return general, nutrition

## Forming a class ##

To help with program structure, a Recipe class is made. In its constructor, the previously explained methods are called, so a recipe can be fully retrieved solely by creating a new Recipe object. Aditional useful methods are added to better display the object, to export it as a dictionnary, to a dataframe...

In [7]:
class Recipe:
    def __init__(self, name):
        self.name = name
        self.ingredients = get_ingredients(name)
        self.general, self.nutrition = get_nutrition(name)
        s1,s2 = get_popularity(name)
        self.general['Popularity'] = s1
        self.general['Rating']=s2
    def display(self):
        print('\n ----------------- \n')
        print(self.general)
        print('\n')
        print(self.ingredients)
        print('\n')
        print(self.nutrition)
    def toDict(self):
        dictio = {}
        dictio['General'] = self.general
        dictio['Nutrition'] = self.nutrition
        dictio['Ingredients'] = self.ingredients
        return dictio

In [8]:
def toDf(a, Region = 'Unknown'):
    dic = a.general
    tmp = {}
    tmp['Region'] = Region
    dic.update(tmp)
    dic.update(a.nutrition)
    RecipeDf = pd.DataFrame.from_dict(dic,orient = 'index')
    RecipeDf.columns = [str(a.general['Title'])]
    return RecipeDf

## Time for an example ##

Instanciating an object for this particular tikka masala recipe, then exporting it as a dictionnary yields this result.The object's attributes are of course accessible individually.


In [457]:
#ref = '246179/black-chana-with-potato/'
ref = '45736/chicken-tikka-masala/'
Recipe(ref).toDict()

4.3879280090332


{'General': {'Title': 'Chicken Tikka Masala',
  'Reference': '45736/chicken-tikka-masala/',
  'Popularity': '3k',
  'Rating': '4.3879280090332'},
 'Nutrition': {'Calories': '404',
  'Total Fat': '28.9g',
  'Saturated Fat': '17.0g',
  'Cholesterol': '143mg',
  'Sodium': '1592mg',
  'Potassium': '660mg',
  'Total Carbohydrates': '13.3g',
  'Dietary Fiber': '2.5g',
  'Protein': '24.6g',
  'Sugars': '7g',
  'Vitamin A': '2432IU',
  'Vitamin C': '11mg',
  'Calcium': '206mg',
  'Iron': '3mg',
  'Thiamin': '0mg',
  'Niacin': '13mg',
  'Vitamin B6': '1mg',
  'Magnesium': '57mg',
  'Folate': '24mcg'},
 'Ingredients': ['60 ml yogurt',
  '4 ml lemon juice',
  '1 g fresh ground cumin',
  '0.6 g ground cinnamon',
  '0.9 g cayenne pepper',
  '1 g freshly ground black pepper',
  '1 g minced fresh ginger',
  '2 g salt, or to taste',
  '3/4 boneless skinless chicken breasts, cut into bite-size pieces',
  '1 long skewers',
  '4 g butter',
  '1/4 clove garlic, minced',
  '1/4 jalapeno pepper, finely chop

## Large Scale Generalization ## 

In data analysis, what we really need is data. The next few functions, help generalizing the data scrapping to a larger scale in order to later be processed.

get_recipes:

This function is given the URL of the theme's page (greek recipes for example). It then searches the webpage for recipe links that are of course linked to the theme. By default, this function returns the first 20 recipes the site shows. The shown recipes present the advantage of being diverse and pretty representative of the country's culinar culture. A retrieval of more recipes, to later only keep the 20 most popular was considered, but the recipe's popularity mainly depend on the american visitors of the website and their taste buds may be too biased for this technique to be representative.

getThemeRecipes:

The magic function! For each of the links generated by get_recipes(), this function generates a recipe object. The objects are also processed to potentially return three things. First, a list of all recipe objects. This is the least useful of the three. This function also returns a Pandas dataframe containing all of the information of all the recipe objects, except for the ingredients which have an unknown size and vary between recipes. Finally, it returns a dictionary containing all the information of all the recipe objects, hierarchally arranged so as to be able to export it easily as a JSON file.

getInfo:

This function calls getThemeRecipes, then exports the information. It exports and saves the dataframe as a CSV file, and the dictionnary as a JSON. This allows the scrapped data to be stored and accessible much more rapidly, without depending on website changes.

In [9]:
def get_recipes(themeUrl, number = 20):
    
    r=requests.get(themeUrl) #add page 1,2,etc
    
    baseUrl = 'https://www.allrecipes.com/recipe/'
    soup = BeautifulSoup(r.text.encode('utf-8'),'lxml') 
    results = soup.findAll('h3', class_='fixed-recipe-card__h3')
    
    theme = []
    iteration = 0
    for i in results :
        
        if iteration  == number:
            break
            
        iteration = iteration + 1
        
        link = str(i.find('a').get('href'))
        first=link.find(baseUrl)
        theme.append(link[(first+len(baseUrl)):])
        
    return theme

In [10]:
def getThemeRecipes(themeName, number = 20, dec =1):
    tail = ''
    for i in range(1,len(themeName)-1):
        tail = tail + themeName[i]+'/'
    themeUrl = 'https://www.allrecipes.com/recipes/'+themeName[0]+'/world-cuisine/'+tail
    themeLinks = get_recipes(themeUrl, number = number)
    themeRecipes = []
    themeDico = {}

    from tqdm import tqdm_notebook as tqdm

    for i in tqdm(range(len(themeLinks))):
        recipe = themeLinks[i]
        
        
    #for recipe in themeLinks:
        try:
            current = Recipe(recipe)
            if i == 0:
                joined = toDf(current, themeName[len(themeName)-dec] );
            else:
                joined = pd.merge(joined,toDf(current,themeName[len(themeName)-dec]), left_index=True, right_index=True, how = 'outer')
                    
            #themeRecipes.append(current)
            themeDico[current.general['Reference']] = current.toDict()
            
        except AttributeError:
            print('Error')
            continue
    return themeDico, joined

In [11]:
def getInfo(themeName, dec=1):
    themeDico, themeDf = getThemeRecipes(themeName,dec = dec)
    
    data_path = 'data/'
    themeDf.transpose().dropna().to_csv(themeName[ data_path + len(themeName)-dec]+'.csv')
   
    with open(themeName[data_path+len(themeName)-dec]+'.json', 'w') as f:
        json.dump(data_path+themeDico, f)
    

## Let's run everything ##

Each region is associated with a number, and multiple categories and sub-categories. Since the site has a timeout after a certain amount of 'suspicious' connections, it is more convenient to scrape each category one by one (takes about a minute, progress bar showed with tqdm). 



In [None]:
themeIndian = ['17136','asian','indian','main-dishes']
themeJapanese = ['17491','asian','japanese','main-dishes']
themeMexican = ['17504','latin-american','mexican','main-dishes']
themeItalian = ['16767','european','italian','main-dishes']
themeLebanese = ['1824','middle-eastern','lebanese']
themeFrench = ['721','european','french']
themeNorthAfrican = ['17582','african','north-african']

getInfo(themeNorthAfrican)
getInfo(themeItalian, dec = 2)
getInfo(themeIndian, dec = 2)
getInfo(themeJapanese, dec = 2)
getInfo(themeMexican, dec = 2)
getInfo(themeLebanese)
getInfo(themeFrench)

themeChinese = ['17135','asian','chinese','main-dishes']
themeThai = ['702','asian','thai']
themeSA = ['730','latin-american','south-american']
themeGreek = ['17152','european','greek','main-dishes']

getInfo(themeChinese, dec = 2)
getInfo(themeSA)
getInfo(themeThai)
getInfo(themeGreek, dec = 2)

themeScand = ['725','european','scandinavian']
themeUK = ['704','european','uk-and-ireland']
themeEastEurope = ['712','european','eastern-european']
getInfo(themeScand)
getInfo(themeUK)
getInfo(themeEastEurope)

themeGerman = ['722','european','german']
themeSpanish = ['726','european','spanish']
getInfo(themeGerman)
getInfo(themeSpanish)

themeEastAfrican =  ['17845','african','east-african']
getInfo(themeEastAfrican)

In [16]:
themeCanadian =  ['733','canadian']
getInfo(themeCanadian)
themeAustralian = ['228','australian-and-new-zealander']
getInfo(themeAustralian)

ConnectionError: ('Connection aborted.', OSError("(104, 'ECONNRESET')",))

In [59]:
Mexican = pd.read_csv('data/mexican.csv',index_col = 0).dropna()
Indian  = pd.read_csv('data/indian.csv',index_col = 0)
Italian = pd.read_csv('data/italian.csv',index_col = 0)
Lebanese = pd.read_csv('data/lebanese.csv',index_col = 0)
French  = pd.read_csv('data/french.csv',index_col = 0)
Japanese = pd.read_csv('data/japanese.csv',index_col = 0)

NorthAfrican = pd.read_csv('data/north-african.csv',index_col = 0)
Chinese = pd.read_csv('data/chinese.csv',index_col = 0)
SA = pd.read_csv('data/south-american.csv',index_col = 0)
Thai = pd.read_csv('data/thai.csv',index_col = 0)
Greek = pd.read_csv('data/greek.csv',index_col = 0)


Scand = pd.read_csv('data/scandinavian.csv',index_col = 0)
UK = pd.read_csv('data/uk-and-ireland.csv',index_col = 0)
EastEurope = pd.read_csv('data/eastern-european.csv',index_col = 0)
German = pd.read_csv('data/german.csv',index_col = 0)
Spanish = pd.read_csv('data/spanish.csv',index_col = 0)
EastAfrican = pd.read_csv('data/east-african.csv',index_col = 0)

Full = Mexican.append(Indian,sort=True).append(Italian,sort=True).append(Lebanese,sort=True).append(French,sort=True).append(Japanese,sort=True)
Full = Full.append(NorthAfrican,sort=True).append(Chinese,sort=True).append(SA,sort=True).append(Thai,sort=True).append(Greek,sort=True)
Full = Full.append(Scand,sort=True).append(UK,sort=True).append(EastEurope,sort=True).append(German,sort=True).append(Spanish,sort=True)
Full = Full.append(EastAfrican,sort=True) #Canadian and Australian

Full.dropna().to_csv('data/fullData.csv')

len(Full)

327

## Data Parsing ##

Now that we have data, let's clean it! We associate each nutrient with it's unit, no error in running it means there are no irregularities, and we get a decimal number with each quantity.

In [88]:
dicUnits = {}
dicUnits['Iron']='mg'
dicUnits['Niacin']='mg'
dicUnits['Magnesium']='mg'
dicUnits['Calcium']='mg'
dicUnits['Cholesterol']='mg'
dicUnits['Sodium']='mg'
dicUnits['Potassium']='mg'
dicUnits['Vitamin C']='mg'
dicUnits['Thiamin']='mg'
dicUnits['Vitamin B6']='mg'


dicUnits['Total Fat']='g'
dicUnits['Saturated Fat']='g'
dicUnits['Sugars']='g'

dicUnits['Folate']='mcg'

dicUnits['Popularity']='k'

def replaceUnits(x,elem):
    if elem == 'Popularity':
        return int(x.replace(dicUnits[str(elem)],'000'))
    else:
        a=0
        try:
            return float(x.replace(dicUnits[str(elem)],''))
        except:
             print('weird format'+x)
                                               
        else:
            return 1   #the 3 cases

                         
    

def treatUnits(Original):
    Treated = Original.copy()
    for elem in dicUnits:
        if elem == 'Popularity':
            Treated[str(elem)]=Original[str(elem)].apply(lambda x : float(str(x).replace(dicUnits[str(elem)],'000')))
        else:
            Treated[str(elem)]=Original[str(elem)].apply(lambda x : replaceUnits(x,elem))
    return Treated

In [92]:
treatUnits(Full).head()

weird format&lt; 1mg
weird format&lt; 1mg
weird format&lt; 1mg


Unnamed: 0,Calcium,Calories,Cholesterol,Dietary Fiber,Folate,Iron,Magnesium,Niacin,Popularity,Potassium,...,Saturated Fat,Sodium,Sugars,Thiamin,Title,Total Carbohydrates,Total Fat,Vitamin A,Vitamin B6,Vitamin C
Low-Carb Jicama Tostadas,18.0,57,0.0,7.4g,18.0,1.0,18.0,0.0,3.0,225.0,...,0.0,6.0,3.0,0.0,Low-Carb Jicama Tostadas,13.2g,0.1,32IU,0.0,30.0
Best Fish Tacos,80.0,322,35.0,3.8g,88.0,3.0,48.0,6.0,3.0,591.0,...,3.0,833.0,2.0,0.0,Best Fish Tacos,34.6g,9.0,1043IU,0.0,18.0
Crispy Pork Carnitas,35.0,317,89.0,0.4g,5.0,2.0,23.0,10.0,267.0,371.0,...,6.0,1036.0,0.0,1.0,Crispy Pork Carnitas,2.1g,22.6,17IU,0.0,2.0
Jeannie's Vegetarian Enchiladas,376.0,626,59.0,9.6g,193.0,6.0,77.0,8.0,1.0,818.0,...,15.0,1138.0,5.0,1.0,Jeannie's Vegetarian Enchiladas,67.2g,31.2,2569IU,0.0,49.0
Mexican Corn Bread Casserole,168.0,304,101.0,1.9g,58.0,2.0,28.0,6.0,21.0,372.0,...,8.0,708.0,4.0,0.0,Mexican Corn Bread Casserole,21.5g,16.3,840IU,0.0,12.0


In [93]:
#Example of what we can get with this dataset

treatUnits(French)['Saturated Fat'].describe()

count    19.000000
mean     11.684211
std      10.397931
min       0.000000
25%       4.000000
50%       9.000000
75%      18.500000
max      33.000000
Name: Saturated Fat, dtype: float64

## Working with ingredients ##

We want to be able to work and exploit the ingredient list. To that end, we have to be able to separate each ingredients from its quantity, and this is the goal of the function ingr2dic.For each item, it retrieves the name, unit and quantity. It also takes cares of several formatting styles such as '1/2 can (16 ounces)' which will be  converted to grams, the '1/8 lettuce' which will be converted to '0.125', 'lettuce', 'no unit'.

We are however interesting in getting the quantity in grams of our fruits and vegetables for our future nutriscore calculations. We thus scrapped two other webpages with a list of fruits and vegetables, converted the veggies in plural form to their singular counterparts, and exported the list as a csv.

With that list, we are able to run the list with the ingredients of the recipes (loaded from the json as a dictionary), and when it matches, retrieve the weight of the given fruit by looking in the Veg2Quant lookup table and multiplying by the quantity. The sum of vegetable-based ingredients is computed in the getIngGram() function.

In [94]:
def ingr2dic(ingredients):
    ingredientDic = {}
    itera = 0
    for i in ingredients:
        itera = itera+1
        parsed = re.match('(\d+([\.,\/]\d+)?)\s+((g|ml|pinch|clove)\s)?', i)   #needs improvement
        if (parsed) != None:
            dic = {}

            if i.find(',')>0:
                name = i[len(parsed[0]):i.find(',')]
            else:
                name = i[len(parsed[0]):]
            dic['name']=name
            dic['unit']     = parsed[4]
            
            #Converting the '1/8 lettuce' to decimal form '0.125 lettuce'
            if re.match('\d\/\d+',parsed[1]):
                p = re.match('(\d)\/(\d)',parsed[1])
                dic['quantity']     = float(p[1])/int(p[2])
            else:    
                dic['quantity'] = parsed[1]
                
            #conversion from ounces
            parsed = re.match('\((\d+(\.\d+?)?)\s+(ounce)\)', name)
            if parsed != None:
                name = name[len(parsed[0]):]
                dic['name']=name
                dic['quantity'] = int(28.3*float(parsed[1]))
                dic['unit']     = 'g'
            
            ingredientDic[str(itera)]=dic
    return ingredientDic

In [95]:
#Getting a list of vegetables and fruits to compare through web scraping

url = 'http://vegetablesfruitsgrains.com/list-of-vegetables/'
r=requests.get(url)
soup = BeautifulSoup(r.text.encode('utf-8'),'lxml')    
a = soup.findAll('li')
url = 'http://vegetablesfruitsgrains.com/list-of-fruits/'
r=requests.get(url)
soup = BeautifulSoup(r.text.encode('utf-8'),'lxml')    
b = soup.findAll('li')
vegies = []
for i in a:
    if '>' not in str(i.contents[0]):
        c=str(i.contents[0]).split('/')
        for j in c:
            d=j.split('-')[0]
            vegies.append(d)
for i in b:
    if '>' not in str(i.contents[0]):
        c=str(i.contents[0]).split('/')
        for j in c:
            d=j.split('-')[0]
            vegies.append(d)
            
with open('data/vegies.csv', 'w') as myfile:
    wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
    wr.writerow(vegies)


In [96]:
#reading vegies.csv
with open('data/vegies.csv', 'r') as myfile:
    r = csv.reader(myfile)
    veg = list(r)[0]

#Adding non plural forms
for i in veg:
    if i[-1]=='s':
        veg.append(i[:-1])

In [97]:
#Vegetable Grammage dictionnary: for milestone 2

#Would be better to have everything in the Veg list and have it as a JSON!

Veg2QuantDic={}
Veg2QuantDic['tomato']=100
Veg2QuantDic['lime']= 50
Veg2QuantDic['lemon']= 100
Veg2QuantDic['lettuce']= 700
Veg2QuantDic['orange']= 130
Veg2QuantDic['onion']= 100
Veg2QuantDic['pepper']= 160
Veg2QuantDic['grapefruit']= 400
Veg2QuantDic['avocado']= 220
Veg2QuantDic['banana']= 120
Veg2QuantDic['potato']= 140
Veg2QuantDic['peach']= 120
Veg2QuantDic['carot']= 130
Veg2QuantDic['peppers']= 160
Veg2QuantDic['bell pepper']= 160

In [98]:
def getVegGrams(a, ref):
    #Iterate through ingredient list, parse quantities, and detect veggetables
    #If a vegetable correspond to a list value, add it to the weight
    
    
    recipe = ingr2dic(a[ref]['Ingredients'])

    quantity = 0
    for ingredient in recipe:
        name = recipe[ingredient]['name'].lower()
       # print(name)
        for word in veg:
            if word.lower() in name:
        #if any(word.lower() in name for word in veg):
                tmpQuantity = 0
                if recipe[ingredient]['unit'] == 'g':
                    tmpQuantity = float(recipe[ingredient]['quantity'])
                else:
                    if recipe[ingredient]['unit'] == None:
                        if word.lower() in Veg2QuantDic.keys():
                            tmpQuantity = float(Veg2QuantDic[word.lower()])*recipe[ingredient]['quantity']
                        else:
                            print("Add to Dic:" + word.lower())
                            tmpQuantity = 60 #estimated guess
                quantity = tmpQuantity + quantity          
                print("Detected: \t\t" + str(word.lower()) + ' : ' + str(tmpQuantity) + ' g')
                break  #avoid multiple detection
    return int(quantity)

In [99]:
with open('data/mexican'+'.json', 'rb') as f:
        a=json.load(f)
ingr2dic(a['258250/jeannies-vegetarian-enchiladas/']['Ingredients'])
#dico compare

{'1': {'name': 'vegetable oil', 'unit': 'ml', 'quantity': '4'},
 '2': {'name': 'bunch green onions', 'unit': None, 'quantity': 0.25},
 '3': {'name': 'large red bell pepper', 'unit': None, 'quantity': 0.125},
 '4': {'name': 'garlic', 'unit': 'clove', 'quantity': 0.375},
 '5': {'name': 'large tomato', 'unit': None, 'quantity': 0.125},
 '6': {'name': 'sliced black olives', 'unit': 'g', 'quantity': '9'},
 '7': {'name': 'jalapeno peppers', 'unit': None, 'quantity': 0.25},
 '8': {'name': 'drained canned black beans', 'unit': 'g', 'quantity': '60'},
 '9': {'name': 'cooked white rice', 'unit': 'g', 'quantity': '30'},
 '10': {'name': ' package cream cheese', 'unit': 'g', 'quantity': 226},
 '11': {'name': 'shredded Cheddar cheese', 'unit': 'g', 'quantity': '20'},
 '12': {'name': 'chopped fresh cilantro', 'unit': 'g', 'quantity': '3'},
 '13': {'name': 'lime', 'unit': None, 'quantity': 0.125},
 '14': {'name': 'dash hot sauce (such as Tabasco(R))',
  'unit': None,
  'quantity': 0.125},
 '15': {'nam

In [100]:
with open('data/mexican'+'.json', 'rb') as f:
        a=json.load(f)
                    
ref = '258250/jeannies-vegetarian-enchiladas/'

print('\n Grams of vegetables detected: ' + str(getVegGrams(a,ref)) + ' g .')

Detected: 		onion : 25.0 g
Detected: 		bell pepper : 20.0 g
Detected: 		garlic : 0 g
Detected: 		tomato : 12.5 g
Detected: 		olive : 9.0 g
Detected: 		peppers : 40.0 g
Detected: 		beans : 60.0 g
Detected: 		lime : 6.25 g

 Grams of vegetables detected: 172 g .


In [102]:
Full[Full['Region']=='mexican'].head()

Unnamed: 0,Calcium,Calories,Cholesterol,Dietary Fiber,Folate,Iron,Magnesium,Niacin,Popularity,Potassium,...,Saturated Fat,Sodium,Sugars,Thiamin,Title,Total Carbohydrates,Total Fat,Vitamin A,Vitamin B6,Vitamin C
Low-Carb Jicama Tostadas,18mg,57,0mg,7.4g,18mcg,1mg,18mg,0mg,3,225mg,...,0.0g,6mg,3g,0mg,Low-Carb Jicama Tostadas,13.2g,0.1g,32IU,0mg,30mg
Best Fish Tacos,80mg,322,35mg,3.8g,88mcg,3mg,48mg,6mg,3,591mg,...,3.0g,833mg,2g,0mg,Best Fish Tacos,34.6g,9g,1043IU,0mg,18mg
Crispy Pork Carnitas,35mg,317,89mg,0.4g,5mcg,2mg,23mg,10mg,267,371mg,...,6.0g,1036mg,0g,1mg,Crispy Pork Carnitas,2.1g,22.6g,17IU,0mg,2mg
Jeannie's Vegetarian Enchiladas,376mg,626,59mg,9.6g,193mcg,6mg,77mg,8mg,1,818mg,...,15.0g,1138mg,5g,1mg,Jeannie's Vegetarian Enchiladas,67.2g,31.2g,2569IU,0mg,49mg
Mexican Corn Bread Casserole,168mg,304,101mg,1.9g,58mcg,2mg,28mg,6mg,21,372mg,...,8.0g,708mg,4g,0mg,Mexican Corn Bread Casserole,21.5g,16.3g,840IU,0mg,12mg


## Further work can begin ##