# Meet the meat

## Abstract

With increasingly dire climate change forecasts, concerned individuals are asking how they can minimize their carbon footprint. Recent research suggests that reducing one's consumption of meat, in particular beef, is one of the highest impact actions an individual can take. To examine this topic, we will explore the popularity and prevalence of meat in recipes. Specifically, we plan to extract the ingredients from a recipe database and calculate the carbon footprint of recipes

Finally, we hope to directly relate this data to the issue of climate change by estimating a rating reflecting the carbon footprint of meat in recipes and the environmental impact of consumers' diets.

### Imports and libraries

In [1]:
# Import libraries
import pandas as pd
import numpy as np

from bs4 import BeautifulSoup
import os, os.path as osp

In [14]:
DATA_FOLDER='data'
#SAMPLE_DATA_FOLDER = DATA_FOLDER + '/htmlSample/'
SAMPLE_DATA_FOLDER = DATA_FOLDER + '/sample_400/'

## Data extraction and cleaning

Our recipe dataset contains recipes from the [From Cookies to Cooks](http://infolab.stanford.edu/~west1/from-cookies-to-cooks/), combining recipes from 14 high-traffic websites. We start by extracting all the information we want from the HTML files, that is: title, ingredients and meat or animal protein ingredients, tags, ratings in order to explore the recipes in more detail.


#### Recipe webpage scraping
The websites' HTML sources are rich in information. However, the information we wantfrom these pages is rather limited. We extract the information we need from the websites, clean and pre-process the data and save it as a CSV file for easy retrieval in further processing.

In [17]:
def analyse_recipe(soup, page):
    """
    Input:
    soup
    page
    
    Output:
    ing_wrap =  wrapper for ingredients
    is_recipe = boolean (True if website is found in title)
    ings = list of ingredients
    ing_amnts = list of ingredient quantities if directly available through the soup (ingredients and quantities separated)
    tags = list of tags assigned to the recipe
    """
    ings = []
    ing_amnts = []
    tags = []
    
    if 'Allrecipes' in soup.title.string:
        print('This recipe is from Allrecipes')
        is_recipe = True
        
        # Extract tags
        tag_wrappers = soup.find_all(itemprop="recipeCategory")
        print('Total number of tags: {0}'.format(len(tag_wrappers)))
        for tag in tag_wrappers:
            tags.append(tag['content'])
        
        # Extract ingredients
        ing_wrap=soup.find_all(itemprop="recipeIngredient")
        if not ing_wrap:
            print('alternative format used')
            ing_wrap=soup.find_all('li', class_="plaincharacterwrap ingredient")
            for ing in ing_wrap:
                ings.append(ing.text)
            if not ing_wrap:
                print('alternative format 2 needed for Allrecipes')
            #for ing in ing_wrap:
            #    ing_amnts.append(ing.find(itemprop='amount').text)
            #    ings.append(ing.find(itemprop='name').text)
              
    elif 'Epicurious' in soup.title.string:
        print('This recipe is from Epicurious')
        is_recipe=True
        ing_wrap=soup.find('div', id="ingredients")
        for ing in ing_wrap:
            ings.append(ing.string)
    
    elif 'Food Network' in soup.title.string:
        print('This recipe is from FoodNetwork')
        is_recipe=True
        ing_wrap=soup.find_all('li',class_='ingredient')
        for ing in ing_wrap:
            ings.append(ing.text)
        
    elif 'Food.com' in soup.title.string:
        print('This recipe is from Food.com')
        is_recipe=True
        ing_wrap=soup.find_all('li', class_="ingredient")
        for ing in ing_wrap:
            ing_amnts.append((ing.find('span',class_='value').text+ ' '+ing.find('span',class_='type').text))
            ings.append(ing.find('span', class_='name').text)
    
    elif 'Betty Crocker' in soup.title.string:
        print('This recipe is from Betty Crocker')
        is_recipe = True
        ing_wrap=soup.find_all('dl', class_='ingredient')
        for ing in ing_wrap:
            ings.append(ing.getText())
            
    elif 'MyRecipes' in soup.title.string:
        print('This recipe is from MyRecipes')
        is_recipe=True
        ing_wrap=soup.find_all(itemprop="ingredient")
        for ing in ing_wrap:
            ings.append(ing.getText())

    else:
        is_recipe = False
        ing_wrap=None
        
    if not ing_wrap:  #return warning if website is recognized but format/data extraction is not successful
        print('*******')
        print('NEED NEW FORMAT')  
        print('*******')


    return ing_wrap, is_recipe, ings, ing_amnts, tags


#### Quantity extraction and conversion
The amounts of each ingredients are expressed in many different units (imperial or metric) depending on the websites, and even on the recipes. Once we have extracted the ingredients and amounts, we need to convert all different quantities to one single weight unit (fixed to kilograms) in order to process the carbon footprint of selected ingredients.

In [4]:
def check_quantity(quant_str):
    """
    Cleans input string and extracts numerical values
    Outputs cleaned string, array of numerical values and sum of numerical values
    """
    quant_str=quant_str.replace("½",".5")
    quant_str=quant_str.replace("1/2",".5")
    quant_str=quant_str.replace("1/3", '.33')
    quant_str=quant_str.replace('1/4','.25')
    quant_str=quant_str.replace('3/4','.75')
    quant_vals=re.findall(r"[-+]?\d*\.\d+|\d+", quant_str)
    total_quant=np.sum([float(i) for i in quant_vals])
    
    return quant_str, quant_vals, total_quant


def convert_to_kg(quant, unit):
    """
    Converts any input unit (kg, lb, grams, ounces) to kilograms
    """
    
    if (unit=='kilogram') or (unit=='kg'):
        amnt_kg=quant
        #print(quant,'kg')
    elif (unit=='pound') or (unit=='lb') or (unit=='lbs') or (unit=='pounds'):
        amnt_kg=quant/2.205
        #print(amnt_kg,'kg')
    elif(unit=='g') or (unit=='gram') or (unit =='grams'):
        amnt_kg=quant/1000
        #print(amnt_kg,'kg')     
    elif(unit=='oz') or (unit=='ounce'):
        amnt_kg=quant/35.274
        #print(amnt_kg, 'kg')
        
    return(amnt_kg)


def extract_amount(ings_in, ing_amnt_in,meat_products_in):
    """
    Inputs: 
    ings_in= list of ingredients (and quantities)
    ing_amnt_in =  list of quantities corresponding to ingredients (empty if quanitites are included in ings_in)
    meat_products_in = list of products that we are searching for
    
    Outputs:
    meat_ingredients_full = list of meat ingredients (full string)
    meat_ingredients_base = list of meat ingredients (from base string meat_products)
    ing_amnt_out = list of corresponding quanities of meat ingredients in kg (=0 if unit not recognized)
    contains_meat = boolean (True if 1+ ingredients are recognized from meat_products list)
    
    """

    meat_ingredients_full = []
    meat_ingredients_base = []
    ing_amnt_out=[]
    contains_meat=False
    
    #Find meat products present in the ingredients (ignoring capitals with casefold)
    for i in ings_in:
        for meat_product in meat_products_in:
            if i != None:
                if meat_product in i.casefold(): 
                    contains_meat=True
                    meat_ingredients_full.append(i.casefold()) 
                    meat_ingredients_base.append(meat_product)

    #extract amount from string if amount is not directly available in ing_amnt_in and convert to kg
    if not ing_amnt_in:
        for meat_i in meat_ingredients_full:
            meat_i_quant_kg=0
            meat_i, quantity_vals, total_quantity=check_quantity(meat_i) #pass string, return cleaned string and total quantity

            for u in units: 
                if u in meat_i:
                    meat_i_quant_kg = convert_to_kg(total_quantity,u)
            ing_amnt_out.append(meat_i_quant_kg)
            #if meat_i_quant_kg==0:
            #    print('Units not recognized for: '+meat_i)
                
    else:# if amount is directly available through ing_amnt, units converted to kg
        for meat_i in meat_ingredients_full:
            meat_i_quant_kg=0

            #get index of ingredient in meat_ingredients_full
            meat_amount=ing_amnt_in[ings_in.index(meat_i)]
            meat_amount, quantity_vals, total_quantity=check_quantity(meat_amount)
            
            for u in units: 
                if u in meat_amount.casefold():
                    meat_i_quant_kg = convert_to_kg(total_quantity,u)
            ing_amnt_out.append(meat_i_quant_kg)
            #if meat_i_quant_kg==0:
                #print('Units not recognized for: '+meat_amount+meat_i)
                
    return meat_ingredients_full, meat_ingredients_base, ing_amnt_out, contains_meat

#### Data cleaning functions

In [16]:
#cleaning function to remove unnecessary cells in ingredient list
def remove_spaces(l):
    """
    cleaning function to remove unnecessary cells in ingredient list
    """
    while '' in l:
                l.remove('')
    while ' ' in l:
                l.remove(' ')
    return l 

#idea: calculate carbon foodprint in this function by summing contributions in meat_ingredients
#return a list of all meat ingredients and their amount and a is_true=True if the recipe contains meat
def analyse_meat(ingredient_list, s):
    """
    takes as argument the ingredient_list and the spacer
    returns a list of all meat ingredients in the recipe, and a boolean contains_meat
    """
    meat_ingredients = []
    contains_meat=False
    for ingredient in ingredient_list:
        for meat_product in meat_products:
            if meat_product in ingredient.getText():
                contains_meat=True
                l=ingredient.getText().split(s)
                l=remove_spaces(l)
                l.append(meat_product)
                meat_ingredients.append(l)
    return meat_ingredients, contains_meat


def get_ingredients(ingredient_wrappers):
    """
    returns a list of all ingredients in the recipe
    """
    ingredients = []
    for ingredient in ingredient_wrappers:
        ingredients.append(ingredient.getText())    
    return ingredients
#not sure whether we can give soup and page as arguments..


#### Define carbon footprint of meat ingredients
Animal agriculture is one of the leading sources of the carbon-impact of a recipe. We start by assigning a carbon footprint to each meat ingredient and could later on extend it to other animal products. 
The functions below assign a carbon footprint to each meat ingredient of the recipes.

In [6]:
# calculate carbon footprint
#input ingredients
#identify ingredient ammount
#output carbon footprint
def carbon_fp (l):
    """
    takes a list of ingredients contributing to co2 and returns carbon footprint
    """
    c=len(l)
    return c

### Data extraction and cleaning loop
Below we extract the data from the recipes of our html dataset and save it in dataframes. Our goal here is to extract the ingredients and assign a carbon-impact rating to the highest impact ingredients (meat or animal protein) in the recipes.

To extract protein-rich ingredients from animal source in order to calculate the main carbon footprint of the recipe, we use an extra database listing the main protein sources and carbon impact. Source of data: [GreenEatz](https://www.greeneatz.com/foods-carbon-footprint.html)

In [15]:
#Nadine's loop for all recipes in folder
# data has following row structure
# RecipeName as Identifier - bool contains_meat - list of co2 ingredients - carbonFootprint - ingredients
data=[]
step=0
for filename in os.listdir(SAMPLE_DATA_FOLDER):
    with open(SAMPLE_DATA_FOLDER+filename) as f:
        isTrue=False
        count_exceptions=0
        print('-------------------------------------')
        
        # introduce try/catch such that it does no longer stop, when not recoginzeing letter
        try:
            page = f.read()
            soup = BeautifulSoup(page, 'html.parser')
            print('Recipe Analysed: '+soup.title.string)
            print('filename: ', filename)
            
            # If recongised as recipe page, returns the ingredients and amounts
            ingredient_wrappers, is_recipe, ingredients, ingredient_amounts, tags = analyse_recipe(soup,page)
            
            print('is recipe: ',is_recipe)
            
            #print(ingredients)
            #print(ingredient_amounts)
            
            #different tag for ingredient and different separators in ingredient list between the webpages
            if is_recipe:
                #Extract all ingredients (and ingredient amounts if available)
                #Extract meat ingredients and quantities in kg
                
                # extract meat ingredients, amounts, tags and store in dataframe
                #extract amounts only for meat ingredients
                
                #below: from Alex
                print('ingredient amount = ', ingredient_amounts)
                meat_ingredients_full, meat_ingredients_base, ingredient_quant,contains_meat=extract_amount(ingredients, ingredient_amounts, meat_products)
                print('does this recipe contain meat? ', contains_meat)
                print('ingredients = ',ingredients)
                print('meat ingredients=', meat_ingredients_base)
                print('ingredient_quantity (kg)= ',ingredient_quant)
               
                
                #data.append
                #add row to dataset only if recipe contains meat
                #data.append([soup.title.string, contains_meat, meatlist, carbon_fp(meatlist), ingredients])
            else:
                print('not a recipe')
        except:
            count_exceptions=count_exceptions+1
            #print('exception')
    step=step+1
    if step>=100:
        break
#print(data)
# save data as csv 

-------------------------------------
Recipe Analysed: Money-Saving Meals : Recipes and Cooking : Food Network
filename:  0a6650d159357825c5ae98eb323dc9f3.html
This recipe is from FoodNetwork
*******
NEED NEW FORMAT
*******
is recipe:  True
ingredient amount =  []
-------------------------------------
-------------------------------------
Recipe Analysed: 
	Cheese and Onion Pie Recipe - Allrecipes.com

filename:  00ee3c71a71548d65f4c5a1dd573fbc6.html
This recipe is from Allrecipes
Total number of tags: 0
alternative format used
is recipe:  True
ingredient amount =  []
-------------------------------------
Recipe Analysed: 
	Wonton Wrappers Recipe - Allrecipes.com

filename:  0a33c71ae50009983ff80e13f41f02db.html
This recipe is from Allrecipes
Total number of tags: 0
alternative format used
is recipe:  True
ingredient amount =  []
-------------------------------------
Recipe Analysed: Cinnamon Bun Scones Recipe #68967 from CDKitchen.com
filename:  0a7349f0a41597144eecac064221457d.html
*

Recipe Analysed: 
	Cranberry Sauce I Recipe - Allrecipes.com

filename:  00dab4137d18d79c978a9673661c6202.html
This recipe is from Allrecipes
Total number of tags: 0
alternative format used
is recipe:  True
ingredient amount =  []
-------------------------------------
Recipe Analysed: Calories in Simple Barley Pudding - Calorie, Fat, Carb, Fiber, and Protein Info
filename:  0a19be4788f6179bea224cd176224722.html
*******
NEED NEW FORMAT
*******
is recipe:  False
not a recipe
-------------------------------------
Recipe Analysed: Black Bean, Corn, And Salsa Dip-Weight Watchers Recipe - Food.com - 256832
filename:  0a2cb17f182c6be115054c8eb1846fc8.html
This recipe is from Food.com
is recipe:  True
ingredient amount =  ['1 (15 1/2 ounce)', '1  cup', '1 (14 1/2 ounce)', '1/2 cup']
-------------------------------------
Recipe Analysed: Can people with wheat allergies eat quinoa - CookEatShare
filename:  0a56ccd728ec1cd8839ed87eec5726f9.html
*******
NEED NEW FORMAT
*******
is recipe:  False
no

-------------------------------------
Recipe Analysed: 
	Nancy's Chicken in Puff Pastry Recipe - Allrecipes.com

filename:  0a6f0810ee4f1b190063b0f43efdb869.html
This recipe is from Allrecipes
Total number of tags: 0
alternative format used
is recipe:  True
ingredient amount =  []
-------------------------------------
Recipe Analysed: Pork Cacciatore Recipe - Food.com - 14013
filename:  0a6b0e6270730a09c69b802c72af2e9f.html
This recipe is from Food.com
-------------------------------------
-------------------------------------
Recipe Analysed: 
	Fruit Dip Recipes - Allrecipes.com

filename:  00a7450a98369de9097e042172e59fa4.html
This recipe is from Allrecipes
Total number of tags: 0
alternative format used
alternative format 2 needed for Allrecipes
*******
NEED NEW FORMAT
*******
is recipe:  True
ingredient amount =  []
-------------------------------------
Recipe Analysed: 









Fast White-Bean Stew Recipe
 at Epicurious.com
filename:  0aae245a26439c51260bfaf6ab01dcb5.html
This re

In [None]:
   
                #below: from Nadine
                meatlist, contains_meat = analyse_meat(ingredient_wrappers,s)
                ingredients = get_ingredients(ingredient_wrappers)
                print('Ingredient list: ')
                print(ingredients)
                if meatlist:
                    print('Has meat:')
                    print(meatlist)
                else:
                    print('No meat detected')