# Meet the meat

## Abstract

With increasingly dire climate change forecasts, concerned individuals are asking how they can minimize their carbon footprint. Recent research suggests that reducing one's consumption of meat, in particular beef, is one of the highest impact actions an individual can take. To examine this topic, we will explore the popularity and prevalence of meat in recipes. Specifically, we plan to extract the ingredients from a recipe database and calculate the carbon footprint of recipes

Finally, we hope to directly relate this data to the issue of climate change by estimating a rating reflecting the carbon footprint of meat in recipes and the environmental impact of consumers' diets.

### Imports and libraries

In [228]:
# Import libraries
import re
import pandas as pd
import numpy as np

from bs4 import BeautifulSoup
import os, os.path as osp

In [156]:
DATA_FOLDER='data'
SAMPLE_DATA_FOLDER = DATA_FOLDER + '/sample_400/'

## Data extraction and cleaning

Our recipe dataset contains recipes from the [From Cookies to Cooks](http://infolab.stanford.edu/~west1/from-cookies-to-cooks/), combining recipes from 14 high-traffic websites. We start by extracting all the information we want from the HTML files, that is: title, ingredients and meat or animal protein ingredients, tags, ratings in order to explore the recipes in more detail.


#### Recipe webpage scraping
The websites' HTML sources are rich in information. However, the information we wantfrom these pages is rather limited. We extract the information we need from the websites, clean and pre-process the data and save it as a CSV file for easy retrieval in further processing.

In [221]:
def analyse_page(soup, page):
    """
    Input: 
        soup
        page: 'allrecipes', 'epicurious', 'food_network', 'food_com', 'betty_crocker', 'my_recipes' , others not implemented yet
    
    Output:
        tags = list of tags assigned to the recipe
        ings = list of ingredients
        ing_amounts = amounts of ingredients
    """
    ings = []
    ing_amnts = []
    tags = []
        
    if page == 'allrecipes':
        # Extract tags
        tag_wrappers = soup.find_all(itemprop="recipeCategory")
        for tag in tag_wrappers:
            tags.append(tag['content'])           
        # Extract ingredients
        ing_wrap=soup.find_all('li', class_="plaincharacterwrap ingredient")
        if ing_wrap:
            for ing in ing_wrap:
                ings.append(ing.getText())
        else:
            ing_wrap=soup.find_all(itemprop="recipeIngredient")
            for ing in ing_wrap:
                ings.append(ing.getText())
            if not ing_wrap:
                print('alternative format needed for Allrecipes')
            #for ing in ing_wrap:
            #    ing_amnts.append(ing.find(itemprop='amount').text)
            #    ings.append(ing.find(itemprop='name').text)
        
        
    elif page == 'epicurious':       
        # Extract tags
        tag_wrappers = soup.find_all(itemprop="recipeCuisine")
        for tag in tag_wrappers:
            tags.append(tag.getText())    
        tag_wrappers = soup.find_all(itemprop="recipeCategory")
        for tag in tag_wrappers:
            tags.append(tag.getText())        
        # Extract ingredients
        ing_wrap=soup.find('div', id="ingredients")
        for ing in ing_wrap:
            ings.append(ing.string)
      
    
    elif page == 'food_network':  
        # Extract tags
        tag_wrappers = soup.find_all(class_="btn grey-tags")        
        for tag in tag_wrappers:
            tags.append(tag.getText())      
        # Extract ingredients
        ing_wrap=soup.find_all('li',class_='ingredient')
        for ing in ing_wrap:
            ings.append(ing.text)

    elif page == 'food_com':      
        # Extract tags
            #not found          
        # Extract ingredients
        ing_wrap=soup.find_all('li', class_="ingredient")
        if ing_wrap:
            for ing in ing_wrap:
                ing_amnts.append((ing.find('span',class_='value').text+ ' '+ing.find('span',class_='type').text))
                #ings.append(ing.find('span', class_='name').text)
                #ings.append((ing.find('span',class_='value').text+ ' '+ing.find('span',class_='type').text + ' ' + ing.find('span', class_='name').text)
        else:
            ing_wrap=soup.find_all(class_="name")
            for ing in ing_wrap:
                ings.append(ing.getText())
    
    elif page == 'betty_crocker':   
        # Extract tags
            #not found    
        # Extract ingredients
        ing_wrap=soup.find_all('dl', class_='ingredient')
        for ing in ing_wrap:
            ings.append(ing.getText())
    
    
    elif page == 'my_recipes':
        # Extract tags
        tag_wrappers = soup.find_all(itemprop="recipeType")
        for tag in tag_wrappers:
            tags.append(tag.getText())  
        # Extract ingredients
        ing_wrap=soup.find_all(itemprop="ingredient")
        for ing in ing_wrap:
            ings.append(ing.text)
        
    #other websites    
        # Extract tags   
        # Extract ingredients 
        
    if not ing_wrap:  #return warning if website is recognized but format/data extraction is not successful
        print('*******')
        print('NEED NEW FORMAT')  
        print('*******')
    
    #if not tags:
        #print('no tags found :( ')
        
    return tags, ings, ing_amnts

    

In [89]:
def find_website(soup):
    """
    Finds if the page is a recipe and which website it comes from
    """
    is_recipe = True
    
    if 'Allrecipes' in soup.title.string:
        website = 'allrecipes'               
              
    elif 'Epicurious' in soup.title.string:
        website = 'epicurious'
    
    elif 'Food Network' in soup.title.string:
        website = 'food_network'
        
    elif 'Food.com' in soup.title.string:
        website == 'food_com'
    
    elif 'Betty Crocker' in soup.title.string:
        website = 'betty_crocker'
               
    elif 'MyRecipes' in soup.title.string:
        website = 'my_recipes'

    else:
        website = 'not found'
        is_recipe = False
        
    return is_recipe, website

#### Quantity extraction and conversion
The amounts of each ingredients are expressed in many different units (imperial or metric) depending on the websites, and even on the recipes. Once we have extracted the ingredients and amounts, we need to convert all different quantities to one single weight unit (fixed to kilograms) in order to process the carbon footprint of selected ingredients.

In [204]:
def check_quantity(quant_str):
    """
    Cleans input string and extracts numerical values
    Outputs cleaned string, array of numerical values and sum of numerical values
    """
    quant_str=quant_str.replace("½",".5")
    quant_str=quant_str.replace("1/2",".5")
    quant_str=quant_str.replace("1/3", '.33')
    quant_str=quant_str.replace('1/4','.25')
    quant_str=quant_str.replace('3/4','.75')
    quant_vals=re.findall(r"[-+]?\d*\.\d+|\d+", quant_str)
    total_quant=np.sum([float(i) for i in quant_vals])
    
    return quant_str, quant_vals, total_quant


def convert_to_kg(quant, unit):
    """
    Converts any input unit (kg, lb, grams, ounces) to kilograms
    """
    
    if (unit=='kilogram') or (unit=='kg'):
        amnt_kg=quant
        #print(quant,'kg')
    elif (unit=='pound') or (unit=='lb') or (unit=='lbs') or (unit=='pounds'):
        amnt_kg=quant/2.205
        #print(amnt_kg,'kg')
    elif(unit=='g') or (unit=='gram') or (unit =='grams'):
        amnt_kg=quant/1000
        #print(amnt_kg,'kg')     
    elif(unit=='oz') or (unit=='ounce'):
        amnt_kg=quant/35.274
        #print(amnt_kg, 'kg')
        
    return(amnt_kg)

def contains_meat_ingredients(ings_in, meat_products_in):
    contains_meat=False
    meat_ingredients=[]
    #meat_ingredients=[False]*len(meat_products_in)
    j=0
    for i in ings_in:
        for meat_product in meat_products_in:
            if i != None:
                if meat_product in i.casefold(): 
                    contains_meat=True
                    meat_ingredients.append(meat_product)        
                    #meat_ingredients[j]=True
        j = j+1
                    
    return contains_meat, meat_ingredients

def extract_meat(ings_in, meat_products_in):
    """
    Inputs: 
    ings_in= list of ingredients (and quantities)
    meat_products_in = list of products that we are searching for
    
    Outputs:
    meat_ingredients_full = list of meat ingredients (full string)
    meat_ingredients_base = list of meat ingredients (from base string meat_products)
    ing_amnt_out = list of corresponding quanities of meat ingredients in kg (=0 if unit not recognized)
    contains_meat = boolean (True if 1+ ingredients are recognized from meat_products list)
    
    """

    meat_ingredients_full = []
    meat_ingredients_base = []
    ing_amnt_out=[]
    contains_meat=False
    
    #Find meat products present in the ingredients (ignoring capitals with casefold)
    for i in ings_in:
        for meat_product in meat_products_in:
            if i != None:
                if meat_product in i.casefold(): 
                    contains_meat=True
                    meat_ingredients_full.append(i.casefold()) 
                    meat_ingredients_base.append(meat_product)

    #extract amount from string and convert to kg
    for meat_i in meat_ingredients_full:
        meat_i_quant_kg=0
        meat_i, quantity_vals, total_quantity=check_quantity(meat_i) #pass string, return cleaned string and total quantity

        for u in units: 
            if u in meat_i:
                meat_i_quant_kg = convert_to_kg(total_quantity,u)
        ing_amnt_out.append(meat_i_quant_kg)
        #if meat_i_quant_kg==0:
        #    print('Units not recognized for: '+meat_i)
                    
    
    return meat_ingredients_full, meat_ingredients_base, ing_amnt_out, contains_meat

In [229]:
def extract_amount(ings_in, ing_amnt_in,meat_products_in):
    """
    Inputs: 
    ings_in= list of ingredients (and quantities)
    ing_amnt_in =  list of quantities corresponding to ingredients (empty if quanitites are included in ings_in)
    meat_products_in = list of products that we are searching for
    
    Outputs:
    meat_ingredients_full = list of meat ingredients (full string)
    meat_ingredients_base = list of meat ingredients (from base string meat_products)
    ing_amnt_out = list of corresponding quanities of meat ingredients in kg (=0 if unit not recognized)
    contains_meat = boolean (True if 1+ ingredients are recognized from meat_products list)
    
    """

    meat_ingredients_full = []
    meat_ingredients_base = []
    ing_amnt_out=[]
    contains_meat=False
    
    #Find meat ingredients
    for i in ings_in:
        for meat_product in meat_products_in:
            if i != None:
                #are any of the meat products present in the ingredients? (ignoring capitals with casefold)
                if meat_product in i.casefold(): 
                    contains_meat=True
                    meat_ingredients_full.append(i.casefold()) 
                    meat_ingredients_base.append(meat_product)

    #extract amount from string if amount is not directly available in ing_amnt_in
    if not ing_amnt_in: #if amount is empty 
        for meat_i in meat_ingredients_full:
            meat_i_quant_kg=0
            
            #print('------------')
            #print('Ingredient: ', meat_i) 
            meat_i, quantity_vals, total_quantity=check_quantity(meat_i) #pass string, return cleaned string and total quantity

            
            #find units and convert to kg
            for u in units: 
                if u in meat_i:
            #        print('Quantity in '+ u + ' converted to ')
                    meat_i_quant_kg = convert_to_kg(total_quantity,u)
            ing_amnt_out.append(meat_i_quant_kg)
            #if meat_i_quant_kg==0:
            #    print('Units not recognized for: '+meat_i)
                
                
    else:# if amount is directly available through ing_amnt
        #print('Amount available')
        for meat_i in meat_ingredients_full:
            meat_i_quant_kg=0
            #print('------------')
            #print('Ingredient: ', meat_i)
            
            #get index of ingredient in meat_ingredients_full
            meat_amount=ing_amnt_in[ings_in.index(meat_i)]
            meat_amount, quantity_vals, total_quantity=check_quantity(meat_amount)
            
            #find units and convert to kg
            for u in units: 
                if u in meat_amount.casefold():
                    meat_i_quant_kg = convert_to_kg(total_quantity,u)
                    #print('Quantity in '+ u + ' converted to ', meat_i_quant_kg, 'kg')
            ing_amnt_out.append(meat_i_quant_kg)
            #if meat_i_quant_kg==0:
                #print('Units not recognized for: '+meat_amount+meat_i)
    return meat_ingredients_full, meat_ingredients_base, ing_amnt_out, contains_meat


    
extract_amount(ingredients,ingredient_amounts, meat_products)


UnboundLocalError: local variable 'amnt_kg' referenced before assignment

#### Define carbon footprint of meat ingredients
Animal agriculture is one of the leading sources of the carbon-impact of a recipe. We start by assigning a carbon footprint to each meat ingredient and could later on extend it to other animal products. 
The functions below assign a carbon footprint to each meat ingredient of the recipes.

Source of data: [GreenEatz](https://www.greeneatz.com/foods-carbon-footprint.html)

In [172]:
#Load data from xls file
carbon_footprint = pd.read_excel('data/carbon_footprint_protein.xls', sheet_name='meat_dairy_eggs', index_col=0)
carbon_footprint 

Unnamed: 0_level_0,Food,CO2 Kilos Equivalent,Car Miles Equivalent
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Lamb,39.2,91
2,Beef,27.0,63
3,Cheese,13.5,31
4,Pork,12.1,28
5,Turkey,10.9,25
6,Chicken,6.9,16
7,Tuna,6.1,14
8,Eggs,4.8,11


In [223]:
#List of meat ingredients
meat_products = carbon_footprint['Food'].tolist()
#same list copied without caps
meat_products = ['lamb', 'beef', 'cheese', 'pork', 'turkey', 'chicken', 'tuna', 'egg']

In [224]:
units = ['pounds','grams','oz','ounces','kg','kilograms','lbs' ]

In [23]:
# calculate carbon footprint
#input ingredients and amounts
#output carbon footprint
def carbon_fp (l):
    """
    takes a list of ingredients contributing to co2 and returns carbon footprint
    """
    c=len(l)
    return c

### Data extraction and cleaning loop
Below we extract the data from the recipes of our html dataset and save it in dataframes. Our goal here is to extract the ingredients and assign a carbon-impact rating to the highest impact ingredients (meat or animal protein) in the recipes.

To extract protein-rich ingredients from animal source in order to calculate the main carbon footprint of the recipe, we use an extra database listing the main protein sources and carbon impact. Source of data: [GreenEatz](https://www.greeneatz.com/foods-carbon-footprint.html)

In [226]:
#Loop for all recipes in folder
# data has following row structure
# RecipeName as Identifier - bool contains_meat - list of co2 ingredients - carbonFootprint - ingredients
data=[]
step=0

verbose = 1 #verbose outputs

for filename in os.listdir(SAMPLE_DATA_FOLDER):
    with open(SAMPLE_DATA_FOLDER+filename) as f:
        isTrue=False
        count_exceptions=0
        
        try:
            page = f.read()
            soup = BeautifulSoup(page, 'html.parser')
            
            #check webpage and extract ingredients if recognised as recipe
            is_recipe, website = find_website(soup)
            print('This recipe is from: '+website)            
            
            
            if is_recipe:
                
                #tags, ingredients = analyse_page(soup, website)
                tags, ingredients, ingredient_amounts = analyse_page(soup, website)

                if ingredients:
                    
                    
                    has_meat, meat_ingredients = contains_meat_ingredients(ingredients, meat_products)
                    
                    if has_meat:

                        #Extract all ingredients (and ingredient amounts if available)
                        #Extract meat ingredients and quantities in kg
                        print('ingredient ammount = ', ingredient_amounts)
                        meat_ingredients_full, meat_ingredients_base, ingredient_quant,contains_meat=\
                                                    extract_amount(ingredients, ingredient_amounts, meat_products)
                        print('does this recipe contain meat? ', contains_meat)
                        print('ingredients = ',ingredients)
                        print('meat ingredients=', meat_ingredients_base)
                        print('ingredient_quantity (kg)= ',ingredient_quant)
                        

                    if verbose: 
                        print('Recipe Analysed: '+soup.title.string)
                        
                        print('contains meat:'+str(has_meat))
                        print(meat_ingredients)
                        
                        print('{0} Ingredients: '.format(len(ingredients)))
                        print(ingredients)

                        print('{0} tags:'.format(len(tags)))
                        print(tags)
                            
                data.append([soup.title.string, has_meat, meat_ingredients, tags])

            #else:
                #print('not a recipe')
        except:
            count_exceptions=count_exceptions+1
            #print('Exception')
    step=step+1
    
    if verbose: 
        print('-------------------------------------')
        
    if step>=100:
        break

column_labels=['Recipe Title', 'Has meat', 'Meat types', 'Tags']#missing: 'Carbon footprint', 'Rating', 'Tags'
recipes_df = pd.DataFrame(data, columns = column_labels)

#save the data as csv for in depth analysis
#recipes_df.to_csv(DATA_FOLDER+'/recipes_data')

recipes_df

This recipe is from: food_network
*******
NEED NEW FORMAT
*******
-------------------------------------
-------------------------------------
This recipe is from: allrecipes
-------------------------------------
This recipe is from: allrecipes
-------------------------------------
This recipe is from: not found
-------------------------------------
This recipe is from: allrecipes
-------------------------------------
-------------------------------------
This recipe is from: epicurious
-------------------------------------
This recipe is from: allrecipes
-------------------------------------
-------------------------------------
This recipe is from: allrecipes
-------------------------------------
This recipe is from: epicurious
Recipe Analysed: 









Harissa-Crusted Tri-Tip Roast Recipe
 at Epicurious.com
contains meat:False
[]
15 Ingredients: 
['\n', None, '\n', None, '\n', None, '\n', None, '\n', 'print a shopping list for this recipe', '\n', None, '\n', None, '\n']
0 tags:
[]
-

-------------------------------------
This recipe is from: allrecipes
-------------------------------------
-------------------------------------
This recipe is from: not found
-------------------------------------
This recipe is from: betty_crocker
-------------------------------------
This recipe is from: not found
-------------------------------------
This recipe is from: not found
-------------------------------------
This recipe is from: not found
-------------------------------------
-------------------------------------
This recipe is from: allrecipes
-------------------------------------
-------------------------------------
-------------------------------------
This recipe is from: allrecipes
alternative format needed for Allrecipes
*******
NEED NEW FORMAT
*******
-------------------------------------
This recipe is from: epicurious
Recipe Analysed: 









Fast White-Bean Stew Recipe
 at Epicurious.com
contains meat:False
[]
11 Ingredients: 
['\n', None, '\n', None, '\n', '

Unnamed: 0,Recipe Title,Has meat,Meat types,Tags
0,Money-Saving Meals : Recipes and Cooking : Foo...,False,[],[]
1,Harissa-Crusted Tri-Tip Roast Recipe...,False,[],[]
2,Savory Grilled Chicken Sauce Recipe : Paula De...,False,[],[]
3,Tisane Recipe | MyRecipes.com,False,[],"[Beverages, Beverages, Nonalcoholic, Quick/Eas..."
4,Baked Asparagus with Balsamic Butter Sauce R...,False,[],[]
5,Oyster Stew Recipe - Allrecipes.com,False,[],[]
6,Inside Out Stuffed Peppers Recipe - Allrecip...,False,[],[]
7,Progresso® Bread Crumb Recipes - Betty Crocker,False,[],[]
8,Simple Collard Greens Recipe | MyRecipes.com,False,[],"[Side Dishes/Vegetables, Entertaining, Make-Ah..."
9,Cranberry Sauce I Recipe - Allrecipes.com,False,[],[]


#### Data cleaning functions and other unused lines of code 

In [None]:
#cleaning function to remove unnecessary cells in ingredient list
def remove_spaces(l):
    """
    cleaning function to remove unnecessary cells in ingredient list
    """
    while '' in l:
                l.remove('')
    while ' ' in l:
                l.remove(' ')
    return l 

#idea: calculate carbon foodprint in this function by summing contributions in meat_ingredients
#return a list of all meat ingredients and their amount and a is_true=True if the recipe contains meat
def analyse_meat(ingredient_list, s):
    """
    takes as argument the ingredient_list and the spacer
    returns a list of all meat ingredients in the recipe, and a boolean contains_meat
    """
    meat_ingredients = []
    contains_meat=False
    for ingredient in ingredient_list:
        for meat_product in meat_products:
            if meat_product in ingredient.getText():
                contains_meat=True
                l=ingredient.getText().split(s)
                l=remove_spaces(l)
                l.append(meat_product)
                meat_ingredients.append(l)
    return meat_ingredients, contains_meat


def get_ingredients(ingredient_wrappers):
    """
    returns a list of all ingredients in the recipe
    """
    ingredients = []
    for ingredient in ingredient_wrappers:
        ingredients.append(ingredient.getText())    
    return ingredients
#not sure whether we can give soup and page as arguments..
