# Meet the meat

## Abstract

With increasingly dire climate change forecasts, concerned individuals are asking how they can minimize their carbon footprint. Recent research suggests that reducing one's consumption of meat, in particular beef, is one of the highest impact actions an individual can take. To examine this topic, we will explore the popularity and prevalence of meat in recipes. Specifically, we plan to extract the ingredients from a recipe database and calculate the carbon footprint of recipes

Finally, we hope to directly relate this data to the issue of climate change by estimating a rating reflecting the carbon footprint of meat in recipes and the environmental impact of consumers' diets.

### Imports and libraries

In [1]:
# Import libraries
import pandas as pd
import numpy as np

from bs4 import BeautifulSoup
import os, os.path as osp

In [20]:
DATA_FOLDER='data'
#SAMPLE_DATA_FOLDER = DATA_FOLDER + '/htmlSample/'
SAMPLE_DATA_FOLDER = DATA_FOLDER + '/sample_400/'

## Data cleaning

Our recipe dataset contains recipes from the [From Cookies to Cooks](http://infolab.stanford.edu/~west1/from-cookies-to-cooks/), combining recipes from 14 high-trafficwebsites. We start by extracting all the information we want from the HTML files, that is: title, ingredients and meat or animal protein ingredients, tags, ratings in order to explore the recipes in more detail.


#### Data cleaning functions

In [446]:
def extract_amount(ings, ing_amnt):
    """
    takes as argument the ingredients and ingredient amounts (if available)
    returns a list of all meat ingredients in the recipe, the amount of the ingredient in kg and a boolean contains_meat
    """

    meat_ingredients_full = []
    meat_ingredients_base = []
    ing_quant=[]
    
    contains_meat=False
    #Find meat ingredients
    for i in ings:
        for meat_product in meat_products:
            if i != None:
                if meat_product in i:
                    contains_meat=True
                    meat_ingredients_full.append(i)
                    meat_ingredients_base.append(meat_product)

    #print(meat_ingredients_base)
    #print(meat_ingredients_full)
    #extract amount if amount is not directly available
    if not ing_amnt: #if amount is empty
        for meat_i in meat_ingredients_full:
            meat_i_quant_kg=0
            #print('------------')
            #print('Ingredient: ', meat_i)
            
            #print('Extracting amount:')
            #amount=re.match(r'\d', meat_i).group()
            amount=re.search(r'\d', meat_i).group()

            #print('quantity=',amount)
            
            #print('Extracting units:')
            for u in units: 
                if u in meat_i:
            #        print('Quantity in '+ u + ' converted to ')
                    meat_i_quant_kg = convert_to_kg(float(amount),u)
            ing_quant.append(meat_i_quant_kg)
            #if meat_i_quant_kg==0:
            #    print('Units not recognized for: '+meat_i)
                
                
    else:# if amount is directly available through soup
        #print('Amount available')
        for meat_i in meat_ingredients_full:
            meat_i_quant_kg=0
            #print('------------')
            #print('Ingredient: ', meat_i)
            #get index of ingredient
            amount=ing_amnt[ings.index(meat_i)]
            #print('Extracting amount:')
            #amount=re.match(r'\d', meat_i).group()
            #print(amount)
            amount_val=re.search(r'\d', amount).group()
            #print('quantity=',amount_val)
            #print('Extracting units:')
            for u in units: 
                if u in amount:
             #       print('Quantity in '+ u + ' converted to ')
                    meat_i_quant_kg = convert_to_kg(float(amount_val),u)
            ing_quant.append(meat_i_quant_kg)
            #if meat_i_quant_kg==0:
            #    print('Units not recognized for: '+amount+meat_i)
    return meat_ingredients_full, meat_ingredients_base, ing_quant,contains_meat


    
extract_amount(ingredients,ingredient_amounts)


([], [], [], False)

In [447]:
measurements = ['pound','pounds','grams','gram','oz','kg','kilogram']

def convert_to_kg(quant, unit):
    if (unit=='kilogram') or (unit=='kg'):
        amnt_kg=quant
        #print(quant,'kg')
    elif (unit=='pound') or (unit=='lb') or (unit=='lbs') or (unit=='pounds'):
        amnt_kg=quant/2.205
        #print(amnt_kg,'kg')
    elif(unit=='g') or (unit=='gram') or (unit =='grams'):
        amnt_kg=quant/1000
        #print(amnt_kg,'kg')     
    elif(unit=='oz') or (unit=='ounce'):
        amnt_kg=quant/35.274
        #print(amnt_kg, 'kg')
    return(amnt_kg)
    
convert_to_kg(1,'oz')

0.02834949254408346

In [1]:
def analyse_recipe(soup, page):
    """
    takes soup, page as argument
    returns ingredientwrapper, s=spacer
    """
    ings = []
    ing_amnts = []
    
    if 'Allrecipes' in soup.title.string:
        print('This recipe is from Allrecipes')
        is_recipe = True
        ing_wrap=soup.find_all(itemprop="recipeIngredient")
        if not ing_wrap:
            print('alternative format used')
            ing_wrap=soup.find_all('li', class_="plaincharacterwrap ingredient")
            for ing in ing_wrap:
                ings.append(ing.text)

        else:
            print('alternative format 2 needed')
            #for ing in ing_wrap:
            #    ing_amnts.append(ing.find(itemprop='amount').text)
            #    ings.append(ing.find(itemprop='name').text)
              
    elif 'Epicurious' in soup.title.string:
        print('This recipe is from Epicurious')
        ing_wrap=soup.find('div', id="ingredients")
        for ing in ing_wrap:
            ings.append(ing.string)     
            
    elif 'Food Network' in soup.title.string:
        print('This recipe is from FoodNetwork')
        ing_wrap=soup.find_all('li',class_='ingredient')
        for ing in ing_wrap:
            ings.append(ing.text)      
        
    elif 'Food.com' in soup.title.string:
        ing_wrap=soup.find_all('li', class_="ingredient")
        is_recipe = True
        for ing in ing_wrap:
            ing_amnts.append((ing.find('span',class_='value').text+ ' '+ing.find('span',class_='type').text))
            ings.append(ing.find('span', class_='name').text)
    
    elif 'Betty Crocker' in soup.title.string:
        is_recipe = True
        ing_wrap=soup.find_all('dl', class_='ingredient')
        for ing in ing_wrap:
            ings.append(ing.getText())
            
    elif 'MyRecipes' in soup.title.string:
        print('This recipe is from MyRecipes')
        ing_wrap=soup.find_all(itemprop="ingredient")
        is_recipe = True
        for ing in ing_wrap:
            ings.append(ing.getText())
    # this really has to be analyzed more closely!!
    else:
        is_recipe = False
        ing_wrap=None

    return ing_wrap, is_recipe, ings, ing_amnts


In [54]:
#cleaning function to remove unnecessary cells in ingredient list
def remove_spaces(l):
    """
    cleaning function to remove unnecessary cells in ingredient list
    """
    while '' in l:
                l.remove('')
    while ' ' in l:
                l.remove(' ')
    return l 

#idea: calculate carbon foodprint in this function by summing contributions in meat_ingredients
#return a list of all meat ingredients and their amount and a is_true=True if the recipe contains meat
def analyse_meat(ingredient_list, s):
    """
    takes as argument the ingredient_list and the spacer
    returns a list of all meat ingredients in the recipe, and a boolean contains_meat
    """
    meat_ingredients = []
    contains_meat=False
    for ingredient in ingredient_list:
        for meat_product in meat_products:
            if meat_product in ingredient.getText():
                contains_meat=True
                l=ingredient.getText().split(s)
                l=remove_spaces(l)
                l.append(meat_product)
                meat_ingredients.append(l)
    return meat_ingredients, contains_meat

# calculate carbon footprint
def carbon_fp (l):
    """
    takes a list of ingredients contributing to co2 and returns carbon footprint
    """
    c=len(l)
    return c

def analyze_recipe(soup, page):
    """
    takes soup, page as argument
    returns ingredientwrapper, s=spacer
    """
    if 'Allrecipes' in soup.title.string:
        ingredient_wrappers=soup.find_all(itemprop="recipeIngredient")
        s=','
        is_recipe = True
    elif 'MyRecipes' in soup.title.string:
        ingredient_wrappers=soup.find_all(itemprop="ingredient")
        s='\n'
        is_recipe = True
    # this really has to be analyzed more closely!!
    else:
        is_recipe = False
        
    #return tags
    return ingredient_wrappers, s, is_recipe


def get_ingredients(ingredient_wrappers):
    """
    returns a list of all ingredients in the recipe
    """
    ingredients = []
    for ingredient in ingredient_wrappers:
        ingredients.append(ingredient.getText())    
    return ingredients
#not sure whether we can give soup and page as arguments..


### Data extraction and cleaning loop
Below we extract the data from the recipes of our html dataset and save it in dataframes. Our goal here is to extract the ingredients and assign a carbon-impact rating to the highest impact ingredients (meat or animal protein) in the recipes.

To extract protein-rich ingredients from animal source in order to calculate the main carbon footprint of the recipe, we use an extra database listing the main protein sources and carbon impact. Source of data: [GreenEatz](https://www.greeneatz.com/foods-carbon-footprint.html)