# Who are the carnivores?

### Abstract

With increasingly dire climate change forecasts, concerned individuals are asking how they can minimize their carbon footprint. Recent research suggests that reducing one's consumption of meat, in particular beef, is one of the highest impact actions an individual can take. To examine this topic, we will explore the trends in meat consumption in the U.S. by analyzing the prevalence of meat in recipes frequented online. Specifically, we plan to extract the ingredients, time and location of clicks from a recipe database. Using this information, we will explore the link between meat consumption and various key factors such as time of year, rural and urban locations, average regional income or historic events (i.e. the Paris Climate Agreement, mad cow disease outbreak). Finally, we hope to directly relate this data to the issue of climate change by estimating a rating reflecting the carbon footprint of meat in recipes and the environmental impact of consumers' diets.


### Imports and libraries

In [150]:
# Import libraries
from bs4 import BeautifulSoup
import os, os.path as osp

In [337]:
#DATA_FOLDER='data'
#SAMPLE_DATA_FOLDER = DATA_FOLDER + '/htmlSample/'
SAMPLE_DATA_FOLDER='htmlSample/'
SAMPLE_FILE='data/htmlSample/ff6da2b8d426c56ae77beda595bdcfea.html' #Recipe
SAMPLE_FILE_2='data/htmlSample/ff727d984c9c0048def173c4c97ab52e.html' #No Recipe

In [372]:
#cleaning function to remove unnecessary cells in ingredient list
def remove_spaces(l):
    """
    cleaning function to remove unnecessary cells in ingredient list
    """
    while '' in l:
                l.remove('')
    while ' ' in l:
                l.remove(' ')
    return l 

#idea: calculate carbon foodprint in this function by summing contributions in meat_ingredients
#return a list of all meat ingredients and their amount and a is_true=True if the recipe contains meat
def analyse_meat(ingredient_list, s):
    """
    takes as argument the ingredient_list and the spacer
    returns a list of all meat ingredients in the recipe, and a boolean contains_meat
    """
    meat_ingredients = []
    contains_meat=False
    for ingredient in ingredient_list:
        for meat_product in meat_products:
            if meat_product in ingredient.getText():
                contains_meat=True
                l=ingredient.getText().split(s)
                l=remove_spaces(l)
                meat_ingredients.append(l)
    return meat_ingredients, contains_meat

# calculate carbon footprint
def carbon_fp (l):
    """
    takes a list of ingredients contributing to co2 and returns carbon footprint
    """
    c=len(l)
    return c

def analyze_recipe(soup, page):
    """
    takes soup, page as argument
    returns ingredientwrapper, s=spacer
    """
    if 'Allrecipes' in soup.title.string:
        ingredient_wrappers=soup.find_all(itemprop="recipeIngredient")
        s=','
        is_recipe = True
    if 'MyRecipes' in soup.title.string:
        ingredient_wrappers=soup.find_all(itemprop="ingredient")
        s='\n'
        is_recipe = True
    # this really has to be analyzed more closely!!
    else:
        is_recipe = False
    return ingredient_wrappers, s, is_recipe

def get_ingredients(ingredient_wrappers):
    """
    returns a list of all ingredients in the recipe
    """
    ingredients = []
    for ingredient in ingredient_wrappers:
        ingredients.append(ingredient.getText())    
    return ingredients
#not sure whether we can give soup and page as arguments..


In [373]:
meat_products = ['meat', 'chicken', 'fish', 'vanilla','lamb','Lamb']
data_list=[] #title, carbon foodprint, counts, location etc.

In [374]:
#REALDEAL - LOOP - TO WRITE DATAFRAME
# data has following row structure
# RecipeName as Identifier - bool contains_meat - list of co2 ingredients - carbonFootprint - ingredients
data=[]
for filename in os.listdir(SAMPLE_DATA_FOLDER):
    with open(SAMPLE_DATA_FOLDER+filename) as f:
        isTrue=False
        count_exceptions=0
        # introduce try/catch such that it does no longer stop, when not recoginzeing letter
        try:
            page = f.read()
            soup = BeautifulSoup(page, 'html.parser')
            ingredient_wrappers, s, is_recipe = analyze_recipe(soup,page)
            #different tag for ingredient and different separators in ingredient list between the webpages
            if is_recipe:
                meatlist, contains_meat = analyse_meat(ingredient_wrappers,s)
                ingredients = get_ingredients(ingredient_wrappers)
                #add row to dataset only if recipe contains meat
                data.append([soup.title.string, contains_meat, meatlist, carbon_fp(meatlist), ingredients])
        except:
            count_exceptions=count_exceptions+1
print(data)
# save data as csv file once everything is working fine

[['Red Raspberry Velvet Cake Recipe | MyRecipes.com', True, [['1 teaspoon', ' vanilla extract'], ['1 teaspoon', ' vanilla extract']], 2, ['\n \n Cooking spray\n \n', '\n3 cups\n sifted cake flour\n \n', '\n2 tablespoons\n unsweetened cocoa\n \n', '\n1 teaspoon\n baking soda\n \n', '\n1 teaspoon\n baking powder\n \n', '\n1/2 teaspoon\n salt\n \n', '\n1 2/3 cups\n granulated sugar\n \n', '\n1/2 cup\n butter, softened\n \n', '\n4 \n large egg whites\n \n', '\n2 cups\n fat-free buttermilk\n \n', '\n1 \n (1-ounce) bottle red food coloring\n \n', '\n1 teaspoon\n vanilla extract\n \n', '\n7 ounces\n 1/3-less-fat cream cheese\n \n', '\n1 teaspoon\n vanilla extract\n \n', '\n2 3/4 cups\n powdered sugar\n \n', '\n1/2 cup\n seedless raspberry jam\n \n']]]


In [375]:
#PLAYGROUND WITH THE SAME LOOP, BUT PRINTS AND STUFF FOR DEBUGGING AND EXPERIMENTING
# data has following row structure
# RecipeName as Identifier - bool contains_meat - list of co2 ingredients - carbonFootprint - ingredients
data=[]
for filename in os.listdir(SAMPLE_DATA_FOLDER):
    with open(SAMPLE_DATA_FOLDER+filename) as f:
        isTrue=False
        count_exceptions=0
        # introduce try/catch such that it does no longer stop, when not recoginzeing letter
        try:
            page = f.read()
            soup = BeautifulSoup(page, 'html.parser')
            ingredient_wrappers, s, is_recipe = analyze_recipe(soup,page)
            print('is recipe: ',is_recipe)
            #different tag for ingredient and different separators in ingredient list between the webpages
            if is_recipe:
                print('enter')
                meatlist, contains_meat = analyse_meat(ingredient_wrappers,s)
                ingredients = get_ingredients(ingredient_wrappers)
                print(ingredients)
                print('Title: ', soup.title.string)
                print('filename: ', filename)
                #print(ingredients)
                #data.append
                #add row to dataset only if recipe contains meat
                data.append([soup.title.string, contains_meat, meatlist, carbon_fp(meatlist), ingredients])
        except:
            count_exceptions=count_exceptions+1
print(data)
# save data as csv 

is recipe:  False
is recipe:  False
is recipe:  True
enter
['\n \n Cooking spray\n \n', '\n3 cups\n sifted cake flour\n \n', '\n2 tablespoons\n unsweetened cocoa\n \n', '\n1 teaspoon\n baking soda\n \n', '\n1 teaspoon\n baking powder\n \n', '\n1/2 teaspoon\n salt\n \n', '\n1 2/3 cups\n granulated sugar\n \n', '\n1/2 cup\n butter, softened\n \n', '\n4 \n large egg whites\n \n', '\n2 cups\n fat-free buttermilk\n \n', '\n1 \n (1-ounce) bottle red food coloring\n \n', '\n1 teaspoon\n vanilla extract\n \n', '\n7 ounces\n 1/3-less-fat cream cheese\n \n', '\n1 teaspoon\n vanilla extract\n \n', '\n2 3/4 cups\n powdered sugar\n \n', '\n1/2 cup\n seedless raspberry jam\n \n']
Title:  Red Raspberry Velvet Cake Recipe | MyRecipes.com
filename:  ff6da2b8d426c56ae77beda595bdcfea.html
is recipe:  False
is recipe:  False
[['Red Raspberry Velvet Cake Recipe | MyRecipes.com', True, [['1 teaspoon', ' vanilla extract'], ['1 teaspoon', ' vanilla extract']], 2, ['\n \n Cooking spray\n \n', '\n3 cups\n sifte

In [351]:
# i put in the allrecipes cheesecake, the ingredients don't get extracted, contains vanilla!!!
with open(SAMPLE_FILE_2) as f:
    page = f.read()
    soup = BeautifulSoup(page, 'html.parser')
    
soup

<!-- ff727d984c9c0048def173c4c97ab52e.html http://allrecipes.com/Recipe/chocolate-cheesecake/detail.aspx //-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<!--[if lt IE 7 ]> <html class="ie6" xmlns="http://www.w3.org/1999/xhtml"> <![endif]-->
<!--[if IE 7 ]>    <html class="ie7" xmlns="http://www.w3.org/1999/xhtml"> <![endif]-->
<!--[if IE 8 ]>    <html class="ie8" xmlns="http://www.w3.org/1999/xhtml"> <![endif]-->
<!--[if IE 9 ]>    <html class="ie9" xmlns="http://www.w3.org/1999/xhtml"> <![endif]-->
<!--[if (gt IE 9)|!(IE)]><!--> <html xmlns="http://www.w3.org/1999/xhtml"> <!--<![endif]-->
<!-- ARLOG SERVER:WEB704 LOCAL_IP: 192.168.5.174 REMOTE_IP:131.107.192.193 TYPESPECIFICID: 91716 MERCH_KEY: MerchData_4_2_49_0_***_10_16_18_34_35_36_38_43_47_48_49_50_51_59 -->
<head><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content='(pics-1.1 "http://www.icra.org/pics/vocabularyv03/" l ge

In [370]:
# i put in the allrecipes cheesecake, the ingredients don't get extracted, contains vanilla!!!
with open(SAMPLE_FILE_2) as f:
    page = f.read()
soup = BeautifulSoup(page, 'html.parser')
ingredient_wrappers = soup.find_all(itemprop="ingredient")
ingredient_wrappers

[]

## First step: data loading and cleaning

Goal: end up with a dataframe containing ingredients, clicks, and other features for each recipe

Start with one HTML file then scale up to 10-100 then the whole folder

In [7]:
r.text

''

Below, we use beautifulSoup to extract features from the HTML file. I took a sample page from the web as I couldn't easily access local files.

In [363]:
URL = 'https://www.allrecipes.com/recipe/234502/vegan-waffles/'

In [364]:
r = requests.get(URL)
page_body = r.text

In [365]:
#This is how we get a beatiful soup of HTML for our recipe web page!
soup = BeautifulSoup(page_body, 'html.parser')

In [366]:
#And here is how we read the title!
soup.title.string

'Vegan Waffles Recipe - Allrecipes.com'

In [12]:
soup


<!DOCTYPE html>

<html lang="en-us">
<head>
<title>Vegan Waffles Recipe - Allrecipes.com</title>
<script async="true" src="https://secureimages.allrecipes.com/assets/deployables/v-1.161.0.4957/karma.bundled.js"></script>
<!--Make our website baseUrl available to the client-side code-->
<script type="text/javascript">
        var AR = AR || {};
        AR.baseWebsiteUrl = 'https://www.allrecipes.com';
    </script>
<script type="text/javascript">
        //Remove Ref_Hub from session after first recipe visited
        var hubId = window.sessionStorage["Ref_Hub_Id"];
        var count = window.sessionStorage["Ref_Hub_Recipe_Count"];
        if (hubId && count) {
            if (count > 0) {
                window.sessionStorage.removeItem("Ref_Hub_Id");
                window.sessionStorage.removeItem("Ref_Hub_Recipe_Count");
            }
        }
    </script>
<meta content="Vegan Waffles Recipe" property="og:title"/>
<meta content="Allrecipes" property="og:site_name"/>
<meta charset

In [367]:
#Now lets try to extract the ingredients. This vegan recipe won't contain meat!

#example from tuto
#publications_wrappers = soup.find_all('li', class_='entry')


#ingredient_wrappers = soup.find_all(class_="recipe-ingred_txt added")

ingredient_wrappers = soup.find_all(itemprop="recipeIngredient")

print('Total number of ingredients: {0}'.format(len(ingredient_wrappers)))

Total number of ingredients: 11


In [368]:
ingredient_wrappers

[<span class="recipe-ingred_txt added" data-id="2496" data-nameid="2496" itemprop="recipeIngredient">6 tablespoons water</span>,
 <span class="recipe-ingred_txt added" data-id="21042" data-nameid="21042" itemprop="recipeIngredient">2 tablespoons flax seed meal</span>,
 <span class="recipe-ingred_txt added" data-id="6057" data-nameid="6057" itemprop="recipeIngredient">1 cup rolled oats</span>,
 <span class="recipe-ingred_txt added" data-id="2879" data-nameid="2879" itemprop="recipeIngredient">1 3/4 cups soy milk</span>,
 <span class="recipe-ingred_txt added" data-id="1684" data-nameid="1684" itemprop="recipeIngredient">1/2 cup all-purpose flour</span>,
 <span class="recipe-ingred_txt added" data-id="1683" data-nameid="1683" itemprop="recipeIngredient">1/2 cup whole wheat flour</span>,
 <span class="recipe-ingred_txt added" data-id="6420" data-nameid="6420" itemprop="recipeIngredient">2 tablespoons canola oil</span>,
 <span class="recipe-ingred_txt added" data-id="2356" data-nameid="2356

In [35]:
# And here is a list of the ingredients!
ingredients = []

for ingredient in ingredient_wrappers:
    ingredients.append(ingredient.getText())

print (ingredients)

['6 tablespoons water', '2 tablespoons flax seed meal', '1 cup rolled oats', '1 3/4 cups soy milk', '1/2 cup all-purpose flour', '1/2 cup whole wheat flour', '2 tablespoons canola oil', '4 teaspoons baking powder', '1 teaspoon vanilla extract', '1 tablespoon agave nectar', '1/2 teaspoon salt']


In [36]:
# Function working on AllRecipes.com
# I guess it'll be different for other websites...

def extract_ingredients(URL):
    
    #get the recipe web page
    r = requests.get(URL)
    page_body = r.text
    
    soup = BeautifulSoup(page_body, 'html.parser')
    ingredient_wrappers = soup.find_all(itemprop="recipeIngredient")
    
    # And here is a list of the ingredients!
    ingredients = []

    for ingredient in ingredient_wrappers:
        ingredients.append(ingredient.getText())
        
    return ingredients
    

In [38]:
# Let's try it on another recipe
URL_nachos = 'https://www.allrecipes.com/recipe/51147/super-nachos/?internalSource=hub%20recipe&referringContentType=Search'

ingredients_nachos = extract_ingredients(URL_nachos)
ingredients_nachos

['1 pound ground beef',
 '1 (1.25 ounce) package taco seasoning mix',
 '3/4 cup water',
 '1 (18 ounce) package restaurant-style tortilla chips',
 '1 cup shredded sharp Cheddar cheese, or more to taste',
 '1 (15.5 ounce) can refried beans',
 '1 cup salsa',
 '1 cup sour cream, or more to taste',
 '1 (10 ounce) can pitted black olives, drained and chopped',
 '4 green onions, diced',
 '1 (4 ounce) can sliced jalapeno peppers, drained']