# Who are the carnivores?

### Abstract

With increasingly dire climate change forecasts, concerned individuals are asking how they can minimize their carbon footprint. Recent research suggests that reducing one's consumption of meat, in particular beef, is one of the highest impact actions an individual can take. To examine this topic, we will explore the trends in meat consumption in the U.S. by analyzing the prevalence of meat in recipes frequented online. Specifically, we plan to extract the ingredients, time and location of clicks from a recipe database. Using this information, we will explore the link between meat consumption and various key factors such as time of year, rural and urban locations, average regional income or historic events (i.e. the Paris Climate Agreement, mad cow disease outbreak). Finally, we hope to directly relate this data to the issue of climate change by estimating a rating reflecting the carbon footprint of meat in recipes and the environmental impact of consumers' diets.


### Imports and libraries

In [1]:
# Import libraries
import pandas as pd
import numpy as np

from bs4 import BeautifulSoup
import os, os.path as osp

In [20]:
DATA_FOLDER='data'
#SAMPLE_DATA_FOLDER = DATA_FOLDER + '/htmlSample/'
SAMPLE_DATA_FOLDER = DATA_FOLDER + '/sample_400/'
#SAMPLE_DATA_FOLDER='htmlSample/'
SAMPLE_FILE='data/htmlSample/ff6da2b8d426c56ae77beda595bdcfea.html' #Recipe
SAMPLE_FILE_2='data/htmlSample/ff727d984c9c0048def173c4c97ab52e.html' #No Recipe

## Data cleaning

Our recipe dataset contains recipes from the [From Cookies to Cooks](http://infolab.stanford.edu/~west1/from-cookies-to-cooks/), combining recipes from 14 high-trafficwebsites. We start by extracting all the information we want from the HTML files, that is: title, ingredients and meat or animal protein ingredients, tags, ratings in order to explore the recipes in more detail.


### Data cleaning functions

In [54]:
#cleaning function to remove unnecessary cells in ingredient list
def remove_spaces(l):
    """
    cleaning function to remove unnecessary cells in ingredient list
    """
    while '' in l:
                l.remove('')
    while ' ' in l:
                l.remove(' ')
    return l 

#idea: calculate carbon foodprint in this function by summing contributions in meat_ingredients
#return a list of all meat ingredients and their amount and a is_true=True if the recipe contains meat
def analyse_meat(ingredient_list, s):
    """
    takes as argument the ingredient_list and the spacer
    returns a list of all meat ingredients in the recipe, and a boolean contains_meat
    """
    meat_ingredients = []
    contains_meat=False
    for ingredient in ingredient_list:
        for meat_product in meat_products:
            if meat_product in ingredient.getText():
                contains_meat=True
                l=ingredient.getText().split(s)
                l=remove_spaces(l)
                l.append(meat_product)
                meat_ingredients.append(l)
    return meat_ingredients, contains_meat

# calculate carbon footprint
def carbon_fp (l):
    """
    takes a list of ingredients contributing to co2 and returns carbon footprint
    """
    c=len(l)
    return c

def analyze_recipe(soup, page):
    """
    takes soup, page as argument
    returns ingredientwrapper, s=spacer
    """
    if 'Allrecipes' in soup.title.string:
        ingredient_wrappers=soup.find_all(itemprop="recipeIngredient")
        s=','
        is_recipe = True
    elif 'MyRecipes' in soup.title.string:
        ingredient_wrappers=soup.find_all(itemprop="ingredient")
        s='\n'
        is_recipe = True
    # this really has to be analyzed more closely!!
    else:
        is_recipe = False
        
    #return tags
    return ingredient_wrappers, s, is_recipe


def get_ingredients(ingredient_wrappers):
    """
    returns a list of all ingredients in the recipe
    """
    ingredients = []
    for ingredient in ingredient_wrappers:
        ingredients.append(ingredient.getText())    
    return ingredients
#not sure whether we can give soup and page as arguments..


### Data extraction and cleaning loop
Below we extract the data from the recipes of our html dataset and save it in dataframes. Our goal here is to extract the ingredients and assign a carbon-impact rating to the highest impact ingredients (meat or animal protein) in the recipes.

To extract protein-rich ingredients from animal source in order to calculate the main carbon footprint of the recipe, we use an extra database listing the main protein sources and carbon impact. Source of data: [GreenEatz](https://www.greeneatz.com/foods-carbon-footprint.html)

In [4]:
#Load carbon impact data from xls file
carbon_footprint = pd.read_excel('data/carbon_footprint_protein.xls', sheet_name='meat', index_col=0)
carbon_footprint 

Unnamed: 0_level_0,Food,CO2 Kilos Equivalent,Car Miles Equivalent
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Lamb,39.2,91
2,Beef,27.0,63
4,Pork,12.1,28
5,Turkey,10.9,25
6,Chicken,6.9,16
7,Tuna,6.1,14


In [33]:
#List of meat ingredients
meat_products = carbon_footprint['Food'].tolist()

#Add lower case manually, use regex for case insensitivity?
meat_products = meat_products + ['lamb', 'beef', 'pork', 'turkey', 'chicken', 'tuna']

In [6]:
data_list=[] #title, carbon foodprint, counts, location etc.

In [None]:
#REALDEAL - LOOP - TO WRITE DATAFRAME
# data has following row structure
# RecipeName as Identifier - bool contains_meat - list of co2 ingredients - carbonFootprint - ingredients
data=[]
for filename in os.listdir(SAMPLE_DATA_FOLDER):
    with open(SAMPLE_DATA_FOLDER+filename) as f:
        isTrue=False
        count_exceptions=0
        # introduce try/catch such that it does no longer stop, when not recoginzeing letter
        try:
            page = f.read()
            soup = BeautifulSoup(page, 'html.parser')
            ingredient_wrappers, s, is_recipe = analyze_recipe(soup,page)
            #different tag for ingredient and different separators in ingredient list between the webpages
            if is_recipe:
                meatlist, contains_meat = analyse_meat(ingredient_wrappers,s)
                ingredients = get_ingredients(ingredient_wrappers)
                #add row to dataset only if recipe contains meat
                data.append([soup.title.string, contains_meat, meatlist, carbon_fp(meatlist), ingredients])
        except:
            count_exceptions=count_exceptions+1
print(data)
# save data as csv file once everything is working fine

In [58]:
#PLAYGROUND WITH THE SAME LOOP, BUT PRINTS AND STUFF FOR DEBUGGING AND EXPERIMENTING
# data has following row structure
# RecipeName as Identifier - bool contains_meat - list of co2 ingredients - carbonFootprint - ingredients
data=[]
for filename in os.listdir(SAMPLE_DATA_FOLDER):
    with open(SAMPLE_DATA_FOLDER+filename) as f:
        isTrue=False
        count_exceptions=0
        # introduce try/catch such that it does no longer stop, when not recoginzing letter
        try:
            page = f.read()
            soup = BeautifulSoup(page, 'html.parser')
            print('Recipe Analysed: '+soup.title.string)
            ingredient_wrappers, s, is_recipe = analyze_recipe(soup,page)

            print('is recipe: ',is_recipe)
            #different tag for ingredient and different separators in ingredient list between the webpages
            if is_recipe:
                #print('Title: ', soup.title.string)
                #print('filename: ', filename)     
                #print('ingredient wrappers: '+ingredient_wrappers)

                meatlist, contains_meat = analyse_meat(ingredient_wrappers,s)
                ingredients = get_ingredients(ingredient_wrappers)
                print('Ingredient list: ')
                print(ingredients)
                if meatlist:
                    print('Has meat:')
                    print(meatlist)
                else:
                    print('No meat detected')
                
                #print(ingredients)
                #data.append
                #add row to dataset only if recipe contains meat
                data.append([soup.title.string, contains_meat, meatlist, carbon_fp(meatlist), ingredients])
        except:
            count_exceptions=count_exceptions+1
    print('------------------------')
print(data)
# save data as csv 

Recipe Analysed: Money-Saving Meals : Recipes and Cooking : Food Network
------------------------
------------------------
Recipe Analysed: 
	Cheese and Onion Pie Recipe - Allrecipes.com

is recipe:  True
filename:  00ee3c71a71548d65f4c5a1dd573fbc6.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: 
	Wonton Wrappers Recipe - Allrecipes.com

is recipe:  True
filename:  0a33c71ae50009983ff80e13f41f02db.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: Cinnamon Bun Scones Recipe #68967 from CDKitchen.com
------------------------
Recipe Analysed: 
	Twice Baked Cheesy Potatoes Recipe - Allrecipes.com

is recipe:  True
filename:  0a86d840f8aa2314221c19e9e0b1a712.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: Oreo Cookies And Cream Cupcakes Recipe - Food.com - 258537
------------------------
Recipe Analysed: 









Vegetable Platter with Cannellini "Hummus" Recipe Reviews by Epicuriou

------------------------
------------------------
Recipe Analysed: Spiced Chicken Tenders with Dipping Sauces Recipe : Sandra Lee : Recipes : Food Network
------------------------
Recipe Analysed: 
	Chicken Corn Chowder Recipe - Allrecipes.com

is recipe:  True
filename:  00a58140b06973b9ac0da04b4b10f67e.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: 
	Pineapple Angel Food Cake I Recipe - Allrecipes.com

is recipe:  True
filename:  0a864ecb0977d6e06c8764142c7bfe86.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: Asian Egg Drop Soup Recipe : Tyler Florence : Recipes : Food Network
------------------------
Recipe Analysed: 
	Asian Ground Beef and Pepper Saute Recipe - Allrecipes.com

is recipe:  True
filename:  0a333842c2e21b5a71844ec1e76f9ddb.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: Chocolate Raspberry Ruffle Cake Recipe - Food.com - 156874
------------------------
Recip

Recipe Analysed: 
	Spicy Bok Choy in Garlic Sauce Recipe - Allrecipes.com

is recipe:  True
filename:  0a955a19da9d6c0236186a18490d2c8d.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: 
	Perfect Baked Potato Recipe - Allrecipes.com

is recipe:  True
filename:  00a405ea8c3d491b677a995cb558b99f.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: 
	Pork Baby Back Rib Recipes - Allrecipes.com

is recipe:  True
filename:  00ef721d1c594ce09abea90254d2fcb9.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: 
	Chicken Recipes - Allrecipes.com

is recipe:  True
filename:  0a98f1e41b6c606c78dcf9ce080845f8.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: Marie Callender's Recipes - CDKitchen
------------------------
Recipe Analysed: 
	Bruschetta Chicken Bake Recipe - Allrecipes.com

is recipe:  True
filename:  0a22bb25faa31175ccb6c6c745809128.html
Ingredient li

Recipe Analysed: 
	Spinach Cheese Squares Recipe - Allrecipes.com

is recipe:  True
filename:  0a8b3b41973595bad9cef18bb6146c26.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: 
	Bread Machine Bagels Recipe - Allrecipes.com

is recipe:  True
filename:  0a16879774efcfdcd589a437d627b70d.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: Creamed Spinach Recipe | MyRecipes.com
is recipe:  True
filename:  0abfa7383987abb026970554893b2351.html
Ingredient list: 
['\n1/2 cup\n fat-free milk\n \n', '\n2 teaspoons\n all-purpose flour\n \n', '\n \n Cooking spray\n \n', '\n1 cup\n thinly sliced leek (about 1 large)\n \n', '\n2 \n garlic cloves, minced\n \n', '\n1 \n (10-ounce) package frozen chopped spinach, thawed, drained, and squeezed dry\n \n', '\n1/4 cup\n (2 ounces) 1/3-less-fat cream cheese\n \n', '\n1/4 teaspoon\n salt\n \n', '\n1/4 teaspoon\n freshly ground black pepper\n \n']
No meat detected
------------------------
-

Recipe Analysed: Grilled Lamb Chops Recipe : Giada De Laurentiis : Recipes : Food Network
------------------------
Recipe Analysed: World's Best Sloppy Joes Recipe - Food.com - 67493
------------------------
Recipe Analysed: Shrimp Salad Recipe - Recipe for Shrimp Salad with Celery and Mayonnaise
------------------------
Recipe Analysed: Paula's Southern Thanksgiving : Special : Food Network
------------------------
Recipe Analysed: Bottom Round Roast, Beef - Definition and Cooking Information - RecipeTips.com
------------------------
Recipe Analysed: Cornbread & Apple Stuffing Recipe | Eating Well
------------------------
Recipe Analysed: Chef Stuart's Maryland Crab Soup Recipe :   : Recipes : Food Network
------------------------
Recipe Analysed: Al Bussell Peach Cobbler Recipe - Food.com - 251235
------------------------
Recipe Analysed: Lasagna Rolls Recipe : Giada De Laurentiis : Recipes : Food Network
------------------------
Recipe Analysed: Moist And Rich Homemade Chocolate Cak

Recipe Analysed: 
	Pumpkin Oatmeal Recipe - Allrecipes.com

is recipe:  True
filename:  00a50320c5b99874292e760601993c21.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: 
	Jerky Lover's Jerky - Sweet, Hot and Spicy! Recipe - Allrecipes.com

is recipe:  True
filename:  00e96740b957f8a99db4c2ccf7ab400e.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: 
	Gluten-Free Cheese and Herb Pizza Crust Recipe - Allrecipes.com

is recipe:  True
filename:  00ab5078eaeeb0bcec9284b8dad89ea7.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: 
	Blackened Tilapia with Secret Hobo Spices Recipe - Allrecipes.com

is recipe:  True
filename:  0a52ac5a2f50521e8ad6ab8ea2ab7eed.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: Raspberry Sauce Recipe : Ina Garten : Recipes : Food Network
------------------------
Recipe Analysed: 
	Flourless Chocolate Cake I Recipe - Allrecip

------------------------
Recipe Analysed: Bruschetta with Fresh Monterey Sardines Recipe : Review :   : Recipes : Food Network
------------------------
Recipe Analysed: 
	Hot Bean Dip Recipe - Allrecipes.com

is recipe:  True
filename:  0aa111ce432464355cd57b763f48ff5c.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: Sweet Potato Souffle Recipes - CDKitchen
------------------------
Recipe Analysed: 
	Monkfish Provincial Recipe - Allrecipes.com

is recipe:  True
filename:  00b416deeb3463c68d37a0ae7804fffa.html
Ingredient list: 
[]
No meat detected
------------------------
Recipe Analysed: 
	Pepperoni Pasta Recipe | Taste of Home Recipes

------------------------
Recipe Analysed: Low-Carb Peanut Butter Cookies Recipe from CDKitchen.com
------------------------
Recipe Analysed: 
	Peach Crisp  Recipe from Betty Crocker

------------------------
------------------------
Recipe Analysed: Quick, Healthy Vegetable Recipes | Eating Well
----------------------

In [9]:
# i put in the allrecipes cheesecake, the ingredients don't get extracted, contains vanilla!!!
with open(SAMPLE_FILE_2) as f:
    page = f.read()
    soup = BeautifulSoup(page, 'html.parser')
    
soup

<!-- ff727d984c9c0048def173c4c97ab52e.html http://allrecipes.com/Recipe/chocolate-cheesecake/detail.aspx //-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<!--[if lt IE 7 ]> <html class="ie6" xmlns="http://www.w3.org/1999/xhtml"> <![endif]-->
<!--[if IE 7 ]>    <html class="ie7" xmlns="http://www.w3.org/1999/xhtml"> <![endif]-->
<!--[if IE 8 ]>    <html class="ie8" xmlns="http://www.w3.org/1999/xhtml"> <![endif]-->
<!--[if IE 9 ]>    <html class="ie9" xmlns="http://www.w3.org/1999/xhtml"> <![endif]-->
<!--[if (gt IE 9)|!(IE)]><!--> <html xmlns="http://www.w3.org/1999/xhtml"> <!--<![endif]-->
<!-- ARLOG SERVER:WEB704 LOCAL_IP: 192.168.5.174 REMOTE_IP:131.107.192.193 TYPESPECIFICID: 91716 MERCH_KEY: MerchData_4_2_49_0_***_10_16_18_34_35_36_38_43_47_48_49_50_51_59 -->
<head><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content='(pics-1.1 "http://www.icra.org/pics/vocabularyv03/" l ge

In [10]:
# i put in the allrecipes cheesecake, the ingredients don't get extracted, contains vanilla!!!
with open(SAMPLE_FILE_2) as f:
    page = f.read()
soup = BeautifulSoup(page, 'html.parser')
ingredient_wrappers = soup.find_all(itemprop="ingredient")
ingredient_wrappers

[]

## First step: data loading and cleaning

Goal: end up with a dataframe containing ingredients, clicks, and other features for each recipe

Start with one HTML file then scale up to 10-100 then the whole folder

In [11]:
r.text

NameError: name 'r' is not defined

Below, we use beautifulSoup to extract features from the HTML file. I took a sample page from the web as I couldn't easily access local files.

In [None]:
URL = 'https://www.allrecipes.com/recipe/234502/vegan-waffles/'

In [None]:
r = requests.get(URL)
page_body = r.text

In [None]:
#This is how we get a beatiful soup of HTML for our recipe web page!
soup = BeautifulSoup(page_body, 'html.parser')

In [None]:
#And here is how we read the title!
soup.title.string

In [None]:
soup

In [None]:
#Now lets try to extract the ingredients. This vegan recipe won't contain meat!

#example from tuto
#publications_wrappers = soup.find_all('li', class_='entry')


#ingredient_wrappers = soup.find_all(class_="recipe-ingred_txt added")

ingredient_wrappers = soup.find_all(itemprop="recipeIngredient")

print('Total number of ingredients: {0}'.format(len(ingredient_wrappers)))

In [None]:
ingredient_wrappers

In [None]:
# And here is a list of the ingredients!
ingredients = []

for ingredient in ingredient_wrappers:
    ingredients.append(ingredient.getText())

print (ingredients)

In [None]:
# Function working on AllRecipes.com
# I guess it'll be different for other websites...

def extract_ingredients(URL):
    
    #get the recipe web page
    r = requests.get(URL)
    page_body = r.text
    
    soup = BeautifulSoup(page_body, 'html.parser')
    ingredient_wrappers = soup.find_all(itemprop="recipeIngredient")
    
    # And here is a list of the ingredients!
    ingredients = []

    for ingredient in ingredient_wrappers:
        ingredients.append(ingredient.getText())
        
    return ingredients
    

In [None]:
# Let's try it on another recipe
URL_nachos = 'https://www.allrecipes.com/recipe/51147/super-nachos/?internalSource=hub%20recipe&referringContentType=Search'

ingredients_nachos = extract_ingredients(URL_nachos)
ingredients_nachos