# Who are the carnivores?

### Abstract

With increasingly dire climate change forecasts, concerned individuals are asking how they can minimize their carbon footprint. Recent research suggests that reducing one's consumption of meat, in particular beef, is one of the highest impact actions an individual can take. To examine this topic, we will explore the trends in meat consumption in the U.S. by analyzing the prevalence of meat in recipes frequented online. Specifically, we plan to extract the ingredients, time and location of clicks from a recipe database. Using this information, we will explore the link between meat consumption and various key factors such as time of year, rural and urban locations, average regional income or historic events (i.e. the Paris Climate Agreement, mad cow disease outbreak). Finally, we hope to directly relate this data to the issue of climate change by estimating a rating reflecting the carbon footprint of meat in recipes and the environmental impact of consumers' diets.


### Imports and libraries

In [150]:
# Import libraries
from bs4 import BeautifulSoup
import os, os.path as osp

In [193]:
#DATA_FOLDER='data'
#SAMPLE_DATA_FOLDER = DATA_FOLDER + '/htmlSample/'
SAMPLE_DATA_FOLDER='htmlSample/'
SAMPLE_FILE='data/htmlSample/ff6da2b8d426c56ae77beda595bdcfea.html' #Recipe
SAMPLE_FILE_2='data/htmlSample/ff70d7922e18782f3bae04ac405508f4.html' #No Recipe

In [174]:
def remove_spaces(l):
    while '' in l:
                l.remove('')
    while ' ' in l:
                l.remove(' ')
    return l 

#idea: calculate carbon foodprint in this function by summing contributions in meat_ingredients
#works for MyRecipes.com
def find_amount(data, s):
    meat_ingredients = []
    for ingredient in data:
        for meat_product in meat_products:
            if meat_product in ingredient.getText():
                is_true=True
                l=ingredient.getText().split(s)
                l=remove_spaces(l)
                meat_ingredients.append(l)
    return meat_ingredients

# calculate carbon footprint
def carbon_fp (l):
    c=len(l)
    return c

In [155]:
meat_products = ['meat', 'chicken', 'fish', 'vanilla']
data_list=[] #title, carbon foodprint, counts, location etc.
is_true=False

In [171]:
# Because the files are already on the computer, we do not need to make a request and can open it directly.
with open(SAMPLE_FILE_2) as f:
    page = f.read()

# We need to extract the name of the respective page, and find the tags correspondingly.
# recipe from MyRecipes
soup = BeautifulSoup(page, 'html.parser')
print(soup.title.string)



	Cuban Recipes - Allrecipes.com



In [210]:
#function analyze_recipe()
#return carbon footprint. If carbon footprint !==0 then append list with recipe title and carbon footprint
def analyze_recipe(soup, page):
    if 'Allrecipes' in soup.title.string:
        ingredient_wrappers=soup.find_all(itemprop="recipeIngredient")
        s=','
    if 'MyRecipes' in soup.title.string:
        ingredient_wrappers=soup.find_all(itemprop="ingredient")
        s='\n'
    #write a list with meat, quantity in the first column, type in the second
    l=find_amount(ingredient_wrappers,s)
    return (carbon_fp(l))

#not sure whether we can give soup and page as arguments..

In [205]:

if 'Allrecipes' in soup.title.string:
        ingredient_wrappers=soup.find_all(itemprop="recipeIngredient")
        s=','
if 'MyRecipes' in soup.title.string:
        ingredient_wrappers=soup.find_all(itemprop="ingredient")
        s='\n'
    #write a list with meat, quantity in the first column, type in the second
l=find_amount(ingredient_wrappers,s)


In [206]:
data=[]

for filename in os.listdir(SAMPLE_DATA_FOLDER):
    with open(SAMPLE_DATA_FOLDER+filename) as f:
        # introduce try/catch such that it does no longer stop, when not recoginzeing letter
        page = f.read()
        soup = BeautifulSoup(page, 'html.parser')
        if 'Allrecipes' in soup.title.string:
            ingredient_wrappers=soup.find_all(itemprop="recipeIngredient")
            s=','
        if 'MyRecipes' in soup.title.string:
            ingredient_wrappers=soup.find_all(itemprop="ingredient")
            s='\n'
#write a list with meat, quantity in the first column, type in the second
        l=find_amount(ingredient_wrappers,s)
        print(l)

[]
[]
[]
[['1 teaspoon', ' vanilla extract'], ['1 teaspoon', ' vanilla extract']]
[['1 teaspoon', ' vanilla extract'], ['1 teaspoon', ' vanilla extract']]
[['1 teaspoon', ' vanilla extract'], ['1 teaspoon', ' vanilla extract']]
[]
[]


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 7572: invalid continuation byte

## First step: data loading and cleaning

Goal: end up with a dataframe containing ingredients, clicks, and other features for each recipe

Start with one HTML file then scale up to 10-100 then the whole folder

In [7]:
r.text

''

Below, we use beautifulSoup to extract features from the HTML file. I took a sample page from the web as I couldn't easily access local files.

In [8]:
URL = 'https://www.allrecipes.com/recipe/234502/vegan-waffles/'

In [9]:
r = requests.get(URL)
page_body = r.text

In [10]:
#This is how we get a beatiful soup of HTML for our recipe web page!
soup = BeautifulSoup(page_body, 'html.parser')

In [11]:
#And here is how we read the title!
soup.title.string

'Vegan Waffles Recipe - Allrecipes.com'

In [12]:
soup


<!DOCTYPE html>

<html lang="en-us">
<head>
<title>Vegan Waffles Recipe - Allrecipes.com</title>
<script async="true" src="https://secureimages.allrecipes.com/assets/deployables/v-1.161.0.4957/karma.bundled.js"></script>
<!--Make our website baseUrl available to the client-side code-->
<script type="text/javascript">
        var AR = AR || {};
        AR.baseWebsiteUrl = 'https://www.allrecipes.com';
    </script>
<script type="text/javascript">
        //Remove Ref_Hub from session after first recipe visited
        var hubId = window.sessionStorage["Ref_Hub_Id"];
        var count = window.sessionStorage["Ref_Hub_Recipe_Count"];
        if (hubId && count) {
            if (count > 0) {
                window.sessionStorage.removeItem("Ref_Hub_Id");
                window.sessionStorage.removeItem("Ref_Hub_Recipe_Count");
            }
        }
    </script>
<meta content="Vegan Waffles Recipe" property="og:title"/>
<meta content="Allrecipes" property="og:site_name"/>
<meta charset

In [13]:
#Now lets try to extract the ingredients. This vegan recipe won't contain meat!

#example from tuto
#publications_wrappers = soup.find_all('li', class_='entry')


#ingredient_wrappers = soup.find_all(class_="recipe-ingred_txt added")

ingredient_wrappers = soup.find_all(itemprop="recipeIngredient")

print('Total number of ingredients: {0}'.format(len(ingredient_wrappers)))

Total number of ingredients: 11


In [17]:
ingredient_wrappers

[<span class="recipe-ingred_txt added" data-id="2496" data-nameid="2496" itemprop="recipeIngredient">6 tablespoons water</span>,
 <span class="recipe-ingred_txt added" data-id="21042" data-nameid="21042" itemprop="recipeIngredient">2 tablespoons flax seed meal</span>,
 <span class="recipe-ingred_txt added" data-id="6057" data-nameid="6057" itemprop="recipeIngredient">1 cup rolled oats</span>,
 <span class="recipe-ingred_txt added" data-id="2879" data-nameid="2879" itemprop="recipeIngredient">1 3/4 cups soy milk</span>,
 <span class="recipe-ingred_txt added" data-id="1684" data-nameid="1684" itemprop="recipeIngredient">1/2 cup all-purpose flour</span>,
 <span class="recipe-ingred_txt added" data-id="1683" data-nameid="1683" itemprop="recipeIngredient">1/2 cup whole wheat flour</span>,
 <span class="recipe-ingred_txt added" data-id="6420" data-nameid="6420" itemprop="recipeIngredient">2 tablespoons canola oil</span>,
 <span class="recipe-ingred_txt added" data-id="2356" data-nameid="2356

In [35]:
# And here is a list of the ingredients!
ingredients = []

for ingredient in ingredient_wrappers:
    ingredients.append(ingredient.getText())

print (ingredients)

['6 tablespoons water', '2 tablespoons flax seed meal', '1 cup rolled oats', '1 3/4 cups soy milk', '1/2 cup all-purpose flour', '1/2 cup whole wheat flour', '2 tablespoons canola oil', '4 teaspoons baking powder', '1 teaspoon vanilla extract', '1 tablespoon agave nectar', '1/2 teaspoon salt']


In [36]:
# Function working on AllRecipes.com
# I guess it'll be different for other websites...

def extract_ingredients(URL):
    
    #get the recipe web page
    r = requests.get(URL)
    page_body = r.text
    
    soup = BeautifulSoup(page_body, 'html.parser')
    ingredient_wrappers = soup.find_all(itemprop="recipeIngredient")
    
    # And here is a list of the ingredients!
    ingredients = []

    for ingredient in ingredient_wrappers:
        ingredients.append(ingredient.getText())
        
    return ingredients
    

In [38]:
# Let's try it on another recipe
URL_nachos = 'https://www.allrecipes.com/recipe/51147/super-nachos/?internalSource=hub%20recipe&referringContentType=Search'

ingredients_nachos = extract_ingredients(URL_nachos)
ingredients_nachos

['1 pound ground beef',
 '1 (1.25 ounce) package taco seasoning mix',
 '3/4 cup water',
 '1 (18 ounce) package restaurant-style tortilla chips',
 '1 cup shredded sharp Cheddar cheese, or more to taste',
 '1 (15.5 ounce) can refried beans',
 '1 cup salsa',
 '1 cup sour cream, or more to taste',
 '1 (10 ounce) can pitted black olives, drained and chopped',
 '4 green onions, diced',
 '1 (4 ounce) can sliced jalapeno peppers, drained']