# Who are the carnivores?

### Abstract

With increasingly dire climate change forecasts, concerned individuals are asking how they can minimize their carbon footprint. Recent research suggests that reducing one's consumption of meat, in particular beef, is one of the highest impact actions an individual can take. To examine this topic, we will explore the trends in meat consumption in the U.S. by analyzing the prevalence of meat in recipes frequented online. Specifically, we plan to extract the ingredients, time and location of clicks from a recipe database. Using this information, we will explore the link between meat consumption and various key factors such as time of year, rural and urban locations, average regional income or historic events (i.e. the Paris Climate Agreement, mad cow disease outbreak). Finally, we hope to directly relate this data to the issue of climate change by estimating a rating reflecting the carbon footprint of meat in recipes and the environmental impact of consumers' diets.


### Imports and libraries

In [20]:
# Import libraries
import requests
import pandas as pd
import numpy as np

from bs4 import BeautifulSoup


In [21]:
DATA_FOLDER = 'data'
SAMPLE_DATA_FOLDER = DATA_FOLDER + '/htmlSample'

## First step: data loading and cleaning

Goal: end up with a dataframe containing ingredients, clicks, and other features for each recipe

Start with one HTML file then scale up to 10-100 then the whole folder

Below, we use beautifulSoup to extract features from the HTML file. I took a sample page from the web as I couldn't easily access local files.

In [44]:
# Function working on AllRecipes.com
# I guess it'll be different for other websites...

def extract_ingredients(URL):
    """
    Function extracting list of ingredients from given URL, currently working on AllRecipes.com recipes
    Returns a list of all ingredients and quantities.
    """
    
    #get the recipe web page
    r = requests.get(URL)
    page_body = r.text
    
    #extract a nice soup from our recipe page
    soup = BeautifulSoup(page_body, 'html.parser')
    print('Recipe analysed: '+soup.title.string)
    ingredient_wrappers = soup.find_all(itemprop="recipeIngredient")
    print('Total number of ingredients: {0}'.format(len(ingredient_wrappers)))
    
    # And here is a list of the ingredients!
    ingredients = []

    for ingredient in ingredient_wrappers:
        ingredients.append(ingredient.getText())
        
    return ingredients
    

In [45]:
# Let's try it on a nachos recipe!
URL_nachos = 'https://www.allrecipes.com/recipe/51147/super-nachos/?internalSource=hub%20recipe&referringContentType=Search'

ingredients_nachos = extract_ingredients(URL_nachos)
ingredients_nachos

Recipe analysed: Super Nachos Recipe - Allrecipes.com
Total number of ingredients: 11


['1 pound ground beef',
 '1 (1.25 ounce) package taco seasoning mix',
 '3/4 cup water',
 '1 (18 ounce) package restaurant-style tortilla chips',
 '1 cup shredded sharp Cheddar cheese, or more to taste',
 '1 (15.5 ounce) can refried beans',
 '1 cup salsa',
 '1 cup sour cream, or more to taste',
 '1 (10 ounce) can pitted black olives, drained and chopped',
 '4 green onions, diced',
 '1 (4 ounce) can sliced jalapeno peppers, drained']

In [33]:
type(ingredients_nachos)

list

## Second step: carbon footprint

We try to extract protein-rich ingredients from animal source to calculate the main carbon footprint of the recipe. Source of data: [GreenEatz](https://www.greeneatz.com/foods-carbon-footprint.html)

In [59]:
#Load data from xls file
carbon_footprint = pd.read_excel('data/carbon_footprint_protein.xls', sheet_name='meat', index_col=0)
carbon_footprint 

Unnamed: 0_level_0,Food,CO2 Kilos Equivalent,Car Miles Equivalent
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Lamb,39.2,91
2,Beef,27.0,63
4,Pork,12.1,28
5,Turkey,10.9,25
6,Chicken,6.9,16
7,Tuna,6.1,14


In [60]:
#List of meat ingredients
protein = carbon_footprint['Food'].tolist()
protein

['Lamb', 'Beef', 'Pork', 'Turkey', 'Chicken', 'Tuna']

In [57]:
ingredients = ingredients_nachos
df =pd.DataFrame(np.array(ingredients), columns=['ingredients'])
df

Unnamed: 0,ingredients
0,1 pound ground beef
1,1 (1.25 ounce) package taco seasoning mix
2,3/4 cup water
3,1 (18 ounce) package restaurant-style tortilla...
4,"1 cup shredded sharp Cheddar cheese, or more t..."
5,1 (15.5 ounce) can refried beans
6,1 cup salsa
7,"1 cup sour cream, or more to taste"
8,"1 (10 ounce) can pitted black olives, drained ..."
9,"4 green onions, diced"


In [None]:
#next step: extract animal protein ingredients by comparing one df to the other

## Third step: Tags and Ratings

In [124]:
def extract_tags(URL):
    """
    Function extracting list of ingredients from given URL, currently working on AllRecipes.com recipes
    Returns a list of all ingredients and quantities.
    """
    
    #get the recipe web page
    r = requests.get(URL)
    page_body = r.text
    
    #extract a nice soup from our recipe page
    soup = BeautifulSoup(page_body, 'html.parser')
    print('Recipe analysed: '+soup.title.string)
    
    
    ings = []
    ing_amnts = []
    tags = []
    
    #OK
    if 'Allrecipes' in soup.title.string: 
        print('This recipe is from Allrecipes')
        is_recipe = True
        tag_wrappers = soup.find_all(itemprop="recipeCategory")
        print('Total number of tags: {0}'.format(len(tag_wrappers)))
        for tag in tag_wrappers:
            tags.append(tag['content'])
        if not tags:
            print('alternative format used')
        else:
            print('alternative format 2 needed')
    
    #OK
    elif 'Epicurious' in soup.title.string:
        print('This recipe is from Epicurious')
        tag_wrappers = soup.find_all(itemprop="recipeCuisine")
        print('Total number of tags: {0}'.format(len(tag_wrappers)))
        for tag in tag_wrappers:
            tags.append(tag.getText())
            
        #These Categories are more main ingredients and might not be considered as tags
        tag_wrappers = soup.find_all(itemprop="recipeCategory")
        for tag in tag_wrappers:
            tags.append(tag.getText())
        
        if not tags:
            print('no tags found :( ')            
    
    #OK
    elif 'Food Network' in soup.title.string:
        print('This recipe is from FoodNetwork')
        tag_wrappers = soup.find_all(class_="btn grey-tags")
        print('Total number of tags: {0}'.format(len(tag_wrappers)))
        
        for tag in tag_wrappers:
            tags.append(tag.getText())

        if not tags:
            print('no tags found :( ')  

    #not tried yet, unavailable web pages    
    elif 'Food.com' in soup.title.string:
        print('This recipe is from Food.com')
    
    #no tags found in source
    elif 'BettyCrocker' in soup.title.string:
        print('This recipe is from Betty Crocker')
        print('no tags found :( ') 
    
    #weird tag structure, not done yet
    elif 'MyRecipes' in soup.title.string:
        print('This recipe is from MyRecipes')
        
        tag_wrappers = soup.find_all(id="karma-loader")
        print('Total number of tags: {0}'.format(len(tag_wrappers)))
        
        for tag in tag_wrappers:
            tags.append(tag.getText())

        if not tags:
            print('no tags found :( ')
    
    return tags

In [126]:
# Let's try it on a recipe!
URL = 'https://www.allrecipes.com/recipe/234502/vegan-waffles/'

tags = extract_tags(URL)
tags

Recipe analysed: Vegan Waffles Recipe - Allrecipes.com
This recipe is from Allrecipes
Total number of tags: 2
alternative format 2 needed


['Breakfast and Brunch', 'Waffles']