## In Class Assignment 1

**Goal:** Get a list of recipe names from www.allrecipes.com

https://www.allrecipes.com/search/results/?search=cheese

1. Write function `crawl_recipes(query)` which:
    * takes the search phrase (the ingredient) as input argument
    * builds the correct url that leads directly to the page that lists the recipes
    * uses `requests` to get the content of this page returns the html text of the page
1. Write `extract_recipes(text)` which:
    * takes the text returned by `crawl_recipes` as argument
    * builds a BeautifulSoup object out of that text 
    * finds names of all recipes
        - to identify which tags / classes to `find_all()`, open the page in your browser and "inspect" 
        - start from the recipe object above, and call another `find_all()` to zoom into the recipe name itself
    * returns the list of recipe names
    
(++) Get the name of the recipe submitter too.  How might your output datatype change to accomodate this?  (Previously, it was a list of strings ... what data type might you choose now?)
    

In [31]:
from bs4 import BeautifulSoup
import requests

def crawl_recipes(query):
    url = f'https://www.allrecipes.com/search/results/?search={query}'
    html = requests.get(url)
    
    return html

In [63]:
def extract_recipe(html):
    """ builds list of recipe names from allrecipies html
    
    Args:
        html (str): html response from allrecipes.com, see crawl_recipes()
        
    Returns:
        recipe_list (list): list of recipes
    """
    recipe = BeautifulSoup(html.text)
    
    recipe_list = [] 
    for recipe in recipe.find_all(class_='card__recipe'):
        recipe = recipe.a.text
        recipe_list = recipe_list.append(recipe)
    
    return recipe_list

In [68]:
import pandas as pd

def extract_recipes(text):
    """ builds list of recipe names from allrecipies html
    
    Args:
        html (str): html response from allrecipes.com, see crawl_recipes()
        
    Returns:
        df_recipe (pd.DataFrame): dataframe of recipes
    """
    
    # build soup object from text
    soup = BeautifulSoup(text.text)
    
    df_recipe = pd.DataFrame()
    for recipe in soup.find_all(class_='card__recipe'):
        # extract / store recipe
        recipe_name = recipe.a.text.strip()
        
        # search within this recipe for a title link
        a = recipe.find_all('a', class_='card__titleLink')[0]
        recipe_href = a.attrs['href']
        
        
        # bundle as a dictionary (easy to pass to pandas series later)
        recipe_dict = {'name': recipe_name,
                      'href': recipe_href}
        df_recipe = df_recipe.append(pd.Series(recipe_dict), ignore_index=True)
        
    return df_recipe

In [69]:
cheese_html = crawl_recipes('cheese')
recipe_list = extract_recipes(cheese_html)

In [70]:
recipe_list

Unnamed: 0,name,href
0,Mac and Cheese with Cottage Cheese,https://www.allrecipes.com/recipe/270056/mac-a...
1,Mac and Cheese in a Cheese Waffle Cone,https://www.allrecipes.com/recipe/244357/mac-a...
2,Ethiopian Cheese,https://www.allrecipes.com/recipe/246339/ethio...
3,Cheese Ball with Cream Cheese,https://www.allrecipes.com/recipe/14980/cheese...
4,Cheese Squares,https://www.allrecipes.com/recipe/17204/cheese...
5,Three-Cheese and Basil Grilled Cheese Sandwich,https://www.allrecipes.com/recipe/275612/three...
6,French Onion Mac and Cheese,https://www.allrecipes.com/recipe/280022/frenc...
7,Four-Cheese Grilled Cheese Sandwich,https://www.allrecipes.com/recipe/281212/four-...
8,Pimento Cheese without Cream Cheese,https://www.allrecipes.com/recipe/262855/pimen...
9,Broccoli and Cheese Mashed Please,https://www.allrecipes.com/recipe/234782/brocc...


## In Class Assignment 2 - Getting Nutritional Information
Write an `extract_nutrition()` function, which accepts a url of a particular recipe (see ex directly above) and returns a dictionary of nutritional information:

```python
url = 'https://www.allrecipes.com/recipe/270056/mac-and-cheese-with-cottage-cheese'
extract_nutrition(url)

```

yields:

```python
{'calories': '620',
 'protein': '31.6g',
 'carbohydrates': '41g',
 'fat': '36.6g',
 'cholesterol': '121.2mg',
 'sodium': '801.1mg'}

```


In [72]:
def extract_nutrition(url):
    """ returns a dictionary of nutrition info
    
    Args:
        url (str): location of all recipes "recipe"
    
    Returns:
        nutrition_dict (dict): keys are molecule types ('fat'),
            vals are str of quantity ('24 g')
    """
    
    # get soup from url
    html = requests.get(url).text
    soup = BeautifulSoup(html)
    
    # extract nutrition info
    str_nutrit = soup.find_all(class_='recipeNutritionSectionBlock')[0].text
    
    # discard uneeded str
    str_nutrit = str_nutrit.replace('Per Serving:', '')
    str_nutrit = str_nutrit.replace('Full Nutrition:', '')
    
    # remove period at end of mg
    str_nutrit = str_nutrit.replace('mg.', 'mg')
    
    # split
    nutrition_dict = dict()
    for line in str_nutrit.split(';'):
        line = line.strip()
        
        feat_name, feat_val = line.split(' ')
        
        if 'calories' in line: 
            feaat_name, feat_val = feat_val, feat_name

    # store
    nutrition_dict[feat_name] = feat_val
        
    return nutrition_dict
    