# Recipes Amore
---
What are you in the mood to cook today? 
Maybe you have a specific craving or even just an ingredient lying around you're trying to use.

Thousands of recipes have been collected from allrecipes.com and have been compiled to help. Using this recipe recommender system, all you need to do is enter those ingredients you're in the mood for and then choose something delicious to make from the results!

In [62]:
from recipe_scrapers import scrape_me
import pandas as pd
import numpy as np
import time

In [63]:
# # Install the package. 
# # !pip install git+git://github.com/hhursev/recipe-scrapers.git

# # To test the scraper, a single scrape is performed. There may be recipes with lower numbers, but this is the lowest yet found.
# scraper = scrape_me('https://www.allrecipes.com/recipe/6663/')

# # Different methods become available with the scraper modules. To test them, uncomment and run the code on scraper.
# # scraper.title()
# # scraper.total_time()
# # scraper.ingredients()
# # scraper.instructions()
# # scraper.links()

# # To run a scrape across allrecipes.com, a function is needed. 
# scrape_url_nude = 'https://www.allrecipes.com/recipe/' # a baseline URL for the function
# recipe_numbers  = list(range(18661, 300000)) # Recipe numbers.Thusfar, only recipes between 6663 and 18660, inclusive, have been collected. 300000 is an arbitrary endpoint.
# len(recipe_numbers) # Checking how many recipes we are attempting to scrape. This is a test, not a function component.
# recipe_df_columns = ['Recipe Number', 'Title', 'Total Time', 'Ingredients', 'Instructions'] # Intended columns for the scrape's output dataframe. 
# recipe_df = pd.DataFrame(columns = recipe_df_columns) # Instantiating the dataframe. 

# def the_big_scrape(): # a function that starts the scrape. 
#     for rec in recipe_numbers:
#         try: 
#             scraped = scrape_me(scrape_url_nude+str(rec))
#             recipe_df.loc[rec]= [rec, scraped.title(), scraped.total_time(), scraped.ingredients(), scraped.instructions()]
# # # The following two lines were intended to skip 404ed pages, but resulted in no scraping taking plac.e. Unsure why. Can clean on the back end. 
# #             if scrape.title() == 'Bummer.': # 'Bummer.' is the title of URLs which have no recipe, their 404 message. 
# #                 continue
#             if rec % 10 == 0: # saving every time a recipe with a factor of 10 isn't 404. If something goes wrong with the scrape, previous materials are saved. 
#                 recipe_df.to_csv('./recipe_df3.csv')
#         except:
#             continue
#         time.sleep(1) # scrapes once every three seconds. 

# the_big_scrape() # runs the scrape

In [64]:
recipe_df = pd.read_csv('./recipe_df.csv')
recipe_df.head(5)

Unnamed: 0.1,Unnamed: 0,Recipe Number,Title,Total Time,Ingredients,Instructions
0,0,0,Bummer.,0,[],
1,1,1,Bummer.,0,[],
2,2,2,Bummer.,0,[],
3,3,3,Mexican Strawberry Water (Agua de Fresa),265,"['4 cups strawberries, sliced', '1 cup white s...",
4,4,4,Bummer.,0,[],


### Defining a recipe

---
Each recipe needs a set of characteristics, stored in variables. These variables will be stored in dictionaries.  
More simply, we need a cookbook (a dictionary), in which we have recipes (dictionaries), in which we store variables. 

The recipe variables are:

    Recipe ID: The unique number that appends the recipe url on allrecipes.com (integer)
    Title: The name of the recipe (string)
    Total time: The number of minutes the recipe should take to prepare and cook. (integer)
    Ingredients: The ingredients that go into the recipe. (list of strings)
    Instructions: Step-by-step cooking instructions (list)
    
Sometimes these variables are unideal. Many cooking times are missing. Recipes are user-submitted and this product, Recipe Amore, currently does not take into account ratings.

<img src="https://imgur.com/cfQTzEB.png" style="float: left; margin: 15px; height: 50px"> 
#### 'Bummer.'
 
Not every URL has a recipe attached. Many have been removed either by the user or by allrecipes themselves for any number of potential reasons. Allrecipes.com uses this eggy image for their 404 to let you know they don't have a recipe there. Before building the recommender system, it was important to ahve these cleared out so you can better guarantee every recipe you see will be a winner!


In [65]:
# First, check the size of the DataFrame
recipe_df.shape # (13456, 6)

# In case anything needs changing, it's important to know about the data
recipe_df.info()

recipe_df.head(3) # Column 'Unnamed: 0' seems to be a copy of the recipe number.
(recipe_df['Unnamed: 0'] == recipe_df['Recipe Number']).value_counts() # demonstrates them to be exact copies. 'Unnamed: 0' will need to be removed.

# Where 'Bummer.' appears as a title, scraper.title(), the recipe is absent. 
# Checking how many rows have a title of 'Bummer.'
recipe_df.loc[recipe_df['Title'] == 'Bummer.'].count() # 779 rows with the title of 'Bummer.' Those will need to be removed. 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17856 entries, 0 to 17855
Data columns (total 6 columns):
Unnamed: 0       17856 non-null int64
Recipe Number    17856 non-null int64
Title            17856 non-null object
Total Time       17856 non-null int64
Ingredients      17856 non-null object
Instructions     12677 non-null object
dtypes: int64(3), object(3)
memory usage: 837.1+ KB


Unnamed: 0       5178
Recipe Number    5178
Title            5178
Total Time       5178
Ingredients      5178
Instructions        0
dtype: int64

In [66]:
recipe_df.head(3)

Unnamed: 0.1,Unnamed: 0,Recipe Number,Title,Total Time,Ingredients,Instructions
0,0,0,Bummer.,0,[],
1,1,1,Bummer.,0,[],
2,2,2,Bummer.,0,[],


### Example recipe
The recipes (dictionaries, all), will again, contain the following: 

    Recipe ID: The unique number that appends the recipe url on allrecipes.com (integer)
    Title: The name of the recipe (string)
    Total time: The number of minutes the recipe should take to prepare and cook. (integer)
    Ingredients: The ingredients that go into the recipe. (list of strings)
    Instructions: Step-by-step cooking instructions (list)

In [67]:
recipe_ex = {
    'Recipe ID': 1,
    'Recipe Name': "Charlie's Best Bunt Cakes",
    'Total Time (min)': 90,
    'Ingredients': [],
    'Instructions': []
}

### Creating a Cookbook, the dictionary to house recipes
The recipes need to be stored. 
The keys of this dictionary will be the `Recipe ID`, which will also be present in the dictionary. A necessary redundancy. 

In [68]:
cookbook_ex = {recipe_ex['Recipe ID']: recipe_ex}
display(cookbook_ex)

{1: {'Recipe ID': 1,
  'Recipe Name': "Charlie's Best Bunt Cakes",
  'Total Time (min)': 90,
  'Ingredients': [],
  'Instructions': []}}

In [72]:
### We can manipulate the recipe if we want!
cookbook_ex[1]['Recipe Name'] = "Isaiah's Best Bunt Cakes"
display(cookbook_ex)

{1: {'Recipe ID': 1,
  'Recipe Name': "Isaiah's Best Bunt Cakes",
  'Total Time (min)': 90,
  'Ingredients': [],
  'Instructions': []}}

In [70]:
### Creating a function that adds a recipe to the cookbook
def add_recipe(cookbook, recipe_id, recipe_name, total_time, ingredients, instructions):
    cookbook[recipe_id] = {
        'Recipe ID': recipe_id,
        'Recipe Name': recipe_name,
        'Total Time (min)': total_time,
        'Ingredients': ingredients,
        'Instructions': instructions
    }

In [76]:
recipe_df.drop(columns=['Unnamed: 0'], inplace=True)

In [74]:
recipe_df.set_index('Recipe Number', drop=False, inplace=True)

In [None]:
# Webscrape one by one using the webscraper 
# I need to drop the scraped information into a database

# 1 Scrape + Apply info to scraper object. 

# 2 Append dataframe with scraper.title, scraper.total_time(), and scraper.ingredients()

# 3 Scrape next page, repeat steps 2 and 3 for x iterations. Make  sure it skips over pages with no recipe. 

# Apply phrase tagger

In [None]:
for x in y:
    x do code:
        
    