# Introduction

This file is used to set up the ```recipes1M.db``` database, which will contain over one million recipes, organized with ingredients and instructions. These instructions will help you get started.

1. To obtain the data, go to http://pic2recipe.csail.mit.edu/, and follow the instructions to download the dataset. This will involve creating an account.
2. After that, go to http://im2recipe.csail.mit.edu/dataset/download/ and download from the link labeled "Layers". Place this file in the working directory and make sure it is named "recipe1M_layers.tar.gz" (this is the default name). 
3. Create a ".gitignore" file and type both "recipe1M_layers.tar.gz" and "recipes1M.db" in it. This will prevent GitHub from attempting to push these MASSIVE files online each time you commit and push changes in other areas.
4. Continue with this file!

In [1]:
import tarfile
import sqlite3
import json
import pandas as pd

In [None]:
# open tarfile and explore what is there
tar = tarfile.open("recipe1M_layers.tar.gz")
files = tar.getmembers()
files

In [None]:
# extract recipe data into a string
f = tar.extractfile(files[0]).read()

# convert to a python list
temp = json.loads(f)

# what information do we have
temp[0].keys()

In [None]:
# create the DataFrame
df = pd.DataFrame()

for key in temp[0].keys():
    tempList = [temp[i][key] for i in range(0, len(temp)-1)]
    df[key] = tempList
    
df.head(5)

In [None]:
# unpack the ingredients and instructions columns
ingr_unpacked = []
istr_unpacked = []
for i in range(0, df.shape[0]): # loop over each row
    ingr_list = []
    istr_list = []
    for ingr_dict in df['ingredients'][i]: # loop over each ingredient
        ingr_list.append(ingr_dict['text']) # add to new list
    for istr_dict in df['instructions'][i]: # repeat for instructions
        istr_list.append(istr_dict['text'])
    ingr_str = json.dumps(ingr_list) # convert to JSON string format
    istr_str = json.dumps(istr_list)
    ingr_unpacked.append(ingr_str) # add the string just constructed to another list
    istr_unpacked.append(istr_str)

df["ingredients"] = ingr_unpacked # replace columns
df["instructions"] = istr_unpacked

In [None]:
# check data
df.head(5)

In [None]:
conn = sqlite3.connect("recipes1M.db")

In [None]:
df.to_sql("recipes", conn, if_exists = "replace", index = False, chunksize = 20000)

In [None]:
# verify it worked
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
print(cursor.fetchall())

In [None]:
cursor.execute("SELECT sql FROM sqlite_master WHERE type='table';")

for result in cursor.fetchall():
    print(result[0])

In [None]:
conn.close()

# Querying (Preliminary)

In [2]:
def title_keyword_search(keyword):
    ''' Searches the recipe databse for entries with the keyword in the title. '''
    
    conn = sqlite3.connect("recipes1M.db")
    cmd = \
    f"""
    SELECT R.title, R.url
    FROM recipes R
    WHERE R.title LIKE "%{keyword}%"
    """
    
    df = pd.read_sql_query(cmd, conn)
    conn.close()
    return df

In [4]:
df = title_keyword_search("steak")

In [5]:
df

Unnamed: 0,title,url
0,Steak & Asparagus Wraps,http://www.kraftrecipes.com/recipes/steak-aspa...
1,Mom's Swiss Steak Recipe,http://cookeatshare.com/recipes/mom-s-swiss-st...
2,The British Bulldog! Traditional Layered Beef ...,http://www.food.com/recipe/the-british-bulldog...
3,BBq Steak Sandwiches With a Rainbow of Peppers,http://www.food.com/recipe/bbq-steak-sandwiche...
4,Lemon Butter for Steak,http://www.epicurious.com/recipes/food/views/l...
...,...,...
11882,"Tuna Steaks, Seared with Salsa",http://www.kraftrecipes.com/recipes/tuna-steak...
11883,Simply Scrumptious Stilton Steak,http://www.food.com/recipe/simply-scrumptious-...
11884,Marvelous Marinated Steak Recipe,http://cookeatshare.com/recipes/marvelous-mari...
11885,Venison Steaks with Chestnuts,https://recipeland.com/recipe/v/venison-steaks...
