<a href="https://colab.research.google.com/github/davidyu8/gouda-group-project/blob/main/find_recipe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recipe Recommender ™

## Preparing the Data

In [1]:
import json
import pandas as pd
import sqlite3
import numpy as np

In [7]:
# set up data set (this is the smaller one, with about 40,000 recipes)
with open("Users\david\Documents\GitHub\gouda-group-project\Old\recipes_raw\recipes_raw_nosource_ar.json") as f:
#with open("recipes_raw/recipes_raw_nosource_ar.json") as f:
    data = json.load(f)
df = pd.DataFrame(data)
df = df.T

# we will use the larger data set later

FileNotFoundError: [Errno 2] No such file or directory: 'Users\\david\\Documents\\GitHub\\gouda-group-project\\Old\recipes_raw\recipes_raw_nosource_ar.json'

In [None]:
df.head()

_note from Colby_:

In order to avoid using a ton of forloops, I reshaped the ingredients column a bit by using the `join` method. Basically, it combines all the elements in a list into a string separated by commas and whitespace. So this is helpful in avoiding the use of too many forloops when dealing with nested iterables.

In [None]:
# cleaning up and preparing the data

# create the Score column to track matching recipes
df["Score"] = 0

# reshape ingredients column from a list into a single string, then replace unneeded words
df["ingredients"] = df["ingredients"].str.join(', ')
df["ingredients"] = df["ingredients"].str.replace("ADVERTISEMENT", "")

# clean up row names, drop NaN values
df = df.reset_index(drop = True)
df = df.dropna()

In [None]:
df.head()

In [None]:
df["ingredients"].iloc[1]

## Implementing the Function

_note from Colby_:

I already worked out a recipe function for the smaller dataset, so I started a brand new one for the bigger dataset that runs by querying the database instead. I'm keeping both just to document, see how maybe we can combine the best aspects of both together. The second function is still a bit broken at the moment though :/
I'm not sure how to use the LIKE keyword when querying in order to grab multiple different ingredient matches, so atm the second function can only look for 1-ingredient matches.

In [None]:
def find_recipe_1(ingredients, min_score = 1):
    """
    A function that recommends a recipe to cook based on the user's available ingredients. Uses the smaller dataset.
    
    ingredients: a list of ingredients supplied as strings
    min_score: the minimum number of ingredient matches a recipe has to satisfy in order to be reccomended
    returns: a portion of the original dataframe only containing recipes that the user may want to cook
    """
    
    # reset the Score column every time the function is called
    df["Score"] = 0
    
    # iterate through list of input ingredients
    for ingr in ingredients:
      
        # increment score by 1 every time the matching ingredient name is found in a recipe
        df["Score"] += df['ingredients'].apply(lambda x: ingr in x)
  
    # return recipes in which the minimum score is satisfied
    return df[df["Score"] >= min_score]


In [None]:
def find_recipe_2(ingredients, min_score = 1):
    """
    Same as find_recipe_1, but uses the bigger dataset.
    """
    
    # ensure that the ingredients are passed as a list
    if type(ingredients) != list:
        raise TypeError("Ingredients must be contained in a list.")
     
    # create a variable to contain the WHERE statement for the SQL query
    where_statement = ""

    # Iterate accross the ingredients and add each one to the WHERE statement
    for i in ingredients:
        where_statement += f"R.ingredients LIKE '%{i}%' OR "
    
    # open up dataset, automatically close
    with sqlite3.connect("recipes1M.db") as conn:
        
        # grab ingredient matches
        query = \
        f"""
        SELECT R.title, R.ingredients, R.url
        FROM recipes R
        WHERE {where_statement[:-3]}
    
        """
        
        # query database
        df = pd.read_sql_query(query, conn)
        
    # reset the Score column every time the function is called
    df["Score"] = 0
    
    # iterate through list of input ingredients
    for ingr in ingredients: 
        # increment score by 1 every time the matching ingredient name is found in a recipe
        df["Score"] += df['ingredients'].apply(lambda x: ingr in x)
    
    # return matching recipes
    return df[df["Score"] >= min_score]

## Testing it Out

In [None]:
find_recipe_1(["chicken"]).head() # all recipes where chicken is used

In [None]:
# recipes that use 4 or more of the ingredients below
find_recipe_1(["chicken", "pesto", "pork", "linguine", "tomato", "mushroom"], min_score = 4).head()

In [None]:
# recipes that use 4 or more of the ingredients below
find_recipe_1(["oysters", "clam", "tomato", "lemon", "scallop", "fish", "squid"], min_score = 4)

In [None]:
# recipes that use 5 or more of the ingredients below
find_recipe_1(["pita", "beef", "yogurt", "cucumber", "dill"], min_score = 5)

In [None]:
# only able to match 1 ingredient
test1 = find_recipe_2(["pork"])

In [None]:
test1

In [None]:
test2 = find_recipe_2(["pita","dill"])

In [None]:
test2

In [None]:
test2['Score'].unique()

In [None]:
test3 = find_recipe_2(["pita", "beef", "yogurt", "cucumber", "dill"], min_score = 5)

In [None]:
test3