**Unfortunately due to time constraints, machine learning for recipe creation was not explored. It was not critical to the project, but would have been a very interesting to explore!**

This notebook will explore what ways to find needed ingredient suggestions. For this project machine learning isn't much of a requirement, as there is no set prediction that being made. We can make predictions on how ingredients relate to each other. Ingredients which are in similar recipies should cluster in the same areas. Ingredients which are close to each other, share many recipie items, should be paired. This will approach will use supervised learning. Other things that can be explored are finding ingredients which are too common, and should not be suggested over important, and more unique ingredients.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline
import random
import copy
import collections
import itertools
import pickle
from ipywidgets import interact
from sklearn.cluster import KMeans

In [2]:
data_combined = pickle.load(open('/data/BobbyDobo/data_combined.p', 'rb'))
flattened_rec_item = pickle.load(open('/data/BobbyDobo/flattened_rec_item.p', 'rb'))
ingredient_relations = pickle.load(open('/data/BobbyDobo/ingredient_relations.p', 'rb'))
vec_ingredient_relations = pickle.load(open( "/data/BobbyDobo/df_ingredient_relations.p", "rb"))
ingredient_counts = pickle.load(open("/data/BobbyDobo/ingredient_counts.p", "rb" ))

Using the below DataFrame, I will vectorize it, making all integers 1, and NaNs 0 is appropriate. Using sklearn, I will try to draw relationships between ingredients. Finding ingredients which lie one recipe away from the others recipies may have very important pairings.

In [3]:
vec_ingredient_relations

Unnamed: 0,( oz.) tomato sauce,( oz.) tomato paste,(10 oz.) frozen chopped spinach,"(10 oz.) frozen chopped spinach, thawed and squeezed dry",(14 oz.) sweetened condensed milk,(14.5 oz.) diced tomatoes,(15 oz.) refried beans,1% low-fat buttermilk,1% low-fat chocolate milk,1% low-fat cottage cheese,...,yuzu,yuzu juice,za'atar,zabaglione,zest,zesty italian dressing,zinfandel,ziti,zucchini,zucchini blossoms
( oz.) tomato sauce,,1,1,,,,,,,,...,,,,,,,,,,
( oz.) tomato paste,1,,,,,1,,,,,...,,,,,,,,,,
(10 oz.) frozen chopped spinach,1,,,,,,,,,,...,,,,,,,,,,
"(10 oz.) frozen chopped spinach, thawed and squeezed dry",,,,,,,,,,,...,,,,,,,,,,
(14 oz.) sweetened condensed milk,,,,,,,,,,,...,,,,,,,,,,
(14.5 oz.) diced tomatoes,,1,,,,,,,,,...,,,,,,,,,,
(15 oz.) refried beans,,,,,,,,,,,...,,,,,,,,,,
1% low-fat buttermilk,,,,,,,,,,,...,,,,,,,,,,
1% low-fat chocolate milk,,,,,,,,,,,...,,,,,,,,,,
1% low-fat cottage cheese,,,,,,,,,,,...,,,,,,,,,1,


In [15]:
df_ingredient_relations = pd.DataFrame(flattened_rec_item, columns=['id','ingredient'])

['baking powder',
 'eggs',
 'all-purpose flour',
 'raisins',
 'milk',
 'white sugar',
 'sugar',
 'egg yolks',
 'corn starch',
 'cream of tartar',
 'bananas',
 'vanilla wafers',
 'milk',
 'vanilla extract',
 'toasted pecans',
 'egg whites',
 'light rum',
 'sausage links',
 'fennel bulb',
 'fronds',
 'olive oil',
 'cuban peppers',
 'onions',
 'meat cuts',
 'file powder',
 'smoked sausage',
 'okra',
 'shrimp',
 'andouille sausage',
 'water',
 'paprika',
 'hot sauce',
 'garlic cloves',
 'browning',
 'lump crab meat',
 'vegetable oil',
 'all-purpose flour',
 'freshly ground pepper',
 'flat leaf parsley',
 'boneless chicken skinless thigh',
 'dried thyme',
 'white rice',
 'yellow onion',
 'ham',
 'ground black pepper',
 'salt',
 'sausage casings',
 'leeks',
 'parmigiano reggiano cheese',
 'cornmeal',
 'water',
 'extra-virgin olive oil',
 'baking powder',
 'all-purpose flour',
 'peach slices',
 'corn starch',
 'heavy cream',
 'lemon juice',
 'unsalted butter',
 'salt',
 'white sugar',
 'grape