# Predicting Recipes Using Machine Learning Recommendation Models


Project Description:

In this project, I focused on various related datasets on customer interaction and recipes. The data content is from Food.com, and was amassed by Maunder et al.. I retrieved the data content from Kaggle (https://www.kaggle.com/datasets/shuyangli94/food-com-recipes-and-user-interactions?select=ingr_map.pkl). Specifically I utilized the PP_recipes, PP_users, raw_rec, and raw_int datasets.

To build a content recommender system, I first created a new dataset with meaningful tokens. Many cells of the database were lists formatted as strings, so I reformatted them as lists. I also separated the nutrition column into decipherable information, since nutritional content is important when comparing recipes. To better compare these factors, I added new columns to classify the calorie, fat, sugar, sodium, and protein content as low, medium, or high respective to the other recipes. I then merged additional columns from PP_recipes and raw_rec to allow more recipe factors to be identified with the ID and name.

A combined_features column was then created to collect all the desired identity tokens for each recipe in a single column. A smaller sample was used due to computing limitations. A count vectorizer was used to make a vector for each row of the data frame based on combined features so that a cosine similarity matrix could be applied.

Next, I built a collaborative filter for the dataset. I created a new data frame with just the user ID, recipe ID and recipe rating information. Again, a smaller sample was used to speed up the time to run the code. I then created a user-item matrix with the new data frame. Ager running singular value decomposition, I reconstructed the predicted ratings. And got the specified number of recipe names from the top value ratings and IDs.

# Importing Data

Data set: https://www.kaggle.com/datasets/shuyangli94/food-com-recipes-and-user-interactions?select=ingr_map.pkl

(only need PP_recipes, raw_rec, and PP_users)


In [1]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
data_path = "/content/drive/MyDrive/DSC 101/Data/"

In [4]:
PP_recipes= pd.read_csv(data_path +'PP_recipes.csv')
PP_users= pd.read_csv(data_path +'PP_users.csv')
raw_int= pd.read_csv(data_path +'RAW_interactions.csv')
raw_rec= pd.read_csv(data_path +'RAW_recipes.csv')

#Building Content-Based Reccomender System Dataset (with meaningful features)

In [6]:
PP_recipes.head()


Unnamed: 0,id,i,name_tokens,ingredient_tokens,steps_tokens,techniques,calorie_level,ingredient_ids
0,424415,23,"[40480, 37229, 2911, 1019, 249, 6878, 6878, 28...","[[2911, 1019, 249, 6878], [1353], [6953], [153...","[40480, 40482, 21662, 481, 6878, 500, 246, 161...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...",0,"[389, 7655, 6270, 1527, 3406]"
1,146223,96900,"[40480, 18376, 7056, 246, 1531, 2032, 40481]","[[17918], [25916], [2507, 6444], [8467, 1179],...","[40480, 40482, 729, 2525, 10906, 485, 43, 8393...","[1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...",0,"[2683, 4969, 800, 5298, 840, 2499, 6632, 7022,..."
2,312329,120056,"[40480, 21044, 16954, 8294, 556, 10837, 40481]","[[5867, 24176], [1353], [6953], [1301, 11332],...","[40480, 40482, 8240, 481, 24176, 296, 1353, 66...","[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, ...",1,"[1257, 7655, 6270, 590, 5024, 1119, 4883, 6696..."
3,74301,168258,"[40480, 10025, 31156, 40481]","[[1270, 1645, 28447], [21601], [27952, 29471, ...","[40480, 40482, 5539, 21601, 1073, 903, 2324, 4...","[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",0,"[7940, 3609, 7060, 6265, 1170, 6654, 5003, 3561]"
4,76272,109030,"[40480, 17841, 252, 782, 2373, 1641, 2373, 252...","[[1430, 11434], [1430, 17027], [1615, 23, 695,...","[40480, 40482, 14046, 1430, 11434, 488, 17027,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...",0,"[3484, 6324, 7594, 243]"


In [7]:
raw_rec.head()

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients
0,arriba baked winter squash mexican style,137739,55,47892,2005-09-16,"['60-minutes-or-less', 'time-to-make', 'course...","[51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0]",11,"['make a choice and proceed with recipe', 'dep...",autumn is my favorite time of year to cook! th...,"['winter squash', 'mexican seasoning', 'mixed ...",7
1,a bit different breakfast pizza,31490,30,26278,2002-06-17,"['30-minutes-or-less', 'time-to-make', 'course...","[173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0]",9,"['preheat oven to 425 degrees f', 'press dough...",this recipe calls for the crust to be prebaked...,"['prepared pizza crust', 'sausage patty', 'egg...",6
2,all in the kitchen chili,112140,130,196586,2005-02-25,"['time-to-make', 'course', 'preparation', 'mai...","[269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0]",6,"['brown ground beef in large pot', 'add choppe...",this modified version of 'mom's' chili was a h...,"['ground beef', 'yellow onions', 'diced tomato...",13
3,alouette potatoes,59389,45,68585,2003-04-14,"['60-minutes-or-less', 'time-to-make', 'course...","[368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0]",11,['place potatoes in a large pot of lightly sal...,"this is a super easy, great tasting, make ahea...","['spreadable cheese with garlic and herbs', 'n...",11
4,amish tomato ketchup for canning,44061,190,41706,2002-10-25,"['weeknight', 'time-to-make', 'course', 'main-...","[352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0]",5,['mix all ingredients& boil for 2 1 / 2 hours ...,my dh's amish mother raised him on this recipe...,"['tomato juice', 'apple cider vinegar', 'sugar...",8


In [8]:
#ingredients formatting
  # make all ingredients 1 word to avoid token overlap
  # format it to be more readable (some undesired tokens [, ], \ etc.)

for i, row in enumerate(raw_rec['ingredients']):

  row = row.split(',')
  new_row = []

  for ingredient in row:
    ingredient = ingredient.replace(']','')
    ingredient = ingredient.replace('[','')
    ingredient = ingredient.replace('\'','')
    ingredient = ingredient.replace(' ','')
    ingredient = ingredient.replace('-','')
    new_row.append(ingredient)

  raw_rec.at[i, 'ingredients'] = new_row


In [9]:
#split nutrition information into its actual info

raw_rec['num_calories'] = np.nan
raw_rec['g_fat'] = np.nan
raw_rec['g_sugar'] = np.nan
raw_rec['mg_sodium'] = np.nan
raw_rec['g_protein'] = np.nan

for i, row in enumerate(raw_rec['nutrition']):

  row = row.split(',')
  row[0] = row[0].replace('[', '')
  raw_rec.at[i, 'num_calories'] = float(row[0])
  raw_rec.at[i, 'g_fat'] = float(row[1])
  raw_rec.at[i, 'g_sugar'] = float(row[2])
  raw_rec.at[i, 'mg_sodium'] = float(row[3])
  raw_rec.at[i, 'g_protein'] = float(row[4])

raw_rec.drop('nutrition', axis = 1, inplace=True)

In [10]:
#categorize numerical value to be used to make it easier to compare
cal_thirds = raw_rec['num_calories'].quantile([1/3, 2/3])
fat_thirds = raw_rec['g_fat'].quantile([1/3, 2/3])
sugar_thirds = raw_rec['g_sugar'].quantile([1/3, 2/3])
sodium_thirds = raw_rec['mg_sodium'].quantile([1/3, 2/3])
protein_thirds = raw_rec['g_protein'].quantile([1/3, 2/3])
steps_thirds = raw_rec['n_steps'].quantile([1/3, 2/3])
time_thirds = raw_rec['minutes'].quantile([1/3, 2/3])
ingredient_thirds = raw_rec['n_ingredients'].quantile([1/3, 2/3])


raw_rec['cal_lev'] = raw_rec['num_calories'].apply(lambda x: 'low_cal' if x <= cal_thirds[1/3] else ('med_cal' if x <= cal_thirds[2/3] else 'high_cal'))

raw_rec['fat_lev'] = raw_rec['g_fat'].apply(lambda x: 'low_fat' if x <= fat_thirds[1/3] else ('med_fat' if x <= fat_thirds[2/3] else 'high_fat'))

raw_rec['sugar_lev'] = raw_rec['g_sugar'].apply(lambda x: 'low_sugar' if x <= sugar_thirds[1/3] else ('med_sugar' if x <= sugar_thirds[2/3] else 'high_sugar'))

raw_rec['sodium_lev'] = raw_rec['mg_sodium'].apply(lambda x: 'low_sodium' if x <= sodium_thirds[1/3] else ('med_sodium' if x <= sodium_thirds[2/3] else 'high_sodium'))

raw_rec['protein_lev'] = raw_rec['g_protein'].apply(lambda x: 'low_protein' if x <= protein_thirds[1/3] else ('med_protein' if x <= protein_thirds[2/3] else 'high_protein'))

raw_rec['relative_nsteps'] = raw_rec['n_steps'].apply(lambda x: 'few_steps' if x <= steps_thirds[1/3] else ('avg_steps' if x <= steps_thirds[2/3] else 'lots_steps'))

raw_rec['relative_time'] = raw_rec['minutes'].apply(lambda x: 'slow_time' if x <= time_thirds[1/3] else ('avg_time' if x <= time_thirds[2/3] else 'long_time'))

raw_rec['relative_n_ingredients'] = raw_rec['n_ingredients'].apply(lambda x: 'low_n_ingredients' if x <= ingredient_thirds[1/3] else ('avg_n_ingredients' if x <= ingredient_thirds[2/3] else 'big_n_ingredients'))


In [11]:
PP_recipes.dtypes

id                    int64
i                     int64
name_tokens          object
ingredient_tokens    object
steps_tokens         object
techniques           object
calorie_level         int64
ingredient_ids       object
dtype: object

In [12]:
raw_rec.dtypes

name                       object
id                          int64
minutes                     int64
contributor_id              int64
submitted                  object
tags                       object
n_steps                     int64
steps                      object
description                object
ingredients                object
n_ingredients               int64
num_calories              float64
g_fat                     float64
g_sugar                   float64
mg_sodium                 float64
g_protein                 float64
cal_lev                    object
fat_lev                    object
sugar_lev                  object
sodium_lev                 object
protein_lev                object
relative_nsteps            object
relative_time              object
relative_n_ingredients     object
dtype: object

In [13]:
#use tokenized steps from PP_recipeis instead of the recipes - make it easier to compare
raw_rec = raw_rec.merge(PP_recipes[['id', 'steps_tokens']], how='left', on = 'id')

In [14]:
raw_rec

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,n_steps,steps,description,ingredients,...,g_protein,cal_lev,fat_lev,sugar_lev,sodium_lev,protein_lev,relative_nsteps,relative_time,relative_n_ingredients,steps_tokens
0,arriba baked winter squash mexican style,137739,55,47892,2005-09-16,"['60-minutes-or-less', 'time-to-make', 'course...",11,"['make a choice and proceed with recipe', 'dep...",autumn is my favorite time of year to cook! th...,"[wintersquash, mexicanseasoning, mixedspice, h...",...,2.0,low_cal,low_fat,low_sugar,low_sodium,low_protein,avg_steps,avg_time,low_n_ingredients,"[40480, 40482, 925, 246, 2650, 488, 10744, 556..."
1,a bit different breakfast pizza,31490,30,26278,2002-06-17,"['30-minutes-or-less', 'time-to-make', 'course...",9,"['preheat oven to 425 degrees f', 'press dough...",this recipe calls for the crust to be prebaked...,"[preparedpizzacrust, sausagepatty, eggs, milk,...",...,22.0,low_cal,med_fat,low_sugar,med_sodium,med_protein,avg_steps,avg_time,low_n_ingredients,"[40480, 40482, 729, 2525, 10906, 485, 44, 1035..."
2,all in the kitchen chili,112140,130,196586,2005-02-25,"['time-to-make', 'course', 'preparation', 'mai...",6,"['brown ground beef in large pot', 'add choppe...",this modified version of 'mom's' chili was a h...,"[groundbeef, yellowonions, dicedtomatoes, toma...",...,39.0,med_cal,med_fat,med_sugar,high_sodium,high_protein,few_steps,long_time,big_n_ingredients,
3,alouette potatoes,59389,45,68585,2003-04-14,"['60-minutes-or-less', 'time-to-make', 'course...",11,['place potatoes in a large pot of lightly sal...,"this is a super easy, great tasting, make ahea...","[spreadablecheesewithgarlicandherbs, newpotato...",...,14.0,med_cal,med_fat,low_sugar,low_sodium,med_protein,avg_steps,avg_time,big_n_ingredients,"[40480, 40482, 1082, 10837, 500, 246, 1719, 50..."
4,amish tomato ketchup for canning,44061,190,41706,2002-10-25,"['weeknight', 'time-to-make', 'course', 'main-...",5,['mix all ingredients& boil for 2 1 / 2 hours ...,my dh's amish mother raised him on this recipe...,"[tomatojuice, applecidervinegar, sugar, salt, ...",...,3.0,med_cal,low_fat,high_sugar,med_sodium,low_protein,few_steps,long_time,avg_n_ingredients,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
231632,zydeco soup,486161,60,227978,2012-08-29,"['ham', '60-minutes-or-less', 'time-to-make', ...",7,"['heat oil in a 4-quart dutch oven', 'add cele...",this is a delicious soup that i originally fou...,"[celery, onion, greensweetpepper, garliccloves...",...,44.0,med_cal,med_fat,med_sugar,high_sodium,high_protein,few_steps,long_time,big_n_ingredients,
231633,zydeco spice mix,493372,5,1500678,2013-01-09,"['15-minutes-or-less', 'time-to-make', 'course...",1,['mix all ingredients together thoroughly'],this spice mix will make your taste buds dance!,"[paprika, salt, garlicpowder, onionpowder, dri...",...,1.0,low_cal,low_fat,low_sugar,high_sodium,low_protein,few_steps,slow_time,big_n_ingredients,
231634,zydeco ya ya deviled eggs,308080,40,37779,2008-06-07,"['60-minutes-or-less', 'time-to-make', 'course...",7,"['in a bowl , combine the mashed yolks and may...","deviled eggs, cajun-style","[hardcookedeggs, mayonnaise, dijonmustard, sal...",...,6.0,low_cal,low_fat,low_sugar,low_sodium,low_protein,few_steps,avg_time,avg_n_ingredients,"[40480, 40482, 500, 246, 5024, 240, 23667, 481..."
231635,cookies by design cookies on a stick,298512,29,506822,2008-04-15,"['30-minutes-or-less', 'time-to-make', 'course...",9,['place melted butter in a large mixing bowl a...,"i've heard of the 'cookies by design' company,...","[butter, eaglebrandcondensedmilk, lightbrownsu...",...,7.0,low_cal,low_fat,high_sugar,med_sodium,low_protein,avg_steps,avg_time,avg_n_ingredients,


In [15]:
# since we are using step tokens and do not want it to get confused with creator we
#can drop contributor ID if too many features
for i, val in enumerate(raw_rec['contributor_id']):
  raw_rec.at[i, 'contributor_id'] = 'creator{}'.format(val)


In [16]:
raw_rec.columns

Index(['name', 'id', 'minutes', 'contributor_id', 'submitted', 'tags',
       'n_steps', 'steps', 'description', 'ingredients', 'n_ingredients',
       'num_calories', 'g_fat', 'g_sugar', 'mg_sodium', 'g_protein', 'cal_lev',
       'fat_lev', 'sugar_lev', 'sodium_lev', 'protein_lev', 'relative_nsteps',
       'relative_time', 'relative_n_ingredients', 'steps_tokens'],
      dtype='object')

In [17]:
raw_rec[raw_rec['id'] == 298509]['name'].values

array(['cookies by design   sugar shortbread cookies'], dtype=object)

In [18]:
#all features we want to use in the reccomendation system
features = ['name', 'id','contributor_id', 'tags','description',
            'ingredients','steps_tokens', 'cal_lev','fat_lev',
            'sugar_lev', 'sodium_lev', 'protein_lev', 'relative_nsteps','relative_time',
            'relative_n_ingredients']

In [19]:
df = raw_rec[features]
for feature in features:
    df[feature] = df[feature].fillna('')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[feature] = df[feature].fillna('')


In [20]:
df.head()

Unnamed: 0,name,id,contributor_id,tags,description,ingredients,steps_tokens,cal_lev,fat_lev,sugar_lev,sodium_lev,protein_lev,relative_nsteps,relative_time,relative_n_ingredients
0,arriba baked winter squash mexican style,137739,creator47892,"['60-minutes-or-less', 'time-to-make', 'course...",autumn is my favorite time of year to cook! th...,"[wintersquash, mexicanseasoning, mixedspice, h...","[40480, 40482, 925, 246, 2650, 488, 10744, 556...",low_cal,low_fat,low_sugar,low_sodium,low_protein,avg_steps,avg_time,low_n_ingredients
1,a bit different breakfast pizza,31490,creator26278,"['30-minutes-or-less', 'time-to-make', 'course...",this recipe calls for the crust to be prebaked...,"[preparedpizzacrust, sausagepatty, eggs, milk,...","[40480, 40482, 729, 2525, 10906, 485, 44, 1035...",low_cal,med_fat,low_sugar,med_sodium,med_protein,avg_steps,avg_time,low_n_ingredients
2,all in the kitchen chili,112140,creator196586,"['time-to-make', 'course', 'preparation', 'mai...",this modified version of 'mom's' chili was a h...,"[groundbeef, yellowonions, dicedtomatoes, toma...",,med_cal,med_fat,med_sugar,high_sodium,high_protein,few_steps,long_time,big_n_ingredients
3,alouette potatoes,59389,creator68585,"['60-minutes-or-less', 'time-to-make', 'course...","this is a super easy, great tasting, make ahea...","[spreadablecheesewithgarlicandherbs, newpotato...","[40480, 40482, 1082, 10837, 500, 246, 1719, 50...",med_cal,med_fat,low_sugar,low_sodium,med_protein,avg_steps,avg_time,big_n_ingredients
4,amish tomato ketchup for canning,44061,creator41706,"['weeknight', 'time-to-make', 'course', 'main-...",my dh's amish mother raised him on this recipe...,"[tomatojuice, applecidervinegar, sugar, salt, ...",,med_cal,low_fat,high_sugar,med_sodium,low_protein,few_steps,long_time,avg_n_ingredients


#Content Based Reccomendation System

In [21]:
# concatenate all relevant features
def combined_features(row):
    return (row['contributor_id'] +
            " " + str(row['tags'][1:-1]) +
            " " + row['description'] +
            " " + str(row['ingredients'][1:-1])+
            " " +  row['cal_lev'] +
            " " + row['fat_lev'] +
            " " + row['sugar_lev'] +
            " " + row['sodium_lev'] +
            " " + row['protein_lev'] +
            " " + row['relative_nsteps'] +
            " " + row['relative_time'] +
            " " + row['relative_n_ingredients']+
            " " + str(row['steps_tokens'][1:-1]))

In [22]:
# add new column of combined features to df
df["combined_features"] = df.apply(combined_features, axis =1)
df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["combined_features"] = df.apply(combined_features, axis =1)


Unnamed: 0,name,id,contributor_id,tags,description,ingredients,steps_tokens,cal_lev,fat_lev,sugar_lev,sodium_lev,protein_lev,relative_nsteps,relative_time,relative_n_ingredients,combined_features
0,arriba baked winter squash mexican style,137739,creator47892,"['60-minutes-or-less', 'time-to-make', 'course...",autumn is my favorite time of year to cook! th...,"[wintersquash, mexicanseasoning, mixedspice, h...","[40480, 40482, 925, 246, 2650, 488, 10744, 556...",low_cal,low_fat,low_sugar,low_sodium,low_protein,avg_steps,avg_time,low_n_ingredients,"creator47892 '60-minutes-or-less', 'time-to-ma..."
1,a bit different breakfast pizza,31490,creator26278,"['30-minutes-or-less', 'time-to-make', 'course...",this recipe calls for the crust to be prebaked...,"[preparedpizzacrust, sausagepatty, eggs, milk,...","[40480, 40482, 729, 2525, 10906, 485, 44, 1035...",low_cal,med_fat,low_sugar,med_sodium,med_protein,avg_steps,avg_time,low_n_ingredients,"creator26278 '30-minutes-or-less', 'time-to-ma..."
2,all in the kitchen chili,112140,creator196586,"['time-to-make', 'course', 'preparation', 'mai...",this modified version of 'mom's' chili was a h...,"[groundbeef, yellowonions, dicedtomatoes, toma...",,med_cal,med_fat,med_sugar,high_sodium,high_protein,few_steps,long_time,big_n_ingredients,"creator196586 'time-to-make', 'course', 'prepa..."
3,alouette potatoes,59389,creator68585,"['60-minutes-or-less', 'time-to-make', 'course...","this is a super easy, great tasting, make ahea...","[spreadablecheesewithgarlicandherbs, newpotato...","[40480, 40482, 1082, 10837, 500, 246, 1719, 50...",med_cal,med_fat,low_sugar,low_sodium,med_protein,avg_steps,avg_time,big_n_ingredients,"creator68585 '60-minutes-or-less', 'time-to-ma..."
4,amish tomato ketchup for canning,44061,creator41706,"['weeknight', 'time-to-make', 'course', 'main-...",my dh's amish mother raised him on this recipe...,"[tomatojuice, applecidervinegar, sugar, salt, ...",,med_cal,low_fat,high_sugar,med_sodium,low_protein,few_steps,long_time,avg_n_ingredients,"creator41706 'weeknight', 'time-to-make', 'cou..."


In [23]:
df['combined_features'].head(10)

0    creator47892 '60-minutes-or-less', 'time-to-ma...
1    creator26278 '30-minutes-or-less', 'time-to-ma...
2    creator196586 'time-to-make', 'course', 'prepa...
3    creator68585 '60-minutes-or-less', 'time-to-ma...
4    creator41706 'weeknight', 'time-to-make', 'cou...
5    creator1533 '15-minutes-or-less', 'time-to-mak...
6    creator21730 '15-minutes-or-less', 'time-to-ma...
7    creator10404 'weeknight', 'time-to-make', 'cou...
8    creator102353 'weeknight', 'time-to-make', 'co...
9    creator15892 'weeknight', 'time-to-make', 'cou...
Name: combined_features, dtype: object

In [24]:
# must use a smaller df bc of computational limitation
df_small=df.sample(n=10000)

In [25]:
df_small = df_small.reset_index(drop = True)

In [26]:
df_small.head(30)

Unnamed: 0,name,id,contributor_id,tags,description,ingredients,steps_tokens,cal_lev,fat_lev,sugar_lev,sodium_lev,protein_lev,relative_nsteps,relative_time,relative_n_ingredients,combined_features
0,sardines melt,181469,creator322326,"['15-minutes-or-less', 'time-to-make', 'course...",my dad was away last night and my mum told me ...,"[bread, sardinesintomatosauce, cheese]",,high_cal,high_fat,low_sugar,high_sodium,high_protein,few_steps,slow_time,low_n_ingredients,"creator322326 '15-minutes-or-less', 'time-to-m..."
1,sauteed chicken with roasted lemons,309942,creator83093,"['60-minutes-or-less', 'time-to-make', 'course...",a chicken picatta-like dish jazzed up by the ...,"[extravirginoliveoil, lemons, koshersalt, fres...","[40480, 40482, 729, 2525, 10906, 485, 43, 50, ...",high_cal,high_fat,low_sugar,high_sodium,high_protein,lots_steps,avg_time,big_n_ingredients,"creator83093 '60-minutes-or-less', 'time-to-ma..."
2,danish cream cheese filling,54153,creator154044,"['15-minutes-or-less', 'time-to-make', 'course...",,"[creamcheese, sugar, eggyolk]",,low_cal,med_fat,med_sugar,low_sodium,low_protein,few_steps,slow_time,low_n_ingredients,"creator154044 '15-minutes-or-less', 'time-to-m..."
3,leeks with olives and feta,138277,creator197023,"['30-minutes-or-less', 'time-to-make', 'course...",a highly flavoursome low calorie vegetable dis...,"[romatomatoes, leeks, garliccloves, oliveoil, ...",,med_cal,med_fat,med_sugar,med_sodium,med_protein,lots_steps,slow_time,big_n_ingredients,"creator197023 '30-minutes-or-less', 'time-to-m..."
4,white bean pancetta salad,65318,creator24386,"['30-minutes-or-less', 'time-to-make', 'course...",a wonderful side dish accompaniment to any mea...,"[oliveoil, pancetta, onion, freshrosemary, red...","[40480, 40482, 2525, 10444, 6020, 500, 246, 48...",high_cal,med_fat,low_sugar,high_sodium,high_protein,avg_steps,avg_time,big_n_ingredients,"creator24386 '30-minutes-or-less', 'time-to-ma..."
5,mushroom meatloaf,240005,creator535440,"['time-to-make', 'course', 'main-ingredient', ...",trust me this isn't an ordinary meat loaf. my ...,"[groundbeef, milk, eggs, bread, parsleyflakes,...",,high_cal,high_fat,low_sugar,med_sodium,high_protein,avg_steps,long_time,big_n_ingredients,"creator535440 'time-to-make', 'course', 'main-..."
6,chipotle chicken tostadas,471858,creator1742729,"['weeknight', '30-minutes-or-less', 'time-to-m...","recipe by julie gutierrez, as published in feb...","[bonelessskinlesschickenthighs, vegetableoil, ...","[40480, 40482, 1082, 5867, 500, 246, 2433, 855...",high_cal,high_fat,med_sugar,low_sodium,high_protein,avg_steps,avg_time,big_n_ingredients,"creator1742729 'weeknight', '30-minutes-or-les..."
7,tiramisu martini,263628,creator131674,"['15-minutes-or-less', 'time-to-make', 'course...",love the taste of tiramisu? this is a drink t...,"[amaretto, cremedenoyaux, chocolatevodka, vodka]","[40480, 40482, 8240, 589, 16126, 666, 13019, 3...",low_cal,low_fat,low_sugar,low_sodium,low_protein,few_steps,slow_time,low_n_ingredients,"creator131674 '15-minutes-or-less', 'time-to-m..."
8,sun dried tomato pesto,19462,creator27643,"['30-minutes-or-less', 'time-to-make', 'course...",choose your pasta and toss with this teriffic ...,"[sundriedtomato, water, oliveoil, pinenuts, th...","[40480, 40482, 15207, 19811, 500, 2107, 1353, ...",med_cal,high_fat,low_sugar,med_sodium,low_protein,few_steps,slow_time,avg_n_ingredients,"creator27643 '30-minutes-or-less', 'time-to-ma..."
9,christmas eve punch,30215,creator37779,"['15-minutes-or-less', 'time-to-make', 'course...",southern living; a tradition to serve this dur...,"[cranberryjuice, unsweetenedpineapplejuice, or...","[40480, 40482, 23667, 929, 287, 16126, 40478, ...",high_cal,low_fat,high_sugar,low_sodium,low_protein,few_steps,slow_time,low_n_ingredients,"creator37779 '15-minutes-or-less', 'time-to-ma..."


In [27]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [28]:
#use countvectorizer to make vector of each row  of df based on combined features
count_vector = CountVectorizer()
count_matrix = count_vector.fit_transform(df_small['combined_features'])

In [29]:
count_vector.get_feature_names_out()

array(['00', '000', '01', ..., 'zwt7', 'zwt8', 'zwt9'], dtype=object)

In [30]:
count_vector.get_feature_names_out().shape

(29609,)

In [54]:
# determines how similar different vectors are to one another
cosine_sim = cosine_similarity(count_matrix)
cosine_sim

array([[1.        , 0.07227473, 0.37463432, ..., 0.11588695, 0.35083436,
        0.1633808 ],
       [0.07227473, 1.        , 0.06092234, ..., 0.66868398, 0.11194301,
        0.63788146],
       [0.37463432, 0.06092234, 1.        , ..., 0.14147376, 0.34501581,
        0.14375839],
       ...,
       [0.11588695, 0.66868398, 0.14147376, ..., 1.        , 0.21077341,
        0.58594827],
       [0.35083436, 0.11194301, 0.34501581, ..., 0.21077341, 1.        ,
        0.14614441],
       [0.1633808 , 0.63788146, 0.14375839, ..., 0.58594827, 0.14614441,
        1.        ]])

In [94]:
# must change every time to reflect new randomly generated df_small
recipe_used = "sauteed chicken with roasted lemons"

In [95]:
df_small.head(20)

Unnamed: 0,name,id,contributor_id,tags,description,ingredients,steps_tokens,cal_lev,fat_lev,sugar_lev,sodium_lev,protein_lev,relative_nsteps,relative_time,relative_n_ingredients,combined_features
0,sardines melt,181469,creator322326,"['15-minutes-or-less', 'time-to-make', 'course...",my dad was away last night and my mum told me ...,"[bread, sardinesintomatosauce, cheese]",,high_cal,high_fat,low_sugar,high_sodium,high_protein,few_steps,slow_time,low_n_ingredients,"creator322326 '15-minutes-or-less', 'time-to-m..."
1,sauteed chicken with roasted lemons,309942,creator83093,"['60-minutes-or-less', 'time-to-make', 'course...",a chicken picatta-like dish jazzed up by the ...,"[extravirginoliveoil, lemons, koshersalt, fres...","[40480, 40482, 729, 2525, 10906, 485, 43, 50, ...",high_cal,high_fat,low_sugar,high_sodium,high_protein,lots_steps,avg_time,big_n_ingredients,"creator83093 '60-minutes-or-less', 'time-to-ma..."
2,danish cream cheese filling,54153,creator154044,"['15-minutes-or-less', 'time-to-make', 'course...",,"[creamcheese, sugar, eggyolk]",,low_cal,med_fat,med_sugar,low_sodium,low_protein,few_steps,slow_time,low_n_ingredients,"creator154044 '15-minutes-or-less', 'time-to-m..."
3,leeks with olives and feta,138277,creator197023,"['30-minutes-or-less', 'time-to-make', 'course...",a highly flavoursome low calorie vegetable dis...,"[romatomatoes, leeks, garliccloves, oliveoil, ...",,med_cal,med_fat,med_sugar,med_sodium,med_protein,lots_steps,slow_time,big_n_ingredients,"creator197023 '30-minutes-or-less', 'time-to-m..."
4,white bean pancetta salad,65318,creator24386,"['30-minutes-or-less', 'time-to-make', 'course...",a wonderful side dish accompaniment to any mea...,"[oliveoil, pancetta, onion, freshrosemary, red...","[40480, 40482, 2525, 10444, 6020, 500, 246, 48...",high_cal,med_fat,low_sugar,high_sodium,high_protein,avg_steps,avg_time,big_n_ingredients,"creator24386 '30-minutes-or-less', 'time-to-ma..."
5,mushroom meatloaf,240005,creator535440,"['time-to-make', 'course', 'main-ingredient', ...",trust me this isn't an ordinary meat loaf. my ...,"[groundbeef, milk, eggs, bread, parsleyflakes,...",,high_cal,high_fat,low_sugar,med_sodium,high_protein,avg_steps,long_time,big_n_ingredients,"creator535440 'time-to-make', 'course', 'main-..."
6,chipotle chicken tostadas,471858,creator1742729,"['weeknight', '30-minutes-or-less', 'time-to-m...","recipe by julie gutierrez, as published in feb...","[bonelessskinlesschickenthighs, vegetableoil, ...","[40480, 40482, 1082, 5867, 500, 246, 2433, 855...",high_cal,high_fat,med_sugar,low_sodium,high_protein,avg_steps,avg_time,big_n_ingredients,"creator1742729 'weeknight', '30-minutes-or-les..."
7,tiramisu martini,263628,creator131674,"['15-minutes-or-less', 'time-to-make', 'course...",love the taste of tiramisu? this is a drink t...,"[amaretto, cremedenoyaux, chocolatevodka, vodka]","[40480, 40482, 8240, 589, 16126, 666, 13019, 3...",low_cal,low_fat,low_sugar,low_sodium,low_protein,few_steps,slow_time,low_n_ingredients,"creator131674 '15-minutes-or-less', 'time-to-m..."
8,sun dried tomato pesto,19462,creator27643,"['30-minutes-or-less', 'time-to-make', 'course...",choose your pasta and toss with this teriffic ...,"[sundriedtomato, water, oliveoil, pinenuts, th...","[40480, 40482, 15207, 19811, 500, 2107, 1353, ...",med_cal,high_fat,low_sugar,med_sodium,low_protein,few_steps,slow_time,avg_n_ingredients,"creator27643 '30-minutes-or-less', 'time-to-ma..."
9,christmas eve punch,30215,creator37779,"['15-minutes-or-less', 'time-to-make', 'course...",southern living; a tradition to serve this dur...,"[cranberryjuice, unsweetenedpineapplejuice, or...","[40480, 40482, 23667, 929, 287, 16126, 40478, ...",high_cal,low_fat,high_sugar,low_sodium,low_protein,few_steps,slow_time,low_n_ingredients,"creator37779 '15-minutes-or-less', 'time-to-ma..."


In [96]:
def get_index_from_name(name):
    return df_small[df_small.name == name].index.values[0]

# getting index for the given recipe name
recipe_index = get_index_from_name(recipe_used)

In [97]:
recipe_index

1

In [98]:
# generating a list of similar recipes to input recipe
similar_recipes = list(enumerate(cosine_sim[recipe_index]))

In [99]:
# displays the first 10 most similar recipes
similar_recipes[0:10]

[(0, 0.07227473249505302),
 (1, 1.0000000000000016),
 (2, 0.06092234009165616),
 (3, 0.08382697714836389),
 (4, 0.6624923285833131),
 (5, 0.08743195506609813),
 (6, 0.7689098008974675),
 (7, 0.391074015766424),
 (8, 0.53611404605378),
 (9, 0.30150772112239604)]

In [100]:
#sorted these recipes from most similar to least similar
sorted_similar_recipes = sorted(similar_recipes,
                               key=lambda x:x[1],
                               reverse=True)

In [101]:
sorted_similar_recipes[0:10]

[(1, 1.0000000000000016),
 (8732, 0.8833885093886866),
 (1201, 0.8761154604569866),
 (6869, 0.8737498854492612),
 (6197, 0.8594991779308698),
 (6083, 0.845970882004788),
 (2030, 0.843965269398392),
 (110, 0.8437620005623595),
 (6463, 0.843631752136883),
 (5058, 0.8435033466333468)]

In [102]:
# function to get the name of the recipe when we display the output
def get_name_from_index(index):
    return df[df.index == index]["name"].values[0]

In [103]:
# function to print a list of names of similar recipes going from most similar to least similar
print("Your next recipe based on what you chose ("+ recipe_used+ ") may be")
print("**********************************************************")
i=0
for recipe in sorted_similar_recipes[1:]:

    print(get_name_from_index(recipe[0]))
    i=i+1
    if i>10:
        break
print("************************************************************")

Your next recipe based on what you chose (sauteed chicken with roasted lemons) may be
**********************************************************
artichoke and broccoli frittata   crustless quiche
3 bean casserole  bourbon beans
apple juice roast
apple and pomegranate tart tartin
apple  custard  dessert
a pie to try
1 asian noodle salad
apple cider
ampiainen
5 things  hot mexican green chile dip
lite  tortillas
************************************************************


#Colaborative Filter

In [105]:
raw_int.head()

Unnamed: 0,user_id,recipe_id,date,rating,review
0,38094,40893,2003-02-17,4,Great with a salad. Cooked on top of stove for...
1,1293707,40893,2011-12-21,5,"So simple, so delicious! Great for chilly fall..."
2,8937,44394,2002-12-01,4,This worked very well and is EASY. I used not...
3,126440,85009,2010-02-27,5,I made the Mexican topping and took it to bunk...
4,57222,85009,2011-10-01,5,"Made the cheddar bacon topping, adding a sprin..."


In [106]:
df = raw_int[['user_id','recipe_id','rating']]
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1132367 entries, 0 to 1132366
Data columns (total 3 columns):
 #   Column     Non-Null Count    Dtype
---  ------     --------------    -----
 0   user_id    1132367 non-null  int64
 1   recipe_id  1132367 non-null  int64
 2   rating     1132367 non-null  int64
dtypes: int64(3)
memory usage: 25.9 MB


In [107]:
df.head()

Unnamed: 0,user_id,recipe_id,rating
0,38094,40893,4
1,1293707,40893,5
2,8937,44394,4
3,126440,85009,5
4,57222,85009,5


In [108]:
df_small = df.sample(n=20000)

In [109]:
#creating a user-item rating matrix ***
user_ratings = df_small.pivot(index='user_id',
                              columns='recipe_id',
                              values='rating').fillna(0)
user_ratings

recipe_id,59,63,120,136,170,172,203,210,232,246,...,527834,529720,531255,531537,534264,535831,536384,536465,536568,536575
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1533,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1535,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1634,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2178,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2310,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2002364298,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2002368926,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2002369315,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2002369480,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [110]:
#get individual mean rating
indiv_rating_mean = user_ratings.mean(axis=1)

# get normalized by subtracting individual mean rating
# indiv_rating_norm = user_ratings.sub(indiv_rating_mean, axis=0)

# convert dataframe to numpy array
# indiv_rating_array = indiv_rating_norm.values

indiv_rating_array= user_ratings.values

In [111]:
from scipy.sparse.linalg import svds

In [112]:
# perform singular value decomposition (SVD)
# U represents the user-feature matrix. Σ represents a diagonal matrix with singular values. V represents the item-feature matrix
# k = determines model complexity and degree of dimensionality reduction
U, sigma, Vt = svds(indiv_rating_array, k=50)

In [113]:
# convert sigma to a diagonal matrix
sigma_diag_matrix = np.diag(sigma)

In [114]:
# reconstruct the predicted ratings
predicted_ratings = np.dot(np.dot(U, sigma_diag_matrix), Vt) + indiv_rating_mean.values.reshape(-1, 1)
predicted_ratings

array([[ 4.42561801e-04,  4.42561801e-04,  4.42561801e-04, ...,
         4.42561801e-04,  4.42561801e-04,  4.42561801e-04],
       [ 5.18429538e-03,  5.18429538e-03,  5.18429538e-03, ...,
         5.18429538e-03,  5.18429538e-03,  5.18429538e-03],
       [ 3.16115572e-04,  3.16115572e-04,  3.16115572e-04, ...,
         3.16115572e-04,  3.16115572e-04,  3.16115572e-04],
       ...,
       [ 3.16115572e-04,  3.16115572e-04,  3.16115572e-04, ...,
         3.16115572e-04,  3.16115572e-04,  3.16115572e-04],
       [ 5.34616811e-33, -7.13651548e-34,  1.23075985e-33, ...,
        -5.64762872e-34,  1.75184340e-33,  1.76088941e-33],
       [ 3.16115572e-04,  3.16115572e-04,  3.16115572e-04, ...,
         3.16115572e-04,  3.16115572e-04,  3.16115572e-04]])

In [115]:
# make df out of calculated pred ratings
predicted_ratings_df = pd.DataFrame(predicted_ratings,
                                    columns=user_ratings.columns,
                                    index=user_ratings.index)
predicted_ratings_df

recipe_id,59,63,120,136,170,172,203,210,232,246,...,527834,529720,531255,531537,534264,535831,536384,536465,536568,536575
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1533,4.425618e-04,4.425618e-04,4.425618e-04,4.425618e-04,4.425618e-04,4.425618e-04,4.425618e-04,4.425618e-04,4.425618e-04,4.425618e-04,...,4.425618e-04,4.425618e-04,0.000443,4.425618e-04,0.000443,4.425618e-04,4.425618e-04,4.425618e-04,4.425618e-04,4.425618e-04
1535,5.184295e-03,5.184295e-03,5.184295e-03,5.184295e-03,5.184295e-03,5.184295e-03,5.184295e-03,5.184295e-03,5.184295e-03,5.102859e-03,...,5.184295e-03,5.184295e-03,0.005184,5.184295e-03,0.005184,5.184316e-03,5.184295e-03,5.184295e-03,5.184295e-03,5.184295e-03
1634,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,...,3.161156e-04,3.161156e-04,0.000316,3.161156e-04,0.000316,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04
2178,2.528925e-04,2.528925e-04,2.528925e-04,2.528925e-04,2.528925e-04,2.528925e-04,2.528925e-04,2.528925e-04,2.528925e-04,2.528925e-04,...,2.528925e-04,2.528925e-04,0.000253,2.528925e-04,0.000253,2.528925e-04,2.528925e-04,2.528925e-04,2.528925e-04,2.528925e-04
2310,2.339255e-03,2.339255e-03,2.339255e-03,2.339255e-03,2.339255e-03,2.339255e-03,2.339255e-03,2.339255e-03,2.339255e-03,2.338619e-03,...,2.339255e-03,2.339255e-03,0.002339,2.339255e-03,0.002339,2.339252e-03,2.339255e-03,2.339255e-03,2.339255e-03,2.339255e-03
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2002364298,1.587526e-34,-2.538887e-35,4.713274e-35,2.395872e-35,-5.726014e-35,-1.012388e-35,9.417584e-36,-3.895514e-35,6.351309e-36,-2.121952e-19,...,4.785354e-19,-5.439267e-35,0.000000,4.269288e-35,0.000000,-3.961039e-23,4.785354e-19,-1.722273e-35,5.313073e-35,5.356415e-35
2002368926,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,...,3.161156e-04,3.161156e-04,0.000316,3.161156e-04,0.000316,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04
2002369315,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,...,3.161156e-04,3.161156e-04,0.000316,3.161156e-04,0.000316,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04,3.161156e-04
2002369480,5.346168e-33,-7.136515e-34,1.230760e-33,7.830194e-34,-2.077742e-33,-3.327674e-34,3.816570e-34,-1.375221e-33,2.185266e-34,-6.963702e-18,...,1.780187e-17,-1.866933e-33,0.000000,1.665881e-33,0.000000,-1.220140e-21,1.780187e-17,-5.647629e-34,1.751843e-33,1.760889e-33


In [116]:
user_id = 2310 #can choose whatever
#get the predicted ratings for a given user
user_predicted_ratings = predicted_ratings_df.loc[user_id]
user_predicted_ratings

recipe_id
59        0.002339
63        0.002339
120       0.002339
136       0.002339
170       0.002339
            ...   
535831    0.002339
536384    0.002339
536465    0.002339
536568    0.002339
536575    0.002339
Name: 2310, Length: 15817, dtype: float64

In [117]:
# get unrated recipes from the user_ratings

unrated_recipes = user_ratings.loc[user_id][user_ratings.loc[user_id] == 0].index
unrated_recipes

Int64Index([    59,     63,    120,    136,    170,    172,    203,    210,
               232,    246,
            ...
            527834, 529720, 531255, 531537, 534264, 535831, 536384, 536465,
            536568, 536575],
           dtype='int64', name='recipe_id', length=15809)

In [118]:
# organize the user predicted ratings high to low

predict_top = user_predicted_ratings[unrated_recipes].sort_values(ascending=False).index
print(predict_top)

Int64Index([124908, 453523, 250284, 423995, 109716, 359058,  71192, 276788,
            277218, 102531,
            ...
            423997,  95007, 294908, 166903, 468456, 335597, 144224, 241322,
            307974, 171623],
           dtype='int64', name='recipe_id', length=15809)


In [119]:
predicted_ratings_df.iloc[3]

recipe_id
59        0.000253
63        0.000253
120       0.000253
136       0.000253
170       0.000253
            ...   
535831    0.000253
536384    0.000253
536465    0.000253
536568    0.000253
536575    0.000253
Name: 2178, Length: 15817, dtype: float64

In [120]:
unrated_recipes=user_ratings.iloc[3][user_ratings.loc[user_id] == 0].index

In [121]:
top_rated_recipes=user_predicted_ratings[unrated_recipes].sort_values(ascending=False).index

In [122]:
recommended_recipes = top_rated_recipes[:5]

In [123]:
for id in recommended_recipes:
      recipe_name = raw_rec.loc[(raw_rec['id'] == id)]
      print(recipe_name)

                                          name      id  minutes  \
106322  homemade baked chips  tortilla or pita  124908       11   

       contributor_id   submitted  \
106322  creator203823  2005-06-06   

                                                     tags  n_steps  \
106322  ['15-minutes-or-less', 'time-to-make', 'course...        5   

                                                    steps  \
106322  ['cut tortillas into 8 wedges', 'brush or spra...   

                                              description  \
106322  great with salsa, guacamole, hummus, whatever ...   

                                   ingredients  ...  g_protein  cal_lev  \
106322  [flourtortillas, vegetableoil, spices]  ...        0.0  low_cal   

        fat_lev  sugar_lev  sodium_lev  protein_lev relative_nsteps  \
106322  low_fat  low_sugar  low_sodium  low_protein       few_steps   

       relative_time relative_n_ingredients steps_tokens  
106322     slow_time      low_n_ingredients       

In [127]:
##PUTTING IT ALL IN A SUBROUTINE
def get_recipe_recommendations_svd(user_id, num_recommendations=5):

    #get the user's predicted ratings
    user_predicted_ratings = predicted_ratings_df.iloc[user_id]

    #find recipes index that the user has not already rated(give recipe id)
    unrated_recipes = user_ratings.iloc[user_id][user_ratings.loc[user_id] == 0].index


    #get the predicted ratings for those unrated recipe and sort them in descending order
    #specifically recipe ids
    top_rated_recipes = user_predicted_ratings[unrated_recipes].sort_values(ascending=False).index

    #choose the top k recipe id for recommendation indices)
    recommended_recipes = top_rated_recipes[:num_recommendations]

    print(f"Recommended recipe for user {user_id}:")

    for id in recommended_recipes:
      recipe_name = raw_rec.loc[raw_rec['id'] == id]['name'].values

      print(f"Recipe Name: {recipe_name}")

    return print(f"\n Enjoy eating your recommended recipes!!!")



In [128]:
#testing 5 recipe recommendations on user 1533
get_recipe_recommendations_svd(1533, num_recommendations=5)

Recommended recipe for user 1533:
Recipe Name: ['taco cheesecake']
Recipe Name: ['amish apple crisp']
Recipe Name: ['onion parmesan roasted red potatoes']
Recipe Name: ['fruit salad  the healthy summer dessert']
Recipe Name: ['no bake rice krispies peanut butter granola bars  lower fat']

 Enjoy eating your recommended recipes!!!


In [129]:
#testing 10 recommendations on user 2310
get_recipe_recommendations_svd(2310, num_recommendations=10)

Recommended recipe for user 2310:
Recipe Name: ['to die for crock pot roast']
Recipe Name: ['healthy cucumber tomato salad']
Recipe Name: ['instant fruit ice cream']
Recipe Name: ['broccoli salad with gouda']
Recipe Name: ['best ever banana cake with cream cheese frosting']
Recipe Name: ['charred heirloom tomatoes with fresh herbs']
Recipe Name: ['taste of summer salad']
Recipe Name: ['vanilla moomoo mocha']
Recipe Name: ['seckel pear open faced sandwich w bleu cheese and walnuts for 1']
Recipe Name: ['stuffed picnic loaf']

 Enjoy eating your recommended recipes!!!
