# Index

## [Preparation](#Preparation)
## [Section 1: Exploring and Cleaning](#Section1)
## [Section 2: Tokenize](#Section2)
## [Section 3: Vectorize / Tokenize](#Section3)
## [Section 4: Recommender](#Section4)
## [Section 5: Conclusion](#Section5)


------------------
-----------------

## Intro

In the following jupyter notebook, I will work on a content-based recommendation system for recipes. This means that I will explore on a way taht by giving the input of a recipe name, there will be an output of similar recipes (based on content like ingridients).


#### Why is it important?

Content-based recommendations are based on characteristic of the items, and can have some advantages over 'user based' recommendation. One of them is that every user is different and have different taste, so some new users not may like what everyone else likes, or may have some prefference, and when there are new users like that, if it is only recommended  what is popular, it may or may not work; but by ALSO having a recommendation system based on content, we can have a higher chance to recommend something that a new user with peculiar taste actually wants.

Also content-based recommendations can introduce users to new and diverse recipes. By analyzing the attributes of recipes, the system can suggest recipes that are similar in terms of ingredients or cooking methods but may not have been discovered by the user (or users in general). This could also solve the problem taht many of the recipes are never explored. Later on in the exploration it will be explored that there are around 250,000 recipes that were never reviewed, so having a content-based recommendation additionally to the user based can help this unexplored recipes to be viewed.

#### Credits:

The data was taken from Kaggle and is a dataset from Food.com credit to the person who created the dataset: https://www.kaggle.com/datasets/irkaal/foodcom-recipes-and-reviews

# Section1
### Exploring and Cleaning Data

First import the required libraries to explor

In [1]:
import pandas as pd
import numpy as np

Loading the dataset:

In [2]:
recipes = pd.read_csv('foodcom_with_reviews/recipes.csv', index_col = 'RecipeId')

Checking the first few columns:

In [3]:
recipes.head(5)

Unnamed: 0_level_0,Name,AuthorId,AuthorName,CookTime,PrepTime,TotalTime,DatePublished,Description,Images,RecipeCategory,...,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions
RecipeId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
38,Low-Fat Berry Blue Frozen Dessert,1533,Dancer,PT24H,PT45M,PT24H45M,1999-08-09T21:46:00Z,Make and share this Low-Fat Berry Blue Frozen ...,"c(""https://img.sndimg.com/food/image/upload/w_...",Frozen Desserts,...,1.3,8.0,29.8,37.1,3.6,30.2,3.2,4.0,,"c(""Toss 2 cups berries with sugar."", ""Let stan..."
39,Biryani,1567,elly9812,PT25M,PT4H,PT4H25M,1999-08-29T13:12:00Z,Make and share this Biryani recipe from Food.com.,"c(""https://img.sndimg.com/food/image/upload/w_...",Chicken Breast,...,16.6,372.8,368.4,84.4,9.0,20.4,63.4,6.0,,"c(""Soak saffron in warm milk for 5 minutes and..."
40,Best Lemonade,1566,Stephen Little,PT5M,PT30M,PT35M,1999-09-05T19:52:00Z,This is from one of my first Good House Keepi...,"c(""https://img.sndimg.com/food/image/upload/w_...",Beverages,...,0.0,0.0,1.8,81.5,0.4,77.2,0.3,4.0,,"c(""Into a 1 quart Jar with tight fitting lid, ..."
41,Carina's Tofu-Vegetable Kebabs,1586,Cyclopz,PT20M,PT24H,PT24H20M,1999-09-03T14:54:00Z,This dish is best prepared a day in advance to...,"c(""https://img.sndimg.com/food/image/upload/w_...",Soy/Tofu,...,3.8,0.0,1558.6,64.2,17.3,32.1,29.3,2.0,4 kebabs,"c(""Drain the tofu, carefully squeezing out exc..."
42,Cabbage Soup,1538,Duckie067,PT30M,PT20M,PT50M,1999-09-19T06:19:00Z,Make and share this Cabbage Soup recipe from F...,"""https://img.sndimg.com/food/image/upload/w_55...",Vegetable,...,0.1,0.0,959.3,25.1,4.8,17.7,4.3,4.0,,"c(""Mix everything together and bring to a boil..."


Checkign for null values on the data set

In [4]:
recipes.isnull().sum()

Name                               0
AuthorId                           0
AuthorName                         0
CookTime                       82545
PrepTime                           0
TotalTime                          0
DatePublished                      0
Description                        5
Images                             1
RecipeCategory                   751
Keywords                       17237
RecipeIngredientQuantities         3
RecipeIngredientParts              0
AggregatedRating              253223
ReviewCount                   247489
Calories                           0
FatContent                         0
SaturatedFatContent                0
CholesterolContent                 0
SodiumContent                      0
CarbohydrateContent                0
FiberContent                       0
SugarContent                       0
ProteinContent                     0
RecipeServings                182911
RecipeYield                   348071
RecipeInstructions                 0
d

Checking the shape and columns of the DataFrame

In [5]:
recipes.shape

(522517, 27)

In [6]:
recipes.columns

Index(['Name', 'AuthorId', 'AuthorName', 'CookTime', 'PrepTime', 'TotalTime',
       'DatePublished', 'Description', 'Images', 'RecipeCategory', 'Keywords',
       'RecipeIngredientQuantities', 'RecipeIngredientParts',
       'AggregatedRating', 'ReviewCount', 'Calories', 'FatContent',
       'SaturatedFatContent', 'CholesterolContent', 'SodiumContent',
       'CarbohydrateContent', 'FiberContent', 'SugarContent', 'ProteinContent',
       'RecipeServings', 'RecipeYield', 'RecipeInstructions'],
      dtype='object')

Just double checking the null values on the data set

In [7]:
recipes.loc[recipes['AggregatedRating'].isnull()]

Unnamed: 0_level_0,Name,AuthorId,AuthorName,CookTime,PrepTime,TotalTime,DatePublished,Description,Images,RecipeCategory,...,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions
RecipeId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
61,Brownie Heart Cake,1555,Cindy Hartlin,PT42M,PT35M,PT1H17M,1999-09-07T14:15:00Z,Make and share this Brownie Heart Cake recipe ...,character(0),Dessert,...,144.2,1097.5,2157.8,509.9,29.0,392.5,71.7,,1 Large cake,"c(""CAKE: Grease 5 cup heart shaped pan; dust ..."
68,Chicago Style Pizza,174711,Queen Dragon Mom,PT2H38M,PT35M,PT3H13M,1999-08-22T04:56:00Z,Make and share this Chicago Style Pizza recipe...,character(0),Weeknight,...,10.1,55.0,990.0,41.1,1.9,2.5,20.9,8.0,8 slices,"c(""For crust, dissolve yeast in water."", ""Add ..."
69,Chicha Peruana,1595,Enrique1,PT1H50M,PT2H45M,PT4H35M,1999-08-14T06:20:00Z,Chicha (corn beer). Chicha is made in South an...,character(0),Beverages,...,0.0,0.0,3.6,2.7,1.5,0.0,2.7,,1 batch,"c(""Procedure: Mash for 90 minutes at 160°F."", ..."
74,Brownie Cheesecake Torte,67395,Dannygirl,PT55M,PT35M,PT1H30M,1999-08-22T04:49:00Z,Make and share this Brownie Cheesecake Torte r...,"""https://img.sndimg.com/food/image/upload/w_55...",Cheesecake,...,3.0,15.9,194.7,14.4,0.5,10.9,5.7,12.0,12,"c(""Preheat oven to 425 degrees F."", ""Combine f..."
75,California Chilled Salsa,1551,Sean Coate,,PT25M,PT25M,1999-09-06T05:11:00Z,Make and share this California Chilled Salsa r...,character(0),Sauces,...,0.0,0.0,354.9,4.9,0.9,3.3,0.6,10.0,,"c(""* Also delicious made with red sweet pepper..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
541379,Meg's Fresh Ginger Gingerbread,2002090414,rdsxc,PT35M,PT1H,PT1H35M,2020-12-22T15:27:00Z,Make and share this Meg's Fresh Ginger Gingerb...,character(0),Dessert,...,7.6,54.4,278.2,48.5,0.8,22.8,3.9,8.0,1 8x8 cake pan,"c(""Preheat oven to 350&deg;F Grease an 8x8 cak..."
541380,Roast Prime Rib au Poivre with Mixed Peppercorns,211566,Denver cooks,PT3H,PT30M,PT3H30M,2020-12-22T15:32:00Z,"White, black, green, and pink peppercorns add ...","""https://img.sndimg.com/food/image/upload/w_55...",Very Low Carbs,...,71.4,433.8,766.3,3.2,0.7,0.1,117.0,8.0,1 Roast,"c(""Position rack in center of oven and preheat..."
541381,Kirshwasser Ice Cream,2001131545,Jonathan F.,PT3H,PT1H,PT4H,2020-12-22T15:33:00Z,Make and share this Kirshwasser Ice Cream reci...,character(0),Ice Cream,...,72.6,470.9,192.5,33.9,0.0,17.3,12.8,6.0,,"c(""heat half and half and heavy cream to a sim..."
541382,Quick & Easy Asian Cucumber Salmon Rolls,2001004241,CLUBFOODY,,PT15M,PT15M,2020-12-22T22:11:00Z,"Extremely quick and easy to make, these are gr...","""https://img.sndimg.com/food/image/upload/w_55...",Canadian,...,0.1,2.9,100.5,0.3,0.0,0.1,2.4,,20 rolls,"c(""In a small bowl, combine mayo and wasabi pa..."


In [8]:
recipes.dropna(subset=['AggregatedRating'], inplace=True)

In [9]:
recipes.shape

(269294, 27)

The rest of the columns that have null values will not be used for this recommender system so I will leave them like that.

In [10]:
recipes.isnull().sum()

Name                               0
AuthorId                           0
AuthorName                         0
CookTime                       42784
PrepTime                           0
TotalTime                          0
DatePublished                      0
Description                        3
Images                             1
RecipeCategory                   245
Keywords                        6335
RecipeIngredientQuantities         2
RecipeIngredientParts              0
AggregatedRating                   0
ReviewCount                        0
Calories                           0
FatContent                         0
SaturatedFatContent                0
CholesterolContent                 0
SodiumContent                      0
CarbohydrateContent                0
FiberContent                       0
SugarContent                       0
ProteinContent                     0
RecipeServings                 96924
RecipeYield                   186471
RecipeInstructions                 0
d

### Choosing columns for vectorizer
Below i will just be exploring in depth the columns to see  what they contain, to choose which ne would be better to make a vectorizer and apply the content recommendation system. In the example Below I will just Select a few of them, to see potential candidates.

In [11]:
top_rated = recipes.sort_values(by=['AggregatedRating', 'ReviewCount'], ascending=False)
core_columns = ['Name', 'Keywords', 'RecipeCategory' ,'AggregatedRating', 'ReviewCount', 'Description', 'RecipeInstructions']
top_rated[['Name', 'Keywords', 'RecipeCategory' ,'AggregatedRating', 'ReviewCount', 'Description', 'RecipeInstructions']].head(10)

Unnamed: 0_level_0,Name,Keywords,RecipeCategory,AggregatedRating,ReviewCount,Description,RecipeInstructions
RecipeId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
45809,Bourbon Chicken,"c(""Chicken"", ""Poultry"", ""Meat"", ""Chinese"", ""As...",Chicken Breast,5.0,3063.0,I searched and finally found this recipe on th...,"c(""Editor's Note: Named Bourbon Chicken becau..."
2886,Best Banana Bread,"c(""Breads"", ""Fruit"", ""Oven"", ""< 4 Hours"")",Quick Breads,5.0,2273.0,Make and share this Best Banana Bread recipe f...,"c(""Remove odd pots and pans from oven."", ""Preh..."
27208,To Die for Crock Pot Roast,"c(""Roast Beef"", ""Meat"", ""Kid Friendly"", ""Potlu...",One Dish Meal,5.0,1692.0,"Amazing flavor, and so simple! No salt needed ...","c(""Place beef roast in crock pot."", ""Mix the d..."
39087,Creamy Cajun Chicken Pasta,"c(""Chicken"", ""Poultry"", ""Meat"", ""Cajun"", ""Kid ...",Chicken Breast,5.0,1586.0,Make and share this Creamy Cajun Chicken Pasta...,"c(""Place chicken and Cajun seasoning in a bowl..."
35813,Oatmeal Raisin Cookies,"c(""Dessert"", ""Lunch/Snacks"", ""Cookie & Brownie...",Drop Cookies,5.0,1410.0,"You've made oatmeal-raisin cookies before, so ...","c(""Preheat oven to 350°."", ""Whisk dry ingredie..."
67256,Best Ever Banana Cake With Cream Cheese Frosting,"c(""Tropical Fruits"", ""Fruit"", ""Weeknight"", ""Fo...",Dessert,5.0,1409.0,This is one of (if not) the BEST banana cake I...,"c(""Preheat oven to 275°F (135C)."", ""Grease and..."
54257,"Yes, Virginia There is a Great Meatloaf","c(""Meat"", ""Roast"", ""Oven"", ""< 4 Hours"")",Meatloaf,5.0,1384.0,Absolutely delicious meatloaf and sauce! Those...,"c(""Meatloaf: Combine meat loaf ingredients and..."
22782,Jo Mama's World Famous Spaghetti,"c(""Pork"", ""Meat"", ""European"", ""Kid Friendly"", ...",Spaghetti,5.0,1326.0,My kids will give up a steak dinner for this s...,"c(""In large, heavy stockpot, brown Italian sau..."
32204,&quot;Whatever Floats Your Boat&quot; Brownies!,"c(""Dessert"", ""Lunch/Snacks"", ""Cookie & Brownie...",Bar Cookie,5.0,1284.0,"These are absolutely the chewiest, moistest, f...","c(""Preheat oven to 350°F."", ""Grease an 8 inch ..."
25690,Pancakes,"c(""< 15 Mins"", ""Easy"", ""Inexpensive"")",Breakfast,5.0,1150.0,"This is really a great recipe! It is fast, sim...","c(""Beat egg until fluffy."", ""Add milk and melt..."


Checking the top rated `Recipes` (the `AggregatedRating` column) with the most `ReviewCount`

In [12]:
top_rated[['Name', 'Keywords', 'RecipeCategory' ,'AggregatedRating', 'ReviewCount', 'Description', 'RecipeInstructions']].loc[recipes['ReviewCount']<10]

Unnamed: 0_level_0,Name,Keywords,RecipeCategory,AggregatedRating,ReviewCount,Description,RecipeInstructions
RecipeId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
186,Coca-Cola Cake,"c(""Weeknight"", ""Oven"", ""< 4 Hours"")",Dessert,5.0,9.0,Make and share this Coca-Cola Cake recipe from...,"c(""Combine flour, sugar, salt, cocoa and bakin..."
210,Christmas Snow Punch,"c(""Beverages"", ""Winter"", ""Christmas"", ""< 15 Mi...",Punch Beverage,5.0,9.0,Here is a punch recipe that we used at my in-l...,"c(""In punch bowl, combine Hi-C Hula Punch, Spr..."
221,Chocolate Chip Muffins,"c(""Breads"", ""Weeknight"", ""Oven"", ""< 4 Hours"")",Quick Breads,5.0,9.0,When I find my bananas getting too ripe I free...,"c(""Preheat oven to 400 degrees Fahrenheit."", ""..."
270,Spiced Pear Butter,"c(""Fruit"", ""Weeknight"", ""Stove Top"", ""< 4 Hours"")",Pears,5.0,9.0,Make and share this Spiced Pear Butter recipe ...,"c(""Combine pears and apple juice in a large Du..."
355,Apple Crisp,"c(""Apple"", ""Fruit"", ""Low Protein"", ""< 60 Mins""...",Dessert,5.0,9.0,Make and share this Apple Crisp recipe from Fo...,"c(""Fill a 6\"" x 10\"" (I use a 9\"" x 13\"") baki..."
...,...,...,...,...,...,...,...
534266,White Chocolate Peppermint Candies,"""< 4 Hours""",Dessert,1.0,1.0,Make and share this White Chocolate Peppermint...,"c(""Spray the ice cube tray with cooking spray...."
534306,Crack Green Beans,"c(""Low Protein"", ""< 60 Mins"")",Vegetable,1.0,1.0,Make and share this Crack Green Beans recipe f...,"c(""Preheat oven to 350°."", ""Drain green beans ..."
535846,Creole Gumbo,"c(""Chicken"", ""Poultry"", ""Meat"", ""Very Low Carb...",Gumbo,1.0,1.0,Make and share this Creole Gumbo recipe from F...,"c(""Bring water and bay leaves to a boil."", ""In..."
536377,Carrabba's Marsala Sauce,"c(""European"", ""< 15 Mins"", ""Easy"")",Sauces,1.0,1.0,"Carrabba's delicious marsala sauce recipe, str...","c(""Melt 1 tablespoon of the butter in a large ..."


# Section2
## Search

In this section I will work in my search bar. Since I want to be able to use all this code in my rpesentation day, searching will be important. Since it is almost impossible that the user knows exactly the name of a recipe, I will first give them a tool in which they can filter by ingridients, to be able to get recipes names that they can later on use to search for similar recipies.

The problem with the search is that the result is exact. So putting a few ingridients would do the trick. The recommendation result is not exact, it gives a bit more freedom to give recommendations with maybe a few different ingridients, which can wake the curiosity of the user. 

In [13]:
from sklearn.metrics.pairwise import cosine_similarity



In the code below I am just looking for recipes that has 'Cheddar' in its ingredients.

In [15]:
recipes.loc[recipes['RecipeIngredientParts'].str.contains('cheddar')].head(5)

Unnamed: 0_level_0,Name,AuthorId,AuthorName,CookTime,PrepTime,TotalTime,DatePublished,Description,Images,RecipeCategory,...,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions
RecipeId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
102,Cheesy Scalloped Potato Side Dish,125579,GrandmaIsCooking,PT1H30M,PT25M,PT1H55M,1999-08-14T20:12:00Z,--Adopted Recipe--\r\nThis is a simple scallop...,"c(""https://img.sndimg.com/food/image/upload/w_...",Potato,...,6.5,32.3,260.7,46.0,5.2,4.3,14.2,,,"c(""Layer potatoes, flour, milk, and salt and p..."
104,Cheeseburger Casserole,1535,Marg CaymanDesigns,PT25M,PT15M,PT40M,1999-08-19T05:30:00Z,This is popular with the kids especially. My h...,"c(""https://img.sndimg.com/food/image/upload/w_...",Cheese,...,11.3,71.9,1795.1,44.8,1.5,12.6,25.5,6.0,,"c(""Combine ground beef and flour in skillet. A..."
108,Buttermilk Pie in Cornmeal Pastry,1535,Marg CaymanDesigns,PT40M,PT1H,PT1H40M,1999-08-06T00:41:00Z,Make and share this Buttermilk Pie in Cornmeal...,character(0),Pie,...,6.6,83.4,323.9,61.8,1.1,40.8,10.6,8.0,,"c(""For Pastry: Sift together flour and salt; s..."
162,Chicken Lasagna,1562,Libby1,PT2H,PT30M,PT2H30M,1999-09-20T19:42:00Z,Make and share this Chicken Lasagna recipe fro...,character(0),Chicken,...,4.7,29.4,693.5,10.9,0.8,3.1,18.1,8.0,,"c(""Mix filling ingredients in large bowl."", ""S..."
208,Chunky Tomato Cheese Pie,1559,Will Parkinson,PT40M,PT35M,PT1H15M,1999-09-16T06:01:00Z,Make and share this Chunky Tomato Cheese Pie r...,character(0),Cheese,...,14.0,56.1,1340.5,29.4,2.2,5.7,17.6,6.0,,"c(""Preheat oven to 375 degrees Fahrenheit."", ""..."


Then I am adding a level of difficulty on searching 2 ingridients. But this code would not be very effective for multiple items

In [16]:

recipes.loc[recipes['RecipeIngredientParts'].str.contains('cheddar') & recipes['RecipeIngredientParts'].str.contains('chicken')].head(5)   #looking for columns that has this ingredient

Unnamed: 0_level_0,Name,AuthorId,AuthorName,CookTime,PrepTime,TotalTime,DatePublished,Description,Images,RecipeCategory,...,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions
RecipeId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
162,Chicken Lasagna,1562,Libby1,PT2H,PT30M,PT2H30M,1999-09-20T19:42:00Z,Make and share this Chicken Lasagna recipe fro...,character(0),Chicken,...,4.7,29.4,693.5,10.9,0.8,3.1,18.1,8.0,,"c(""Mix filling ingredients in large bowl."", ""S..."
339,Spinach Phyllo Casserole,39547,Julesong,PT45M,PT1H,PT1H45M,1999-09-13T03:48:00Z,The layers of this low-fat phyllo pastry are c...,"c(""https://img.sndimg.com/food/image/upload/w_...",Vegetable,...,1.9,6.3,319.9,8.7,1.1,0.8,12.7,8.0,,"c(""Preheat oven to 375 F (190 C). Saute onions..."
506,Yummy and Comforting Chicken Tetrazzini,51011,Loves2Teach,PT40M,PT35M,PT1H15M,1999-08-26T03:56:00Z,"Though very easy, this is surprisingly good. ...",character(0),Chicken,...,12.9,56.4,746.8,26.7,1.4,3.0,16.2,10.0,,"c(""Cook the spaghetti according to package dir..."
558,German Potato-Cheese Soup,1549,Dave5003,PT40M,PT30M,PT1H10M,1999-08-17T04:36:00Z,Make and share this German Potato-Cheese Soup ...,"""https://img.sndimg.com/food/image/upload/w_55...",Potato,...,9.7,45.8,775.8,24.2,2.7,5.6,14.3,12.0,,"c(""Peel and prepare vegetables."", ""In a large ..."
602,Enchiladas Verdes Suizas,174711,Queen Dragon Mom,PT15M,PT1H20M,PT1H35M,1999-08-25T06:25:00Z,"For authenticity, substitute crema for the whi...",character(0),Mexican,...,29.9,152.0,325.8,66.5,9.6,10.3,13.6,4.0,12 enchiladas,"c(""Remove papery husks from tomatillos."", ""Was..."


In the next two block of codes I am just manually searching the ingridients to explore and see if they are atleast similar

In [17]:
recipes['RecipeIngredientParts'].loc[recipes['Name'] == 'Chicken Lasagna']

RecipeId
162       c("onion", "green pepper", "skim milk", "fresh...
76476     c("chicken broth", "salt", "cottage cheese", "...
87901     c("butter", "milk", "parsley", "parmesan chees...
134726    c("butter", "onions", "skinless chicken breast...
165926    c("mushrooms", "onion", "oregano", "basil", "m...
171715    c("cottage cheese", "cheddar cheese", "parmesa...
210101    c("butter", "olive oil", "mushroom", "garlic c...
310552    c("butter", "onion", "garlic clove", "all-purp...
311199    c("boneless chicken breasts", "cottage cheese"...
349042    c("boneless skinless chicken breast", "diced t...
357970    c("butter", "flour", "salt", "basil", "chicken...
387872    c("chicken breasts", "onion", "bell pepper", "...
Name: RecipeIngredientParts, dtype: object

In [18]:
# exploring the ingridientsParts

# So i can see the whole column
pd.set_option('display.max_colwidth', None)

# Access the recipe i want to see
print(recipes['RecipeIngredientParts'].loc[recipes['Name'] == 'Chicken Lasagna'])

# setting the reset back to normal
pd.reset_option('display.max_colwidth')


RecipeId
162                                                                                             c("onion", "green pepper", "skim milk", "fresh mushrooms", "pimiento", "dried basil", "nonfat cottage cheese", "mozzarella cheese", "parmesan cheese", "chicken", "cheddar cheese")
76476                                                                                                                                  c("chicken broth", "salt", "cottage cheese", "cream cheese", "sour cream", "mayonnaise", "onion", "green pepper", "fresh parsley", "butter")
87901                                                                                                                        c("butter", "milk", "parsley", "parmesan cheese", "onion", "cottage cheese", "cheddar cheese", "cream cheese", "mozzarella cheese", "frozen broccoli")
134726                                                                                                                                     c("butter", "onions", "s

Well, now that the manual search is kind of ok, I will tune it up make it better

I will try out a way to search recipes based on Ingridients. Right now I am doing it like this, later on I will try doing a function. The reason I am doing this is because it will be difficult for people to know the exact name of the Recipe they are looking for, so to make their life easier, I want them to be able to look by keywords (in this case ingridients), get a list of recipies that have these ingridients, and with that list, be able to copy them to the `Content Recommender` (function) which will recommend similar recipes.

In [19]:
# searching for ingridients
ingredients = ['cheddar', 'chicken', 'garlic', 'spaghetti']  

mask = True  

for ingredient in ingredients:
    mask = mask & recipes['RecipeIngredientParts'].str.contains(ingredient)

result = recipes.loc[mask]

result

Unnamed: 0_level_0,Name,AuthorId,AuthorName,CookTime,PrepTime,TotalTime,DatePublished,Description,Images,RecipeCategory,...,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions
RecipeId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
23700,Baked Spaghetti With Chicken and Spinach,5060,Derf2440,PT30M,PT15M,PT45M,2002-03-28T17:10:00Z,"One of our favourite ways to serve spaghetti, ...","""https://img.sndimg.com/food/image/upload/w_55...",One Dish Meal,...,14.3,138.2,1469.3,65.6,5.7,11.5,57.9,2.0,,"c(""Cook spaghetti, al dente, drain and set asi..."
52978,Spaghetti Squash Ole,54716,Mimi Bobeck,PT1H10M,PT10M,PT1H20M,2003-02-02T20:11:00Z,Make and share this Spaghetti Squash Ole recip...,character(0),Low Cholesterol,...,3.2,71.0,799.6,32.1,7.0,6.2,26.4,6.0,,"c(""Halve the spaghetti squash lengthwise and r..."
82638,Mexican Spaghetti,67573,TPubmgjbd,PT40M,PT15M,PT55M,2004-01-31T20:01:00Z,A good way to use up leftover rotisserie chick...,"""https://img.sndimg.com/food/image/upload/w_55...",Chicken,...,9.2,40.1,801.5,47.6,4.5,7.9,14.9,6.0,,"c(""In a large skillet, saute onion and garlic ..."
116815,Kittencal's Tuna-Spaghetti Casserole,89831,Kittencalrecipezazz,PT30M,PT30M,PT1H,2005-04-12T16:47:00Z,This is a recipe I developed years ago; it als...,"c(""https://img.sndimg.com/food/image/upload/w_...",Tuna,...,14.2,136.9,724.6,40.8,2.5,3.7,31.2,,,"c(""Butter a 13 x 9-inch baking dish."", ""Chop t..."
137226,Chicken Spaghetti Casserole Bake,89831,Kittencalrecipezazz,PT40M,PT30M,PT1H10M,2005-09-13T11:52:00Z,Make and share this Chicken Spaghetti Casserol...,character(0),Spaghetti,...,4.5,18.0,606.9,33.0,3.8,6.2,11.6,6.0,,"c(""Set oven to 350 degrees F."", ""Butter a 3-qu..."
154183,Creamy Turkey (Or Chicken) Spaghetti Bake,89831,Kittencalrecipezazz,PT30M,PT30M,PT1H,2006-02-01T14:04:00Z,This is a wonderful dish to use up any leftove...,character(0),Spaghetti,...,38.9,248.7,1601.4,109.6,8.4,10.3,75.7,,,"c(""In a large heavy saucepan cook the fresh mu..."
184052,Rotisserie Chicken Spaghetti Casserole,27783,HeatherFeather,PT45M,PT15M,PT1H,2006-08-31T13:26:00Z,This is one of those casseroles with many vers...,character(0),Chicken,...,7.9,76.1,708.3,35.6,1.6,4.8,25.5,12.0,,"c(""Remove skin and bones from your rotisserie ..."
184922,Stuffed Spaghetti Squash,123118,momjan,PT45M,PT30M,PT1H15M,2006-09-06T21:28:00Z,"A really good recipe, low cal, lots of veggies...",character(0),One Dish Meal,...,12.8,106.8,682.2,16.0,3.3,8.7,30.1,4.0,2 boats,"c(""Pierce the squash along one side in a strai..."
231284,Chicken Tetrazzini,384737,Salt Lake Meal Swap,PT30M,PT30M,PT1H,2007-05-30T17:03:00Z,Make and share this Chicken Tetrazzini recipe ...,character(0),One Dish Meal,...,11.2,104.5,1043.6,34.1,1.8,4.0,31.1,10.0,,"c(""Assembly Directions:"", ""Break the noodles i..."
330414,Cheesy Chicken Tetrazzini,226863,breezermom,PT1H,PT10M,PT1H10M,2008-10-13T23:57:00Z,"This recipe came from Southern Living, way bac...","c(""https://img.sndimg.com/food/image/upload/w_...",Spaghetti,...,22.0,147.6,1228.4,37.5,2.8,3.2,46.1,,,"c(""Combine first 3 ingredients in a saucepan;b..."


`RecipeIngredientParts` seems like a good candidate to make the vectorizer.

Next I will analyze `RecipeInstructions` to see what kind of data it contains.

In [20]:
# searching for ingridients using the 'RecipeInstructions' column
ingredients = ['cheddar', 'chicken', 'garlic', 'spaghetti']  

mask = True  

for ingredient in ingredients:
    mask = mask & recipes['RecipeInstructions'].str.contains(ingredient)

result = recipes.loc[mask]

result

Unnamed: 0_level_0,Name,AuthorId,AuthorName,CookTime,PrepTime,TotalTime,DatePublished,Description,Images,RecipeCategory,...,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions
RecipeId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
94519,Creole Spaghetti Bake,89831,Kittencalrecipezazz,PT25M,PT30M,PT55M,2004-06-28T20:00:00Z,Make and share this Creole Spaghetti Bake reci...,character(0),Spaghetti,...,11.5,50.0,1047.1,123.2,7.2,10.8,32.2,6.0,,"c(""Set oven to 350 degrees."", ""Grease a 2-quar..."
116815,Kittencal's Tuna-Spaghetti Casserole,89831,Kittencalrecipezazz,PT30M,PT30M,PT1H,2005-04-12T16:47:00Z,This is a recipe I developed years ago; it als...,"c(""https://img.sndimg.com/food/image/upload/w_...",Tuna,...,14.2,136.9,724.6,40.8,2.5,3.7,31.2,,,"c(""Butter a 13 x 9-inch baking dish."", ""Chop t..."
133250,Ospidillo Cafe Cincinnati Chili No. 2,196369,Bone Man,PT6H,PT1H,PT7H,2005-08-11T19:54:00Z,If you've ever eaten &quot;Cincinnati Chili&qu...,"c(""https://img.sndimg.com/food/image/upload/w_...",Beans,...,2.6,25.0,301.8,11.8,3.6,1.9,10.8,25.0,,"c(""In a large cooking pot, over medium-high h..."
137226,Chicken Spaghetti Casserole Bake,89831,Kittencalrecipezazz,PT40M,PT30M,PT1H10M,2005-09-13T11:52:00Z,Make and share this Chicken Spaghetti Casserol...,character(0),Spaghetti,...,4.5,18.0,606.9,33.0,3.8,6.2,11.6,6.0,,"c(""Set oven to 350 degrees F."", ""Butter a 3-qu..."
214724,Cincinnati Chili,398160,GREG IN SAN DIEGO,PT20M,PT20M,PT40M,2007-03-02T22:23:00Z,This recipe came from a recent edition of &quo...,"c(""https://img.sndimg.com/food/image/upload/w_...",Meat,...,3.1,70.3,1214.7,15.1,3.5,7.6,28.0,,,"c(""Heat oil in Dutch oven over medium-high hea..."
218790,Cowboy Spaghetti With Cheese Sauce - Rachael Ray,170628,LizP5885,,PT45M,PT45M,2007-03-25T20:40:00Z,I saw Rachael Ray make this on her show and I ...,"c(""https://img.sndimg.com/food/image/upload/w_...",Stew,...,13.3,88.8,731.7,54.7,3.8,5.8,31.4,8.0,,"c(""Bring a pot of water to a boil. Add a gener..."
219206,Creamy Chicken Spaghetti,88099,Nimz_,PT45M,PT40M,PT1H25M,2007-03-27T22:20:00Z,This is so rich and creamy with a hint of spic...,"c(""https://img.sndimg.com/food/image/upload/w_...",One Dish Meal,...,13.5,103.4,1702.3,41.0,2.9,5.5,35.7,,,"c(""Saute onion, peppers and garlic in 1 tables..."
231284,Chicken Tetrazzini,384737,Salt Lake Meal Swap,PT30M,PT30M,PT1H,2007-05-30T17:03:00Z,Make and share this Chicken Tetrazzini recipe ...,character(0),One Dish Meal,...,11.2,104.5,1043.6,34.1,1.8,4.0,31.1,10.0,,"c(""Assembly Directions:"", ""Break the noodles i..."
277393,Nimz's Creamy Chicken Spaghetti (Lite-Bleu),452940,2Bleu,PT1H30M,PT40M,PT2H10M,2008-01-08T00:19:00Z,This is a healthier version of Chef #88099's R...,"""https://img.sndimg.com/food/image/upload/w_55...",Chicken,...,5.4,57.9,939.6,33.2,2.2,5.8,23.9,6.0,,"c(""In a large skillet over medium heat, melt t..."
316545,"Chili Sauce for Hot Dogs, Fries and Hamburgers",795588,Brandess,PT2H,PT5M,PT2H5M,2008-07-30T19:11:00Z,This is 1 of my 2 staple chili sauces. This is...,"c(""https://img.sndimg.com/food/image/upload/w_...",Sauces,...,2.5,38.1,449.8,5.9,1.2,2.8,13.3,,,"c(""With your clean hands start by adding 4 cup..."


I was checking if i would get more rows with `Recipe instructions` but it was the same quantity of rows as `RecipeIngridientsParts`

In [21]:
# searching for ingridients
ingredients = ['cheddar', 'chicken', 'garlic', 'spaghetti']  

mask = True  

for ingredient in ingredients:
    mask = mask & recipes['Description'].str.contains(ingredient)

result = recipes.loc[mask]

result

Unnamed: 0_level_0,Name,AuthorId,AuthorName,CookTime,PrepTime,TotalTime,DatePublished,Description,Images,RecipeCategory,...,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions
RecipeId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
73376,Very Cheesy Garlic Bread,63098,Shawn C,PT15M,PT5M,PT20M,2003-10-15T20:00:00Z,I love this with my homemade lasagna or spaghe...,character(0),Breads,...,16.7,68.3,982.1,60.0,3.5,0.4,13.6,4.0,,"c(""Preheat oven to 375°F."", ""Cut slices down t..."


The `description` seems  abit less accurrate. In the example where i did 'cheddar', 'chicken', 'garlic', 'spaghetti' , the `description` only gave me 1 result, so I will not use this one for my searchbar. The one I will use is the ingridients, since it has the words that I need to make a good content-based recommendation system.

Below I will just be exploring the data.

In [22]:
recipes['RecipeInstructions'].isna().sum()

0

In [23]:
recipes.shape

(269294, 27)

In [24]:
recipes['ReviewCount'].value_counts().head(10)

1.0     101665
2.0      52906
3.0      29880
4.0      18691
5.0      12814
6.0       9449
7.0       7156
8.0       5336
9.0       4156
10.0      3222
Name: ReviewCount, dtype: int64

There are many recipes that have less than 10 reviews. This would be biase reviews so I will get ride of them. Warning. In this case I only do this because I need to make my dataset lot smaller since my computer cant handle the whole dataset and this will be a standalone project. On real life scenarios, keeping them could help this recipes with no reviews have more views and visits. 


In [25]:
recipes = recipes[recipes['ReviewCount'] >= 10]

In [26]:
recipes.shape

(27241, 27)

In [27]:
recipes.head(5)

Unnamed: 0_level_0,Name,AuthorId,AuthorName,CookTime,PrepTime,TotalTime,DatePublished,Description,Images,RecipeCategory,...,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions
RecipeId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
40,Best Lemonade,1566,Stephen Little,PT5M,PT30M,PT35M,1999-09-05T19:52:00Z,This is from one of my first Good House Keepi...,"c(""https://img.sndimg.com/food/image/upload/w_...",Beverages,...,0.0,0.0,1.8,81.5,0.4,77.2,0.3,4.0,,"c(""Into a 1 quart Jar with tight fitting lid, ..."
42,Cabbage Soup,1538,Duckie067,PT30M,PT20M,PT50M,1999-09-19T06:19:00Z,Make and share this Cabbage Soup recipe from F...,"""https://img.sndimg.com/food/image/upload/w_55...",Vegetable,...,0.1,0.0,959.3,25.1,4.8,17.7,4.3,4.0,,"c(""Mix everything together and bring to a boil..."
44,Warm Chicken A La King,1596,Joan Edington,PT3M,PT35M,PT38M,1999-09-17T04:47:00Z,I copied this one out of a friend's book so ma...,"""https://img.sndimg.com/food/image/upload/w_55...",Chicken,...,31.9,405.8,557.2,29.1,3.1,5.0,45.3,2.0,,"c(""Melt 1 1/2 ozs butter, add the flour and co..."
49,Chicken Breasts Lombardi,174711,Queen Dragon Mom,PT30M,PT45M,PT1H15M,1999-08-14T19:58:00Z,Make and share this Chicken Breasts Lombardi r...,"c(""https://img.sndimg.com/food/image/upload/w_...",Chicken Breast,...,13.0,203.0,848.9,13.7,0.6,2.0,57.9,6.0,,"c(""Cook mushrooms in 2 tbsp butter in a large ..."
54,Carrot Cake,1535,Marg CaymanDesigns,PT50M,PT45M,PT1H35M,1999-09-13T15:20:00Z,This is one of the few recipes my husband ever...,"c(""https://img.sndimg.com/food/image/upload/w_...",Dessert,...,4.9,69.8,534.8,67.0,1.6,47.9,5.0,12.0,1 bundt,"c(""Beat together the eggs, oil, and white suga..."


In [28]:
recipes['AggregatedRating'].value_counts()

5.0    21779
4.5     4805
4.0      553
3.5       78
3.0       15
2.5        7
2.0        2
1.5        2
Name: AggregatedRating, dtype: int64

Again, this is just for demostration porpuses and to make my DataFrame a little bit  smaller (and avoid having a bunch of unrated recipes on my sample).

In [29]:
recipes = recipes[recipes['AggregatedRating']> 3.5]

In [30]:
recipes.shape

(27137, 27)

I only mantain recipes that are above 3.5 to only recommend the best ones. Since the ones on 3.5 nd bellow are less than 110, it is not a problem.

# function for search bar

In the examples below I will search 'cheddar, spaghetti, chicken, garlic' . I did the search to be able to use it later on and I hav ebeen using it a lot to see if the code is correct by searching certain ingredients and see if most of the reocmmendations come up.

In [31]:
input_ingredients = input('Enter your ingredients, separated by commas: ')
ingredients = input_ingredients.split(',')

mask = True

for ingredient in ingredients:
    mask = mask & recipes['RecipeIngredientParts'].str.contains(ingredient.strip(), case=False)

result = recipes.loc[mask].head(3)

display(result)


Enter your ingredients, separated by commas: cheddar, spaghetti, chicken, garlic
                                              Name  AuthorId  \
RecipeId                                                       
23700     Baked Spaghetti With Chicken and Spinach      5060   
116815        Kittencal's Tuna-Spaghetti Casserole     89831   

                   AuthorName CookTime PrepTime TotalTime  \
RecipeId                                                    
23700                Derf2440    PT30M    PT15M     PT45M   
116815    Kittencalrecipezazz    PT30M    PT30M      PT1H   

                 DatePublished  \
RecipeId                         
23700     2002-03-28T17:10:00Z   
116815    2005-04-12T16:47:00Z   

                                                Description  \
RecipeId                                                      
23700     One of our favourite ways to serve spaghetti, ...   
116815    This is a recipe I developed years ago; it als...   

                           

Since the code above is working, I did a function to be able to use it later on the presentation day.

In [32]:
def ingredient_search(recipes):
    input_ingredients = input('Enter your ingredients, separated by commas: ')
    ingredients = input_ingredients.split(',')

    mask = True

    for ingredient in ingredients:
        mask = mask & recipes['RecipeIngredientParts'].str.contains(ingredient.strip(), case=False)

    result = recipes.loc[mask]
    
    return result['Name'].head(10)


In [33]:
search_result = ingredient_search(recipes)

# Print the search result
display(search_result) 
 

Enter your ingredients, separated by commas: cheddar, spaghetti, chicken, garlic
RecipeId
23700     Baked Spaghetti With Chicken and Spinach
116815        Kittencal's Tuna-Spaghetti Casserole
Name: Name, dtype: object


The searchbar is working! this will be useful for presentation purpose and here on the code it was very useful to correct my code (i used it many times to see if the recommender at the end is working correctly)

# Section3
## Vectorize for recommendation

In this part I will vectorize using TfidfVectorizer to make recommendations based on content. First I will have to reset my index to be able to use '.index' I was testing without resenting my index and it was giving me a bunch of problems, so resetting is the way.

In [35]:
recipes.reset_index(inplace=True)
recipes.head(3)

Unnamed: 0,RecipeId,Name,AuthorId,AuthorName,CookTime,PrepTime,TotalTime,DatePublished,Description,Images,...,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions
0,40,Best Lemonade,1566,Stephen Little,PT5M,PT30M,PT35M,1999-09-05T19:52:00Z,This is from one of my first Good House Keepi...,"c(""https://img.sndimg.com/food/image/upload/w_...",...,0.0,0.0,1.8,81.5,0.4,77.2,0.3,4.0,,"c(""Into a 1 quart Jar with tight fitting lid, ..."
1,42,Cabbage Soup,1538,Duckie067,PT30M,PT20M,PT50M,1999-09-19T06:19:00Z,Make and share this Cabbage Soup recipe from F...,"""https://img.sndimg.com/food/image/upload/w_55...",...,0.1,0.0,959.3,25.1,4.8,17.7,4.3,4.0,,"c(""Mix everything together and bring to a boil..."
2,44,Warm Chicken A La King,1596,Joan Edington,PT3M,PT35M,PT38M,1999-09-17T04:47:00Z,I copied this one out of a friend's book so ma...,"""https://img.sndimg.com/food/image/upload/w_55...",...,31.9,405.8,557.2,29.1,3.1,5.0,45.3,2.0,,"c(""Melt 1 1/2 ozs butter, add the flour and co..."
3,49,Chicken Breasts Lombardi,174711,Queen Dragon Mom,PT30M,PT45M,PT1H15M,1999-08-14T19:58:00Z,Make and share this Chicken Breasts Lombardi r...,"c(""https://img.sndimg.com/food/image/upload/w_...",...,13.0,203.0,848.9,13.7,0.6,2.0,57.9,6.0,,"c(""Cook mushrooms in 2 tbsp butter in a large ..."
4,54,Carrot Cake,1535,Marg CaymanDesigns,PT50M,PT45M,PT1H35M,1999-09-13T15:20:00Z,This is one of the few recipes my husband ever...,"c(""https://img.sndimg.com/food/image/upload/w_...",...,4.9,69.8,534.8,67.0,1.6,47.9,5.0,12.0,1 bundt,"c(""Beat together the eggs, oil, and white suga..."


I used a TFidfVectorizer. I selecte a minimun of 20 words, Because I want to avoid typos or words that are barely used since this could mess up with my model. I also selected a max_df of 0.7 just in case some word is used way too much

In [36]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(stop_words = "english", min_df=20, max_df = 0.7)
recipes['RecipeIngredientParts'] = recipes['RecipeIngredientParts'].fillna("")

TF_IDF_matrix = vectorizer.fit_transform(recipes['RecipeIngredientParts'])

In [37]:
TF_IDF_matrix.shape

(27137, 660)

# Section4
## Recommender

Now the part where I put all together to create the recommender. First I have to import cosine_similarity and I will test 2 recipes I got from the search, that SHOULD be similar

In [38]:
from sklearn.metrics.pairwise import cosine_similarity

In [39]:
# checking the ingridients of 2 recipes
# So i can see the whole column
pd.set_option('display.max_colwidth', None)

print(recipes['RecipeIngredientParts'][recipes['Name'] == 'Warm Chicken A La King'])
print(recipes['RecipeIngredientParts'][recipes['Name'] == 'Chicken Breasts Lombardi'])


# setting the reset back to normal
pd.reset_option('display.max_colwidth')

2    c("chicken", "butter", "flour", "milk", "celery", "button mushrooms", "green pepper", "canned pimiento", "salt", "black pepper", "Worcestershire sauce", "parsley")
Name: RecipeIngredientParts, dtype: object
3    c("fresh mushrooms", "butter", "boneless skinless chicken breast halves", "flour", "butter", "marsala", "chicken broth", "salt", "mozzarella cheese", "parmesan cheese", "green onion")
Name: RecipeIngredientParts, dtype: object


The 2 tested recipes seem kind of similar, but when checking the ingredients, they only have chicken, butter, salt in common, then the others are pretty much different. They had the ingridients I searched for in one of my searches, but... are they really the closest in similarity? 

I did a cosine similarity to check it out

In [40]:
# comparing cosine_similarity of the 2 recipes
recipe_1_index = recipes[recipes['Name'] == 'Warm Chicken A La King'].index[0]  
recipe_2_index = recipes[recipes['Name'] == 'Chicken Breasts Lombardi'].index[0]  

recipe_1 = TF_IDF_matrix[recipe_1_index, :]
recipe_2 = TF_IDF_matrix[recipe_2_index, :]

similarity = cosine_similarity(recipe_1, recipe_2)
print("Similarity:", similarity[0][0])


Similarity: 0.22913068557345473


The simiarity is 0.22, is that high? low? is it the closes recipe to it? We dont know unless I make the model run through all recipes to see which one has the closes similarity score.

In [41]:
similarities = cosine_similarity(TF_IDF_matrix, dense_output=False)
similarities.shape

(27137, 27137)

In [42]:
recipes[recipes['Name'] == 'Warm Chicken A La King'].index

Int64Index([2], dtype='int64')

Next I will be doing a Dataframe with the recipe name and similarity to see if I get recipes that are actually similar. This is the moment of thuth!

In [43]:
# Get the column based upon the index
recipe_index = recipes[recipes['Name'] == 'Warm Chicken A La King'].index

# Create a dataframe with the movie titles
sim_df = pd.DataFrame({'RecipeName':recipes['Name'], 
                       'similarity': np.array(similarities[recipe_index, :].todense()).squeeze(),
                      'Ingredients': recipes['RecipeIngredientParts']})

In [44]:
sim_df.sort_values(by='similarity', ascending=False).head(10)

Unnamed: 0,RecipeName,similarity,Ingredients
2,Warm Chicken A La King,1.0,"c(""chicken"", ""butter"", ""flour"", ""milk"", ""celer..."
16149,Classic Chicken Ala King,0.624533,"c(""butter"", ""flour"", ""salt"", ""pepper"", ""chicke..."
16772,Garlic Mushrooms With Basil,0.536586,"c(""butter"", ""garlic cloves"", ""button mushrooms..."
21231,Quickie Tom Yum Soup,0.516537,"c(""garlic"", ""celery"", ""tomatoes"", ""button mush..."
6694,Cajun Glazed Mushrooms,0.508901,"c(""button mushrooms"", ""unsalted butter"", ""marg..."
22999,Chicken With Mushrooms,0.498053,"c(""boneless skinless chicken thighs"", ""white b..."
22271,Kittencal's Turkey or Chicken a La King,0.458257,"c(""butter"", ""fresh mushrooms"", ""fresh garlic"",..."
19011,Champignons &agrave; L'ail (Garlic Mushrooms),0.455482,"c(""button mushrooms"", ""butter"", ""garlic cloves"")"
7226,Garlic Stuffed Mushrooms,0.453184,"c(""white button mushrooms"", ""garlic"", ""butter"")"
515,Corn and Pea Salad,0.445987,"c(""white corn"", ""tiny peas"", ""onion"", ""pimient..."


By checking the name I can see that they are actually similar, the closest one in similarity to 'Warm Chicken A La King' is 'Classic Chicken Ala King' which makes a lot of sense, because it is a variation of it. Most of the recipes names include chicken, mushroom and garlic. Below I check the 3rd one 'Garlic Mushroom with Basil' since it has a similar similarity without the word 'Chicken' in its name. I assume it will be stil a bit similar cause of the words 'Mushroom' and 'Garlic'

So there are indeed a lot of recipes that are closer to 'Warm Chicken A La King' than 'Chicken Breast Lombardi'. This shows why recommendation systems can be very powerful and useful!

In [46]:
# checking the ingridients of 2 recipes
# So i can see the whole column
pd.set_option('display.max_colwidth', None)

print(recipes['RecipeIngredientParts'][recipes['Name'] == 'Warm Chicken A La King'])
print('-------')
print(recipes['RecipeIngredientParts'][recipes['Name'] == 'Garlic Mushrooms With Basil'])


# setting the reset back to normal
pd.reset_option('display.max_colwidth')

2    c("chicken", "butter", "flour", "milk", "celery", "button mushrooms", "green pepper", "canned pimiento", "salt", "black pepper", "Worcestershire sauce", "parsley")
Name: RecipeIngredientParts, dtype: object
-------
16772    c("butter", "garlic cloves", "button mushrooms", "salt", "black pepper", "cayenne pepper", "parsley", "basil")
Name: RecipeIngredientParts, dtype: object


In effect the recipes have very similar ingridients. They both have butter, garlic, buttom mushrooms, salt, pepper, parsley. The only difference is the first one has chicken, sauce and milk, and the second has a different kind of pepper and basiil, but the recommendation seems on point.

# Function for Content Recommender

Now that I have tested that the cosine similarity works with the `RecipeIngridientParts` I will do a function that will do everything I am doing on separated

In [53]:
def get_similar_recipes(recipe_name, recipes, similarities):
    # Get the column based on the index
    recipe_index = recipes[recipes['Name'] == recipe_name].index

    # Create a dataframe with the recipe names, similarity scores, and ingredients
    sim_df = pd.DataFrame({
        'RecipeName': recipes['Name'],
        'SimilarityScore': np.array(similarities[recipe_index, :].todense()).squeeze(),
        'Ingredients': recipes['RecipeIngredientParts']
    })

    # Sort the dataframe by similarity score in descending order
    sim_df = sim_df.sort_values(by='SimilarityScore', ascending=False)

    return sim_df.head(10)

In [55]:
similar_recipes = get_similar_recipes('Warm Chicken A La King', recipes, similarities)
display(similar_recipes)

Unnamed: 0,RecipeName,SimilarityScore,Ingredients
2,Warm Chicken A La King,1.0,"c(""chicken"", ""butter"", ""flour"", ""milk"", ""celer..."
16149,Classic Chicken Ala King,0.624533,"c(""butter"", ""flour"", ""salt"", ""pepper"", ""chicke..."
16772,Garlic Mushrooms With Basil,0.536586,"c(""butter"", ""garlic cloves"", ""button mushrooms..."
21231,Quickie Tom Yum Soup,0.516537,"c(""garlic"", ""celery"", ""tomatoes"", ""button mush..."
6694,Cajun Glazed Mushrooms,0.508901,"c(""button mushrooms"", ""unsalted butter"", ""marg..."
22999,Chicken With Mushrooms,0.498053,"c(""boneless skinless chicken thighs"", ""white b..."
22271,Kittencal's Turkey or Chicken a La King,0.458257,"c(""butter"", ""fresh mushrooms"", ""fresh garlic"",..."
19011,Champignons &agrave; L'ail (Garlic Mushrooms),0.455482,"c(""button mushrooms"", ""butter"", ""garlic cloves"")"
7226,Garlic Stuffed Mushrooms,0.453184,"c(""white button mushrooms"", ""garlic"", ""butter"")"
515,Corn and Pea Salad,0.445987,"c(""white corn"", ""tiny peas"", ""onion"", ""pimient..."


Everything works as planned! So bon appetit with the recipes that you can get from this!

# Section 5
## Conclusion

In this notebook we explored searching, filtering, tokenizing and doing a recommendation system based on ingredients. This can be very useful, specially to new customers/visitors or people that have never left a review on the site, because the system doesn't have enough information on what he has liked before, but with the content one, they can receive recommendations based on what he searches. Also, by considering the content characteristics, the system can suggest items that share similar attributes but have not been directly interacted with by the user. 