<a href="https://colab.research.google.com/github/dinabahar/recipe-recommender/blob/master/code/2_recommender.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recipe Recommender
In this notebook I built the the recommender system using the pre-processed data, `ready.csv`. 

Calculating pairwise distances for 30,000 data was too demanding on Jupyter. Hence, I ran my codes on Google Colab Pro with `GPU` and `High-RAM` runtime settings. 

If you are running this notebook with limited RAM, I recommend sampling the data even smaller.

---
## DIRECTORY
1. [Import Libraries](#import)
2. [Read Data](#read)
3. [Preprocessing](#preprocessing)
4. [Calculate Pairwise Distance](#pairwise)
5. [RECiPE ROULETTE: Recommender System](#system)

---
<a id='import'></a>
## 1. Import Libraries
Along with Pandas, Numpy and Matplotlib, we are going to import,
- `sparse` from SciPy to convert our data into CSR matrix which will be a more efficient way to store matrices. 
- `pairwise_distances` method from Scikit-learn to calculate pairwise distances.
- `random` to pick 1 of 5 recommendation with lowest pairwise distances.
- `colored` method from [termcolor](https://pypi.org/project/termcolor/), a python module for ANSII Color formatting for output in terminal.

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from scipy import sparse
from sklearn.metrics.pairwise import pairwise_distances
import random
from termcolor import colored

# Using Google Colab
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


<a id='read'></a>
## 2. Read Data
I'm reading `cleaned_recipes.csv` to pull recipe information when outputing recommendation. As well as `ready.csv` for calculating the pairwise distance.

**Read `cleaned_recipes.csv`**

In [2]:
path = "/content/drive/My Drive/recipe-recommender/cleaned_recipes.csv"
recipe_df = pd.read_csv(path)
recipe_df.head()

Unnamed: 0,name,minutes,steps,ingredients
0,crab filled crescent snacks,70,"['heat over to 375 degrees', 'spray large cook...","['crabmeat', 'cream cheese', 'green onions', '..."
1,curried bean salad,20,"['drain & rinse beans', 'stir all ingredients ...","['garbanzo beans', 'black beans', 'onion', 'gi..."
2,delicious steak with onion marinade,25,['heat the oil in a heavy-based pan and cook t...,"['olive oil', 'red onion', 'light brown sugar'..."
3,pork tenderloin with hoisin,15,"['cut pork into 1 / 4-inch slices', 'in a larg...","['pork tenderloin', 'soy sauce', 'hoisin sauce..."
4,mixed baby greens with oranges grapefruit and...,15,['in a salad bowl combine the lettuce with the...,"['mixed baby greens', 'oranges', 'grapefruit',..."


**Read `ready.csv`**

In [3]:
#df = pd.read_csv('../data/ready.csv')
#df.head(1)

# Using Google Colab

path = "/content/drive/My Drive/recipe-recommender/ready.csv"
df = pd.read_csv(path)
df.head(1)

Unnamed: 0,name,1-day-or-more,15-minutes-or-less,3-steps-or-less,30-minutes-or-less,4-hours-or-less,5-ingredients-or-less,60-minutes-or-less,african,american,amish-mennonite,appetizers,apples,argentine,asian,asparagus,australian,austrian,bacon,baja,baking,bananas,bar-cookies,barbecue,beans,beef,beef-organ-meats,beef-ribs,beef-sausage,beginner-cook,belgian,berries,beverages,birthday,biscotti,bisques-cream-soups,black-beans,blueberries,brazilian,bread-machine,...,superbowl,swedish,sweet,sweet-sauces,swiss,tarts,taste-mood,technique,tex-mex,thai,thanksgiving,tilapia,to-go,toddler-friendly,tomatoes,tropical-fruit,tuna,turkey,turkey-breasts,turkish,valentines-day,veal,vegan,vegetables,vegetarian,very-low-carbs,vietnamese,water-bath,wedding,weeknight,welsh,white-rice,whole-chicken,whole-turkey,wild-game,wings,winter,yams-sweet-potatoes,yeast,zucchini
0,crab filled crescent snacks,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


<a id='preprocessing'></a>
## 3. Preprocessing
Before calculating the pairwise distances of 30,000 recipes, I need to set the `name` column as index and convert data into a CSR matrix.

Note that I did not scale my data because everything is in binary. However, you should always scale your data if you are building a recommender system using non-binary values to control for differences in measurements.

In [4]:
name_as_index = df.set_index('name')

df_sparse = sparse.csr_matrix(name_as_index.fillna(0))

df_sparse.shape

(30000, 388)

<a id='pairwise'></a>
## 4. Calculate Pairwise Distance
Using Scikit-learn's `pairwise_distances` method and setting the metric to `cosine` to evaluate all pairs of sequences and transform the differences into a distance.

The data is stored in the format of a `DataFrame`.

In [0]:
recommender = pairwise_distances(df_sparse, metric='cosine')

In [6]:
%%time

recommender_df = pd.DataFrame(recommender, 
                              columns=name_as_index.index,
                              index=name_as_index.index)
recommender_df.head()

CPU times: user 2.35 ms, sys: 0 ns, total: 2.35 ms
Wall time: 1.61 ms


<a id='system'></a>
## 5. RECiPE ROULETTE: Recommender System
Now, it's time to try out the recommender system!

Since you must input the exact name of the recipe that's included in the recommender, here are 10 examples of the recipes you can choose from.

In [7]:
df.sample(10, random_state=55)[['name']]

Unnamed: 0,name
24525,pizza swirls
21205,almost unsweetened applesauce homemade
29812,cauliflower and coriander soup
3932,mango pineapple cobbler
19204,sangria chicken
9581,lamb stew with tomatoes chickpeas and spices
19906,traditional tourtiere
13239,spicy peanut yogurt dip
1364,orange soda ice cream electric ice cream maker
4317,thai corn chowder


**Try RECiPE ROULETTE!**

In [10]:
x = input("Name the recipe you've rated highly: ").lower()

i = random.randint(0,4)

find = recommender_df[x].sort_values()[1:6].reset_index(drop=False).loc[i,'name']

recommendation = recipe_df[recipe_df['name']==find].reset_index(drop=True)

recipe = recommendation.loc[0, 'name']
minutes = recommendation.loc[0, 'minutes']
ingredients = recommendation.loc[0, 'ingredients']
steps = recommendation.loc[0, 'steps']

print()
print(colored("RECiPE ROULETTE recommends:", attrs=['bold']))
print(colored(recipe.title(), 'white', 'on_magenta'))
print()
print("It takes "+colored(minutes, 'magenta', attrs=['bold'])+" minute(s).")
print()
print(colored("You will need:", attrs=['bold']))
print(colored("- "+ingredients[2:-2].replace("', '", "\n- "), 'magenta'))
print()
print(colored("And this is how:", attrs=['bold']))
print(colored("- "+steps[2:-2].replace("', '", "\n- "), 'magenta'))

Name the recipe you've rated highly: thai corn chowder

[1mRECiPE ROULETTE recommends:[0m
[45m[37mStrawberries Jubilee   Fat Free[0m

It takes [1m[35m20[0m minute(s).

[1mYou will need:[0m
[35m- orange rind
- orange juice
- granulated sugar
- real vanilla
- cornstarch
- fresh strawberries
- brandy[0m

[1mAnd this is how:[0m
[35m- in large skillet over medium heat , stir together orange rind strips , orange juice , sugar and vanilla until sugar is dissolved , about 4 minutes
- bring to boil over medium-high heat
- whisk cornstarch with 3 tbsp water
- whisk into pan and boil , whisking constantly , until slightly thickened and glossy , about 1 minute
- discard orange rind strips
- add strawberries
- cook just until softened , about 2 minutes
- in small saucepan , heat brandy over medium heat
- remove from heat
- with long match , ignite brandy and pour , still flaming , over warm berries
- when flame dies , serve immediately
- if you are serving berries over ice cream or c