## Recommender doc


Considered strategies :

+ Recommendation based on ingredients distance
+ Recommendation based on type clustering
+ Recommendation based on user - recipe rating (collaborative filtering)

For this test implemented based on ingredients distance

### Recommendation based on ingredients distance

This recommender is based solely on the similarity between ingredients between recipes. 

For this strategy to work on a notion of distance measuring the similarity between two recipes is needed.

The distance will be defined as normalized scalar product between a one hot encoder vector of ingredients between two recipes.

Once the distance is defined, given a recipe , the similarity of such ingredient versus  others is computed and the N more similar are sorted.

Other aspects should be considered (allergies, if the user had previously rated the proposed recipes...) that are not considered in this notebook.

### Recommendation based on type clustering

This strategy is based on the similarity of recipes types (for example similarity between chinese food and indian).

Again for this strategy to work on a notion of distance measuring the similarity between two types is needed.

To define the distance, a classification of the $M$ different types in $k$ clusters $(k<M)$ is defined. 

Once the distance is defined, given a recipe , the similarity of such type versus others is computed and the N more similar are sorted.

Other aspects should be considered (allergies, if the user had previously rated the proposed recipes...) that are not considered in this notebook.

Work in progress.

### Recommendation based on user - rating (collaborative filtering)

The idea beneath this strategy is to implement a recommender using collaborative filtering techniques.

This type of techniques uses the different ratings of users to create similarities between them to output recommendations such as : "Other "similar" users also liked this"

Work in progress

## Recommendation based on ingredients demo

In [1]:
import pandas as pd
import numpy as np



In [2]:
db_recipes_ingredients = pd.read_csv("one_hot_ingredients.csv")
db_recipes_ingredients = pd.DataFrame(db_recipes_ingredients)
db_recipes_ingredients = db_recipes_ingredients.set_index('Unnamed: 0')
db_recipes_ingredients.head()

Unnamed: 0_level_0,69,Cos,Doves,English,Farm,Greek,Jarlsberg,Kalamata,Kashmiri,Parmesan,...,yogurt,yolk,yolks,zeera,zest,Â¼,Â½,Â½red,Â¾,â…“
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Brandade',0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,1,0
'Dragon prawn' noodles,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
'Firecracker' prawns,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
'Guilt-free gourmet' sticky toffee pudding,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,1,0,0,0
'Lion head' meatballs,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,1,0,0,0


In [5]:
def distance_recipes(recipe1,recipe2,db_recipes_ingredients):
    return (np.dot(db_recipes_ingredients.loc[recipe1].values\
                ,db_recipes_ingredients.loc[recipe2].values)/np.sum(db_recipes_ingredients.loc[recipe1].values))

def distance_recipes_norm(recipe1,recipe2,db_recipes_ingredients):
    return (2*(np.dot(db_recipes_ingredients.loc[recipe1].values\
                ,db_recipes_ingredients.loc[recipe2].values)/np.sum(db_recipes_ingredients.loc[recipe1].values)-0.5))

In [5]:
distance_recipes("'Brandade'","'Brandade'",db_recipes_ingredients)

1.0

In [6]:
distance_recipes_norm("'Dragon prawn' noodles","'Dragon prawn' noodles",db_recipes_ingredients)

1.0

In [7]:
distance_recipes("'Brandade'","'Dragon prawn' noodles",db_recipes_ingredients)

0.22222222222222221

In [8]:
distance_recipes_norm("'Brandade'","'Dragon prawn' noodles",db_recipes_ingredients)

-0.55555555555555558

In [8]:
distance_recipes("'Guilt-free gourmet' sticky toffee pudding","'Dragon prawn' noodles",db_recipes_ingredients)

0.19047619047619047

In [9]:
def recommend_me(recipe1):
    M = 7
    df_aux = pd.DataFrame(index =db_recipes_ingredients.index)
    df_aux["dist"] = 0.0
    for rec in db_recipes_ingredients.index:
        df_aux.loc[rec]['dist'] = distance_recipes(recipe1,rec,db_recipes_ingredients)
    df_return = df_aux[(df_aux["dist"]>0.6) & (df_aux["dist"]!=1)].sort_values(by=["dist"])[0:M]
    return df_return.index

In [10]:
test = recommend_me("'Dragon prawn' noodles")

In [11]:
test

Index(['Pork chow mein', 'Delicious fried rice',
       'Fish in a hot and sour sauce', 'Galinha Ã  Portuguesa',
       'Garlic chicken with cucumber',
       'Stir-fried lobster with black bean sauce', 'Gong bao haddock goujons'],
      dtype='object', name='Unnamed: 0')

## Recommendation based on Types

The number is the "probability" of appearance of an ingredient. It has been computed as the frequency of appearence of the ingredient for a given type of food.

In [16]:
db_types_ingredients = pd.read_csv("one_hot_types_ingredients.csv")
db_types_ingredients = pd.DataFrame(db_types_ingredients)
db_types_ingredients = db_types_ingredients.set_index('type')
db_types_ingredients.head()

Unnamed: 0_level_0,69,Cos,Doves,English,Farm,Greek,Jarlsberg,Kalamata,Kashmiri,Parmesan,...,yogurt,yolk,yolks,zeera,zest,Â¼,Â½,Â½red,Â¾,â…“
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Albanian,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0
American,0.0,0.005747,0.0,0.034483,0.0,0.045977,0.0,0.005747,0.0,0.011494,...,0.0,0.074713,0.04023,0.0,0.103448,0.218391,0.793103,0.0,0.270115,0.0
Arabian,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.333333,0.333333,0.0,0.0,0.0,1.0,0.0,0.333333,0.0
Argentinian,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.333333,0.0,0.0,0.0,0.333333,1.0,0.0,0.333333,0.0
Armenian,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.333333,0.0,0.333333,0.0,0.0,0.0


In [57]:
def distance_types(type1,type2,db_types_ingredients):
    return (np.dot(db_types_ingredients.loc[type1].values\
                ,db_types_ingredients.loc[type2].values)/\
                max(np.sum(db_types_ingredients.loc[type1].values*db_types_ingredients.loc[type1].values),np.sum(db_types_ingredients.loc[type2].values*db_types_ingredients.loc[type2].values)))

def distance_types_norm(type1,type2,db_types_ingredients):
    return (2*(np.dot(db_types_ingredients.loc[type1].values\
                ,db_types_ingredients.loc[type2].values)/\
                max(np.sum(db_types_ingredients.loc[type1].values*db_types_ingredients.loc[type1].values),np.sum(db_types_ingredients.loc[type2].values*db_types_ingredients.loc[type2].values))-0.5))

Some examples :

In [35]:
distance_types("Albanian","American",db_types_ingredients)

0.28297755883962783

In [36]:
distance_types_norm("Albanian","American",db_types_ingredients)

-0.43404488232074434

In [58]:
distance_types_norm("American","American",db_types_ingredients)

1.0

In [59]:
distance_types("American","American",db_types_ingredients)

1.0

Commutative? 

In [69]:
distance_types_norm("Thai","Chinese",db_types_ingredients)

0.70386480094369541

In [70]:
distance_types_norm("Chinese","Thai",db_types_ingredients)

0.70386480094369541

In [71]:
distance_types("Spanish","Thai",db_types_ingredients)

0.64818223703396927

In [72]:
distance_types("Thai","Spanish",db_types_ingredients)

0.64818223703396927

In [73]:
distance_types_norm("Thai","Spanish",db_types_ingredients)

0.29636447406793853

In [74]:
distance_types_norm("Spanish","Thai",db_types_ingredients)

0.29636447406793853

Logical results ? (spanish - italian are more similar than italian-thai)

In [75]:
distance_types_norm("Spanish","Italian",db_types_ingredients)

0.75766186115969636

In [76]:
distance_types_norm("Italian","Thai",db_types_ingredients)

0.17957446372424779

In [77]:
distance_types_norm("Japanese","Thai",db_types_ingredients)

0.58509647778345419

In [78]:
distance_types_norm("Japanese","Spanish",db_types_ingredients)

0.39594140181898174