# Food Recommendation Systems 
Almost all of us have heard about the phrase "prevention is better than cure" and there is no better prevention than the right type of foods. The ingredients in food have medicinal properties and a recommendation system based on those ingredients would be useful for many like me


In [1]:
%env PYDEVD_DISABLE_FILE_VALIDATION=1


env: PYDEVD_DISABLE_FILE_VALIDATION=1


In [2]:
import pandas as pd
import re

## The Interactions between Users and the recipes

The raw_interaction.csv has the user_id and recipe_id and the ratings given by the users.

In [3]:
df=pd.read_csv("RAW_interactions.csv")
df.head()

Unnamed: 0,user_id,recipe_id,date,rating,review
0,38094,40893,2003-02-17,4,Great with a salad. Cooked on top of stove for...
1,1293707,40893,2011-12-21,5,"So simple, so delicious! Great for chilly fall..."
2,8937,44394,2002-12-01,4,This worked very well and is EASY. I used not...
3,126440,85009,2010-02-27,5,I made the Mexican topping and took it to bunk...
4,57222,85009,2011-10-01,5,"Made the cheddar bacon topping, adding a sprin..."


## Recipes dataset
<p>The recipes dataset has attributes such as ingredients, steps which can be used for developing a content-based collaborative filtering recommendation system.</p>

In [4]:
recipes_df=pd.read_csv("RAW_recipes.csv")
recipes_df.head()

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients
0,arriba baked winter squash mexican style,137739,55,47892,2005-09-16,"['60-minutes-or-less', 'time-to-make', 'course...","[51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0]",11,"['make a choice and proceed with recipe', 'dep...",autumn is my favorite time of year to cook! th...,"['winter squash', 'mexican seasoning', 'mixed ...",7
1,a bit different breakfast pizza,31490,30,26278,2002-06-17,"['30-minutes-or-less', 'time-to-make', 'course...","[173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0]",9,"['preheat oven to 425 degrees f', 'press dough...",this recipe calls for the crust to be prebaked...,"['prepared pizza crust', 'sausage patty', 'egg...",6
2,all in the kitchen chili,112140,130,196586,2005-02-25,"['time-to-make', 'course', 'preparation', 'mai...","[269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0]",6,"['brown ground beef in large pot', 'add choppe...",this modified version of 'mom's' chili was a h...,"['ground beef', 'yellow onions', 'diced tomato...",13
3,alouette potatoes,59389,45,68585,2003-04-14,"['60-minutes-or-less', 'time-to-make', 'course...","[368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0]",11,['place potatoes in a large pot of lightly sal...,"this is a super easy, great tasting, make ahea...","['spreadable cheese with garlic and herbs', 'n...",11
4,amish tomato ketchup for canning,44061,190,41706,2002-10-25,"['weeknight', 'time-to-make', 'course', 'main-...","[352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0]",5,['mix all ingredients& boil for 2 1 / 2 hours ...,my dh's amish mother raised him on this recipe...,"['tomato juice', 'apple cider vinegar', 'sugar...",8


In [5]:
recipes_df.describe()

Unnamed: 0,id,minutes,contributor_id,n_steps,n_ingredients
count,231637.0,231637.0,231637.0,231637.0,231637.0
mean,222014.708984,9398.546,5534885.0,9.765499,9.051153
std,141206.635626,4461963.0,99791410.0,5.995128,3.734796
min,38.0,0.0,27.0,0.0,1.0
25%,99944.0,20.0,56905.0,6.0,6.0
50%,207249.0,40.0,173614.0,9.0,9.0
75%,333816.0,65.0,398275.0,12.0,11.0
max,537716.0,2147484000.0,2002290000.0,145.0,43.0


## Medicines dataset
The medicine dataset has the medicinal benefits for common ingredients like Garlic, ginger etc

In [6]:
medicinal_df=pd.read_csv("medicines.csv").dropna()
medicinal_df.head(10)

Unnamed: 0,Ingredients,benefits
0,Sandy everlasting,Gastrointestinal disorders
1,Diuretic herbal tea combinations,Urinary tract and genital disorders
2,Milkthistle Fruit,Gastrointestinal disorders
3,Garlic,Cough and cold
4,Garlic,Circulatory disorders
5,"Mastic (Mastix, Pistaciae lentisci resina)",Skin disorders and minor wounds
6,"Mastic (Mastix, Pistaciae lentisci resina)",Gastrointestinal disorders
7,Wild Strawberry Leaf,Urinary tract and genital disorders
8,Wild Strawberry Leaf,Gastrointestinal disorders
9,Mallow leaf,Cough and cold


In [7]:
medicinal_df.describe()  # There are 280 ingredients in the dataset

Unnamed: 0,Ingredients,benefits
count,280,280
unique,174,14
top,Yarrow,Gastrointestinal disorders
freq,4,55


## Matching the medicinal properties
Using the medicinal benefits of ingredients in the medicinal dataset, we match it to the recipes dataset by tokenizing and matching the words and saving it to a separate csv

In [8]:
"""
from fuzzywuzzy import fuzz

# Create a new column for medicinal properties
recipes_df['Medicinal_Properties'] = ''

# Create an empty list to store the medicinal properties
medicinal_properties = []

# Convert the 'ingredients' column from string to list
recipes_df['ingredients'] = recipes_df['ingredients'].str.split(',')
print("starting")
# Iterate through the recipe ingredients
for ingredients in recipes_df['ingredients']:
    found_ingredients = []  # List to store found ingredients in the current recipe

    # Check if ingredients is not NaN or None
    if isinstance(ingredients, list):
        # Iterate through each ingredient in the recipe
        for ingredient in ingredients:
            best_match_score = 0
            best_match_ingredient = None

            # Find the best matching ingredient in the medicinal DataFrame
            for medicinal_ingredient in medicinal_df['Ingredients']:
                match_score = fuzz.token_set_ratio(ingredient.strip().lower(), medicinal_ingredient.strip().lower())
                if match_score > best_match_score:
                    best_match_score = match_score
                    best_match_ingredient = medicinal_ingredient

            # Check if a matching ingredient with sufficient similarity score was found
            if best_match_ingredient and best_match_score >= 80:
           
                found_ingredients.append(best_match_ingredient)

    # Get the medicinal properties for the found ingredients
    properties = []
    for found_ingredient in found_ingredients:
    
        properties.extend(medicinal_df.loc[medicinal_df['Ingredients'] == found_ingredient, 'benefits'].values)
    medicinal_properties.append(properties)

# Assign the medicinal properties to the new column
recipes_df['Medicinal_Properties'] = medicinal_properties
print("done ")

recipes_df.to_csv('recipes_with_medicinal_properties.csv', index=False)
"""

'\nfrom fuzzywuzzy import fuzz\n\n# Create a new column for medicinal properties\nrecipes_df[\'Medicinal_Properties\'] = \'\'\n\n# Create an empty list to store the medicinal properties\nmedicinal_properties = []\n\n# Convert the \'ingredients\' column from string to list\nrecipes_df[\'ingredients\'] = recipes_df[\'ingredients\'].str.split(\',\')\nprint("starting")\n# Iterate through the recipe ingredients\nfor ingredients in recipes_df[\'ingredients\']:\n    found_ingredients = []  # List to store found ingredients in the current recipe\n\n    # Check if ingredients is not NaN or None\n    if isinstance(ingredients, list):\n        # Iterate through each ingredient in the recipe\n        for ingredient in ingredients:\n            best_match_score = 0\n            best_match_ingredient = None\n\n            # Find the best matching ingredient in the medicinal DataFrame\n            for medicinal_ingredient in medicinal_df[\'Ingredients\']:\n                match_score = fuzz.token_set

In [9]:
recipes_df=pd.read_csv("recipes_with_medicinal_properties.csv")
recipes_df.head()

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients,Medicinal_Properties
0,arriba baked winter squash mexican style,137739,55,47892,2005-09-16,"['60-minutes-or-less', 'time-to-make', 'course...","[51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0]",11,"['make a choice and proceed with recipe', 'dep...",autumn is my favorite time of year to cook! th...,"[""['winter squash'"", "" 'mexican seasoning'"", ""...",7,[]
1,a bit different breakfast pizza,31490,30,26278,2002-06-17,"['30-minutes-or-less', 'time-to-make', 'course...","[173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0]",9,"['preheat oven to 425 degrees f', 'press dough...",this recipe calls for the crust to be prebaked...,"[""['prepared pizza crust'"", "" 'sausage patty'""...",6,[]
2,all in the kitchen chili,112140,130,196586,2005-02-25,"['time-to-make', 'course', 'preparation', 'mai...","[269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0]",6,"['brown ground beef in large pot', 'add choppe...",this modified version of 'mom's' chili was a h...,"[""['ground beef'"", "" 'yellow onions'"", "" 'dice...",13,[]
3,alouette potatoes,59389,45,68585,2003-04-14,"['60-minutes-or-less', 'time-to-make', 'course...","[368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0]",11,['place potatoes in a large pot of lightly sal...,"this is a super easy, great tasting, make ahea...","[""['spreadable cheese with garlic and herbs'"",...",11,"['Cough and cold', 'Circulatory disorders']"
4,amish tomato ketchup for canning,44061,190,41706,2002-10-25,"['weeknight', 'time-to-make', 'course', 'main-...","[352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0]",5,['mix all ingredients& boil for 2 1 / 2 hours ...,my dh's amish mother raised him on this recipe...,"[""['tomato juice'"", "" 'apple cider vinegar'"", ...",8,"['Mouth and throat disorders', 'Gastrointestin..."


In [10]:
recipes_df=recipes_df[recipes_df.astype(str)['Medicinal_Properties'] != '[]'] #drop empty lists row in medicinal_properties
recipes_df=recipes_df.head(10000)
recipes_df["name"]

3                                       alouette  potatoes
4                       amish  tomato ketchup  for canning
6                                    aww  marinated olives
7                           backyard style  barbecued ribs
10                             berry  good sandwich spread
                               ...                        
19914                         beef  california roll  salad
19916                                      beef  a la mode
19917    beef  n bean burrito stack  crock pot  slow co...
19918                                         beef  n beer
19919                             beef  n noodle casserole
Name: name, Length: 10000, dtype: object

# Content Based Recommendation System
We will use TF-IDF (Term Frequency - Inverse Document Frequency) to see how important the words are in a recipe and then we will use cosine-similarity to compare the similarity between teh recipes

In [11]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import scipy.sparse

# Combine ingredients, steps, and medicinal properties into a single text column
recipes_df['text'] = recipes_df['ingredients'] + ' ' + recipes_df['steps'] + ' ' + recipes_df['Medicinal_Properties'].apply(lambda x: ' '.join(x))
# Check for missing values in the 'text' column
missing_values = recipes_df['text'].isnull().sum()
print("Missing values:", missing_values)

# Drop rows with missing values in the 'text' column
recipes_df = recipes_df.dropna(subset=['text'])

# Check for empty strings in the 'text' column
empty_strings = (recipes_df['text'] == '').sum()
print("Empty strings:", empty_strings)
# Initialize the TF-IDF vectorizer
vectorizer = TfidfVectorizer()

# Compute the TF-IDF matrix
tfidf_matrix = vectorizer.fit_transform(recipes_df['text'])

# Compute the cosine similarity matrix
cosine_sim = cosine_similarity(tfidf_matrix)

# Print the cosine similarity matrix
print(cosine_sim)
print(len(cosine_sim))

Missing values: 0
Empty strings: 0
[[1.         0.05970882 0.08043023 ... 0.15645319 0.21571308 0.11820604]
 [0.05970882 1.         0.03994626 ... 0.06687041 0.05860836 0.05838822]
 [0.08043023 0.03994626 1.         ... 0.1058544  0.06561466 0.0764586 ]
 ...
 [0.15645319 0.06687041 0.1058544  ... 1.         0.20212531 0.21898725]
 [0.21571308 0.05860836 0.06561466 ... 0.20212531 1.         0.25331932]
 [0.11820604 0.05838822 0.0764586  ... 0.21898725 0.25331932 1.        ]]
10000


In [12]:
recipes_df.reset_index(drop=True, inplace=True)


## Getting recommendations
Now we are recommending the top 5 recipes which are similar when we pass in a recipe title

In [13]:

# Function to get recipe recommendations based on cosine similarity
def get_recipe_recommendations(recipe_title, cosine_sim_matrix, recipes_df, top_n=5):
    # Get the index of the recipe title
    recipe_index = recipes_df[recipes_df['name'] == recipe_title].index[0]

    # Get the cosine similarity scores of the recipe index
    similarity_scores = cosine_sim_matrix[recipe_index]


    # Sort the recipes based on similarity scores
    top_indices = np.argsort(similarity_scores)[::-1][1:top_n+1]

    # Get the recipe IDs of top similar recipes
    top_recipe_ids = recipes_df['id'].iloc[top_indices]

    # Return the top recommended recipe IDs
    #return top_recipe_ids
    return [(recipes_df.loc[index, 'id'], recipes_df.loc[index, 'name']) for index in top_indices]

# Example usage
recipe_title = 'amish  tomato ketchup  for canning'
recommendations = get_recipe_recommendations(recipe_title, cosine_sim, recipes_df)
print(f"Recommended recipes for '{recipe_title}':")
_ = [print(f"{i}: {recipe_name}") for i, (_, recipe_name) in enumerate(recommendations, 1)]


Recommended recipes for 'amish  tomato ketchup  for canning':
1: apple barbecue sauce
2: baking spice   copycat pampered chef cinnamon plus mix
3: banana peppers stuffed with vienna sausages  or hot dogs
4: apple honey vinaigrette
5: banana bread in a jar


In [14]:
row_1 = recipes_df.loc[recipes_df['id'] == 273460]
medicinal_properties = row_1['Medicinal_Properties'].values[0]
print(medicinal_properties)


['Cough and cold', 'Circulatory disorders', 'Gastrointestinal disorders']


## Recommending based on medicinal properties

In [15]:
def search_recipes_by_medicinal_properties(medicinal_property, recipes_df, top_n=5):
    # Define the regular expression pattern for matching
    pattern = r'\b{}\b'.format(medicinal_property)

    # Filter recipes by the given medicinal property using regular expression matching
    filtered_recipes = recipes_df[recipes_df['Medicinal_Properties'].str.contains(pattern, case=False, regex=True)]

    # If there are no recipes with the given medicinal property, return an empty list
    if len(filtered_recipes) == 0:
        return []

    # Get a random recipe from the filtered recipes
    random_recipe = filtered_recipes.sample(n=1)

    # Get recommendations based on the random recipe
    recommendations = get_recipe_recommendations(random_recipe['name'].values[0], cosine_sim, recipes_df, top_n)

    # Return the recommended recipe IDs and titles
    return [(recipe_id, recipe_title) for recipe_id, recipe_title in recommendations]

# Example usage
medicinal_property = 'Circulatory disorders'
recommendations = search_recipes_by_medicinal_properties(medicinal_property, recipes_df)
print(f"Recommended recipes for '{medicinal_property}':")
_ = [print(f"{i}: {recipe_title}") for i, (_, recipe_title) in enumerate(recommendations, 1)]



Recommended recipes for 'Circulatory disorders':
1: baked lemon chicken breast
2: baked chicken in herbed gravy
3: 15 minute garlic lemon chicken
4: authentic chicken marsala
5: baked garlic chicken pieces


# User-based Colloborative System

In [16]:
filtered_user_interactions = df[df['recipe_id'].isin(recipes_df['id'])]
filtered_user_interactions['user_id'].nunique()
#filtered_user_interactions=filtered_user_interactions.head(10000)


21832

In [17]:
filtered_user_interactions.head(5)

Unnamed: 0,user_id,recipe_id,date,rating,review
26,135017,254596,2007-09-29,5,I am extremely picky about my crockpot recipes...
27,224088,254596,2007-10-21,4,Pork + fruit + crockpot = awesome.\n\nA very g...
28,582223,254596,2007-12-09,5,really good my 18 month old loved it and she's...
29,1413963,254596,2009-10-15,5,One word says it all . Delicious!!!
35,6258,20930,2002-07-09,5,"Jan, what an interesting combination of flavor..."


## Filtering recipes based on ratings
We only keep the recipes that has atleast more than 5 ratings

In [18]:
agg_ratings = filtered_user_interactions.groupby('recipe_id').agg(mean_rating = ('rating', 'mean'),
                                                number_of_ratings = ('rating', 'count')).reset_index()
# Keep the recipes with over 5 ratings
agg_ratings_GT100 = agg_ratings[agg_ratings['number_of_ratings']>5]
agg_ratings_GT100.head()

Unnamed: 0,recipe_id,mean_rating,number_of_ratings
3,150,4.583333,12
4,153,3.791667,48
10,356,4.444444,18
13,436,5.0,6
16,520,3.666667,12


In [19]:
# Check popular movies
agg_ratings_GT100.sort_values(by='number_of_ratings', ascending=False).head()

Unnamed: 0,recipe_id,mean_rating,number_of_ratings
784,32204,4.52541,1220
666,28768,4.316071,560
2362,92095,4.308793,489
2777,107997,4.623053,321
691,29598,4.621622,296


In [20]:
df_GT100 = pd.merge(filtered_user_interactions, agg_ratings_GT100[['recipe_id']], on='recipe_id', how='inner')
df_GT100.head()

Unnamed: 0,user_id,recipe_id,date,rating,review
0,522099,424415,2010-05-21,5,I really didn't expect to like this rice as mu...
1,171790,424415,2010-05-22,4,What a wonderful aroma while cooking. Dinner g...
2,491979,424415,2010-05-24,5,Perfect rice. We love Basmati RiceMade for ZWT...
3,724516,424415,2010-06-22,5,"Well I got perfect rice also, I served this wi..."
4,480195,424415,2010-06-25,5,The rice turned out perfectly and had good fla...


In [21]:
# Number of users
print('The ratings dataset has', df_GT100['user_id'].nunique(), 'unique users')
# Number of movies
print('The ratings dataset has', df_GT100['recipe_id'].nunique(), 'unique recipes')
# Number of ratings
print('The ratings dataset has', df_GT100['rating'].nunique(), 'unique ratings')
# List of unique ratings
print('The unique ratings are', sorted(df_GT100['rating'].unique()))


The ratings dataset has 17403 unique users
The ratings dataset has 1964 unique recipes
The ratings dataset has 6 unique ratings
The unique ratings are [0, 1, 2, 3, 4, 5]


In [22]:
# Create the user-item rating matrix
# Set 'user_id' as the index

user_ratings = df_GT100.pivot(index='user_id', columns='recipe_id', values='rating')
user_ratings.fillna(0, inplace=True)

user_ratings.head(5)



recipe_id,150,153,356,436,520,638,647,825,931,1042,...,498564,499804,500602,506787,510905,515833,517511,518145,518151,523538
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1533,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1535,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1581,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1634,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1676,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [23]:
user_ratings.info()

<class 'pandas.core.frame.DataFrame'>
Index: 17403 entries, 1533 to 2002360398
Columns: 1964 entries, 150 to 523538
dtypes: float64(1964)
memory usage: 260.9 MB


## Recommendations based on cosine similarity
We are creating a user matrix using cosine similarity to find out similar users and get the most similar users based on the weighted ratings 

In [30]:
user_sim = cosine_similarity(user_ratings)
user_sim

array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [31]:
def get_top_similar_users(user_id, user_sim_matrix, user_ratings, top_n=5):
    # Get the index of the user_id in the user_ratings DataFrame
    user_index = user_ratings.index.get_loc(user_id)

    # Get the cosine similarity scores of the user index
    similarity_scores = user_sim_matrix[user_index]

    # Sort the users based on similarity scores
    user_scores = sorted(enumerate(similarity_scores), key=lambda x: x[1], reverse=True)

    # Get the indices of top similar users
    top_user_indices = [i[0] for i in user_scores[1:top_n+1]]

    # Get the user ids corresponding to the top similar user indices
    top_user_ids = [user_ratings.index[i] for i in top_user_indices]

    return top_user_ids
users=get_top_similar_users(3288, user_sim, user_ratings, top_n=5)
users

[12301, 68613, 107144, 117396, 125019]

In [32]:
def get_user_recipe_recommendations(user_id, user_ratings, user_sim_matrix, recipes_df, top_n=5):
    # Create a dictionary to map user IDs to indices
    user_id_to_index = {user_id: index for index, user_id in enumerate(user_ratings.index)}
    # Get the index of the user_id in the user_ratings DataFrame
    user_index = user_id_to_index[user_id]
    # Get the top similar users for the given user
    top_similar_users = get_top_similar_users(user_id, user_sim_matrix, user_ratings, top_n)
    # Get the recipes rated by the top similar users
    similar_user_ratings = user_ratings.loc[top_similar_users]
    top_similar_user_indices = [user_id_to_index[user_id] for user_id in users]
    # Compute the weighted average ratings for the recipes
    weighted_ratings = np.dot(similar_user_ratings.values.T, user_sim_matrix[user_index, top_similar_user_indices]) / np.sum(user_sim_matrix[user_index, top_similar_user_indices])
    # Get the top 5 recipe IDs
    top_recipe_indices = np.argsort(weighted_ratings)[::-1][:top_n]  
    recommendations = recipes_df.iloc[top_recipe_indices]
    return recommendations

recommendations=get_user_recipe_recommendations(11882, user_ratings, user_sim, recipes_df, top_n=5)
#recommendations=get_user_recipe_recommendations(24386, user_ratings, user_sim_matrix, recipes_df, top_n=5)

# Print the top 5 recipe names with numbering
print("The top suggestions for you are:")
for i, recipe_name in enumerate(recommendations['name'], 1):
    print(f"{i}: {recipe_name}")


The top suggestions for you are:
1: amazin raisin cake
2: 30 minute pasta sauce
3: 30 minute turkey chili
4: 31st year meatloaf
5: 35 calorie pumpkin cookies


  weighted_ratings = np.dot(similar_user_ratings.values.T, user_sim_matrix[user_index, top_similar_user_indices]) / np.sum(user_sim_matrix[user_index, top_similar_user_indices])


The problem with many user based recommendation system is that they often suggest the most rated items on top but a user normally wants to find new suggestions that he doesn't know about so we need to randomly suggest some recipes

In [34]:
def get_user_recipe_recommendations(user_id, user_ratings, user_sim_matrix, recipes_df, top_n=2, random_n=3):
    # Create a dictionary to map user IDs to indices
    user_id_to_index = {user_id: index for index, user_id in enumerate(user_ratings.index)}

    # Get the index of the user_id in the user_ratings DataFrame
    user_index = user_id_to_index[user_id]

    # Get the top similar users for the given user
    top_similar_users = get_top_similar_users(user_id, user_sim_matrix, user_ratings,top_n)

    # Get the recipes rated by the top similar users
    similar_user_ratings = user_ratings.loc[user_ratings.index.isin(top_similar_users)]

    # Get the indices of the top similar users
    top_similar_user_indices = [user_id_to_index[user_id] for user_id in top_similar_users]

    if len(top_similar_users) >= top_n:
        # Compute the weighted average ratings for the recipes
        weighted_ratings = np.dot(similar_user_ratings.values.T, user_sim_matrix[user_index, top_similar_user_indices]) / np.sum(user_sim_matrix[user_index, top_similar_user_indices])
        # Sort the recipes based on the weighted ratings
        sorted_recipe_indices = np.argsort(weighted_ratings)[::-1][:top_n]
    else:
        # If there are fewer top similar users than top_n, return all available recommendations
        sorted_recipe_indices = np.arange(len(recipes_df))

    # Get the top rated recommendations
    top_recommendations = recipes_df.iloc[sorted_recipe_indices]

    # Get some random recommendations
    random_recommendations = recipes_df.sample(n=random_n)

    # Concatenate the top rated and random recommendations
    recommendations = pd.concat([top_recommendations, random_recommendations])

    return recommendations


recommendations=get_user_recipe_recommendations(11882, user_ratings, user_sim, recipes_df, top_n=2, random_n=3)
print("The top suggestions for you are:")
for i, recipe_name in enumerate(recommendations['name'], 1):
    print(f"{i}: {recipe_name}")

The top suggestions for you are:
1: some like it hot
2: amazin raisin cake
3: 2bleu s wet sauce for ribs and more
4: artichoke tomato bruschetta
5: baked eggplant with feta cheese


In [28]:
def get_combined_recommendations(user_id, recipe_title, user_ratings, user_sim_matrix, cosine_sim_matrix, recipes_df, top_n_collab=3, top_n_content=2):
    # Step 1: Get user-based collaborative filtering recommendations
    collab_recommendations = get_user_recipe_recommendations(user_id, user_ratings, user_sim_matrix, recipes_df, top_n=top_n_collab)

    # Step 2: Get content-based filtering recommendations
    content_recommendations = get_recipe_recommendations(recipe_title, cosine_sim_matrix, recipes_df, top_n=top_n_content)

    # Step 3: Combine the recommendations
    combined_recommendations = []

    for recommendation in collab_recommendations:
        recipe_id = recommendation[0]
        recipe_name = recipes_df.loc[recipes_df['id'] == recipe_id, 'name'].values
        if len(recipe_name) > 0:
            combined_recommendations.append((recipe_id, recipe_name[0]))

    for recommendation in content_recommendations:
        recipe_id = recommendation[0]
        recipe_name = recommendation[1]
        combined_recommendations.append((recipe_id, recipe_name))

    return combined_recommendations


# Example usage
user_id = 3288
recipe_title = 'amish  tomato ketchup  for canning'
recommendations = get_combined_recommendations(user_id, recipe_title, user_ratings, user_sim, cosine_sim, recipes_df, top_n_collab=4, top_n_content=1)
print(f"Combined recommendations for user {user_id} and recipe '{recipe_title}':")
print(recommendations)


Combined recommendations for user 3288 and recipe 'amish  tomato ketchup  for canning':
[(236350, 'apple barbecue sauce')]


#Hybrid Colloborative System
It is often better to go with the best of both worlds and a hybrid collobarativse system with both user based and item to item collobarative system would be ideal for a food recommendation system

In [36]:
def get_combined_recommendations(user_id, recipe_title, user_ratings, user_sim_matrix, cosine_sim_matrix, recipes_df, top_n_collab=1, top_n_content=1):
    # Step 1: Get user-based collaborative filtering recommendations
    collab_recommendations = get_user_recipe_recommendations(user_id, user_ratings, user_sim_matrix, recipes_df, top_n=top_n_collab,random_n=0)

    # Step 2: Get content-based filtering recommendations
    content_recommendations = get_recipe_recommendations(recipe_title, cosine_sim_matrix, recipes_df, top_n=top_n_content)

    # Step 3: Combine the recommendations
    combined_recommendations = []

    for recommendation in collab_recommendations.itertuples():
        recipe_id = recommendation.id
        recipe_name = recommendation.name
        combined_recommendations.append((recipe_id, recipe_name))

    for recommendation in content_recommendations:
        recipe_id = recommendation[0]
        recipe_name = recommendation[1]
        combined_recommendations.append((recipe_id, recipe_name))

    return combined_recommendations


# Example usage
user_id = 3288
recipe_title = 'amish  tomato ketchup  for canning'
recommendations = get_combined_recommendations(user_id, recipe_title, user_ratings, user_sim, cosine_sim, recipes_df, top_n_collab=2, top_n_content=3)
print(f"Combined recommendations for user {user_id} and recipe '{recipe_title}':")
for i, (recipe_id, recipe_name) in enumerate(recommendations, 1):
    print(f"{i}: {recipe_name}")

Combined recommendations for user 3288 and recipe 'amish  tomato ketchup  for canning':
1: kinda sorta  hungarian goulash
2: amazin raisin cake
3: apple barbecue sauce
4: baking spice   copycat pampered chef cinnamon plus mix
5: banana peppers stuffed with vienna sausages  or hot dogs
