<a href="https://www.kaggle.com/code/scipioaemilianvs/food-recommendation?scriptVersionId=186206586" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

We are making a food recommendation system that suggests recipes based on preferences, ingredients available to the user and time constraints, using content based filtering and TFIDF vectorization.

Source and reference material that was helpful to understand the relevant concepts:

  1. https://towardsdatascience.com/building-a-recipe-recommendation-system-297c229dda7b
  
  2. https://medium.com/@honeyhulya16/recipe-recommendation-based-on-ingredients-ad43833cc5bd
  
  3. https://scikit-learn.org/stable/modules/feature_extraction.html#tfidf-term-weighting
 

Research papers that show alternative approaches to the problem:
1. https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2021.778417/full

2. https://dl.acm.org/doi/10.1145/3552485.3554941

3. https://arxiv.org/abs/2205.14005

4. https://arxiv.org/abs/1111.3919


Dataset used for this model:

https://www.kaggle.com/datasets/shuyangli94/food-com-recipes-and-user-interactions

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("/kaggle/input/final-data/final_data.csv")
df.head()


Unnamed: 0,id,name,minutes,tags,n_steps,steps,ingredients,n_ingredients,user_id,rating,review,rating_category,calories,total fat (PDV),sugar (PDV),sodium (PDV),protein (PDV),saturated fat (PDV),carbohydrates (PDV),calorie_status
0,49,chicken breasts lombardi,75,"['weeknight', 'time-to-make', 'course', 'main-...",22,['cook mushrooms in 2 tbsp butter in a large s...,"['fresh mushrooms', 'butter', 'boneless skinle...",10,19893,4,I give this recipe a good to high rating becau...,tasty,627.7,38.0,8.0,35.0,115.0,64.0,4.0,High Calory
1,62,black bean corn and tomato salad,25,"['30-minutes-or-less', 'time-to-make', 'course...",4,"['in a bowl whisk together lemon juice , oil ,...","['fresh lemon juice', 'olive oil', 'black bean...",9,16408,5,This recipe was awesome! I took it to a family...,tasty,407.8,23.0,17.0,0.0,34.0,11.0,18.0,Low Calory
2,66,black coffee barbecue sauce,30,"['lactose', '30-minutes-or-less', 'time-to-mak...",3,['combine all ingredients in a saucepan and si...,"['brewed coffee', 'ketchup', 'red wine vinegar...",11,42938,4,This was an excellent sauce! I did cut it in h...,tasty,772.0,6.0,657.0,93.0,13.0,2.0,63.0,High Calory
3,142,almond fudge banana cake,110,"['weeknight', 'time-to-make', 'course', 'prepa...",13,"['mash bananas and set aside', 'beat sugar and...","['dole banana', 'sugar', 'margarine', 'eggs', ...",11,914114,4,Came out great for a chocolate Easter cake. I ...,tasty,224.8,14.0,87.0,10.0,7.0,9.0,11.0,Low Calory
4,150,all purpose crock pot chicken,540,"['weeknight', 'time-to-make', 'course', 'main-...",21,"['if using thigh portions or a cut up fryer , ...","['chicken', 'cajun-louisiana seasoning blend',...",10,111342,5,This was great. I tweaked it a little and her...,tasty,247.2,26.0,0.0,4.0,42.0,24.0,0.0,Low Calory


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21272 entries, 0 to 21271
Data columns (total 20 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   id                   21272 non-null  int64  
 1   name                 21272 non-null  object 
 2   minutes              21272 non-null  int64  
 3   tags                 21272 non-null  object 
 4   n_steps              21272 non-null  int64  
 5   steps                21272 non-null  object 
 6   ingredients          21272 non-null  object 
 7   n_ingredients        21272 non-null  int64  
 8   user_id              21272 non-null  int64  
 9   rating               21272 non-null  int64  
 10  review               21272 non-null  object 
 11  rating_category      21272 non-null  object 
 12  calories             21272 non-null  float64
 13  total fat (PDV)      21272 non-null  float64
 14  sugar (PDV)          21272 non-null  float64
 15  sodium (PDV)         21272 non-null 

In [4]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
from typing import List




# Preprocessing
df['ingredients'] = df['ingredients'].apply(lambda x: x.lower().strip('"').strip("'"))
df['name'] = df['name'].apply(lambda x: x.lower())
df['steps'] = df['steps'].apply(lambda x: x.strip('"').strip("'"))
df['combined_features'] = df['name'] + ' ' + df['ingredients']

# Feature extraction
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df['combined_features'])

# Compute similarity matrix
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

# Function to suggest recipes based on user input dish name, ingredients, and preferred cooking time
def suggest_recipes(dish_name, ingredients, max_cooking_time, top_n=5):
    # Create a new user input string based on the dish name and ingredients
    user_input = f"{dish_name.lower()} {' '.join(ingredients).lower()}"
    user_input_vec = tfidf.transform([user_input])
    cosine_sim_user = linear_kernel(user_input_vec, tfidf_matrix)
    sim_scores_user = list(enumerate(cosine_sim_user[0]))
    sim_scores_user = sorted(sim_scores_user, key=lambda x: x[1], reverse=True)
    
    # Filter recipes based on cooking time
    recipe_indices_user = [i[0] for i in sim_scores_user if df.iloc[i[0]]['minutes'] <= max_cooking_time][:top_n]
    return df.iloc[recipe_indices_user]



# Function to clean and format the ingredients for display:
def clean_ingredients(ingredients):
    cleaned_ingredients = ingredients.strip("[]").replace("'", "").replace('"', '').split(', ')
    cleaned_ingredients = [ingredient.capitalize() for ingredient in cleaned_ingredients]
    return ', '.join(cleaned_ingredients)

#Similar function for cleaning the preparation steps:
def clean_steps(steps):
    cleaned_steps = steps.strip("[]").replace("'", "").replace('"', '').split(', ')
    cleaned_steps = [step.capitalize() for step in cleaned_steps]
    return '. '.join(cleaned_steps)


# Function to format the output
def format_output(recipes):
    for index, row in recipes.iterrows():
        print(f"Recipe Name: {row['name'].title()}\n")
        print(f"Ingredients: {clean_ingredients(row['ingredients'])}\n")
        print(f"Steps: {clean_steps(row['steps'])}\n")
        print(f"Cooking Time: {row['minutes']} minutes\n")
        print("-" * 80 + "\n")

# Function to format the output into a table
def format_output_table(recipes):
    table_data = []
    for index, row in recipes.iterrows():
        table_data.append([
            row['name'].title(),
            clean_ingredients(row['ingredients']),
            clean_steps(row['steps']),
            f"{row['minutes']} minutes"
        ])
    
    # Create a DataFrame for the table
    table_df = pd.DataFrame(table_data, columns=['Recipe Name', 'Ingredients', 'Steps', 'Cooking Time'])
    return table_df

# Example usage
dish_name = "Spaghetti Carbonara"
ingredients = ["spaghetti", "eggs", "cheese", "bacon"]
max_cooking_time = 30  # User's preferred maximum cooking time in minutes
suggested_recipes = suggest_recipes(dish_name, ingredients, max_cooking_time)

#Output without tables:
format_output(suggested_recipes)

# Format and print the output table
output_table = format_output_table(suggested_recipes)
output_table.head()


Recipe Name: Spaghetti Carbonara For One

Ingredients: Spaghetti, Garlic clove, Bacon, Egg, Fresh parmesan cheese

Steps: Cook the bacon in a cast iron skillet. Remove and drain on paper towel before crumbling. Add the minced garlic to the bacon grease and cook until fragrant. Remove from heat. Cook the pasta in boiling . Salted water. Drain and immediately place back in the pot with the bacon . Bacon grease and garlic. Place pot back on the stove over low heat . Add the beaten egg and whisk continuously for three minutes . Or until the egg cooks and thickens . Creating a silky sauce over the noodles. Be careful not to stop whisking or the egg will scramble !. Season with salt and pepper and serve immediately

Cooking Time: 20 minutes

--------------------------------------------------------------------------------

Recipe Name: Spaghetti Alla Carbonara

Ingredients: Spaghetti, Olive oil, Onion, Bacon, Garlic, Eggs, Heavy cream, Parmesan cheese, Salt and pepper

Steps: Cook the pasta f

Unnamed: 0,Recipe Name,Ingredients,Steps,Cooking Time
0,Spaghetti Carbonara For One,"Spaghetti, Garlic clove, Bacon, Egg, Fresh par...",Cook the bacon in a cast iron skillet. Remove ...,20 minutes
1,Spaghetti Alla Carbonara,"Spaghetti, Olive oil, Onion, Bacon, Garlic, Eg...",Cook the pasta following the instructions on t...,15 minutes
2,White Spaghetti,"Spaghetti, Olive oil, Garlic powder, Black pep...",Boil the pasta al-dente . Drain and place in a...,15 minutes
3,One Pot Spaghetti With Meat Sauce,"Ground turkey, Garlic cloves, Green pepper, On...",Brown meat with garlic . Green pepper and onio...,30 minutes
4,Fried Spaghetti,"Butter, Cooked spaghetti, Eggs, Milk, Parmesan...",Melt the butter in a heavy nonstick skillet. H...,20 minutes
