# Content-based recipe recommender system using the TF-IDF (Term Frequency-Inverse Document Frequency) technique
Building a Recipe Recommender System involves using techniques such as collaborative filtering, content-based filtering, or hybrid methods. Let's explore a simple content-based filtering approach where recommendations are made based on the content of recipes and user preferences.

Here's a basic example using TF-IDF (Term Frequency-Inverse Document Frequency) to represent the recipes and cosine similarity to find similar recipes. For demonstration purposes, let's consider the 'tags' and 'ingredients' columns for content-based recommendations:

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

# Load your dataset
recipes_df = pd.read_csv('recipes_w_search_terms.csv')
recipes_df_short = recipes_df.head(500)

# Combine relevant text features (tags, ingredients, cuisine, difficulty level, prep time)
recipes_df_short['content'] = (
    recipes_df_short['tags'] + ' ' +
    recipes_df_short['ingredients'].apply(', '.join) + ' ' +
    recipes_df_short['steps'].astype(str) + ' ' +
    recipes_df_short['description'].astype(str) + ' ' +
    recipes_df_short['search_terms'].astype(str)
)

# Initialize the TF-IDF vectorizer
print("Fitting TF-IDF vectorizer...")
tfidf_vectorizer = TfidfVectorizer(stop_words='english')

# Fit and transform the TF-IDF matrix
print("Transforming TF-IDF matrix...")
tfidf_matrix = tfidf_vectorizer.fit_transform(recipes_df_short['content'])

# Calculate cosine similarity between recipes
print("Calculating cosine similarity...")
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

# Function to get recipe recommendations
def get_recommendations(recipe_name, cosine_sim=cosine_sim):
    idx = recipes_df.index[recipes_df['name'] == recipe_name].tolist()[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]  # Get the top 11 similar recipes (excluding itself)
    recipe_indices = [score[0] for score in sim_scores]
    return recipes_df['name'].iloc[recipe_indices]

# Example: Get recommendations for a specific recipe
recipe_name = 'Grilled Garlic Cheese Grits'
print(f"Getting recommendations for '{recipe_name}'...")
recommendations = get_recommendations(recipe_name)

print(f"Recommendations for '{recipe_name}':")
print(recommendations)


Fitting TF-IDF vectorizer...
Transforming TF-IDF matrix...
Calculating cosine similarity...
Getting recommendations for 'Grilled Garlic Cheese Grits'...
Recommendations for 'Grilled Garlic Cheese Grits':
348                Tomato and Mushroom Omelette
233                     Sweet and Tangy Chicken
172                  Buttery Apples and Cabbage
418                       Butternut Squash Soup
8                           Potato-Crab Chowder
130             Quick &quot;beef&quot; Stir Fry
105                   Italian Chicken and Pasta
126    Hamburger-Vegetable Soup With Tortellini
306                             Orange Lemonade
168                  Chicken Breasts With Herbs
Name: name, dtype: object


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recipes_df_short['content'] = (


In [2]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

# Load your dataset
recipes_df = pd.read_csv('recipes_w_search_terms.csv')
recipes_df_short = recipes_df.head(500)

# Combine relevant text features
recipes_df_short['content'] = (
    recipes_df_short['tags'] + ' ' +
    recipes_df_short['ingredients'].apply(', '.join) + ' ' +
    recipes_df_short['steps'].astype(str) + ' ' +
    recipes_df_short['description'].astype(str) + ' ' +
    recipes_df_short['search_terms'].astype(str)
)

# Preprocess text data
recipes_df_short['content'] = recipes_df_short['content'].apply(lambda x: x.lower())  # Convert to lowercase, for example

# Initialize the TF-IDF vectorizer with adjusted parameters
tfidf_vectorizer = TfidfVectorizer(stop_words='english', min_df=0.1, max_df=0.9)

# Fit and transform the TF-IDF matrix
tfidf_matrix = tfidf_vectorizer.fit_transform(recipes_df_short['content'])

# Calculate cosine similarity between recipes
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

# Function to get recipe recommendations
def get_recommendations(recipe_name, cosine_sim=cosine_sim):
    idx = recipes_df.index[recipes_df['name'] == recipe_name].tolist()[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]  # Get the top 11 similar recipes (excluding itself)
    recipe_indices = [score[0] for score in sim_scores]
    return recipes_df.iloc[recipe_indices][['name', 'description', 'ingredients', 'steps', 'tags', 'search_terms']]

# Example: Get recommendations for a specific recipe
recipe_name = 'Grilled Garlic Cheese Grits'
print(f"Getting recommendations for '{recipe_name}'...")

# Get the recommendations DataFrame
recommendations_df = get_recommendations(recipe_name)

# Print the result in the specified format
print(f"\nRecipe: {recipe_name}")
print("Recommended Recipes list:")
recommendations_df




Getting recommendations for 'Grilled Garlic Cheese Grits'...

Recipe: Grilled Garlic Cheese Grits
Recommended Recipes list:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recipes_df_short['content'] = (
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recipes_df_short['content'] = recipes_df_short['content'].apply(lambda x: x.lower())  # Convert to lowercase, for example


Unnamed: 0,name,description,ingredients,steps,tags,search_terms
33,Pinto Bean Soup With Fresh Salsa,"Despite its creamy taste, this simply prepared...","['dried pinto beans', 'water', 'vegetable oil'...",['Sort through the beans and discard any missh...,"['time-to-make', 'course', 'main-ingredient', ...","{'low-calorie', 'low-sodium', 'low-carb', 'mex..."
279,Pumpkin Soup (Served in the Shell),"Great autumm recipe, served in the pumpkin she...","['pumpkin', 'unsalted butter', 'onion', 'carro...","['Slice off stem of pumpkin, 2-1/2 inches from...","['time-to-make', 'course', 'main-ingredient', ...","{'low-calorie', 'soup', 'low-sodium', 'low-carb'}"
44,Kickin' Shrimp Dip,"I've tried a lot of shrimp dip recipes, and I ...","['cream cheese', 'sour cream', 'old bay season...","['Mix together, adjusting seasonings to taste....","['time-to-make', 'course', 'main-ingredient', ...","{'low-calorie', 'low-sodium', 'low-carb', 'app..."
348,Tomato and Mushroom Omelette,"Makes a hearty breakfast or brunch for one, or...","['eggs', 'potato', 'cheddar cheese', 'dried ch...","['Whisk eggs in a bowl and stir in potato, che...","['15-minutes-or-less', 'time-to-make', 'course...","{'low-sodium', 'low-calorie', 'vegetarian', 'l..."
304,Aunt Paulette's Chili,This is a recipe from my husband's aunt and is...,"['ground beef', 'onion', 'garlic clove', 'toma...",['Brown burger with onion and garlic. Drain fa...,"['60-minutes-or-less', 'time-to-make', 'course...","{'low-calorie', 'dinner', 'low-sodium', 'low-c..."
154,Cucumber Yogurt Dip,I can not get enough of this recipe for this c...,"['plain yogurt', 'lemon%2c juice of', 'garlic ...","['to drain the yogurt, line a mesh colandar wi...","['time-to-make', 'course', 'cuisine', 'prepara...","{'low-sodium', 'low-calorie', 'vegetarian', 'l..."
119,Garden Fresh Eggplant Parmesan,This year I finally mastered the art of growin...,"['eggplants', 'flour', 'egg', 'water', 'parmes...",['I slice my eggplants the night before (into ...,"['weeknight', '60-minutes-or-less', 'time-to-m...","{'low-calorie', 'vegetarian', 'italian', 'low-..."
126,Hamburger-Vegetable Soup With Tortellini,"Easy, tasty, and left-overs freeze well.","['lean ground beef', 'yellow onion', 'vegetabl...","['In a large soup pot over medium heat, brown ...","['time-to-make', 'course', 'main-ingredient', ...","{'dinner', 'low-calorie', 'low-carb', 'soup'}"
418,Butternut Squash Soup,This basic soup recipe came from Good Things U...,"['water', 'chicken bouillon cubes', 'butternut...","['Combine water, chicken boullion cubes and sq...","['60-minutes-or-less', 'time-to-make', 'course...","{'low-fat', 'low-calorie', 'low-carb', 'soup'}"
135,Broccoli Salad,This is s treat! Great salad that's pretty hea...,"['broccoli', 'avocado', 'olive oil', 'lemon ju...","['Cut broccoli into bite size pieces.', 'Steam...","['30-minutes-or-less', 'time-to-make', 'course...","{'low-calorie', 'low-sodium', 'vegetarian', 's..."


## Hybrid recommendation model: TF-IDF and cosine similarity for content-based filtering and item-item collaborative filtering

In [3]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

# Load your dataset
recipes_df = pd.read_csv('recipes_w_search_terms.csv')
recipes_df_short = recipes_df.head(500)

# Combine relevant text features
recipes_df_short['content'] = (
    recipes_df_short['tags'] + ' ' +
    recipes_df_short['ingredients'].apply(', '.join) + ' ' +
    recipes_df_short['steps'].astype(str) + ' ' +
    recipes_df_short['description'].astype(str) + ' ' +
    recipes_df_short['search_terms'].astype(str)
)

# Preprocess text data
recipes_df_short['content'] = recipes_df_short['content'].apply(lambda x: x.lower())  # Convert to lowercase, for example

# Initialize the TF-IDF vectorizer with adjusted parameters
tfidf_vectorizer = TfidfVectorizer(stop_words='english', min_df=0.1, max_df=0.9)

# Fit and transform the TF-IDF matrix
tfidf_matrix = tfidf_vectorizer.fit_transform(recipes_df_short['content'])

# Calculate cosine similarity between recipes for content-based filtering
cosine_sim_content = linear_kernel(tfidf_matrix, tfidf_matrix)

# Item-Item Collaborative Filtering (item-based similarity)
item_item_sim = cosine_sim_content.T

# Function to get hybrid recipe recommendations
def get_hybrid_recommendations(recipe_name, item_item_sim, recipes_df):
    idx = recipes_df.index[recipes_df['name'] == recipe_name].tolist()[0]

    # Get item-item similarity scores for the given recipe
    item_sim_scores = list(enumerate(item_item_sim[idx]))
    item_sim_scores = sorted(item_sim_scores, key=lambda x: x[1], reverse=True)
    item_sim_scores = item_sim_scores[1:11]  # Get top 10 similar recipes

    # Get top 10 recommended recipes
    recommended_recipes = recipes_df.iloc[[score[0] for score in item_sim_scores[:10]]][['name', 'description', 'ingredients', 'steps', 'tags', 'search_terms']]

    return recommended_recipes

# Example: Get hybrid recommendations for a specific recipe
recipe_name = 'Cucumber Yogurt Dip'
print(f"Getting hybrid recommendations for '{recipe_name}'...")

# Get the hybrid recommendations DataFrame
hybrid_recommendations_df = get_hybrid_recommendations(recipe_name, item_item_sim, recipes_df_short)

# Print the result in the specified format
print(f"\nRecipe: {recipe_name}")
print("Hybrid Recommended Recipes list:")
hybrid_recommendations_df




Getting hybrid recommendations for 'Cucumber Yogurt Dip'...

Recipe: Cucumber Yogurt Dip
Hybrid Recommended Recipes list:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recipes_df_short['content'] = (
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recipes_df_short['content'] = recipes_df_short['content'].apply(lambda x: x.lower())  # Convert to lowercase, for example


Unnamed: 0,name,description,ingredients,steps,tags,search_terms
135,Broccoli Salad,This is s treat! Great salad that's pretty hea...,"['broccoli', 'avocado', 'olive oil', 'lemon ju...","['Cut broccoli into bite size pieces.', 'Steam...","['30-minutes-or-less', 'time-to-make', 'course...","{'low-calorie', 'low-sodium', 'vegetarian', 's..."
426,Banana-Rama Mocktail,Found at the lcbo.ca site.,"['ice', 'banana', 'strawberries', 'pineapple j...","['In a blender, combine all ingredients. Blend...","['15-minutes-or-less', 'time-to-make', 'course...","{'low-calorie', 'low-sodium', 'vegetarian', 'l..."
304,Aunt Paulette's Chili,This is a recipe from my husband's aunt and is...,"['ground beef', 'onion', 'garlic clove', 'toma...",['Brown burger with onion and garlic. Drain fa...,"['60-minutes-or-less', 'time-to-make', 'course...","{'low-calorie', 'dinner', 'low-sodium', 'low-c..."
243,Cilantro Sparkling Wine Cooler,The perfect anecdote to a scorching summer's d...,"['fresh cilantro', 'sparkling white wine', 'li...",['Tie the cilantro to secure in place before b...,"['15-minutes-or-less', 'time-to-make', 'course...","{'low-sodium', 'low-calorie', 'vegetarian', 'i..."
52,Beat and Bake Sponge Cake,I have tried other sponge cake recipes and the...,"['margarine', 'sugar', 'flour', 'baking powder...","['Heat the oven to 180 deg Celsius.', 'Grease ...","['60-minutes-or-less', 'time-to-make', 'course...","{'low-calorie', 'low-sodium', 'dessert', 'low-..."
223,Strawberry Swirl,Great for afternoon energy!,"['strawberry', 'low-fat milk', 'sugar', 'vanil...","['Clean berries.', 'Combine all ingredients in...","['15-minutes-or-less', 'time-to-make', 'course...","{'low-calorie', 'low-sodium', 'vegetarian', 'h..."
307,The Mad Hungarian,Guess someone messed with his goulash!\n\nCour...,"['rum', 'root beer']","['Pour rum over ice in a frosted beer mug.', '...","['15-minutes-or-less', 'time-to-make', 'course...","{'low-calorie', 'low-sodium', 'healthy', 'low-..."
44,Kickin' Shrimp Dip,"I've tried a lot of shrimp dip recipes, and I ...","['cream cheese', 'sour cream', 'old bay season...","['Mix together, adjusting seasonings to taste....","['time-to-make', 'course', 'main-ingredient', ...","{'low-calorie', 'low-sodium', 'low-carb', 'app..."
97,Caramel Apple Nibbles,Amazingly simple and who doesn't like a carame...,"['caramel sauce', 'apple', 'pecans']",['I like to heat the caramel just a bit in the...,"['15-minutes-or-less', 'time-to-make', 'course...","{'low-calorie', 'low-sodium', 'low-carb', 'app..."
311,Butternut Squash Apple Soup,From Simply Recipes. I bet this would freeze o...,"['yellow onion', 'celery rib', 'carrot', 'butt...","['Combine butter, onion, celery, and carrot in...","['30-minutes-or-less', 'time-to-make', 'course...","{'low-calorie', 'low-sodium', 'healthy', 'low-..."


In [7]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
from fuzzywuzzy import process

# Load your dataset
recipes_df = pd.read_csv('recipes_w_search_terms.csv')
recipes_df_short = recipes_df.head(200)

# Combine relevant text features
recipes_df_short['content'] = (
    recipes_df_short['tags'] + ' ' +
    recipes_df_short['ingredients'].apply(', '.join) + ' ' +
    recipes_df_short['steps'].astype(str) + ' ' +
    recipes_df_short['description'].astype(str) + ' ' +
    recipes_df_short['search_terms'].astype(str)
)

# Preprocess text data
recipes_df_short['content'] = recipes_df_short['content'].apply(lambda x: x.lower())  # Convert to lowercase, for example

# Initialize the TF-IDF vectorizer with adjusted parameters
tfidf_vectorizer = TfidfVectorizer(stop_words='english', min_df=0.1, max_df=0.9)

# Fit and transform the TF-IDF matrix
tfidf_matrix = tfidf_vectorizer.fit_transform(recipes_df_short['content'])

# Calculate cosine similarity between recipes for content-based filtering
cosine_sim_content = linear_kernel(tfidf_matrix, tfidf_matrix)

# Item-Item Collaborative Filtering (item-based similarity)
item_item_sim = cosine_sim_content.T

# Function to get the closest matching recipe name based on user input
def get_closest_recipe_name(user_input, recipe_names):
    match, score = process.extractOne(user_input, recipe_names.tolist())
    return match, score

# Function to get hybrid recipe recommendations based on user input
def get_hybrid_recommendations_by_name(user_input, item_item_sim, recipes_df):
    # Find the closest matching recipe name in the dataset
    closest_recipe_name, score = get_closest_recipe_name(user_input, recipes_df['name'])

    # Get recommendations for the closest matching recipe name
    hybrid_recommendations_df = get_hybrid_recommendations(closest_recipe_name, item_item_sim, recipes_df)

    return hybrid_recommendations_df

# Function to get hybrid recipe recommendations
def get_hybrid_recommendations(recipe_name, item_item_sim, recipes_df):
    idx = recipes_df.index[recipes_df['name'] == recipe_name].tolist()[0]

    # Get item-item similarity scores for the given recipe
    item_sim_scores = list(enumerate(item_item_sim[idx]))
    item_sim_scores = sorted(item_sim_scores, key=lambda x: x[1], reverse=True)
    item_sim_scores = item_sim_scores[1:11]  # Get top 10 similar recipes

    # Get top 10 recommended recipes
    recommended_recipes = recipes_df.iloc[[score[0] for score in item_sim_scores[:10]]][['name', 'description', 'ingredients', 'steps', 'tags', 'search_terms']]

    return recommended_recipes

# Example: Get hybrid recommendations for a user-inputted recipe name
user_input_recipe_name = 'chicken sandwich'
print(f"Getting hybrid recommendations for '{user_input_recipe_name}'...")

# Get the hybrid recommendations DataFrame based on user input
hybrid_recommendations_by_name_df = get_hybrid_recommendations_by_name(user_input_recipe_name, item_item_sim, recipes_df_short)

# Print the result in the specified format
print(f"\nUser Input Recipe Name: {user_input_recipe_name}")
print("Hybrid Recommended Recipes list:")
print(hybrid_recommendations_by_name_df[['name', 'description', 'ingredients', 'steps', 'tags', 'search_terms']])


Getting hybrid recommendations for 'chicken sandwich'...

User Input Recipe Name: chicken sandwich
Hybrid Recommended Recipes list:
                                              name  \
158        Spinach and Rice Stuffed Chicken Breast   
105                      Italian Chicken and Pasta   
199  Sun-Dried Tomato Chicken Pesto Couscous Salad   
73                                         Biryani   
147               Chicken in Orange-Riesling Sauce   
110         Chicken Thighs With Lemon &amp; Garlic   
168                     Chicken Breasts With Herbs   
50                      Chicken Parm Meatball Subs   
15           Santa Fe-Tastic Chicken Tortilla Soup   
146                            Sweet Asian Chicken   

                                           description  \
158  A VERY good, and VERY easy dinner that's guara...   
105  I love italian food but I am a diabetic, so I ...   
199  This is such a quick recipe to make and has a ...   
73                                       

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recipes_df_short['content'] = (
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recipes_df_short['content'] = recipes_df_short['content'].apply(lambda x: x.lower())  # Convert to lowercase, for example
