# Recipe Recommendation System 
~ TASTY DISHES ~

- Group 3
- Group Members.
    - Cindy Tumaini
    - Margret Namunyak
    - Faith Wafula
    - Martin Waweru
    - Matthew Karani


## Table Of Contents

- Business Understanding
- Data Understanding
- Data Preparation
- Modelling 
- Evaluation 
  

## Business Understanding

### Business Description 
Tasty Dishes is a web-based culinary platform dedicated to sharing authentic African recipes with the world. Our mission is to enhance the cooking experience of home chefs by providing them with a diverse collection of recipes rooted in African culinary traditions, while also incorporating global influences. Whether you're an experienced cook or just starting, Tasty Dishes offers a wide variety of recipes that empower users to create delicious, flavorful meals from the comfort of their homes.


## Business Goal 
### Objective
The main objective of this project is to develop an item-based recipe recommendation system that suggests recipes to users based on the ingredients they have available. By analyzing the ingredients present in various recipes, the system aims to provide relevant and appealing recommendations that encourage users to explore and cook diverse dishes rooted in African culinary traditions, while also incorporating global flavors.

### Scope

1. Ingredient-Based Recommendations: Develop an algorithm that analyzes user-provided ingredients to recommend recipes based on ingredient similarity, leveraging a diverse dataset that includes recipe_Title, Ingredients, and Instructions for authentic African and global dishes.

2. User-Friendly Interface: Design an intuitive web interface that enables users to input their available ingredients and view tailored recipe recommendations, along with detailed cooking instructions and a feedback mechanism to enhance recommendation accuracy.


### Success Criteria
1. Accuracy:
Achieve at least 80% accuracy in recommending relevant recipes based on user-provided ingredients.

2. Precision:
Ensure that at least 75% of recommended recipes correspond to the user’s input ingredients.

3. Recall:
Aim for a recall rate of at least 70%, indicating the system identifies a significant portion of relevant recipes.

4. F1 Score:
Target an F1 score of 0.75 or higher, balancing precision and recall for comprehensive recommendations.






## Data Understanding

### Data Source:




In [1]:
# Necessary Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import string
import re
import nltk
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix


### Data Frame One

In [2]:
# Load the dataframe

df = pd.read_csv("Food Ingredients and Recipe Dataset with Image Name Mapping.csv", index_col=0)


# Display the first columns
display(df.head(10))

#show the shape
print(df.shape)

Unnamed: 0,Title,Ingredients,Instructions,Image_Name,Cleaned_Ingredients
0,Miso-Butter Roast Chicken With Acorn Squash Pa...,"['1 (3½–4-lb.) whole chicken', '2¾ tsp. kosher...","Pat chicken dry with paper towels, season all ...",miso-butter-roast-chicken-acorn-squash-panzanella,"['1 (3½–4-lb.) whole chicken', '2¾ tsp. kosher..."
1,Crispy Salt and Pepper Potatoes,"['2 large egg whites', '1 pound new potatoes (...",Preheat oven to 400°F and line a rimmed baking...,crispy-salt-and-pepper-potatoes-dan-kluger,"['2 large egg whites', '1 pound new potatoes (..."
2,Thanksgiving Mac and Cheese,"['1 cup evaporated milk', '1 cup whole milk', ...",Place a rack in middle of oven; preheat to 400...,thanksgiving-mac-and-cheese-erick-williams,"['1 cup evaporated milk', '1 cup whole milk', ..."
3,Italian Sausage and Bread Stuffing,"['1 (¾- to 1-pound) round Italian loaf, cut in...",Preheat oven to 350°F with rack in middle. Gen...,italian-sausage-and-bread-stuffing-240559,"['1 (¾- to 1-pound) round Italian loaf, cut in..."
4,Newton's Law,"['1 teaspoon dark brown sugar', '1 teaspoon ho...",Stir together brown sugar and hot water in a c...,newtons-law-apple-bourbon-cocktail,"['1 teaspoon dark brown sugar', '1 teaspoon ho..."
5,Warm Comfort,"['2 chamomile tea bags', '1½ oz. reposado tequ...",Place 2 chamomile tea bags in a heatsafe vesse...,warm-comfort-tequila-chamomile-toddy,"['2 chamomile tea bags', '1½ oz. reposado tequ..."
6,Apples and Oranges,"['3 oz. Grand Marnier', '1 oz. Amaro Averna', ...","Add 3 oz. Grand Marnier, 1 oz. Amaro Averna, a...",apples-and-oranges-spiked-cider,"['3 oz. Grand Marnier', '1 oz. Amaro Averna', ..."
7,Turmeric Hot Toddy,"['¼ cup granulated sugar', '¾ tsp. ground turm...","For the turmeric syrup, combine ½ cup hot wate...",turmeric-hot-toddy-claire-sprouse,"['¼ cup granulated sugar', '¾ tsp. ground turm..."
8,Instant Pot Lamb Haleem,"['¾ cup assorted dals (such as chana dal, moon...","Combine dals, rice, and barley in a medium bow...",instant-pot-lamb-haleem,"['¾ cup assorted dals (such as chana dal, moon..."
9,Spiced Lentil and Caramelized Onion Baked Eggs,"['1 (14.5-ounce) can basic lentil soup, like A...","Place an oven rack in the center of the oven, ...",spiced-lentil-and-caramelized-onion-baked-eggs,"['1 (14.5-ounce) can basic lentil soup, like A..."


(13501, 5)


- Check for duplicates


In [3]:
print(f'Number of duplicates: {df.duplicated().sum()}')

Number of duplicates: 0


In [4]:
#drop duplicates
df.drop_duplicates(inplace=True)
print(f'Number of duplicates after dropping: {df.duplicated().sum()}')

Number of duplicates after dropping: 0


- Check for missing values

In [5]:
df.isnull().sum().sort_values(ascending=False)

Instructions           8
Title                  5
Ingredients            0
Image_Name             0
Cleaned_Ingredients    0
dtype: int64

In [6]:
#drop rows with missing values
df.dropna(inplace=True)
print(f'Number of missing values after dropping: {df.isnull().sum().sum()}')

Number of missing values after dropping: 0


- There is the Ingredients and Cleaned Ingredients column, check if there is any difference between the two.

In [7]:
df['Ingredients'][5]

"['2 chamomile tea bags', '1½ oz. reposado tequila', '¾ oz. fresh lemon juice', '1 Tbsp. agave nectar']"

In [8]:
df['Cleaned_Ingredients'][5]

"['2 chamomile tea bags', '1½ oz. reposado tequila', '¾ oz. fresh lemon juice', '1 Tbsp. agave nectar']"

- There's no significant difference between Ingredients and cleaned Ingredients. Thus, we drop the Ingredients column and rename cleaned_ingredients  ingredients.

In [9]:
#move the cleaned_ingredients column to the second column
df = df[['Title', 'Cleaned_Ingredients', 'Ingredients', 'Instructions', 'Image_Name']]

#drop the ingredients column
df = df.drop(columns=['Ingredients','Image_Name'])


In [10]:
# rename the cleaned ingredients column
df = df.rename(columns={'Cleaned_Ingredients':'Ingredients'})
df.head()


Unnamed: 0,Title,Ingredients,Instructions
0,Miso-Butter Roast Chicken With Acorn Squash Pa...,"['1 (3½–4-lb.) whole chicken', '2¾ tsp. kosher...","Pat chicken dry with paper towels, season all ..."
1,Crispy Salt and Pepper Potatoes,"['2 large egg whites', '1 pound new potatoes (...",Preheat oven to 400°F and line a rimmed baking...
2,Thanksgiving Mac and Cheese,"['1 cup evaporated milk', '1 cup whole milk', ...",Place a rack in middle of oven; preheat to 400...
3,Italian Sausage and Bread Stuffing,"['1 (¾- to 1-pound) round Italian loaf, cut in...",Preheat oven to 350°F with rack in middle. Gen...
4,Newton's Law,"['1 teaspoon dark brown sugar', '1 teaspoon ho...",Stir together brown sugar and hot water in a c...


In [11]:
df['Ingredients'][5]

"['2 chamomile tea bags', '1½ oz. reposado tequila', '¾ oz. fresh lemon juice', '1 Tbsp. agave nectar']"

### DataFrame two 

In [12]:
#explore the recipeslmp.csv file
df2 = pd.read_csv("RecipesImp.csv")
display(df2.head())

#display the shape
print(df2.shape)

Unnamed: 0,title,index,page,about,ingridients,preparation,nutrition per 100g of recipe,energy(kcal),fat(g),carbohydrates(g),proteins(g),fibre(g),vitamin A(mcg),iron(mg),zinc(mg),F_factor_est
0,Kaimati(Fried Dumplings),15003,24,Kaimatis get their unique flavour from the sty...,"wheat flour, refined\r\nwater, vanilla essenc...",Put yeast in a small container.\r\n Add 50ml ...,"Energy 1,795 kJ/ 429 kcal | Fat 21.8 g | Carbo...",429.0,21.8,52.8,4.6,1.6,30,2.1,0.45,0.4
1,Mahamri\r\n(Swahili Doughnut),15004,26,This is a typical traditional recipe among the...,"wheat flour,\r\ncoconut milk\r\nwhite sugar\r\...","Break the coconut shell, drain the water and...","Energy 1,728 kJ/ 413 kcal | Fat 22.1 g | Carbo...",413.0,22.1,46.6,6.0,2.1,41,2.8,0.56,0.4
2,"Enriched Mandazi \r\n(East African Doughnuts, ...",15124,28,A popular snack among urban dwellers across th...,self-raising wheat flour\r\neggs\r\nmargarine\...,"? Put flour, salt, sugar and lemon rind into ...","Energy 1,590 kJ/ 379 kcal | Fat 16.1 g | Carbo...",379.0,16.1,49.9,7.6,2.2,90,3.3,0.66,0.4
3,"Basic Mandazi \r\n(East African Doughnuts, Basic)",15125,30,You will find this recipe in any home across K...,all-purpose wheat flour\r\nbaking powder\r\nsu...,"? Put the wheat flour into a bowl, add baking...","Energy 1,430kJ/ 340 kcal | Fat 12.9 g | Carboh...",340.0,12.9,48.7,6.4,2.1,48,3.5,0.52,0.4
4,Meat Samosa\r\n(Sambusa ya Nyama),15025,32,Nothing more delicious like the Kenyan meaty s...,"minced beef\r\ncoriander, fresh\r\nleek\r\ngar...",? Put the meat in a pan over a fire. Stir con...,"Energy 1,854 kJ/ 443 kcal | Fat 22.2 g | Carbo...",443.0,22.2,40.5,18.8,3.1,66,11.5,2.99,0.4


(142, 16)


- Since we want only a few columns to recommend the possible recipes, we need to drop some columns.

In [13]:
df2.columns


Index(['title', 'index', 'page', 'about', 'ingridients', 'preparation',
       'nutrition per 100g of recipe', 'energy(kcal)', 'fat(g)',
       'carbohydrates(g)', 'proteins(g)', 'fibre(g)', 'vitamin A(mcg)',
       'iron(mg)', 'zinc(mg)', 'F_factor_est'],
      dtype='object')

In [14]:
columns_to_keep = ['title','ingridients','preparation']

df2 = df2[columns_to_keep]
df2.head()

Unnamed: 0,title,ingridients,preparation
0,Kaimati(Fried Dumplings),"wheat flour, refined\r\nwater, vanilla essenc...",Put yeast in a small container.\r\n Add 50ml ...
1,Mahamri\r\n(Swahili Doughnut),"wheat flour,\r\ncoconut milk\r\nwhite sugar\r\...","Break the coconut shell, drain the water and..."
2,"Enriched Mandazi \r\n(East African Doughnuts, ...",self-raising wheat flour\r\neggs\r\nmargarine\...,"? Put flour, salt, sugar and lemon rind into ..."
3,"Basic Mandazi \r\n(East African Doughnuts, Basic)",all-purpose wheat flour\r\nbaking powder\r\nsu...,"? Put the wheat flour into a bowl, add baking..."
4,Meat Samosa\r\n(Sambusa ya Nyama),"minced beef\r\ncoriander, fresh\r\nleek\r\ngar...",? Put the meat in a pan over a fire. Stir con...


In [15]:
#clean the column names
#change the ingridient column name to ingredients
df2.rename(columns={'ingridients':'ingredients','preparation':'instructions'}, inplace=True)

#capitalize the column names
df2.columns = df2.columns.str.capitalize()

# Function to process the Ingredients column
def process_ingredients(ingredients):
    # Remove unwanted characters and split by commas
    return [ingredient.strip() for ingredient in ingredients.replace('[','').replace(']','').replace("'", "").replace('\n', ',').split(',')]

# Apply the function to each DataFrame
df['Ingredients'] = df['Ingredients'].apply(lambda x: [ingredient.strip() for ingredient in x.split(',')])
df2['Ingredients'] = df2['Ingredients'].apply(process_ingredients)




In [16]:
# Check the processed DataFrames
df[['Title', 'Ingredients']]


Unnamed: 0,Title,Ingredients
0,Miso-Butter Roast Chicken With Acorn Squash Pa...,"[['1 (3½–4-lb.) whole chicken', '2¾ tsp. koshe..."
1,Crispy Salt and Pepper Potatoes,"[['2 large egg whites', '1 pound new potatoes ..."
2,Thanksgiving Mac and Cheese,"[['1 cup evaporated milk', '1 cup whole milk',..."
3,Italian Sausage and Bread Stuffing,"[['1 (¾- to 1-pound) round Italian loaf, cut i..."
4,Newton's Law,"[['1 teaspoon dark brown sugar', '1 teaspoon h..."
...,...,...
13496,Brownie Pudding Cake,"[['1 cup all-purpose flour', '2/3 cup unsweete..."
13497,Israeli Couscous with Roasted Butternut Squash...,"[['1 preserved lemon', '1 1/2 pound butternut ..."
13498,Rice with Soy-Glazed Bonito Flakes and Sesame ...,[['Leftover katsuo bushi (dried bonito flakes)...
13499,Spanakopita,[['1 stick (1/2 cup) plus 1 tablespoon unsalte...


In [17]:
df2[['Title', 'Ingredients']]

Unnamed: 0,Title,Ingredients
0,Kaimati(Fried Dumplings),"[wheat flour, refined, water, vanilla essence,..."
1,Mahamri\r\n(Swahili Doughnut),"[wheat flour, , coconut milk, white sugar, dry..."
2,"Enriched Mandazi \r\n(East African Doughnuts, ...","[self-raising wheat flour, eggs, margarine, Ri..."
3,"Basic Mandazi \r\n(East African Doughnuts, Basic)","[all-purpose wheat flour, baking powder, sugar..."
4,Meat Samosa\r\n(Sambusa ya Nyama),"[minced beef, coriander, fresh, leek, garlic, ..."
...,...,...
137,Bhature\r\n (Fried Indian Bread),"[wheat flour, salt, sugar, ghee, cooking oil, ..."
138,Vimumunya vya \r\nSukari\r\n (Sweetened Pumpki...,"[pumpkin, cardamon, sugar, coconut milk, water]"
139,Siro\r\n (Semolina & Nuts),"[semolina flour, cow ghee, cow milk, sugar, pi..."
140,Chaas\r\n(Diluted Yoghurt),"[natural yoghurt, water, salt, ]"


- Chek for missing values

In [18]:
print(f'number of missing values: {df2.isnull().sum().sum()}')

number of missing values: 0


- Since both dataframes have no missing values and duplicates we can merge them now

In [19]:
#merge the two dataframes
combined_df = pd.concat([df,df2])
#check the shapes of the three dfs
print(f'Dataframe 1 has a shape of: {df.shape}')
print(f'Dataframe 2 has a shape of: {df2.shape}')
print(f'Combined dataframe has a shape of: {combined_df.shape}')

#reset the index
combined_df = combined_df.reset_index(drop=True)



Dataframe 1 has a shape of: (13493, 3)
Dataframe 2 has a shape of: (142, 3)
Combined dataframe has a shape of: (13635, 3)


### Clean the Combined DataFrame

In [20]:
combined_df.head()

Unnamed: 0,Title,Ingredients,Instructions
0,Miso-Butter Roast Chicken With Acorn Squash Pa...,"[['1 (3½–4-lb.) whole chicken', '2¾ tsp. koshe...","Pat chicken dry with paper towels, season all ..."
1,Crispy Salt and Pepper Potatoes,"[['2 large egg whites', '1 pound new potatoes ...",Preheat oven to 400°F and line a rimmed baking...
2,Thanksgiving Mac and Cheese,"[['1 cup evaporated milk', '1 cup whole milk',...",Place a rack in middle of oven; preheat to 400...
3,Italian Sausage and Bread Stuffing,"[['1 (¾- to 1-pound) round Italian loaf, cut i...",Preheat oven to 350°F with rack in middle. Gen...
4,Newton's Law,"[['1 teaspoon dark brown sugar', '1 teaspoon h...",Stir together brown sugar and hot water in a c...


In [21]:
combined_df[-10:]

Unnamed: 0,Title,Ingredients,Instructions
13625,Vinolo\r\n(Banana and Maize Flour Ugali),"[banana green, maize flour, water]",Preparation 5 minutes | Cooking 40 minutes | \...
13626,Finger Millet \r\nFlour Ugali,"[finger millet, water]",Preparation time 5 minutes | Cooking time 15 m...
13627,White Chapati,"[wheat flour, water, sugar, salt, cooking oil]",Preparation 30 minutes | Cooking 30 minutes | ...
13628,Brown Chapati,"[wheat flour, water, sugar, , salt, cooking oil]",Preparation 30 minutes | Cooking 30 minutes | ...
13629,Roti \r\n(Indian Chapati),"[wheat flour, salt, water, cooking oil, cow ghee]",Preparation 3 hours | Cooking 21 minutes | Ser...
13630,Bhature\r\n (Fried Indian Bread),"[wheat flour, salt, sugar, ghee, cooking oil, ...",Preparation 1 hour 15 minutes | Cooking 30 min...
13631,Vimumunya vya \r\nSukari\r\n (Sweetened Pumpki...,"[pumpkin, cardamon, sugar, coconut milk, water]",Preparation 5 minutes | Cooking 45 minutes | \...
13632,Siro\r\n (Semolina & Nuts),"[semolina flour, cow ghee, cow milk, sugar, pi...",Preparation 15 minutes | Cooking 30 minutes | ...
13633,Chaas\r\n(Diluted Yoghurt),"[natural yoghurt, water, salt, ]",Preparation 5 minutes | Serves 2\r\n?Add natur...
13634,Groundnut Sauce,"[groundnut, salt, sour milk, water]",Preparation 5 minutes | Cooking 1 hour 40 minu...


In [22]:
#check for missing values
print(combined_df.isnull().sum().sort_values(ascending=False))


Title           0
Ingredients     0
Instructions    0
dtype: int64


- Clean the columns of the combined_df

In [23]:
#write a function to clean the columns
"""
This function should:
1. Clean the Title Column: Remove newlines and extra spaces
2. Clean the Ingredients Column: Convert string ingredients to lists: If they are not already in a list format, convert them and clean any extra spaces.
                                Ensure there are no empty strings or duplicates within each list of ingredients.
3. Clean the Instructions Column: Similar to the Title, ensure that the instructions are clean and properly formatted.
"""

def clean_combined_df(df):
    #title column
    # Remove newlines and extra spaces
    df['Title'] = df['Title'].str.replace('\n', '').str.strip()
    #Ingredients column
    # Remove empty ingredients from each list of ingredients
    df['Ingredients'] = df['Ingredients'].apply(lambda x: [ingredient.strip() for ingredient in x if ingredient.strip() != ''])

    # If there are any duplicate ingredients in each list, remove them
    df['Ingredients'] = df['Ingredients'].apply(lambda x: list(set(x)))

    # Clean the Instructions column
    df['Instructions'] = df['Instructions'].str.replace('\n', '') \
                                       .str.replace('?', '.') \
                                       .str.replace('|', ',') \
                                       .str.replace('\r', '') \
                                       .str.strip()
    #remove   special characters like \x02 from words like  gradu\x02ally
    df['Instructions'] = df['Instructions'].apply(lambda x: re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\xff]', '', x))#remove special characters
    #if a sentence has more than one fullstop, replace it with one
    df['Instructions'] = df['Instructions'].apply(lambda x: re.sub(r'\.{2,}', '.', x))
    return df

# Use the function on your DataFrame
combined_cleaned = clean_combined_df(combined_df)
combined_cleaned

Unnamed: 0,Title,Ingredients,Instructions
0,Miso-Butter Roast Chicken With Acorn Squash Pa...,"['2 Tbsp. finely chopped sage', '¼ cup all-pur...","Pat chicken dry with paper towels, season all ..."
1,Crispy Salt and Pepper Potatoes,['1 pound new potatoes (about 1 inch in diamet...,Preheat oven to 400F and line a rimmed baking ...
2,Thanksgiving Mac and Cheese,"['1 lb. elbow macaroni'], plus more', '2 lb. e...",Place a rack in middle of oven; preheat to 400...
3,Italian Sausage and Bread Stuffing,"['1 stick unsalted butter, '5 garlic cloves, [...",Preheat oven to 350F with rack in middle. Gene...
4,Newton's Law,"[['1 teaspoon dark brown sugar', '1 ½ oz. bour...",Stir together brown sugar and hot water in a c...
...,...,...,...
13630,Bhature\r (Fried Indian Bread),"[coriander, fenugreek leaves, sugar, cooking o...","Preparation 1 hour 15 minutes , Cooking 30 min..."
13631,Vimumunya vya \rSukari\r (Sweetened Pumpkin & ...,"[sugar, cardamon, coconut milk, water, pumpkin]","Preparation 5 minutes , Cooking 45 minutes , S..."
13632,Siro\r (Semolina & Nuts),"[cow milk, sugar, cardamon, semolina flour, pi...","Preparation 15 minutes , Cooking 30 minutes , ..."
13633,Chaas\r(Diluted Yoghurt),"[salt, water, natural yoghurt]","Preparation 5 minutes , Serves 2.Add natural y..."


In [24]:
combined_cleaned['Instructions'][13634]

'Preparation 5 minutes , Cooking 1 hour 40 minutes , Serves 4.Place a saucepan over fire and let it preheat.Add the groundnuts, salt and 1/2 a cup of water oras desired.Cook until the water evaporates as you stir gradually. When ready, the nuts produce a pop sound.Once they pop, turn down the heat and continue stirring until the groundnuts are dry (about 13minutes).Remove from heat and allow it to cool down.Using a blender, blend the nuts into a paste. Apestle and mortar can be used in the absence of ablender.Put the groundnut paste into a bowl, add sour milkand stir into thick paste. Water or fresh milk can beused in place of the sour milk.Once ready, put another pan on the heat, add thepeanut paste and stir.Stir until it is smooth but not too thick.Serve hot with green leafy vegetables of yourchoice, fish, sweet potatoes, green bananas, ugali,etc'

### Function to get Recipes based on Ingredients

- Let's see if we can tokenize our Ingredients and see if the system recommends recipes based on the ingredients

In [25]:

def clean_ingredients(ingredients_list):
    # Remove any extra single quotes and fix formatting for each ingredient
    cleaned_list = [re.sub(r"['\"]", "", ingredient) for ingredient in ingredients_list]  # Remove quotes
    cleaned_list = [re.sub(r'\s+', ' ', ingredient) for ingredient in cleaned_list]  # Normalize spaces
    return cleaned_list

# Apply the cleaning function
combined_df['Ingredients'] = combined_df['Ingredients'].apply(clean_ingredients)


# Function to tokenize and normalize ingredients
def tokenize_and_normalize(ingredients_list):
    tokens = []
    for ingredient in ingredients_list:
        # Split ingredient string by commas and strip whitespace
        split_ingredients = [i.strip().lower() for i in ingredient.split(',')]
        
        # Further clean each token: remove unwanted characters
        split_ingredients = [re.sub(r'[^\w\s]', '', i) for i in split_ingredients]  # Remove punctuation
        split_ingredients = [re.sub(r'\s+', ' ', i) for i in split_ingredients]  # Normalize whitespace
        
        # Extend the tokens list with cleaned ingredients
        tokens.extend(split_ingredients)
    
    return tokens


# Apply the function to the Ingredients column
combined_df['Ingredients'] = combined_df['Ingredients'].apply(tokenize_and_normalize)


In [26]:
def recommend_recipes(input_ingredients, combined_df):
    # Normalize user input by stripping whitespace and converting to lower case
    input_ingredients = [ingredient.strip().lower() for ingredient in input_ingredients.split(',')]
    
    # Find matching recipes
    matched_recipes = combined_df[combined_df['Ingredients'].apply(lambda x: any(ingredient in x for ingredient in input_ingredients))]
    
    # Check if any recipes were found
    if matched_recipes.empty:
        return "No recipe found. Try again."
    
    return matched_recipes[['Title','Ingredients', 'Instructions']]


- When the user inputs an ingredient:

In [27]:
# Example user input ingredients
user_input = "thyme"

# Get recommendations
recommended_recipes = recommend_recipes(user_input, combined_df)

# Display recommendations
recommended_recipes


Unnamed: 0,Title,Ingredients,Instructions
273,Braised Celery With Lentils and Garlic,"[andor sage optional, a small handful hardy he...",Heat cup oil in a large high-sided skillet ov...
1479,Kale or Chard Pie,"[1 onion, sliced, about 8 large kale or chard ...",Heat the oven to 375F. Melt the butter in a la...
1586,Make-Ahead Gravy,[6 cups or more thanksgiving stock or lowsodiu...,Heat oil in a large saucepan over medium-high....
1897,3-Ingredient Garlic-Herb Grilled Chicken Wings,"[12 teaspoon freshly ground black pepper, 2 po...",Pat wings very dry with paper towels. Whisk ga...
2766,Herby Ricotta-Poblano Tacos,"[for garnish, thinly sliced or cut into matchs...","On an ungreased griddle or small, heavy skille..."
5510,Grilled Fish,"[crumbled, 12 cup extravirgin olive oil, cilan...",Combine all the ingredients except fish and le...
5988,Gazpacho,"[peeled and finely chopped, to garnish, 2 clov...","1. In a bowl, reserve 2 tablespoons each of th..."
6253,Duck Breast & Zucchini Tournedos,"[salt, thyme, rosemary, oregano, 1 34 oz dried...","For the mushroom powder, process the dried cep..."
6851,Chicken & Rice Soup,"[skinless, sliced, peeled and diced, frozen pe...","Combine the broth, rice, and 2 cups (16 fl oz/..."
7203,Grilled Chicken with Almond and Garlic Sauce,"[12 teaspoon salt, 1 onion, salt and pepper, 1...","1. For the sauce, soak the beans in a generous..."


- The results are titles, ingredients and instructions all with the name thyme in it.

- When the user enters a title:

## Implement Content Based Recommendation
- Recommend a recipe based on the similarities between ingredients in the recipes.

## TF-IDF Vectorization

In [347]:
#perform TF-IDF on the ingredients column to get the similarity between the ingredients
# Initialize the TfidfVectorizer
combined_df['Ingredients'] = combined_df['Ingredients'].apply(lambda x: ', '.join(x))

# Initialize the TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english')

# Fit and transform the ingredients
tfidf_matrix = tfidf.fit_transform(combined_df['Ingredients'])

# Check the shape of the matrix
print(tfidf_matrix.shape)


(13635, 9357)


- Get the cosine similarities of the ingredients.

In [28]:
# Example user input Title
user_input = "Gazpacho"

# Get recommendations
recommended_recipes = recommend_recipes(user_input, combined_df)

# Display recommendations
recommended_recipes


'No recipe found. Try again.'

In [29]:
#check if there is a recipe with the title 'Gazpacho'
combined_df[combined_df['Title'].str.contains('Gazpacho', case=False)]

Unnamed: 0,Title,Ingredients,Instructions
133,Speedy Summer Gazpacho,"[deseeded and roughly chopped, plus extra for ...",Put all the ingredients in a food processor an...
1771,Sippin’ Green Gazpacho,"[chopped, smashed, 2 cups coarsely chopped aru...","Pure cucumbers, garlic, and 1/2 cup water in a..."
3894,Gochujang Gazpacho,"[chopped, 2 tablespoons chopped mint, 14 cup e...","Pulse 1 1/2 cups tomato, 1/4 cup cucumber, ora..."
4089,Wee Gazpacho,"[1 large garlic clove, chopped, 2 celery stalk...",Place all the ingredients in a food processor ...
4476,Watermelon Gazpacho with Feta Crema,"[2 ounces feta, kosher salt, 14 cup almonds, 1...","Pure watermelon, tomato, cucumber, jalapeo, oi..."
5297,Creamy Green Gazpacho,"[12 jalapeño optional, 1 tablespoon honey, 34 ...","Reserve one-quarter of the tomato, two cucumbe..."
5988,Gazpacho,"[peeled and finely chopped, to garnish, 2 clov...","1. In a bowl, reserve 2 tablespoons each of th..."
6163,Mixed Berry Gazpacho with Basil,"[torn into pieces, 1 teaspoon fresh lime juice...",Combine first 6 ingredients and basil sprig in...
6260,Stone Fruit Gazpacho with Scallops,"[1 clove garlic, 4 jumbo diver scallops, pitte...","Combine the peaches, plums, watermelon, garlic..."
6564,Fiery Grilled Shrimp with Honeydew Gazpacho,"[and chopped, stemmed, 12 english cucumber, 12...",Cut away the honeydew rind and the dark green ...


- They get the response "No recipe found.Try again" but if you check among the foods, there is a recipe for Gazpacho.

### Function to get Recipes based on Titles
- Function to check if a title exists.
- First, write a function to clean the title column.

In [30]:
# clean title column
def clean_title(title):
    # Normalize titles by removing punctuation, converting to lowercase, and stripping extra spaces
    title = re.sub(r'[^\w\s]', '', title)
    title = re.sub(r'\s+', ' ', title).strip().lower()
    return title
combined_df['Title'] = combined_df['Title'].apply(clean_title)


- Function to return the results based on the title entered.

In [31]:
def get_ingredients_by_title(title_input, combined_df):
    # Normalize the title input
    title_input = title_input.strip().lower()
    # Check if the title matches any recipe titles
    matched_title = combined_df[combined_df['Title'].str.lower().str.contains(title_input)]
    # If there's a match by title, return the recipe ingredients
    if not matched_title.empty:
        return matched_title[['Title','Instructions', 'Ingredients']]
    
    return "No recipe found with that title. Try again."


In [32]:
# Example user input Title
user_input = "fennelrubbed chicketta"

# Get recommendations
recommended_recipes = get_ingredients_by_title(user_input, combined_df)

# Display recommendations
recommended_recipes


Unnamed: 0,Title,Instructions,Ingredients
2521,fennelrubbed chicketta,Toast fennel seeds in a dry small skillet over...,"[finely grated, 2 teaspoons fennel seeds, 1 te..."


- The above functions to return the recipes based on whether the user enters an ingredient or a title work perfectly. The problem is, they are a lot of work while we can easily perform feature engineering, create a new column that has the combined Title and Ingredients.
- After creating the column, we can easily tokenize and calculate there cosine similarities. So that when a user enters either a title or ingredient/ingredients they get the results either way.

### Feature Engineering

- Create the column containing combined title and ingredient.

In [33]:
# Combining title and ingredients
"""
This code is meant to combine both the ingredients and the title into a 
single string for each recipe this will allow to system to search both 
the ingredients and the title since ther user can input any of them.
"""
combined_df['Title_Ingredients'] = combined_df['Title'] + ' ' + combined_df['Ingredients'].apply(lambda x: ' '.join(x))
combined_df.head()

Unnamed: 0,Title,Ingredients,Instructions,Title_Ingredients
0,misobutter roast chicken with acorn squash pan...,"[2 tbsp finely chopped sage, ¼ cup allpurpose ...","Pat chicken dry with paper towels, season all ...",misobutter roast chicken with acorn squash pan...
1,crispy salt and pepper potatoes,[1 pound new potatoes about 1 inch in diameter...,Preheat oven to 400F and line a rimmed baking ...,crispy salt and pepper potatoes 1 pound new po...
2,thanksgiving mac and cheese,"[1 lb elbow macaroni, plus more, 2 lb extrasha...",Place a rack in middle of oven; preheat to 400...,thanksgiving mac and cheese 1 lb elbow macaron...
3,italian sausage and bread stuffing,"[1 stick unsalted butter, 5 garlic cloves, 1 ¾...",Preheat oven to 350F with rack in middle. Gene...,italian sausage and bread stuffing 1 stick uns...
4,newtons law,"[1 teaspoon dark brown sugar, 1 ½ oz bourbon, ...",Stir together brown sugar and hot water in a c...,newtons law 1 teaspoon dark brown sugar 1 ½ oz...


In [34]:
# preview combined column of title together with ingredients
combined_df['Title_Ingredients'][0]

'misobutter roast chicken with acorn squash panzanella 2 tbsp finely chopped sage ¼ cup allpurpose flour 2¾ tsp kosher salt cut into 1 pieces 2 medium apples such as gala or pink lady about 14 oz total 2 tbsp extravirgin olive oil kosher salt 2 tsp white miso thinly sliced plus more melted ¼ cup dry white wine 1 3½4lb whole chicken plus 3 tbsp room temperature freshly ground black pepper 1 tbsp finely chopped rosemary 2 tbsp unsalted butter torn into 1 pieces about 2½ cups pinch of crushed red pepper flakes divided freshly ground pepper ½ small red onion ⅓ loaf goodquality sturdy white bread 1 tbsp white miso 2 small acorn squash about 3 lb total 6 tbsp unsalted butter room temperature 2 cups unsalted chicken broth ¼ tsp ground allspice 3 tbsp apple cider vinegar cored'

### TF-IDF Vectorizer
-The TF-IDF vectorizer will then create vectors based on both the recipe title and ingredients. This will give you a more comprehensive representation of the recipe.

In [35]:
# intiliazing TFIDF Vectorizer
tfidf_vectorizer = TfidfVectorizer()
# fit the vectorizer
tfidf_matrix = tfidf_vectorizer.fit_transform(combined_df['Title_Ingredients'])


- Check the shape of the matrix and see the features to know how the vectorization has taken place.

In [36]:
# Check the shape of the TF-IDF matrix (rows x features)
print(f'Shape of matrix:{tfidf_matrix.shape}')

# Get the feature names (terms)
tfidf_feature_names = tfidf_vectorizer.get_feature_names_out()
print(f'Feature Names:{tfidf_feature_names[:10]}')  # Show first 10 features (terms)

Shape of matrix:(13635, 13690)
Feature Names:['00' '000mg' '011' '018' '018oz' '02' '025' '028' '035' '037']


- In the above we can see how the shape of the matrix is 13,635 by 13,690 after vectorization and sample feature names.

- Let's get the cosine Similarities

In [37]:
#cosine similarity
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

#print the cosine similarity matrix shape
print(f'Cosine Similarity Matrix Shape: {cosine_sim.shape}')


Cosine Similarity Matrix Shape: (13635, 13635)


In [38]:
#example getting top 5 most similar recipes for the first recipe
similar_recipes = cosine_sim[0].argsort()[-6:-1] 


In [39]:
#display the first recipe
combined_df.iloc[0][['Title', 'Ingredients', 'Instructions']]

Title           misobutter roast chicken with acorn squash pan...
Ingredients     [2 tbsp finely chopped sage, ¼ cup allpurpose ...
Instructions    Pat chicken dry with paper towels, season all ...
Name: 0, dtype: object

In [40]:
# Display the similar recipes
combined_df.iloc[similar_recipes][['Title', 'Ingredients', 'Instructions']]

Unnamed: 0,Title,Ingredients,Instructions
1232,shrimp empanadas,"[4 garlic cloves, 4 tbsp extravirgin olive oil...","Mix warm lard, salt, vinegar, and 2 cups lukew..."
1028,chopped salad,"[1 small kabocha or acorn squash 23 lb, 1 garl...",Place a rack in the middle of oven and preheat...
590,grilled whole cauliflower with miso mayo,"[leaves removed, 1 tbsp soy sauce, 12 tsp or m...",Prepare a grill for medium-high heat. Sprinkle...
331,risotto with mushrooms and thyme,"[1 tbsp kosher salt, 1 lb mushrooms such as sh...",Combine 1 Tbsp. salt and 10 cups water in a me...
4237,baconcheddar muffins,"[34 cup allpurpose flour, 6 tbsp unsalted butt...",Preheat oven to 400 with rack near top. Line t...


- In the above, there have been 5 top recipe recommendations  based on their cosine similarities with the first recipe.

In [41]:
def clean_title(title):
    # Normalize titles by removing punctuation, converting to lowercase, and stripping extra spaces
    title = re.sub(r'[^\w\s]', '', title)  # Remove non-word characters
    title = re.sub(r'\s+', ' ', title).strip().lower()  # Convert to lowercase and strip spaces
    return title

# Example user input (title or ingredients)
user_input_cleaned = clean_title(user_input)  # Clean the user input


- Create a function to recommend the recipes.

In [42]:
#function to recommend recipes
def recommend_recipe(user_input, combined_df, tfidf_vectorizer, tfidf_matrix, n=10):
    # Clean the user input
    user_input_cleaned = clean_title(user_input)
    
    # Check if the user input matches a specific title
    matching_title = combined_df[combined_df['Title'].str.lower().str.contains(user_input_cleaned)]
    
    #check if there is a match by title
    if not matching_title.empty:
        return matching_title[['Title', 'Ingredients', 'Instructions']]
    
    # If no exact title match is found, proceed with TF-IDF based recommendation
    # Transform the user input
    user_input_transformed = tfidf_vectorizer.transform([user_input_cleaned])
    
    # Compute the cosine similarity between the user input and the recipes
    cosine_sim = cosine_similarity(user_input_transformed, tfidf_matrix)
    
    # Get the index of the top n most similar recipes
    similar_recipes = cosine_sim[0].argsort()[-n:][::-1]
    
    # Return the top n most similar recipes
    return combined_df.iloc[similar_recipes][['Title', 'Ingredients', 'Instructions']]
    

In [43]:
#Example usage
user_input = "githeri"
recommendations = recommend_recipe(user_input, combined_df, tfidf_vectorizer, tfidf_matrix)

# View recommendations
recommendations

Unnamed: 0,Title,Ingredients,Instructions
13545,githeri fresh beans and maize,"[bean pods, spring onion, cooking fat, green m...","Preparation 2 hours , Cooking 20 minutes , Ser..."
13546,githeri stewed maize beans,"[onion, cooking oil, salt, water, maize and be...","Preparation 10 minutes , Cooking 15 minutes , ..."
13547,githeri sauted fresh maize beans,"[onion, kidney beans, cooking fat, green maize...","Preparation 5 hours 30 minutes , Cooking 10 mi..."


- Based on the above, the system works in a way that when a user enters a specific title of a food like "mandazi" it returns all instances where the food title is seen otherwise, if the user enters ingredients, it converts the user input to a vector then performs the cosine similarity. Afterwards, it returns recipes of the top foods with highest cosine similarities.

- This system works accordingly in recommending the recipes to cook.
