# Food Recommender System

# Introduction
The increasing awareness of health and nutrition has led to a growing demand for personalized dietary recommendations. With the vast array of food options and complex nutritional guidelines, individuals often find it challenging to make informed choices that align with their health goals and dietary preferences. A food recommender system emerges as a powerful tool to address this challenge, leveraging data and machine learning to provide personalized food and meal suggestions. Such systems can help users navigate the complexities of nutrition and diet planning, encouraging healthier eating habits and improving overall wellness.

In our fast-paced digital age, we're inundated with vast amounts of data, offering a wealth of information at our fingertips. Yet, sifting through this sea of data to find relevant information tailored to our specific interests and needs can be daunting. This is where recommendation systems come into play, serving as a navigational tool to guide users through the information overload towards their desired outcomes.

Recommendation systems are designed to establish a connection between users and items by identifying similarities among them. This connection is then leveraged to predict and suggest items a user might find appealing.


Aim
The aim of this project is to develop a personalized food recommender system that assists users in making healthier food choices, based on their individual dietary needs, preferences, health objectives, and restrictions. The system will offer tailored recommendations for meals and foods, helping users achieve a balanced diet that supports their overall health and wellness goals.

Objectives
To achieve this aim, the project will focus on several key objectives:

Develop a User Profile Module:
To collect and manage detailed user information, including dietary preferences, health goals, and nutritional requirements.
Integrate a Comprehensive Nutritional Database:
To utilize a rich database of foods and their nutritional content, serving as the foundation for generating accurate recommendations.
Implement a Sophisticated Recommendation Engine:
To employ machine learning algorithms for analyzing user profiles and matching them with suitable food options, ensuring recommendations are personalized and relevant.
Create an Interactive User Interface:
To design an engaging and intuitive interface that allows users to easily input their preferences, interact with recommendations, and track their dietary habits.
Incorporate a Feedback Mechanism:
To enable users to rate recommendations and provide feedback, facilitating continuous improvement of the recommendation engine through adaptive learning.
Ensure Compliance with Dietary Guidelines:
To align the recommendations with established dietary guidelines and nutritional best practices, ensuring healthful advice.
Foster Educational Engagement:
To offer educational resources and insights about nutrition and healthy eating, empowering users with knowledge to make informed dietary decisions.

In today's digital age, we find ourselves submerged in an ocean of data, which, while rich in information, presents a daunting challenge for users attempting to sift through it to find what truly matters to them. Recommendation systems emerged as a beacon of guidance in this complex landscape, ingeniously navigating the user through the clutter to discover products and information aligned with their interests.

Recommendation systems ingeniously bridge the gap between users and products by identifying and leveraging similarities in user preferences and item characteristics. These systems serve a multitude of purposes:

They streamline the search process, enabling users to discover the right products efficiently.
They enhance user engagement, as evidenced by a 40% increase in clicks on Google News, fueled by tailored recommendations.
They support item providers by accurately targeting and delivering products to the appropriate audience, a strategy that accounts for 35% of Amazon's product sales.
They offer personalized content, significantly contributing to user satisfaction on platforms like Netflix, where recommendations drive the majority of movie rentals.
In the realm of nutrition and dietary advice, the scenario is no less complex. The internet is awash with apps, videos, and articles prescribing what to eat and what to avoid. However, the abundance of this information rather than clarifying, often muddies the waters for the public. With conflicting advice and dubious sources, users find themselves at a crossroads, unsure of whom to trust or what guidelines to follow. This confusion, exacerbated by sometimes misleading information, can have direct consequences on an individual's health and well-being.

Recognizing the critical need for reliable and personalized dietary recommendations, our project aims to cut through this noise, providing clear, trustworthy, and scientifically-backed advice tailored to individual dietary needs and preferences. By developing a sophisticated food recommender system, we seek to eliminate the confusion and risk of misinformation, guiding users towards healthier eating habits and, ultimately, fostering better health outcomes. This initiative not only aims to empower users with accurate and personalized dietary insights but also to set a new standard in dietary guidance, making it more accessible, reliable, and conducive to a healthy lifestyle.


How Can a Food Recommender System Make a Difference?
Facilitate Nutritional Choices: Just as recommendation systems help users find the right products, a food recommender system can guide individuals towards healthier eating choices, making it simpler to identify foods that align with their dietary goals and preferences.
Enhance User Engagement: Similar to the 40% increase in clicks observed on Google News due to recommendations, a food recommender system can significantly boost user engagement by providing personalized food suggestions, leading to a more active and involved user base.
Match Dietary Needs with Suitable Foods: As Amazon sees a 35% increase in product sales through recommendations, a food recommender system can adeptly connect users with the right dietary options. This ensures that individuals receive tailored food suggestions that not only cater to their nutritional requirements but also support their overall well-being.
Personalization of Dietary Plans: Echoing Netflix's success, where most rented movies stem from recommendations, a food recommender system can transform the way users approach their diet. By offering personalized meal and food suggestions, it brings a new level of customization to dietary planning, making it easier for users to follow a balanced and nutritious diet tailored specifically to their needs.
In essence, a food recommender system transcends traditional information retrieval methods by providing a curated, personalized experience that aids users in navigating the complexities of nutrition and dietary choices. This not only enhances user satisfaction and engagement but also promotes healthier lifestyle choices, thereby contributing to the well-being of individuals in a meaningful way.

Recommendation systems can be classified into six main types. Popularity-based systems recommend items that are popular among most users. Classification model-based systems use user characteristics and classification algorithms to predict interest in products. Content-based recommendations suggest items similar to those a user has previously liked, focusing on item content. Collaborative filtering assumes users will like items similar to what they've liked before or items liked by users with similar tastes. Hybrid approaches combine multiple recommendation strategies. Association rule mining identifies relationships between items based on their co-occurrence in transactions.

Here I am considering the hybrid approach might be the most effective, as the food recommender system focused on nutritional advice and diet balancing,. This method combines content-based recommendations, which can suggest foods based on their nutritional content and similarity to user preferences, with collaborative filtering, to leverage patterns in user behavior and preferences. Incorporating classification models could help predict whether a user would like a suggested item based on their dietary goals. This multifaceted strategy allows for personalized, accurate, and comprehensive food recommendations.

In [4]:
!pip3 install Flask
!pip3 install spacy


Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [17]:
import numpy as np
import pandas as pd
import os
import tensorflow as tf
import re
import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

import math
import json
import time
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.neighbors import NearestNeighbors
import scipy.sparse
from scipy.sparse import csr_matrix
from scipy.sparse.linalg import svds
import warnings; warnings.simplefilter('ignore')
%matplotlib inline

import spacy
import string
import unicodedata

In [18]:
file_path = 'FoodIngredientsandRecipe.csv'
print(f'File is {os.path.getsize(file_path)/1_000_000:.2f} MB')

File is 20.89 MB


In [84]:
df = pd.read_csv(file_path, index_col=0)

df.head()

Unnamed: 0,Title,Ingredients,Instructions,Image_Name
0,Miso-Butter Roast Chicken With Acorn Squash Panzanella,"['1 (3½–4-lb.) whole chicken', '2¾ tsp. kosher salt, divided, plus more', '2 small acorn squash ...","Pat chicken dry with paper towels, season all over with 2 tsp. salt, and tie legs together with ...",miso-butter-roast-chicken-acorn-squash-panzanella
1,Crispy Salt and Pepper Potatoes,"['2 large egg whites', '1 pound new potatoes (about 1 inch in diameter)', '2 teaspoons kosher sa...","Preheat oven to 400°F and line a rimmed baking sheet with parchment. In a large bowl, whisk the ...",crispy-salt-and-pepper-potatoes-dan-kluger
2,Thanksgiving Mac and Cheese,"['1 cup evaporated milk', '1 cup whole milk', '1 tsp. garlic powder', '1 tsp. onion powder', '1 ...",Place a rack in middle of oven; preheat to 400°. Bring evaporated milk and whole milk to a bare ...,thanksgiving-mac-and-cheese-erick-williams
3,Italian Sausage and Bread Stuffing,"['1 (¾- to 1-pound) round Italian loaf, cut into 1-inch cubes (8 cups)', '2 tablespoons olive oi...",Preheat oven to 350°F with rack in middle. Generously butter baking dish.\nPut bread in 2 shallo...,italian-sausage-and-bread-stuffing-240559
4,Newton's Law,"['1 teaspoon dark brown sugar', '1 teaspoon hot water', '1 ½ oz. bourbon', '½ oz. fresh lemon ju...","Stir together brown sugar and hot water in a cocktail shaker to dissolve. Let cool, then add bou...",newtons-law-apple-bourbon-cocktail


In [22]:
# Rename the index to 'recipe_id'
df.index.name = 'Recipe_id'

# Verify the changes
df.head()

Unnamed: 0_level_0,Title,Ingredients,Instructions,Image_Name
Recipe_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,Miso-Butter Roast Chicken With Acorn Squash Panzanella,"['1 (3½–4-lb.) whole chicken', '2¾ tsp. kosher salt, divided, plus more', '2 small acorn squash ...","Pat chicken dry with paper towels, season all over with 2 tsp. salt, and tie legs together with ...",miso-butter-roast-chicken-acorn-squash-panzanella
1,Crispy Salt and Pepper Potatoes,"['2 large egg whites', '1 pound new potatoes (about 1 inch in diameter)', '2 teaspoons kosher sa...","Preheat oven to 400°F and line a rimmed baking sheet with parchment. In a large bowl, whisk the ...",crispy-salt-and-pepper-potatoes-dan-kluger
2,Thanksgiving Mac and Cheese,"['1 cup evaporated milk', '1 cup whole milk', '1 tsp. garlic powder', '1 tsp. onion powder', '1 ...",Place a rack in middle of oven; preheat to 400°. Bring evaporated milk and whole milk to a bare ...,thanksgiving-mac-and-cheese-erick-williams
3,Italian Sausage and Bread Stuffing,"['1 (¾- to 1-pound) round Italian loaf, cut into 1-inch cubes (8 cups)', '2 tablespoons olive oi...",Preheat oven to 350°F with rack in middle. Generously butter baking dish.\nPut bread in 2 shallo...,italian-sausage-and-bread-stuffing-240559
4,Newton's Law,"['1 teaspoon dark brown sugar', '1 teaspoon hot water', '1 ½ oz. bourbon', '½ oz. fresh lemon ju...","Stir together brown sugar and hot water in a cocktail shaker to dissolve. Let cool, then add bou...",newtons-law-apple-bourbon-cocktail


In [23]:
# Ensure you have NLTK data downloaded (do this once)
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\mjoth\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\mjoth\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

In [25]:
titles = df['Title']
titles.head(10)

Recipe_id
0    Miso-Butter Roast Chicken With Acorn Squash Panzanella
1                           Crispy Salt and Pepper Potatoes
2                               Thanksgiving Mac and Cheese
3                        Italian Sausage and Bread Stuffing
4                                              Newton's Law
5                                              Warm Comfort
6                                        Apples and Oranges
7                                        Turmeric Hot Toddy
8                                   Instant Pot Lamb Haleem
9            Spiced Lentil and Caramelized Onion Baked Eggs
Name: Title, dtype: object

In [26]:
print(f'Count including NaNs: {len(titles)}')
print(titles.describe())

Count including NaNs: 13501
count             13496
unique            13305
top       Potato Latkes
freq                  5
Name: Title, dtype: object


In [27]:
print("Title lengths:")
lengths = titles.str.len()
print(lengths.describe())

Title lengths:
count    13496.000000
mean        32.761633
std         14.756405
min          3.000000
25%         21.000000
50%         31.000000
75%         43.000000
max        112.000000
Name: Title, dtype: float64


In [28]:
# Check null values
print(f'Null value count: {titles.isnull().sum()}')

# Drop null values
titles.dropna(inplace=True)

Null value count: 5


In [103]:
#check null values after dropping
print(f'Null value count: {titles.isnull().sum()}')

df['Title'].describe()

Null value count: 0


count             13496
unique            13305
top       Potato Latkes
freq                  5
Name: Title, dtype: object

In [29]:
# Enumerate characters
chars = set()

for title in titles:
    for char in title:
        chars.add(char)

chars = sorted(chars)
print(f'Unique characters: {len(chars)}')
print(''.join(chars))

Unique characters: 136

 !"#%&'()+,-./012345679:;ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ®ÁÉàáâãäçèéêëìíîïñòóôöøùúûüōờ́̃̉Сикнры –—‘’“”강개닭된장전정찌파


In [30]:
# Describe unexpected characters
pd.options.display.max_colwidth = 100
expected_chars = string.ascii_letters + string.digits + ' ' + '!"#%&\'(),-.:? ®ÁÉàáâãäçèéêëìíîïñòóôöøùúûüōờ́̃̉Сикнры –—‘’“”강개닭된장전정찌파'
unexpected_chars = [char for char in chars if char not in expected_chars]

for char in unexpected_chars:
    char_name = ('<control>' if (unicodedata.category(char) == 'Cc') else
        unicodedata.name(char))
    print(f'Unexpected character: {char_name}')
    titles_containing_char = titles[titles.str.contains(char, regex=False)]
    print(f'Titles containing char: {len(titles_containing_char)}')
    titles_containing_char.head()

Unexpected character: <control>
Titles containing char: 3
Unexpected character: PLUS SIGN
Titles containing char: 2
Unexpected character: SOLIDUS
Titles containing char: 6
Unexpected character: SEMICOLON
Titles containing char: 2
Unexpected character: NO-BREAK SPACE
Titles containing char: 2


In [85]:
def clean_ingredients(text):
    # Define a pattern to match common measurement units
    pattern_units = r'\b(?:tsp|tbsp|cup|pound|lb)\.?\b'
    
    # Remove common measurement units
    cleaned_text = re.sub(pattern_units, '', text, flags=re.IGNORECASE)
    
    return cleaned_text.strip()

# Apply the cleaning function to the Ingredients column
df['Cleaned_Ingredients'] = df['Ingredients'].apply(clean_ingredients)

# Tokenize the cleaned ingredients to extract main ingredients
main_ingredients = df['Cleaned_Ingredients'].str.split(',').explode().str.strip().value_counts()

print("Main ingredients:")
main_ingredients.head(20)

Main ingredients:


Cleaned_Ingredients
divided'                         3817
chopped'                         1887
peeled                           1541
finely chopped'                  1414
thinly sliced'                   1331
'Kosher salt'                    1008
minced'                           808
room temperature'                 806
'Kosher salt                      707
coarsely chopped'                 670
'1/2 teaspoon salt'               645
'2 garlic cloves                  616
freshly ground pepper'            595
'1/4 teaspoon salt'               591
'Freshly ground black pepper'     484
plus more'                        478
'1 garlic clove                   470
halved'                           456
peeled'                           434
trimmed                           409
Name: count, dtype: int64

In [89]:
df

Unnamed: 0,Title,Ingredients,Instructions,Image_Name,Cleaned_Ingredients
0,Miso-Butter Roast Chicken With Acorn Squash Panzanella,"['1 (3½–4-lb.) whole chicken', '2¾ tsp. kosher salt, divided, plus more', '2 small acorn squash ...","Pat chicken dry with paper towels, season all over with 2 tsp. salt, and tie legs together with ...",miso-butter-roast-chicken-acorn-squash-panzanella,"[ ' 1 ( 3½–4- . ) chicken ' , ' 2¾ . salt , , plus ' , ' 2 squash ( about 3 . ) ' , ' 2 . finely..."
1,Crispy Salt and Pepper Potatoes,"['2 large egg whites', '1 pound new potatoes (about 1 inch in diameter)', '2 teaspoons kosher sa...","Preheat oven to 400°F and line a rimmed baking sheet with parchment. In a large bowl, whisk the ...",crispy-salt-and-pepper-potatoes-dan-kluger,"[ ' 2 egg whites ' , ' 1 potatoes ( about 1 inch in diameter ) ' , ' 2 salt ' , ' ¾ teaspoon f..."
2,Thanksgiving Mac and Cheese,"['1 cup evaporated milk', '1 cup whole milk', '1 tsp. garlic powder', '1 tsp. onion powder', '1 ...",Place a rack in middle of oven; preheat to 400°. Bring evaporated milk and whole milk to a bare ...,thanksgiving-mac-and-cheese-erick-williams,"[ ' 1 milk ' , ' 1 milk ' , ' 1 . powder ' , ' 1 . onion powder ' , ' 1 . paprika ' , ' ½ . ..."
3,Italian Sausage and Bread Stuffing,"['1 (¾- to 1-pound) round Italian loaf, cut into 1-inch cubes (8 cups)', '2 tablespoons olive oi...",Preheat oven to 350°F with rack in middle. Generously butter baking dish.\nPut bread in 2 shallo...,italian-sausage-and-bread-stuffing-240559,"[ ' 1 ( to 1- ) loaf , into 1 - inch cubes ( 8 cups ) ' , ' 2 tablespoons olive oil , ' , ' 2 po..."
4,Newton's Law,"['1 teaspoon dark brown sugar', '1 teaspoon hot water', '1 ½ oz. bourbon', '½ oz. fresh lemon ju...","Stir together brown sugar and hot water in a cocktail shaker to dissolve. Let cool, then add bou...",newtons-law-apple-bourbon-cocktail,"[ ' 1 teaspoon sugar ' , ' 1 teaspoon water ' , ' 1 ½ oz . bourbon ' , ' ½ oz . lemon juice ' , ..."
...,...,...,...,...,...
13496,Brownie Pudding Cake,"['1 cup all-purpose flour', '2/3 cup unsweetened cocoa powder', '3/4 teaspoon double-acting baki...","Preheat the oven to 350°F. Into a bowl sift together the flour, 1/3 cup of the cocoa powder, the...",brownie-pudding-cake-14408,"[ ' 1 all - purpose flour ' , ' 2/3 cocoa powder ' , ' 3/4 teaspoon - baking powder ' , ' 3/..."
13497,Israeli Couscous with Roasted Butternut Squash and Preserved Lemon,"['1 preserved lemon', '1 1/2 pound butternut squash, peeled and seeded, and cut into 1/4-inch di...","Preheat oven to 475°F.\nHalve lemons and scoop out flesh, keeping both flesh and peel. Cut enoug...",israeli-couscous-with-roasted-butternut-squash-and-preserved-lemon-102250,"[ ' 1 lemon ' , ' 1 1/2 butternut squash , and , and into 1/4 - inch dice ' , ' 3 tablespoons ..."
13498,Rice with Soy-Glazed Bonito Flakes and Sesame Seeds,['Leftover katsuo bushi (dried bonito flakes) from making dashi or 1 cup katsuo bushi fresh from...,"If using katsuo bushi flakes from package, moisten with a few drops of sake or water. Finely cho...",rice-with-soy-glazed-bonito-flakes-and-sesame-seeds-103400,"[ ' Leftover katsuo bushi ( bonito flakes ) from dashi or 1 katsuo bushi from package ' , ' 1 ..."
13499,Spanakopita,"['1 stick (1/2 cup) plus 1 tablespoon unsalted butter', '1 lb baby spinach', '1/2 lb feta, crumb...","Melt 1 tablespoon butter in a 12-inch heavy skillet over moderate heat, then cook spinach, stirr...",spanakopita-107344,"[ ' 1 stick ( 1/2 ) plus 1 tablespoon butter ' , ' 1 baby spinach ' , ' 1/2 feta , ( 2 cups ..."


To extract the main ingredients using NLP, we have to follow these steps:

**Clean the text:** Initially, I'll preprocess the text by removing any irrelevant information such as special characters, numbers, and measurements, ensuring that only the text relevant to ingredients remains.

**Tokenize the text:** Following cleaning, I'll tokenize the preprocessed text, splitting it into individual words or tokens. This step is essential for further analysis as it breaks down the text into manageable units.

**Identify ingredient tokens:** Leveraging NLP techniques like part-of-speech tagging and named entity recognition, I'll identify tokens within the text that likely represent ingredients. This process helps distinguish ingredient-related terms from other text components.

**Filter out non-ingredient tokens:** Once we've identified potential ingredient tokens, I'll filter out any non-ingredient tokens, including measurements, descriptors, and punctuation. This refinement ensures that we're left with only the terms directly related to ingredients.

**Determine main ingredients:** With the remaining tokens, I'll analyze their frequency or relevance within the context of the text to determine the main ingredients. By examining factors like token frequency or association with specific recipes, we can identify the key components of the dish.

After doing all these, I will be creating a separate column named Cleaned_Ingredients and will be storing these new ingredient's list.

In [90]:
import spacy
import re
from spacy.matcher import PhraseMatcher
from spacy.tokens import Span
from spacy.util import compile_prefix_regex, compile_infix_regex, compile_suffix_regex

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")

# Multi-word ingredients to be identified
multi_word_ingredients = [
    "olive oil", "all purpose flour", "chana dal", "egg white", "unsalted butter", "Red pepper flakes",
    "lemon juice", "cocoa powder", "baking powder", "apple cider vinegar", "allpurpose flour",
    "lime juice", "freshly",
    "vanilla extract", "pepper flakes", "unsalted butter","black pepper", "Extravirgin olive oil"
]

# Initialize PhraseMatcher with the shared vocab
matcher = PhraseMatcher(nlp.vocab, attr="LOWER")
# Create pattern Doc objects and add them to the matcher
patterns = [nlp(text) for text in multi_word_ingredients]
matcher.add("MULTI_WORD_INGREDIENTS", patterns)

def clean_ingredients_nlp(ingredient_text):
    # Regex to remove numbers (including fractions) and common measurements
    # Improved regex to include Unicode fractions
    ingredient_text = re.sub(r'\b\d+/?\d*\s*|\u00BC|\u00BD|\u00BE|\u2153|\u2154|\u215B', '', ingredient_text)  # Remove numbers, simple fractions, and Unicode fractions
    ingredient_text = re.sub(r'\b(?:tsp|tbsp|cup|cups|pounds?|oz|ounces?|lbs?|g|grams?|mgs?|kgs?|ls?|mls?|quarts?|pints?|gallons?)\b\.?', '', ingredient_text, flags=re.I)

    doc = nlp(ingredient_text.lower())
    matches = matcher(doc)
    spans = []  # To store the matched spans

    # Merge multi-word ingredients into single tokens
    for match_id, start, end in matches:
        span = Span(doc, start, end, label=match_id)
        spans.append(span)
    with doc.retokenize() as retokenizer:
        for span in spans:
            retokenizer.merge(span)

    # Filtering out unwanted descriptors and common measurements
    unwanted_pos = {"VERB", "ADJ"}  # Define the part-of-speech tags for verbs
    common_words = ["and", "or", "with", "plus", "as well as", "including", "such as", "of", "for", "to", "a", "an", "the", "and/or"]

    measurements_descriptors = {
        "cup", "cups", "tsp", "kg", "kgs", "teaspoon", "tablespoon", "divided", "chopped",
        "oz", "ounce", "ounces", "pound", "pounds", "g", "tbsp","total", "like","teaspoons",
        "gram", "grams", "ml", "liter", "slice", "slices", "pinch", "powder","tablespoons",
        "dash", "quarter", "half", "temperature", "lb", "tmp", "room", "white",
        "inch", "diameter", "large", "small", "medium", "hot", "cold", "warm",
        "new", "old", "ripe", "fresh", "dry", "sweet", "melted", "finely", "chopped"
    }
    cleaned_ingredients = [token.text for token in doc 
                           if token.pos_ not in unwanted_pos
                           and token.text not in unwanted_pos
                           and token.pos not in common_words
                           and token.text not in measurements_descriptors
                           and not token.is_punct
                           and not token.is_stop]

    return ", ".join(cleaned_ingredients)

In [91]:
df['Cleaned_Ingredients'] = df['Cleaned_Ingredients'].apply(clean_ingredients_nlp)
df['Cleaned_Ingredients'].head(20)

0     chicken,  , salt, plus, butter, plus,  , ground, allspice, pepper flakes, freshly, pepper,  , lo...
1                                                                   egg, whites, potatoes, salt,  , thyme
2     milk, milk, onion, paprika,  , freshly, salt, plus, cheddar, coarsely,  , cream, cheese, elbow, ...
3     loaf, cubes,  , tablespoons, olive oil,  , sausage, casings, butter, pieces, onions, celery, rib...
4     sugar, water,   , bourbon,   , lemon juice, apple, butter, storebought, orange, twist, freshly, ...
5                                               tea, bags,   , reposado, tequila,   , lemon juice, nectar
6      , grand,  , amaro, averna, pat, butter,  , apple, cider,  , lemon juice, sweetness, cider, fres...
7        , sugar,  , ground, turmeric,   , amontillado, sherry,  , bourbon, rum, scotch, gin,  , turme...
8       , dals, chana dal, moong, dal, masoor, dal, and/or, urad, dal,    , jasmine, rice, grain, rice...
9      , lentil, soup, onion, thinly,  , turme

In [104]:
import spacy

# Load English tokenizer, tagger, parser, and NER
nlp = spacy.load("en_core_web_sm")

# Function to count adjectives and verbs
def count_adjectives_verbs(text):
    # Process the text with spaCy
    doc = nlp(text)
    # Initialize counts
    adjective_count = 0
    verb_count = 0
    # Iterate through tokens in the document
    for token in doc:
        # Check if the token is an adjective
        if token.pos_ == "ADJ":
            adjective_count += 1
        # Check if the token is a verb
        elif token.pos_ == "VERB":
            verb_count += 1
    return adjective_count, verb_count

# Split the text into chunks of manageable size
chunk_size = 100000  # Adjust as needed
chunks = [df['Cleaned_Ingredients'].str.cat(sep=', ').encode('utf-8')[i:i+chunk_size].decode('utf-8') for i in range(0, len(df['Cleaned_Ingredients']), chunk_size)]

# Initialize total counts
total_adjective_count = 0
total_verb_count = 0

# Process each chunk and accumulate counts
for chunk in chunks:
    adjective_count, verb_count = count_adjectives_verbs(chunk)
    total_adjective_count += adjective_count
    total_verb_count += verb_count

print("Total Adjective Count:", total_adjective_count)
print("Total Verb Count:", total_verb_count)


Total Adjective Count: 125
Total Verb Count: 46


In [96]:
# Function to remove adjectives and verbs from cleaned ingredients
def remove_adjectives_verbs(text, max_length=1000000):
    # Check the length of the text
    if len(text) > max_length:
        print("Text length exceeds maximum limit. Skipping removal.")
        return text

    # Process the text with spaCy
    doc = nlp(text)
    # Initialize list to store tokens to keep
    cleaned_tokens = []
    # Iterate through tokens in the document
    for token in doc:
        # Check if the token is not an adjective or a verb
        if token.pos_ not in ["ADJ", "VERB"]:
            cleaned_tokens.append(token.text)
    # Join the cleaned tokens into a single string
    cleaned_text = " ".join(cleaned_tokens)
    return cleaned_text

# Apply the function to remove adjectives and verbs from each cleaned ingredient
df['Cleaned_Ingredients'] = df['Cleaned_Ingredients'].apply(remove_adjectives_verbs)

In [105]:
# Display the DataFrame with cleaned ingredients
df.tail(10)

Unnamed: 0,Title,Ingredients,Instructions,Image_Name,Cleaned_Ingredients
13491,"Tomato, Garlic, and Potato Frittata","['6 whole large eggs', '2 large egg whites', '1/2 cup finely grated parmesan (2 ounces)', '1/3 c...","Whisk together whole eggs, whites, 1/4 cup parmesan, basil, 1/2 teaspoon salt, and 1/4 teaspoon ...",tomato-garlic-and-potato-frittata-105017,"eggs , egg , whites , , thinly , basil , salt , pepper , cloves , thinly , tablespoons , oil ,..."
13492,Cornmeal Pancakes with Honey-Pecan Butter,"['1/2 cup (1 stick) unsalted European-style butter, room temperature', '2 tablespoons honey', 'G...","Using electric mixer, beat 1/2 cup butter, honey, and cinnamon in small bowl until fluffy. Stir ...",cornmeal-pancakes-with-honey-pecan-butter-108554,"stick , style , butter , tablespoons , honey , ground , cinnamon , pecans , plus , tablespoons ,..."
13493,Chocolate Heart Layer Cake with Chocolate-Cinnamon Mousse,"['4 tablespoons unsalted butter, melted', '1/3 cup all purpose flour', '1/3 cup unsweetened coco...",Preheat oven to 400°F. Place 8x8x2-inch heart-shaped cake ring on sheet of foil. Wrap foil up si...,chocolate-heart-layer-cake-with-chocolate-cinnamon-mousse-107716,"tablespoons , butter , all purpose flour , cocoa powder , salt , eggs , sugar , cream , sticks ,..."
13494,Ginger-Pecan Roulade with Honey-Glazed Pecans,"['1/2 stick (1/4 cup) unsalted butter, melted, plus additional for brushing pan', '3/4 cup pecan...","Preheat oven to 350°F. Line bottom and sides of a 15- by 10- by 1-inch jelly-roll pan with foil,...",ginger-pecan-roulade-with-honey-glazed-pecans-104254,"stick , butter , plus , pan , pecans , cake , flour , self , plus , cocoa powder , plus , ground..."
13495,Dandelion Salad with Warm Bacon Dressing,"['1 lb tender dandelion greens, tough stems removed', '5 bacon slices', '1 1/2 tablespoons finel...",Cut greens into 1 1/2-inch lengths and transfer to a large bowl.\nCook bacon in a large heavy sk...,dandelion-salad-with-warm-bacon-dressing-106375,"dandelion , greens , bacon , tablespoons , tablespoons , vinegar , salt , pepper"
13496,Brownie Pudding Cake,"['1 cup all-purpose flour', '2/3 cup unsweetened cocoa powder', '3/4 teaspoon double-acting baki...","Preheat the oven to 350°F. Into a bowl sift together the flour, 1/3 cup of the cocoa powder, the...",brownie-pudding-cake-14408,"purpose , flour , cocoa powder , powder , salt , eggs , sugar , stick , tablespoons , butter , m..."
13497,Israeli Couscous with Roasted Butternut Squash and Preserved Lemon,"['1 preserved lemon', '1 1/2 pound butternut squash, peeled and seeded, and cut into 1/4-inch di...","Preheat oven to 475°F.\nHalve lemons and scoop out flesh, keeping both flesh and peel. Cut enoug...",israeli-couscous-with-roasted-butternut-squash-and-preserved-lemon-102250,"lemon , butternut , squash , dice , tablespoons , oil , onion , , acini , di , pepe , pasta , ..."
13498,Rice with Soy-Glazed Bonito Flakes and Sesame Seeds,['Leftover katsuo bushi (dried bonito flakes) from making dashi or 1 cup katsuo bushi fresh from...,"If using katsuo bushi flakes from package, moisten with a few drops of sake or water. Finely cho...",rice-with-soy-glazed-bonito-flakes-and-sesame-seeds-103400,"katsuo , bushi , bonito , flakes , dashi , katsuo , bushi , package , sake , sugar , tablespoons..."
13499,Spanakopita,"['1 stick (1/2 cup) plus 1 tablespoon unsalted butter', '1 lb baby spinach', '1/2 lb feta, crumb...","Melt 1 tablespoon butter in a 12-inch heavy skillet over moderate heat, then cook spinach, stirr...",spanakopita-107344,"stick , plus , butter , baby , spinach , feta , , freshly , nutmeg , sheets"
13500,"Mexican Poblano, Spinach, and Black Bean ""Lasagne"" with Goat Cheese","['12 medium to large fresh poblano chiles (2 1/4 lb)', '1 (14- to 16-oz) can whole tomatoes incl...","Lay 4 chiles on their sides on racks of gas burners and turn flames on high. Roast chiles, turni...",mexican-poblano-spinach-and-black-bean-lasagne-with-goat-cheese-107413,"chiles , , cloves , cilantro , sugar , salt , tablespoons , oil , cream , epazote , , goat ,..."


In [107]:
import pandas as pd

# Split each row on the comma to get individual ingredients and then use explode to create a row for each ingredient.
ingredients_series = df['Cleaned_Ingredients'].str.split(', ')

# Aggregate ingredients into a list for each recipe
unique_ingredients_per_recipe = ingredients_series.apply(lambda x: ', '.join(set(x)))

# Create a DataFrame with recipe IDs and unique ingredients
unique_ingredients_df = pd.DataFrame({'Title': df['Title'], 'Unique_Ingredients': unique_ingredients_per_recipe})

# Display the DataFrame
unique_ingredients_df.head()


Unnamed: 0,Title,Unique_Ingredients
0,Miso-Butter Roast Chicken With Acorn Squash Panzanella,"loaf , onion , lady , butter , thinly , , pepper, purpose , freshly , pepper flakes , allspi..."
1,Crispy Salt and Pepper Potatoes,"whites , potatoes , egg , salt , , thyme"
2,Thanksgiving Mac and Cheese,"onion , cheese , coarsely , milk , plus , macaroni, freshly , paprika , cheddar , cream , elbow ..."
3,Italian Sausage and Bread Stuffing,"loaf , , butter , , sausage , stock , giblet , coarsely , tablespoons , parmigiano , sodium ..."
4,Newton's Law,", sugar , butter , bourbon , water , lemon juice , storebought , freshly , twist , cinnamon, ..."


Cosine similarity measures the cosine of the angle between two non-zero vectors in a multi-dimensional space. In the context of text analysis and recommender systems, these vectors often represent documents or items (like movie descriptions) in terms of their TF-IDF scores, which indicate how important a word is to a document in a collection of documents.

Value Range: The cosine similarity score ranges from -1 to 1, where 1 means the vectors are identical, 0 means they're orthogonal (no similarity), and -1 means they are diametrically opposed (but this is rare in text analysis as term frequencies cannot be negative).

Why Cosine?: Cosine similarity is particularly useful in text analysis because it is unaffected by the magnitude of the vectors. This means it measures similarity more in terms of orientation and less in terms of size (magnitude), making it ideal for situations where the length of documents varies.

Applying Cosine Similarity
When you compute cosine similarity in the context of TF-IDF vectors from text data:

TF-IDF Vectors: First, each item's description is transformed into a TF-IDF vector. In these vectors, each dimension corresponds to a unique word (or n-gram) in the entire dataset, and the value in each dimension is the TF-IDF score of that word for the given item.

Cosine Similarity Matrix: By calculating the cosine similarity between all pairs of TF-IDF vectors, you obtain a similarity matrix. In this matrix, each row and column represent an item, and each cell (i, j) contains the cosine similarity score between the TF-IDF vectors of items i and j.

Interpreting the Cosine Similarity Matrix
Diagonal: The diagonal of the matrix (where i = j) will always be 1, because the cosine similarity between any vector and itself is 1.

Off-Diagonal Values: The off-diagonal values give you the similarity scores between different items. A higher score indicates a greater similarity between the items in terms of their textual content.

Using the Cosine Similarity Matrix for Recommendations
To use this matrix for making recommendations:

For a Given Item: To find items similar to a given item, look at the row or column corresponding to that item in the similarity matrix. The highest values in this row/column (excluding the diagonal 1) point to the items most similar to the given item.

In [None]:
similarities = cosine_sim[0]  # Get similarity scores for the first item with all items
recommended_item_indices = np.argsort(-similarities)[1:4]  # Indices of top 3 similar items, excluding the item itself

print(f"Recommended items for item 0: {recommended_item_indices}")

Recommended items for item 0: [1283  920 6224]


If we're looking for items similar to item 0, we would examine the first row of the cosine similarity matrix. Let's say the values are [202  920 600], excluding the diagonal, the highest value here is 920, which indicates that item 202 is the most similar to item 920.

In [None]:
df['Cleaned_Ingredients'].iloc[202] # Details of the most similar item

'honey, rosemary, kosher, salt, apple cider vinegar, red, onion, thinly, sliced'

In [None]:
df['Cleaned_Ingredients'].iloc[920]  # Details of the second most similar item


'loaf, countrystyle, bread, torn, extravirgin, olive oil, onions, fennel, bulb, celery, stalks, wine, unsalted butter, parsley, sage, rosemary, thyme, kosher, salt, freshly, ground, black pepper, eggs, lowsodium, chicken, broth'

In [None]:
df['Cleaned_Ingredients'].iloc[600] # Details of the third most similar item

'allpurpose, flour, dusting, kosher, salt, chilled, unsalted butter, cut, apple cider vinegar, heirloom, tomatoes, sliced, thick, garlic, thinly, sliced, firm, cheese, asiago, cheddar, gouda, grated, egg, beaten, blend, flaky, sea, salt, freshly, ground, black pepper, lemon, chives'

This process leverages the computed cosine similarities to guide recommendations, ensuring that recommended items are textually similar to the item of interest, based on their descriptions or other textual features processed through TF-IDF and n-grams.

In [142]:
import requests

api_key = "pMLeIW9BhKSckb6M7KvdgqGkhn7anXQ7YoWS8fwP"

def fetch_nutritional_info(ingredient):
    search_url = "https://api.nal.usda.gov/fdc/v1/foods/search"
    api_key = "pMLeIW9BhKSckb6M7KvdgqGkhn7anXQ7YoWS8fwP"

    params = {
        "query": ingredient,
        "api_key": api_key,
        "pageSize": 2  # Adjust based on how broad you want the search to be
    }
    response = requests.get(search_url, params=params)
    data = response.json()

   # Initialize default nutritional info; adjust as per your requirements
    nutrition_info = {"Recipe":0, "calories": 0, "protein": 0, "carbohydrates": 0, "fiber": 0, "sugar": 0, "sodium": 0}

    if 'foods' in data and len(data['foods']) > 0:
        fdc_id = data['foods'][0]['fdcId']
        
        # Fetching detailed food information
        details_url = f"https://api.nal.usda.gov/fdc/v1/food/{fdc_id}"
        details_response = requests.get(details_url, {"api_key": api_key})
        details_data = details_response.json()

        # Extracting nutritional information
        for nutrient in details_data['foodNutrients']:
            if nutrient['nutrient']['name'] == 'Energy':
                nutrition_info["calories"] = nutrient.get('amount', 0)  # Use 'amount' instead of 'value'
            elif nutrient['nutrient']['name'] == 'Protein':
                nutrition_info["protein"] = nutrient.get('amount', 0)  # Use 'amount' instead of 'value'
            elif nutrient['nutrient']['name'] == 'Carbohydrate, by difference':
                nutrition_info["carbohydrates"] = nutrient.get('amount', 0)
            elif nutrient['nutrient']['name'] == 'Fiber, total dietary':
                nutrition_info["fiber"] = nutrient.get('amount', 0)
            elif nutrient['nutrient']['name'] == 'Sugars, total including NLEA':
                nutrition_info["sugar"] = nutrient.get('amount', 0)
            elif nutrient['nutrient']['name'] == 'Sodium, Na':
                nutrition_info["sodium"] = nutrient.get('amount', 0)   
    return nutrition_info



In [156]:
import pandas as pd

# Initialize an empty list to store the data
nutrition_data = []

# Iterate over the first 50 rows in the DataFrame
for index, row in df.head(5).iterrows():
    # Fetch the title and cleaned ingredient for the current row
    title = row['Title']
    ingredient = row['Cleaned_Ingredients']
    
    # Call fetch_nutritional_info() for the ingredient
    nutritional_info = fetch_nutritional_info(ingredient)
    
    # Preprocess the nutritional information dictionary
    processed_info = {
        'Recipe': title,
        'Calories': nutritional_info.get('calories', 0),
        'Protein': nutritional_info.get('protein', 0),
        'Carbohydrates': nutritional_info.get('carbohydrates', 0),
        'Fiber': nutritional_info.get('fiber', 0),
        'Sugar': nutritional_info.get('sugar', 0),
        'Sodium': nutritional_info.get('sodium', 0)  # Add sodium if available
    }
    
    # Append the processed nutritional information to the list
    nutrition_data.append(processed_info)

# Create a DataFrame from the list of nutritional data
nutrition_df = pd.DataFrame(nutrition_data)

# Now nutrition_df contains the nutritional information for the first 50 items with each nutrient in a separate column



In [157]:
nutrition_df

Unnamed: 0,Recipe,Calories,Protein,Carbohydrates,Fiber,Sugar,Sodium
0,Miso-Butter Roast Chicken With Acorn Squash Panzanella,1099.0,6.09,72.12,21.6,0.0,77.0
1,Crispy Salt and Pepper Potatoes,423.0,5.56,24.45,14.0,0.0,9.0
2,Thanksgiving Mac and Cheese,125.0,6.0,13.5,1.0,0.0,420.0
3,Italian Sausage and Bread Stuffing,230.0,20.0,13.3,0.0,0.76,417.0
4,Newton's Law,22.0,0.35,6.9,0.3,2.52,1.0


In [None]:
nutritional_df.head(10)

In [158]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(ngram_range=(1, 2))
tfidf_matrix = vectorizer.fit_transform(df['Cleaned_Ingredients'])

In [159]:
from sklearn.metrics.pairwise import cosine_similarity

cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)


In [161]:
def get_similar_recipes(recipe_id, cosine_sim_matrix, top_n=5):
    recipe_idx = df.index[df['Title'] == recipe_id].tolist()[0]  # Find the index of the given recipe ID
    similarity_scores = list(enumerate(cosine_sim_matrix[recipe_idx]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    similarity_scores = similarity_scores[1:top_n+1]  # Exclude the first item (self) and get top N scores
    
    recipe_indices = [i[0] for i in similarity_scores]
    return df['Title'].iloc[recipe_indices]

# Example usage: Get 5 recipes similar to recipe with ID 'XYZ'
similar_recipes = get_similar_recipes(df['Title'].iloc[0], cosine_sim, 5)


This process enables our recommender system to suggest recipes with similar ingredients to any given recipe, enhancing discovery and personalization based on users' tastes or dietary preferences. Adjustments may be needed based on the actual structure and content of our dataset.

In [163]:
def recommend_recipes(title, cosine_sim=cosine_sim):
    # Check if the title exists in the dataset
    if title not in df['Title'].values:
        print("Recipe title not found in the dataset.")
        return []
    
    # Get the index of the recipe that matches the title
    idx = df.index[df['Title'] == title].tolist()[0]

    # Get the pairwise similarity scores of all recipes with that recipe
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the recipes based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar recipes
    sim_scores = sim_scores[1:11]

    # Get the recipe indices
    recipe_indices = [i[0] for i in sim_scores]

    # Return the top 10 most similar recipes
    return df['Title'].iloc[recipe_indices]


In [164]:
# Display some recipe titles from the dataset
print(df['Title'].sample(10))


1149               Grilled Rosemary Lamb with Juicy Tomatoes
132      Pork Shoulder Steaks With Horseradish-Mustard Sauce
11171                                Rigatoni with Duck Ragù
6306                       Smoked Summer Tomato Basil Butter
4901                     Celery-Spiked Guacamole with Chiles
3038                                Golden Milk Turmeric Tea
5940                                  Day-After Turkey Stock
2720                                     Spicy Confit Chiles
2411                                  Duchess Baked Potatoes
3921                                      English Pea Hummus
Name: Title, dtype: object


In [165]:
import ast

good_ingredients = ['kale', 'quinoa', 'blueberries']  # Example good ingredients
bad_ingredients = ['sugar', 'butter']  # Example bad ingredients

def filter_recipes(ingredients_str):
    # Convert stringified list back to list
    try:
        ingredients = ast.literal_eval(ingredients_str)
    except:
        # Handle potential error if conversion fails
        return False
    
    # Check for good and bad ingredients
    has_good = any(good in ingredients_str for good in good_ingredients)
    has_bad = any(bad in ingredients_str for bad in bad_ingredients)
    return has_good and not has_bad

# Apply the filter to get a subset of recipes
filtered_df = df[df['Ingredients'].apply(filter_recipes)]

# If filtered_df is empty, handle accordingly
if filtered_df.empty:
    print("No recommendations found based on filtering criteria.")
else:
    # Continue with your recommendation logic here
    # For example, displaying the filtered_df or further processing
    print(filtered_df[['Title', 'Cleaned_Ingredients']])


                                                    Title  \
20     Kale and Pumpkin Falafels With Pickled Carrot Slaw   
53                        Coconut-Creamed Corn and Grains   
255                                    Seed and Nut Bread   
263                         Beans and Greens Polenta Bake   
294             Scallop Rice Bowls With Crunchy Spice Oil   
...                                                   ...   
12969                              Kale and Chickpea Soup   
13015                                           Miso Stew   
13123                    Kale and Potato Spanish Tortilla   
13320                           Cavolo Nero with Cilantro   
13332                                          Minestrone   

                                                                                       Cleaned_Ingredients  
20        ,   ,    , pepitas , seeds ,   ,   ,   , kale , leaves , clove , ,   ,   , chickpeas , garban...  
53     olive oil , chile , jalapeño , thinly , pi

In [166]:
df.head()

Unnamed: 0,Title,Ingredients,Instructions,Image_Name,Cleaned_Ingredients
0,Miso-Butter Roast Chicken With Acorn Squash Panzanella,"['1 (3½–4-lb.) whole chicken', '2¾ tsp. kosher salt, divided, plus more', '2 small acorn squash ...","Pat chicken dry with paper towels, season all over with 2 tsp. salt, and tie legs together with ...",miso-butter-roast-chicken-acorn-squash-panzanella,"chicken , , salt , plus , butter , plus , , ground , allspice , pepper flakes , freshly , pe..."
1,Crispy Salt and Pepper Potatoes,"['2 large egg whites', '1 pound new potatoes (about 1 inch in diameter)', '2 teaspoons kosher sa...","Preheat oven to 400°F and line a rimmed baking sheet with parchment. In a large bowl, whisk the ...",crispy-salt-and-pepper-potatoes-dan-kluger,"egg , whites , potatoes , salt , , thyme"
2,Thanksgiving Mac and Cheese,"['1 cup evaporated milk', '1 cup whole milk', '1 tsp. garlic powder', '1 tsp. onion powder', '1 ...",Place a rack in middle of oven; preheat to 400°. Bring evaporated milk and whole milk to a bare ...,thanksgiving-mac-and-cheese-erick-williams,"milk , milk , onion , paprika , , freshly , salt , plus , cheddar , coarsely , , cream , che..."
3,Italian Sausage and Bread Stuffing,"['1 (¾- to 1-pound) round Italian loaf, cut into 1-inch cubes (8 cups)', '2 tablespoons olive oi...",Preheat oven to 350°F with rack in middle. Generously butter baking dish.\nPut bread in 2 shallo...,italian-sausage-and-bread-stuffing-240559,"loaf , cubes , , tablespoons , oil , , sausage , casings , butter , pieces , onions , celery..."
4,Newton's Law,"['1 teaspoon dark brown sugar', '1 teaspoon hot water', '1 ½ oz. bourbon', '½ oz. fresh lemon ju...","Stir together brown sugar and hot water in a cocktail shaker to dissolve. Let cool, then add bou...",newtons-law-apple-bourbon-cocktail,"sugar , water , , bourbon , , lemon juice , apple , butter , storebought , , twist , fresh..."


In [None]:
# Split the tuple stored in 'Nutrition_Info' into two separate columns
df[['Total_Calories', 'Total_Protein']] = df['Nutrition_Info'].apply(pd.Series)


In [168]:
def filter_recipes_by_ingredient(preferences):
    # Filter recipes based on user preferences
    # preferences is a dict with keys 'include' and 'exclude' pointing to lists of ingredients
    filtered_df = df[df['Cleaned_Ingredients'].apply(lambda x: all(item in x for item in preferences['include']) and not any(item in x for item in preferences['exclude']))]
    return filtered_df

# Example usage:
user_preferences = {
    'include': ['kale', 'quinoa'],
    'exclude': ['sugar', 'butter']
}
filtered_recipes = filter_recipes_by_ingredient(user_preferences)
print(filtered_recipes['Title'])


1454                                    Grain Bowl Soup
3786               Superfood Coconut Curry Salmon Salad
5926    Quinoa Salad with Kale, Pine Nuts, and Parmesan
Name: Title, dtype: object


In [169]:
from IPython.display import HTML, display
import pandas as pd
from itertools import cycle

# Assuming `df` is your dataset with recipes, ingredients, and potentially user preferences

# Example function to categorize recipes based on predefined nutritional guidelines or preferences
def categorize_recipes(recipe):
    if 'kale' in recipe['Cleaned_Ingredients']:
        return 'Foods to Eat'
    elif 'sugar' in recipe['Cleaned_Ingredients']:
        return 'Foods to Avoid'
    else:
        return 'Foods to Eat Occasionally'

# Adding a new column for category based on the categorize_recipes function
df['category'] = df.apply(categorize_recipes, axis=1)

# Function to display recommendations
def display_food_recommendations(df, num_categories=3, num_recommendations=5):
    color_palette = ["#FFB6C1", "#ADD8E6", "#90EE90", "#FFA07A", "#20B2AA", "#778899", "#DAA520"]
    color_cycle = cycle(color_palette)
    
    html_str = "<div style='width: 100%;'>"
    categories = df['category'].unique()
    
    for category in categories:
        category_color = next(color_cycle)  # Cycle through colors for each category
        recommendations = df[df['category'] == category].sample(n=num_recommendations)['Title'].tolist()
        
        html_str += f"<h2>{category}</h2><div style='display: flex; flex-wrap: wrap; gap: 10px;'>"
        for recommendation in recommendations:
            # Display each recommendation in a 'tile' with the category color
            html_str += f"<div style='min-width: 150px; padding: 10px; background-color: {category_color}; color: #000; text-align: center; border-radius: 10px;'>{recommendation}</div>"
        html_str += "</div><br>"
    
    html_str += "</div>"
    display(HTML(html_str))

# Displaying the categorized food recommendations
display_food_recommendations(df)


In [170]:
import requests

def get_nutritional_info(ingredient):
    api_url = "https://api.nal.usda.gov/fdc/v1/foods/search"
    params = {
        "query": ingredient,
        "api_key": "pMLeIW9BhKSckb6M7KvdgqGkhn7anXQ7YoWS8fwP"
    }
    response = requests.get(api_url, params=params)
    data = response.json()
    # Parse and return the desired nutritional information
    return data


In [172]:
from IPython.display import HTML, display
import pandas as pd
from itertools import cycle

# Sample DataFrame initialization (replace with your actual DataFrame loading code)
# df = pd.read_csv('path_to_your_dataset.csv')

# Expanded example function to categorize recipes based on a more comprehensive set of criteria
def categorize_recipes(row):
    categories = []
    ingredients = row['Cleaned_Ingredients']  # Assuming this is a string of comma-separated ingredients

    # Example checks for different categories
    if any(ingredient in ingredients for ingredient in ['kale', 'quinoa', 'blueberries']):
        categories.append('Foods to Eat')
    if any(ingredient in ingredients for ingredient in ['sugar', 'butter']):
        categories.append('Foods to Avoid')
    # Add more conditions for other categories as needed

    return ', '.join(categories) if categories else 'Foods to Eat Occasionally'

# Apply the categorization function
df['category'] = df.apply(categorize_recipes, axis=1)

def display_food_recommendations(df, num_categories=3, num_recommendations=5):
    color_palette = ["#FFB6C1", "#ADD8E6", "#90EE90", "#FFA07A", "#20B2AA", "#778899", "#DAA520"]
    color_cycle = cycle(color_palette)
    
    html_str = "<div style='width: 100%;'>"
    # Splitting categories for each recipe into a list, then flattening the list and getting unique categories
    all_categories = set(category for sublist in df['category'].str.split(', ').tolist() for category in sublist)
    
    for category in all_categories:
        category_color = next(color_cycle)  # Cycle through colors for each category
        # Filtering recipes that contain the current category in their category list
        recommendations = df[df['category'].str.contains(category)].sample(n=min(num_recommendations, len(df))).get('Title', []).tolist()
        
        html_str += f"<h2>{category}</h2><div style='display: flex; flex-wrap: wrap; gap: 10px;'>"
        for recommendation in recommendations:
            # Display each recommendation in a 'tile' with the category color
            html_str += f"<div style='min-width: 150px; padding: 10px; background-color: {category_color}; color: #000; text-align: center; border-radius: 10px;'>{recommendation}</div>"
        html_str += "</div><br>"
    
    html_str += "</div>"
    display(HTML(html_str))

# Ensure your DataFrame 'df' is correctly loaded and structured before calling this function
# display_food_recommendations(df)


In [173]:
from IPython.display import HTML, display
import pandas as pd
from itertools import cycle
import requests
import ast

# Mock function to simulate fetching nutritional info from an API
def fetch_nutritional_info(ingredient):
    # Placeholder for API call to fetch nutritional data
    # Example response structure based on assumed API response
    nutritional_data = {
        'protein': 5,  # grams per 100g
        'carbs': 20,   # grams per 100g
        'fat': 1,      # grams per 100g
        'fiber': 2,    # grams per 100g
        'sodium': 50   # mg per 100g
    }
    return nutritional_data

# Expanded categorization function considering more detailed criteria
def categorize_recipes(row):
    categories = []
    
    try:
        ingredients = ast.literal_eval(row['Cleaned_Ingredients'])
    except ValueError:
        ingredients = [row['Cleaned_Ingredients']]
    
    # Example categorization logic
    for ingredient in ingredients:
        nutrition = fetch_nutritional_info(ingredient)
        
        if nutrition['fiber'] > 5:
            categories.append('High Fiber')
        if nutrition['protein'] > 10:
            categories.append('High Protein')
        if nutrition['sodium'] < 40:
            categories.append('Low Sodium')
        # Extend with more conditions based on nutritional data

    # Example static categorizations based on ingredient names
    if any('kale' in ing for ing in ingredients):
        categories.append('Foods to Eat')
    if any('sugar' in ing for ing in ingredients):
        categories.append('Foods to Avoid')
    # Add more conditions as needed

    return ', '.join(categories) if categories else 'General'

# Assuming df is your DataFrame and it has a 'Cleaned_Ingredients' column
# df['categories'] = df.apply(categorize_recipes, axis=1)

def display_food_recommendations(df, num_recommendations=5):
    color_palette = cycle(["#FFB6C1", "#ADD8E6", "#90EE90", "#FFA07A", "#20B2AA", "#778899", "#DAA520", "#F08080"])
    
    html_str = "<div style='width: 100%;'>"
    all_categories = set(cat.strip() for sublist in df['categories'].str.split(',').tolist() for cat in sublist)
    
    for category in all_categories:
        category_color = next(color_palette)
        recommendations = df[df['categories'].str.contains(category, na=False)].sample(n=num_recommendations).get('Title', []).tolist()
        
        html_str += f"<h2>{category}</h2><div style='display: flex; flex-wrap: wrap; gap: 10px;'>"
        for recommendation in recommendations:
            html_str += f"<div style='min-width: 150px; padding: 10px; background-color: {category_color}; color: #000; text-align: center; border-radius: 10px;'>{recommendation}</div>"
        html_str += "</div><br>"
    
    html_str += "</div>"
    display(HTML(html_str))

# Example usage; ensure your DataFrame 'df' is defined and structured correctly before using it
display_food_recommendations(df)


TypeError: 'float' object is not iterable

In [None]:
from IPython.display import HTML, display
import pandas as pd
from itertools import cycle
import requests
import ast

# Mock function to simulate fetching nutritional info from an API
def fetch_nutritional_info(ingredient):
    # Placeholder for API call to fetch nutritional data
    # Example response structure based on assumed API response
    nutritional_data = {
        'protein': 5,  # grams per 100g
        'carbs': 20,   # grams per 100g
        'fat': 1,      # grams per 100g
        'fiber': 2,    # grams per 100g
        'sodium': 50   # mg per 100g
    }
    return nutritional_data

# Expanded categorization function considering more detailed criteria
def categorize_recipes(row):
    categories = []
    
    try:
        ingredients = ast.literal_eval(row['Cleaned_Ingredients'])
    except ValueError:
        ingredients = [row['Cleaned_Ingredients']]
    
    # Example categorization logic
    for ingredient in ingredients:
        nutrition = fetch_nutritional_info(ingredient)
        
        if nutrition['fiber'] > 5:
            categories.append('High Fiber')
        if nutrition['protein'] > 10:
            categories.append('High Protein')
        if nutrition['sodium'] < 40:
            categories.append('Low Sodium')
        # Extend with more conditions based on nutritional data

    # Example static categorizations based on ingredient names
    if any('kale' in ing for ing in ingredients):
        categories.append('Foods to Eat')
    if any('sugar' in ing for ing in ingredients):
        categories.append('Foods to Avoid')
    # Add more conditions as needed

    return ', '.join(categories) if categories else 'General'

# Assuming df is your DataFrame and it has a 'Cleaned_Ingredients' column
# df['categories'] = df.apply(categorize_recipes, axis=1)

def display_food_recommendations(df, num_recommendations=5):
    color_palette = cycle(["#FFB6C1", "#ADD8E6", "#90EE90", "#FFA07A", "#20B2AA", "#778899", "#DAA520", "#F08080"])
    
    html_str = "<div style='width: 100%;'>"
    all_categories = set(cat.strip() for sublist in df['categories'].str.split(',').tolist() for cat in sublist)
    
    for category in all_categories:
        category_color = next(color_palette)
        recommendations = df[df['categories'].str.contains(category, na=False)].sample(n=num_recommendations).get('Title', []).tolist()
        
        html_str += f"<h2>{category}</h2><div style='display: flex; flex-wrap: wrap; gap: 10px;'>"
        for recommendation in recommendations:
            html_str += f"<div style='min-width: 150px; padding: 10px; background-color: {category_color}; color: #000; text-align: center; border-radius: 10px;'>{recommendation}</div>"
        html_str += "</div><br>"
    
    html_str += "</div>"
    display(HTML(html_str))

# Example usage; ensure your DataFrame 'df' is defined and structured correctly before using it
display_food_recommendations(df)


Conclusion
In conclusion, the personalized food recommender system represents a convergence of technology, nutrition, and user-centric design, aiming to revolutionize the way individuals approach diet and health. Through its implementation, the project seeks to make personalized dietary advice accessible and actionable, supporting a healthier and more informed society.

This project plan sets the stage for detailed planning, development, and execution phases, where each objective will be systematically addressed to realize the vision of a comprehensive and user-friendly food recommender system.

In [174]:
def extract_nutrients(json_data):
    nutrients = {}
    try:
        foods = json_data.get('foods', [])
        for food in foods:
            description = food.get('description')
            food_nutrients = food.get('foodNutrients', [])
            nutrient_info = {nutrient['nutrientName']: nutrient['value'] for nutrient in food_nutrients}
            nutrients[description] = nutrient_info
    except Exception as e:
        print(f"Error extracting nutrients: {e}")
    return nutrients


In [175]:
def calculate_nutritional_score(nutrients):
    score = 0
    # Example scoring logic
    if 'Protein' in nutrients:
        score += nutrients['Protein'] * 4  # Assuming 4 points per gram of protein
    if 'Total lipid (fat)' in nutrients:
        score -= nutrients['Total lipid (fat)']  # Subtract points for fats
    return score


In [177]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

# Extract features from text descriptions
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(df['Cleaned_Ingredients'])

In [178]:
from flask import Flask, jsonify, request

app = Flask(__name__)

# Mock data for demonstration purposes
recipes = [
    {"id": 1, "title": "Spaghetti Carbonara", "category": "Italian"},
    {"id": 2, "title": "Avocado Toast", "category": "American"},
    {"id": 3, "title": "Masala Dosa", "category": "Indian"}
]

# Route to get all recipes
@app.route('/recipes', methods=['GET'])
def get_recipes():
    return jsonify({"recipes": recipes})

# Route to get a recipe by ID
@app.route('/recipes/<int:recipe_id>', methods=['GET'])
def get_recipe(recipe_id):
    recipe = next((recipe for recipe in recipes if recipe['id'] == recipe_id), None)
    if recipe:
        return jsonify(recipe)
    else:
        return jsonify({"message": "Recipe not found"}), 404

# Route to add a new recipe
@app.route('/recipes', methods=['POST'])
def add_recipe():
    if request.is_json:
        recipe = request.get_json()
        recipes.append(recipe)
        return jsonify(recipe), 201
    else:
        return jsonify({"message": "Request must be JSON"}), 400

if __name__ == '__main__':
    app.run(debug=True)


 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with stat


SystemExit: 1