<center><h1>PantryPalette: Your Ingredient-Inspired Recipe Guide</h1></center>

# Problem Statement

Every day, households struggle with unused ingredients in their kitchens, often leading to food waste due
to a lack of inspiration or knowledge on how to use them effectively. Many individuals resort to takeout
or repetitive meals, simply because they don’t know what to cook with the ingredients they already have.


Meanwhile, traditional recipe search methods require manual browsing through extensive databases,
leading to time-consuming and frustrating experiences. Users with dietary restrictions, allergies, or
cultural food preferences face an additional challenge in finding recipes that align with their needs.

PantryPalette is designed to solve this problem by offering a seamless, ingredient-based recipe discovery
experience that helps users maximize their groceries, reduce food waste, and discover new meal ideas
with ease.

# Purpose

The purpose of the MLOps project is to develop an end-to-end data science solution that is usable by someone with no technical knowledge. 

It must include the following components
- A process map built which documents the process
- Data ingested from an online source
- A data repository and model repository
- A predictive model built off the data
- The model predictions

  a. Put into a Streamlit application

  b. Via a Docker deployment

  c. Which is accessible to users
  
- A model monitoring dashboard built
- Documentation for the model process and risks with the production

The purpose is not to have a complex model. It is meant to have a simpler model that works! All of this is to be done in teams of 2, along with individual users. Both the modeling teams and users will take part in the presentations

Online Data Ingestion - ReceipeNLG and PinchOfYum (Web Scraped Data)
->
Data Repo - SQL Lite
Model Repo - MLFLOW
Predictive Model - (TDIDF + )


# Workflow

```mermaid
graph TD;
    A[User Inputs Ingredients] --> B[Preprocessing]
    B --> C[Tokenization & Standardization]
    C --> D[Feature Engineering]
    D --> E[TF-IDF Vectorization]
    E --> F[Cosine Similarity Calculation]
    F --> G[Retrieve Relevant Recipes]
    G --> H[Rank Recipes Based on Similarity]
    H --> I[Provide Alternative Ingredient Suggestions]
    I --> J[Display Recipes with Ingredients & Instructions]

# Implementation

In [None]:
import os
import mlflow
import mlflow.sklearn
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import NearestNeighbors
import joblib
from sklearn.metrics.pairwise import cosine_similarity

## Data Collection and Ingestion

- Get RecipeNLG data (static data) + Spoonacular API (dynamic data)

In [1]:
import pandas as pd

### Static Data Collection (RecipeNLG Dataset)

- Get RecipeNLG data (static data)

In [2]:
receipenlg = pd.read_csv("../dataset/RecipeNLG_dataset.csv")

In [3]:
receipenlg.shape

(2231142, 7)

In [4]:
receipenlg.head()

Unnamed: 0.1,Unnamed: 0,title,ingredients,directions,link,source,NER
0,0,No-Bake Nut Cookies,"[""1 c. firmly packed brown sugar"", ""1/2 c. eva...","[""In a heavy 2-quart saucepan, mix brown sugar...",www.cookbooks.com/Recipe-Details.aspx?id=44874,Gathered,"[""brown sugar"", ""milk"", ""vanilla"", ""nuts"", ""bu..."
1,1,Jewell Ball'S Chicken,"[""1 small jar chipped beef, cut up"", ""4 boned ...","[""Place chipped beef on bottom of baking dish....",www.cookbooks.com/Recipe-Details.aspx?id=699419,Gathered,"[""beef"", ""chicken breasts"", ""cream of mushroom..."
2,2,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg...","[""In a slow cooker, combine all ingredients. C...",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""gar..."
3,3,Chicken Funny,"[""1 large whole chicken"", ""2 (10 1/2 oz.) cans...","[""Boil and debone chicken."", ""Put bite size pi...",www.cookbooks.com/Recipe-Details.aspx?id=897570,Gathered,"[""chicken"", ""chicken gravy"", ""cream of mushroo..."
4,4,Reeses Cups(Candy),"[""1 c. peanut butter"", ""3/4 c. graham cracker ...","[""Combine first four ingredients and press in ...",www.cookbooks.com/Recipe-Details.aspx?id=659239,Gathered,"[""peanut butter"", ""graham cracker crumbs"", ""bu..."


In [5]:
# receipenlg = receipenlg.sample(n=10000, random_state=42).reset_index(drop=True)
# receipenlg.shape

In [6]:
receipenlg.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2231142 entries, 0 to 2231141
Data columns (total 7 columns):
 #   Column       Dtype 
---  ------       ----- 
 0   Unnamed: 0   int64 
 1   title        object
 2   ingredients  object
 3   directions   object
 4   link         object
 5   source       object
 6   NER          object
dtypes: int64(1), object(6)
memory usage: 119.2+ MB


In [7]:
# Drop unnecessary columns
receipenlg.drop(columns=["Unnamed: 0", "link", "source", "NER"], inplace=True)
receipenlg.head()

Unnamed: 0,title,ingredients,directions
0,No-Bake Nut Cookies,"[""1 c. firmly packed brown sugar"", ""1/2 c. eva...","[""In a heavy 2-quart saucepan, mix brown sugar..."
1,Jewell Ball'S Chicken,"[""1 small jar chipped beef, cut up"", ""4 boned ...","[""Place chipped beef on bottom of baking dish...."
2,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg...","[""In a slow cooker, combine all ingredients. C..."
3,Chicken Funny,"[""1 large whole chicken"", ""2 (10 1/2 oz.) cans...","[""Boil and debone chicken."", ""Put bite size pi..."
4,Reeses Cups(Candy),"[""1 c. peanut butter"", ""3/4 c. graham cracker ...","[""Combine first four ingredients and press in ..."


In [8]:
df_static = receipenlg.copy()

### Data Ingestion

-  (dynamic data)

In [9]:
df_dynamic = pd.read_csv("../dataset/recipes.csv")
df_dynamic.head()

Unnamed: 0,image,title,description,total time,ingredients,instructions
0,https://pinchofyum.com/wp-content/uploads/Ital...,Big Yummy Italian Salad,"I love a big Italian salad! Crisp romaine, sal...",15 minutes,pepperoncini italian season romain lettuc pank...,['Prep Salad Stuff: Chop all your salad veggie...
1,https://pinchofyum.com/wp-content/uploads/Cris...,Crispy Rice Salad with Cucumbers and Herbs,"Paper-thin veggies, a shower of herbs, a pile ...",30 minutes,cornstarch cucumb jasmin rice ginger garlic br...,['Dressing: Blitz everything up in a blender o...
2,https://pinchofyum.com/wp-content/uploads/Goch...,Incredible Gochujang Sauce,"This delightful, creamy, silky, incredible goc...",5 minutes,rice vinegar garlic soy sauc mayo gochujang sauc,['Mix all ingredients together in a small bowl...
3,https://pinchofyum.com/wp-content/uploads/Air-...,Ridiculously Good Air Fryer Salmon,This air fryer salmon is TOO GOOD. Crisped and...,13 minutes,cornstarch chili powder brown sugar onion powd...,['Prep the salmon: Remove the skin from your s...
4,https://pinchofyum.com/wp-content/uploads/Two-...,Two Huge Chocolate Chip Cookies,Just two chocolate chip cookies – lightly cris...,15 minutes,cornstarch allpurpos flour brown sugar white s...,"['Preheat the oven to 350 degrees.', 'Mix butt..."


In [10]:
# Drop unnecessary columns
df_dynamic.drop(columns=["image", "total time", "description"], inplace=True)
df_dynamic.head()

Unnamed: 0,title,ingredients,instructions
0,Big Yummy Italian Salad,pepperoncini italian season romain lettuc pank...,['Prep Salad Stuff: Chop all your salad veggie...
1,Crispy Rice Salad with Cucumbers and Herbs,cornstarch cucumb jasmin rice ginger garlic br...,['Dressing: Blitz everything up in a blender o...
2,Incredible Gochujang Sauce,rice vinegar garlic soy sauc mayo gochujang sauc,['Mix all ingredients together in a small bowl...
3,Ridiculously Good Air Fryer Salmon,cornstarch chili powder brown sugar onion powd...,['Prep the salmon: Remove the skin from your s...
4,Two Huge Chocolate Chip Cookies,cornstarch allpurpos flour brown sugar white s...,"['Preheat the oven to 350 degrees.', 'Mix butt..."


### Data Preprocessing

- Clean & standardize ingredients

Used regex and string methods to clean and standardize:
- Ingredients: lowercase, alphanumeric filtering, list normalization.
- Instructions: sentence splitting, stripping, and cleaning.


In [11]:
import pandas as pd
import ast
import re

# -----------------------
# Step 1: Standardize Columns
# -----------------------
df_static.rename(columns={'directions': 'instructions'}, inplace=True)
df_dynamic.rename(columns={'Recipe_name': 'title',
                           'Recipe_ingredients': 'ingredients',
                           'Recipe_instructions': 'instructions'}, inplace=True)

# -----------------------
# Step 2: Clean Ingredients
# -----------------------
def process_ingredients(ing):
    if isinstance(ing, list):
        return [re.sub(r'[^a-z0-9\s.]', '', i.lower().strip()) for i in ing]
    elif isinstance(ing, str):
        tokens = ing.split()
        return [re.sub(r'[^a-z0-9\s.]', '', i.lower().strip()) for i in tokens if len(i) > 1]
    else:
        return []

df_static['ingredients'] = df_static['ingredients'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)
df_dynamic['ingredients'] = df_dynamic['ingredients'].apply(process_ingredients)

# -----------------------
# Step 3: Clean Instructions
# -----------------------
def process_instructions(instr):
    if isinstance(instr, list):
        return [i.strip() for i in instr if len(i.strip()) > 0]
    elif isinstance(instr, str):
        return [i.strip() for i in instr.split('.') if len(i.strip()) > 0]
    else:
        return []

df_static['instructions'] = df_static['instructions'].apply(process_instructions)
df_dynamic['instructions'] = df_dynamic['instructions'].apply(process_instructions)

# -----------------------
# Step 4: Merge Datasets
# -----------------------
df_combined = pd.concat([df_static, df_dynamic], ignore_index=True)
df_combined.dropna(subset=['ingredients', 'instructions'], inplace=True)

df_combined.head()

Unnamed: 0,title,ingredients,instructions
0,No-Bake Nut Cookies,"[1 c. firmly packed brown sugar, 1/2 c. evapor...","[[""In a heavy 2-quart saucepan, mix brown suga..."
1,Jewell Ball'S Chicken,"[1 small jar chipped beef, cut up, 4 boned chi...","[[""Place chipped beef on bottom of baking dish..."
2,Creamy Corn,"[2 (16 oz.) pkg. frozen corn, 1 (8 oz.) pkg. c...","[[""In a slow cooker, combine all ingredients, ..."
3,Chicken Funny,"[1 large whole chicken, 2 (10 1/2 oz.) cans ch...","[[""Boil and debone chicken, "", ""Put bite size ..."
4,Reeses Cups(Candy),"[1 c. peanut butter, 3/4 c. graham cracker cru...","[[""Combine first four ingredients and press in..."


In [12]:
import os

os.makedirs("../processed_dataset", exist_ok=True)

# Save clean dataset
df_combined.to_csv("../processed_dataset/combined_recipes_cleaned.csv", index=False)
print("Data integration complete. Ready for feature engineering!")

Data integration complete. Ready for feature engineering!


In [13]:
df_combined.dtypes

title           object
ingredients     object
instructions    object
dtype: object

In [14]:
import pandas as pd
import ast
import re
from collections import Counter
import matplotlib.pyplot as plt

# Load your combined dataset
df = pd.read_csv("../processed_dataset/combined_recipes_cleaned.csv") 

# STEP 1: Parse ingredients column into token list
def flatten_and_tokenize(ingredient_col):
    all_tokens = []
    for entry in ingredient_col:
        try:
            ingredients = ast.literal_eval(entry) if isinstance(entry, str) and entry.startswith("[") else [entry]
            for line in ingredients:
                # Lowercase, remove punctuation/numbers
                line = line.lower()
                line = re.sub(r'[^a-z\s]', '', line)
                tokens = line.split()
                all_tokens.extend(tokens)
        except:
            continue
    return all_tokens

# STEP 2: Generate token frequency
tokens = flatten_and_tokenize(df['ingredients'])
token_freq = Counter(tokens)

# STEP 3: Convert to DataFrame for inspection
stopword_df = pd.DataFrame(token_freq.items(), columns=["token", "count"]).sort_values(by="count", ascending=False)

# STEP 4: Show top 100 most common tokens
print(stopword_df.head(100))

# Optional: Save all tokens with counts
stopword_df.to_csv("../processed_dataset/ingredient_token_counts.csv", index=False)

         token    count
730        cup  2474893
0            c  2452925
1188  teaspoon  1425910
78     chopped  1343576
7          tsp  1302747
...        ...      ...
178      about   148721
92      beaten   145966
36        sour   145442
245      beans   144328
40        corn   143472

[100 rows x 2 columns]


In [15]:
import pandas as pd
import re
import ast
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords

nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')

lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english'))

# Define custom stopwords 
custom_stopwords = stop_words.union({
    # Measurement units
    'c', 'cup', 'cups', 'tsp', 'teaspoon', 'teaspoons', 'tbsp', 'tablespoon', 'tablespoons',
    'oz', 'ounce', 'ounces', 'lb', 'lbs', 'pound', 'pounds', 'g', 'gram', 'grams', 'kg',
    'ml', 'milliliter', 'milliliters', 'l', 'liter', 'liters', 'qt', 'quart', 'quarts',
    'pt', 'pint', 'pints', 'gal', 'gallon', 'gallons', 'pkg', 'pkgs', 'package', 'packages',
    'stick', 'sticks', 'dash', 'pinch', 'can', 'cans', 'fluid', 'fl', 'jar', 'jars',
    'box', 'boxes', 'bottle', 'bottles', 't', 'tbs', 'tbls', 'qt.', 'pt.', 'oz.', 'lb.', 'g.', 'ml.', 'kg.', 'l.', 'pkg.', 'pkt',

    # Preparation and cooking descriptors
    'chopped', 'minced', 'diced', 'sliced', 'grated', 'crushed', 'shredded', 'cut',
    'peeled', 'optional', 'seeded', 'halved', 'coarsely', 'finely', 'thinly', 'roughly',
    'cubed', 'crumbled', 'ground', 'trimmed', 'boneless', 'skinless', 'melted', 'softened',
    'cooled', 'boiled', 'cooked', 'uncooked', 'raw', 'drained', 'rinsed', 'beaten', 'size'

    # Quantity and portion descriptors
    'small', 'medium', 'large', 'extra', 'light', 'dark', 'best', 'fresh', 'freshly',
    'ripe', 'mini', 'whole', 'big', 'room', 'temperature', 'zero', 'one', 'two', 'three',
    'four', 'five', 'six', 'eight', 'ten', 'twelve', 'half', 'third', 'quarter', 'dozen',
    'thousand', 'bite'

    # Filler or generic stopwords
    'plus', 'with', 'without', 'into', 'about', 'of', 'the', 'to', 'for', 'in', 'from',
    'as', 'and', 'or', 'on', 'your', 'if', 'such', 'you', 'use', 'may'
})

def preprocess_ingredients(ingredients):
    try:
        if isinstance(ingredients, str):
            ingredients_list = ast.literal_eval(ingredients) if ingredients.startswith("[") else [ingredients]
        elif isinstance(ingredients, list):
            ingredients_list = ingredients
        else:
            return ""

        cleaned_ingredients = set()

        for ing in ingredients_list:
            ing = re.sub(r'\(.*?\)', '', str(ing)).lower() 
            ing = re.sub(r'[^a-z\s]', '', ing)  
            tokens = word_tokenize(ing)
            tokens = [lemmatizer.lemmatize(token) for token in tokens if token not in custom_stopwords and len(token) > 1]
            if tokens:
                phrase = " ".join(tokens)
                if "oil" not in phrase and "salt" not in phrase and "water" not in phrase:
                    cleaned_ingredients.add(phrase)

        return ", ".join(sorted(cleaned_ingredients))

    except Exception as e:
        print(f"Preprocessing error: {e}")
        return ""

def preprocess_user_ingredients(user_input):
    ingredients = user_input.split(',')
    ingredients_str = str([ing.strip() for ing in ingredients])
    return preprocess_ingredients(ingredients_str)

df_combined = pd.read_csv("../processed_dataset/combined_recipes_cleaned.csv") 

df_combined['ingredients_clean'] = df_combined['ingredients'].apply(preprocess_ingredients)

print(df_combined[['ingredients', 'ingredients_clean']].head())

df_combined.to_csv('../processed_dataset/clean_receipe_dataset.csv', index=False)

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/sandhyakilari/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/sandhyakilari/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/sandhyakilari/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


                                         ingredients  \
0  ['1 c. firmly packed brown sugar', '1/2 c. eva...   
1  ['1 small jar chipped beef, cut up', '4 boned ...   
2  ['2 (16 oz.) pkg. frozen corn', '1 (8 oz.) pkg...   
3  ['1 large whole chicken', '2 (10 1/2 oz.) cans...   
4  ['1 c. peanut butter', '3/4 c. graham cracker ...   

                                   ingredients_clean  
0  bite size rice biscuit, broken nut, butter mar...  
1  boned chicken breast, carton sour cream, cream...  
2  butter, cream cheese, frozen corn, garlic powd...  
3  cheese, chicken, chicken gravy, cream mushroom...  
4  butter, chocolate chip, graham cracker crumb, ...  


In [16]:
df_combined

Unnamed: 0,title,ingredients,instructions,ingredients_clean
0,No-Bake Nut Cookies,"['1 c. firmly packed brown sugar', '1/2 c. eva...","['[""In a heavy 2-quart saucepan, mix brown sug...","bite size rice biscuit, broken nut, butter mar..."
1,Jewell Ball'S Chicken,"['1 small jar chipped beef, cut up', '4 boned ...","['[""Place chipped beef on bottom of baking dis...","boned chicken breast, carton sour cream, cream..."
2,Creamy Corn,"['2 (16 oz.) pkg. frozen corn', '1 (8 oz.) pkg...","['[""In a slow cooker, combine all ingredients'...","butter, cream cheese, frozen corn, garlic powd..."
3,Chicken Funny,"['1 large whole chicken', '2 (10 1/2 oz.) cans...","['[""Boil and debone chicken', '"", ""Put bite si...","cheese, chicken, chicken gravy, cream mushroom..."
4,Reeses Cups(Candy),"['1 c. peanut butter', '3/4 c. graham cracker ...","['[""Combine first four ingredients and press i...","butter, chocolate chip, graham cracker crumb, ..."
...,...,...,...,...
2231904,Crockpot Sweet Potato Lentils,"['coconut', 'milk', 'red', 'lentil', 'garlic',...","[""['Place the sweet potatoes, vegetable broth,...","broth, coconut, coriand, garlic, lentil, milk,..."
2231905,Healthy Maple Glazed Pumpkin Muffins,"['purpos', 'flour', 'bake', 'powder', 'whole',...","[""['Preheat the oven to 350 F"", 'Mix the dry i...","bake, butter, egg, flour, granul, mapl, milk, ..."
2231906,Filipino Humba,"['oyster', 'sauc', 'bonein', 'pork', 'belli', ...","[""['Separate the fat from the lean meat by cut...","bay, bean, belli, black, bonein, brown, garlic..."
2231907,Deep Dish Cinnamon Streusel Dessert Pizza,"['cream', 'dough', 'ani', 'pizza', 'dough', 'c...","[""['Crust: Preheat the oven to 425"", 'Generous...","ani, brown, butter, cinnamon, cold, cream, dou..."


## Data Storage

In [17]:
import pickle

with open('../models/recipes.pkl', 'wb') as file:
    pickle.dump(df_combined, file)

In [18]:
import sqlite3
import os
import ast

receipe_df = df_combined[['title', 'ingredients', 'instructions']] 

# Ensure that the database folder exists
os.makedirs("../database", exist_ok=True)

# Connect to (or create) the SQLite database
conn = sqlite3.connect("../database/pantrypalette.db")

# Create a table named 'recipes' and populate it with the DataFrame data
receipe_df.to_sql("recipes", conn, if_exists="replace", index=False)

conn.close()

print("SQLite database 'pantrypalette.db' has been created and populated.")

SQLite database 'pantrypalette.db' has been created and populated.


In [19]:
df_combined

Unnamed: 0,title,ingredients,instructions,ingredients_clean
0,No-Bake Nut Cookies,"['1 c. firmly packed brown sugar', '1/2 c. eva...","['[""In a heavy 2-quart saucepan, mix brown sug...","bite size rice biscuit, broken nut, butter mar..."
1,Jewell Ball'S Chicken,"['1 small jar chipped beef, cut up', '4 boned ...","['[""Place chipped beef on bottom of baking dis...","boned chicken breast, carton sour cream, cream..."
2,Creamy Corn,"['2 (16 oz.) pkg. frozen corn', '1 (8 oz.) pkg...","['[""In a slow cooker, combine all ingredients'...","butter, cream cheese, frozen corn, garlic powd..."
3,Chicken Funny,"['1 large whole chicken', '2 (10 1/2 oz.) cans...","['[""Boil and debone chicken', '"", ""Put bite si...","cheese, chicken, chicken gravy, cream mushroom..."
4,Reeses Cups(Candy),"['1 c. peanut butter', '3/4 c. graham cracker ...","['[""Combine first four ingredients and press i...","butter, chocolate chip, graham cracker crumb, ..."
...,...,...,...,...
2231904,Crockpot Sweet Potato Lentils,"['coconut', 'milk', 'red', 'lentil', 'garlic',...","[""['Place the sweet potatoes, vegetable broth,...","broth, coconut, coriand, garlic, lentil, milk,..."
2231905,Healthy Maple Glazed Pumpkin Muffins,"['purpos', 'flour', 'bake', 'powder', 'whole',...","[""['Preheat the oven to 350 F"", 'Mix the dry i...","bake, butter, egg, flour, granul, mapl, milk, ..."
2231906,Filipino Humba,"['oyster', 'sauc', 'bonein', 'pork', 'belli', ...","[""['Separate the fat from the lean meat by cut...","bay, bean, belli, black, bonein, brown, garlic..."
2231907,Deep Dish Cinnamon Streusel Dessert Pizza,"['cream', 'dough', 'ani', 'pizza', 'dough', 'c...","[""['Crust: Preheat the oven to 425"", 'Generous...","ani, brown, butter, cinnamon, cold, cream, dou..."


## Feature Engineering

- TF-IDF + Cosine Similarity

In [None]:
mlflow.set_tracking_uri("http://localhost:5000")    
mlflow.set_experiment("PantryPalette_Recipe_Search")

In [20]:
from sklearn.model_selection import train_test_split

# 1. Train-Test Split
train_df, test_df = train_test_split(df_combined, test_size=0.3, random_state=42)

print(train_df.shape), print(test_df.shape)

(1562336, 4)
(669573, 4)


(None, None)

In [21]:
train_df

Unnamed: 0,title,ingredients,instructions,ingredients_clean
2098391,Stays Crispy! Our Family's Tempura,"[""1 rice bowl's worth Cake flour"", '4 Ice cube...","['[""Prepare the ingredients: Prepare ingredien...","ice cube, main ingredient, rice bowl worth cak..."
487096,Summer Squash With Taste,"['3 slices bacon, diced', '1 large onion, chop...","['[""Place bacon and onion in medium saucepan o...","onion, pepper taste, slice bacon, small summer..."
529457,Raw Apple Cake,"['1/2 tsp. salt', '2 cups sugar', '3 cup flour...","['[""Mix ingredients by hand DO NOT USE MIXER',...","apple, cinnamon, egg slightly, flour, pecan, s..."
1314823,Corpse Reviver 3000,['3/4 ounce Tenneyson Absinthe Royale or other...,"['[""Combine the absinthe, St-Germain, orange l...","orange coin garnish, orange liqueuer, squeezed..."
1049118,Quail In The Limelight,"['1 medium lime', '6 quail, cleaned and halved...","['[""About 1 hour before serving: Separately, g...","allpurpose flour, avocado garnish, dried mint,..."
...,...,...,...,...
732180,Fruit Dip,"['2 c. light brown sugar', '1 c. sour cream']","['[""Mix well and chill thoroughly', 'Serve wit...","brown sugar, sour cream"
110268,Baked Crab Cakes,"['1/4 c. butter or margarine', '1/2 c. chopped...","['[""Saute onion and pepper in margarine or but...","butter margarine, crab meat, cracker crumb, dr..."
1692743,Cappuccino Creme Caramel,"['120 g sugar', '12 cup water', '12 cup milk, ...","['[""Boil sugar and water until golden', '"", ""P...","egg, instant coffee, milk warmed, sour cream, ..."
2229084,Creamy Roasted Garlic Hummus,"['5 cloves (large) Garlic, Peeled', '1 Tablesp...","['[""Preheat oven to 400 degrees F', 'Line a ba...","chickpea skin removed, clove garlic, lemon jui..."


In [None]:
# 2) Start an MLflow run
with mlflow.start_run(run_name="tfidf_knn_training") as run:
    # Log parameters
    params = {
        "tfidf_max_features": 500,
        "tfidf_ngram_range": (1, 2),
        "nn_n_neighbors": 10,
        "nn_metric": "cosine",
    }
    for k, v in params.items():
        mlflow.log_param(k, v)

    # 3) Vectorize
    vectorizer = TfidfVectorizer(max_features=params["tfidf_max_features"],
                                 ngram_range=params["tfidf_ngram_range"])
    train_vectors = vectorizer.fit_transform(train_df["ingredients"])

    # 4) Fit NearestNeighbors
    nn_model = NearestNeighbors(n_neighbors=params["nn_n_neighbors"],
                                metric=params["nn_metric"])
    nn_model.fit(train_vectors)

    # 5) (Optional) Evaluate on test set: example average similarity
    test_vecs = vectorizer.transform(test_df["ingredients"])
    sims = cosine_similarity(test_vecs, train_vectors)
    avg_sim = sims.max(axis=1).mean()
    mlflow.log_metric("test_avg_max_similarity", float(avg_sim))

    # 6) Log models to MLflow Model Registry
    #    - vectorizer
    mlflow.sklearn.log_model(
        sk_model=vectorizer,
        artifact_path="vectorizer",
        registered_model_name="PantryPaletteVectorizer"
    )
    #    - nearest‑neighbors (as a sklearn model)
    mlflow.sklearn.log_model(
        sk_model=nn_model,
        artifact_path="nn_model",
        registered_model_name="PantryPaletteNNModel"
    )

    # 7) Save and log your training DataFrame
    os.makedirs("processed_dataset", exist_ok=True)
    train_csv = "processed_dataset/train_data.csv"
    train_df.to_csv(train_csv, index=False)
    mlflow.log_artifact(train_csv, artifact_path="training_data")

print("✅ Training run logged in MLflow!")

## Model Development

- Train NLP-based model

In [23]:
def find_similar_recipes(user_input, vectorizer, nn_model, train_df):
    """
    Finds the top 10 similar recipes based on user's ingredients input and shows similarity score.
    """
    user_cleaned = preprocess_user_ingredients(user_input)
    print(f"\n🔍 Processed Input: {user_cleaned}")
    
    # Vectorize the cleaned user input using the improved TF-IDF vectorizer
    user_vector = vectorizer.transform([user_cleaned])
    distances, indices = nn_model.kneighbors(user_vector)

    print("\n🍽️ Top 10 similar recipes:\n")
    for i, (idx, dist) in enumerate(zip(indices[0], distances[0])):
        similarity_score = round((1 - dist) * 100, 2)  # Convert cosine distance to percentage similarity
        title = train_df.iloc[idx]['title'] if 'title' in train_df.columns else 'N/A'
        ingredients = train_df.iloc[idx]['ingredients']
        ingredients_clean = train_df.iloc[idx]['ingredients_clean']

        print(f"{i+1}. Title: {title}")
        print(f"   Similarity: {similarity_score}%")
        print(f"   Ingredients: {ingredients}")
        print(f"   Cleaned Ingredients: {ingredients_clean}\n")

In [24]:
user_input = "milk, eggs, sugar, pasta, maida"
find_similar_recipes(user_input, vectorizer, nn, train_df)


🔍 Processed Input: egg, maida, milk, pasta, sugar

🍽️ Top 10 similar recipes:

1. Title: For Character Bento: Locks of Hair for a Wiener Sausage
   Similarity: 73.81%
   Ingredients: ['1 Wiener sausages', '1 few Pasta']
   Cleaned Ingredients: pasta, wiener sausage

2. Title: Fresh Lasagna & Cannelloni
   Similarity: 73.81%
   Ingredients: ['1 Pasta dough']
   Cleaned Ingredients: pasta dough

3. Title: Pasta Baszul Recipe
   Similarity: 73.81%
   Ingredients: ['pasta', 'bazul']
   Cleaned Ingredients: bazul, pasta

4. Title: Pasta With Pesto 
   Similarity: 73.81%
   Ingredients: ['pesto', 'Pasta']
   Cleaned Ingredients: pasta, pesto

5. Title: Lasagna
   Similarity: 73.81%
   Ingredients: ['lasagna', 'pasta']
   Cleaned Ingredients: lasagna, pasta

6. Title: Hot Dog Octipi 
   Similarity: 68.1%
   Ingredients: ['hot dogs', 'Spinach pasta or other veggie pasta', 'velvita', 'milk', 'salsa']
   Cleaned Ingredients: hot dog, milk, salsa, spinach pasta veggie pasta, velvita

7. Title: E

In [25]:
user_input = "onion, tomato, garlic"
find_similar_recipes(user_input, vectorizer, nn, train_df)


🔍 Processed Input: garlic, onion, tomato

🍽️ Top 10 similar recipes:

1. Title: Three Cheese Baked Ziti
   Similarity: 76.22%
   Ingredients: ['ziti', 'mozzarella', 'chees', 'garlic', 'crush', 'red', 'pepper', 'flake', 'provolon', 'chees', 'crush', 'tomato', 'pancetta', 'onion', 'mascarpon', 'chees', 'tomato', 'sauc', 'tomato', 'past']
   Cleaned Ingredients: chees, crush, flake, garlic, mascarpon, mozzarella, onion, pancetta, past, pepper, provolon, red, sauc, tomato, ziti

2. Title: Punjabi Sookhi Urad Daal
   Similarity: 73.09%
   Ingredients: ['tomato', 'asafoetida']
   Cleaned Ingredients: asafoetida, tomato

3. Title: Loaded Caprese Grilled Cheese
   Similarity: 72.75%
   Ingredients: ['mozzarella', 'chees', 'cherri', 'tomato', 'pesto', 'balsam', 'vinegar', 'garlic', 'garlic', 'butter', 'tomato', 'sauc']
   Cleaned Ingredients: balsam, butter, chees, cherri, garlic, mozzarella, pesto, sauc, tomato, vinegar

4. Title: Roasted Tomato Puttanesca
   Similarity: 69.07%
   Ingredients

In [26]:
import joblib

os.makedirs("../models", exist_ok=True)

# Save vectorizer and nearest neighbors model
joblib.dump(vectorizer, "../models/tfidf_vectorizer.pkl")
joblib.dump(nn, "../models/nearest_neighbors_model.pkl")

# Optionally save the training DataFrame (with titles + ingredients)
train_df.to_csv("../processed_dataset/trained_data.csv", index=False)

## Docker Containers

## UI Development

- Streamlit

## Model Monitoring

- PowerBI

<center><h4>Enjoy Your Perfect Recipe! 🍽️</h4></center>
<center>Designed By: Madhurya Shankar & Sandhya Kilari</center>