# FridgeForage - Got Ingredients? We’ve Got Ideas.

FridgeForage is the machine learning-powered meal planner that helps university students turn whatever ingredients they have into quick, budget-friendly meals—no extra grocery trips needed. Just enter what’s in your fridge, and our smart recipe engine will suggest easy, affordable dishes using what you already own.

Unlike other recipe apps, FridgeForage is built for students who need cheap, no-fuss meals with minimal prep and cleanup. Whether you’re working with leftovers, pantry staples, or mystery ingredients, FridgeForage finds a way to make it work.

Save money. Waste less. Eat better.

# Basic Code - Laying Down Base Idea

We use a supervised learning method called k-nearest neighbour (KNN), which essentially classifies new data points by comparing them to the closest data points (neighbors) in a labeled training dataset, essentially predicting the class of a new data point based on the majority class of its "k" nearest neighbors.

In the context of dish recommendation using KNN, the machine learning model works by finding patterns in a recipe dataset and recommending dishes that are similar to what the user has in their fridge or prefers.

Each recipe in your dataset is represented by a set of features, such as ingredients, cooking time, cuisine, dietary restrictions, etc. This can be done in a few ways:

*   Ingredient-based: Each recipe is represented by a vector of ingredients (e.g., [1, 0, 1, 0] where "1" means the ingredient is present and "0" means it’s absent).
*   Vectorisation: The ingredients can be converted into a vector using techniques like one-hot encoding or TF-IDF (Term Frequency-Inverse Document Frequency) if you want to account for ingredient importance.

When a user inputs the ingredients available in their fridge, those ingredients are represented as a vector in the same way as the recipe dataset (e.g., a vector of ingredients the user has).

The KNN algorithm calculates the distance between the user’s fridge ingredients vector and the vectors of all recipes in the dataset. Common distance metrics include Euclidean distance or Cosine similarity. The distance function determines how similar the user's available ingredients are to those in the recipes.

In [None]:
import json
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import KNeighborsClassifier

dishes = {
    "Pasta": {"ingredients": "pasta, tomato sauce, cheese, garlic"},
    "Omelette": {"ingredients": "eggs, cheese, milk, butter"},
    "Salad": {"ingredients": "lettuce, tomato, cucumber, olive oil"},
    "Sandwich": {"ingredients": "bread, cheese, ham, butter"},
    "Fried Rice": {"ingredients": "rice, egg, soy sauce, vegetables"}
}

# Convert data to DataFrame
df = pd.DataFrame(dishes).T

tfidf = TfidfVectorizer()
X = tfidf.fit_transform(df['ingredients'])
y = df.index

# Train a KNN model
model = KNeighborsClassifier(n_neighbors=1)
model.fit(X, y)

def recommend_dish(available_ingredients):
    available_ingredients = [" ".join(available_ingredients)]
    X_input = tfidf.transform(available_ingredients)
    prediction = model.predict(X_input)
    return f"You can make: {prediction[0]}"

if __name__ == "__main__":
    user_ingredients = input("Enter available ingredients (comma separated): ").lower().split(", ")
    print(recommend_dish(user_ingredients))

Enter available ingredients (comma separated): egg, soy sauce, rice, milk, chicken
You can make: Fried Rice


# Trying with Larger Dataset

We use this recipes dataset https://www.kaggle.com/code/paultimothymooney/explore-recipe-nlg-dataset?select=RecipeNLG_dataset.csv to test our idea on real recipes.

In [None]:
!pip install gdown



In [None]:
# Open file
url = 'https://drive.google.com/uc?id=1fdZuadYNRGukySisNqT9kcivU2OREb_P'
output = 'RecipeNLG_dataset.csv'

# Download the file
import gdown
gdown.download(url, output, quiet=False)

# Load the dataset
import pandas as pd
df = pd.read_csv(output, engine='python')

# Display the first few rows of the dataset to ensure it's loaded
df.head()


Downloading...
From (original): https://drive.google.com/uc?id=1fdZuadYNRGukySisNqT9kcivU2OREb_P
From (redirected): https://drive.google.com/uc?id=1fdZuadYNRGukySisNqT9kcivU2OREb_P&confirm=t&uuid=da7290b3-11e1-4f8e-874e-264fe9d05e59
To: /content/RecipeNLG_dataset.csv
100%|██████████| 2.29G/2.29G [00:39<00:00, 57.6MB/s]


Unnamed: 0.1,Unnamed: 0,title,ingredients,directions,link,source,NER
0,0,No-Bake Nut Cookies,"[""1 c. firmly packed brown sugar"", ""1/2 c. eva...","[""In a heavy 2-quart saucepan, mix brown sugar...",www.cookbooks.com/Recipe-Details.aspx?id=44874,Gathered,"[""brown sugar"", ""milk"", ""vanilla"", ""nuts"", ""bu..."
1,1,Jewell Ball'S Chicken,"[""1 small jar chipped beef, cut up"", ""4 boned ...","[""Place chipped beef on bottom of baking dish....",www.cookbooks.com/Recipe-Details.aspx?id=699419,Gathered,"[""beef"", ""chicken breasts"", ""cream of mushroom..."
2,2,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg...","[""In a slow cooker, combine all ingredients. C...",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""gar..."
3,3,Chicken Funny,"[""1 large whole chicken"", ""2 (10 1/2 oz.) cans...","[""Boil and debone chicken."", ""Put bite size pi...",www.cookbooks.com/Recipe-Details.aspx?id=897570,Gathered,"[""chicken"", ""chicken gravy"", ""cream of mushroo..."
4,4,Reeses Cups(Candy),"[""1 c. peanut butter"", ""3/4 c. graham cracker ...","[""Combine first four ingredients and press in ...",www.cookbooks.com/Recipe-Details.aspx?id=659239,Gathered,"[""peanut butter"", ""graham cracker crumbs"", ""bu..."


For now, we use the NER column, which lists the ingredients for us in a clean manner. We first filter out any unecessary data columns and take a random sample of 10,000 observations, since the dataset it quite large (it took over 2.5 hours and still could not run the KNN code).


In [None]:
# Select only relevant columns
df_filtered = df[['NER', 'title', 'directions']]

# Drop rows with missing values in these columns
df_filtered = df_filtered.dropna(subset=['NER', 'title', 'directions'])

# Sample 10,000 random rows for training
df_sampled = df_filtered.sample(n=10000, random_state=42)

In [None]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import NearestNeighbors

# Convert the list of ingredients in 'NER' into a string format
df_sampled['NER'] = df_sampled['NER'].apply(lambda x: ', '.join(x) if isinstance(x, list) else str(x))

# Initialise TF-IDF Vectorizer
tfidf = TfidfVectorizer()
X = tfidf.fit_transform(df_sampled['NER'])  # Features: ingredient lists

# Train a Nearest Neighbors model (alternative to KNNClassifier)
knn = NearestNeighbors(n_neighbors=3, metric='cosine')  # 3 nearest neighbors
knn.fit(X)

# Create mappings for recipe details
recipe_ingredients = dict(zip(df_sampled['title'], df_sampled['NER']))
recipe_directions = dict(zip(df_sampled['title'], df_sampled['directions']))

# Function to predict multiple recipes and retrieve details
def recommend_recipes(ingredients_list, num_recommendations=3):
    input_ingredients = ', '.join(ingredients_list)  # Convert list to string
    input_vector = tfidf.transform([input_ingredients])  # Transform input into TF-IDF vector

    distances, indices = knn.kneighbors(input_vector)  # Find nearest recipes

    recommendations = []
    for idx in indices[0]:  # Loop through top recommendations
        recipe_name = df_sampled.iloc[idx]['title']
        recipe_ingredients_list = recipe_ingredients.get(recipe_name, "No ingredients available.")
        recipe_steps = recipe_directions.get(recipe_name, "No directions available.")
        recommendations.append((recipe_name, recipe_ingredients_list, recipe_steps))

    return recommendations

# Example usage
user_ingredients = ["chicken", "butter", "garlic", "rice", "egg", "onion", "beef"]
recommended_recipes = recommend_recipes(user_ingredients, num_recommendations=3)

# Print results
for i, (recipe, ingredients, steps) in enumerate(recommended_recipes, 1):
    print(f"Recommendation {i}: {recipe}")
    print(f"Ingredients: {ingredients}")
    print(f"How to Cook:\n{steps}\n")

Recommendation 1: Hamburger Stroganoff - Delicious
Ingredients: ["onion", "garlic", "butter", "ground beef", "salt", "pepper", "mushrooms", "cream of chicken soup", "rice"]
How to Cook:
["Saute onion and garlic in butter over medium heat.", "Stir in ground beef and brown.", "Stir in flour, salt, and pepper -- let cook for about 3 minutes, stir in mushrooms.", "Cook 5 minutes.", "Stir in soup.", "Simmer uncovered 10 minutes.", "Stir in sour cream and heat through (don't allow it to come to a boil).", "Serve in a ring of rice, or buttered noodles."]

Recommendation 2: Kyra'S Mild Or Hot Chicken
Ingredients: ["chicken", "salt", "onion", "garlic"]
How to Cook:
["Preheat oven to 350\u00b0.", "Set the 12 pieces of chicken on a greased pan.", "Shake on the seasoning salt, granulated onion and granulated garlic.", "The more you add, the spicier it tastes.", "Put it in the oven and bake for one hour or until slightly browned and it doesn't bleed.", "Turn off the oven and let cool 5 to 10 minute

The new version returns three recipes that best match the provided list of ingredients, along with instructions on how to cook it.

# Limitations


*   Algorithm does not take into account how much of each ingredient the user has and is needed for each recipe
*   The algorithm/dataset does not differentiate between main and supporting ingredients. As a result, it may recommend a dish where the user does not have the primary ingredient, but the user has most the optional or supporting ingredients.
*   Dataset does not take into account cooking/prepping time/cuisine
*   Dataset does not have very basic recipes so currently our model cannot recommend more basic dishes that may be easier for users to make



