# Team Shallot ML4VA Project: Personalized Meal Planning on a Budget

*In today’s fast-paced world, many people struggle to maintain balanced, healthy diets that fit their preferences, budgets, and dietary needs. This is especially true for UVA students and Charlottesville residents, who often face constraints such as limited time and budgets. Our goal is to automate meal planning based on user-specific inputs, making it easier for users to create economical and nutritious meal plans.*

Our project addresses the challenges faced by UVA students and Charlottesville residents in maintaining a healthy, budget-conscious diet. We are developing a personalized meal-planning application that recommends meal plans based on dietary preferences, budget, ingredient availability, and location. The application aims to promote healthier and more affordable eating habits while being accessible to low-income communities.

## 1. Data Preprocessing

### Nutritional Information
https://fdc.nal.usda.gov/download-datasets


| **Feature**          | **Description**                             | **Example**             |
|-----------------------|---------------------------------------------|-------------------------|
| `fdc_id`             | Unique food identifier                      | 320020                 |
| `description`        | Food name                                   | "Hummus"               |
| `protein`            | Protein content per serving (g)             | 3.47                   |
| `fat`                | Fat content per serving (g)                 | 8.37                   |
| `carbohydrates`      | Carbohydrate content per serving (g)        | 4.07                   |
| `calories`           | Total calories per serving (KCAL)           | 56                     |
| `food_category`      | Category of food                            | "Legumes and Products" |
| `serving_size`       | Serving size in grams                       | 35.8                   |
| `serving_unit`       | Serving measurement unit                    | "tablespoon"           |
| `price_per_serving`  | Cost per serving (in currency)               | 1.50                   |
| `available_in_location` | Availability by location (region/flag)    | "USA, EU"              |


In [1]:
import pandas as pd

data_folder = "../DATA/FoodData_Central_foundation_food_csv_2024-10-31/FoodData_Central_foundation_food_csv_2024-10-31/"
output_file = data_folder + "nutritional_information.csv"

food = pd.read_csv(data_folder + "food.csv")
food_nutrient = pd.read_csv(data_folder + "food_nutrient.csv")
nutrient = pd.read_csv(data_folder + "nutrient.csv")
food_category = pd.read_csv(data_folder + "food_category.csv")
food_portion = pd.read_csv(data_folder + "food_portion.csv")

# From food.csv
food = food[['fdc_id', 'description', 'food_category_id']]

# From food_nutrient.csv
food_nutrient = food_nutrient[['fdc_id', 'nutrient_id', 'amount']]

# From nutrient.csv
nutrient = nutrient[['id', 'name', 'unit_name']]
nutrient.columns = ['nutrient_id', 'nutrient_name', 'unit_name']

# From food_category.csv
food_category = food_category[['id', 'description']]
food_category.columns = ['food_category_id', 'food_category']

# From food_portion.csv
food_portion = food_portion[['fdc_id', 'gram_weight']]

# 1. Add nutrient names and units to food_nutrient
food_nutrient = food_nutrient.merge(nutrient, on='nutrient_id', how='inner')

# 2. Pivot food_nutrient to have one row per food with macronutrient columns
macronutrients = food_nutrient.pivot_table(index='fdc_id', 
                                           columns='nutrient_name', 
                                           values='amount', 
                                           aggfunc='first').reset_index()

# 3. Merge macronutrients with food
nutritional_data = food.merge(macronutrients, on='fdc_id', how='inner')

# 4. Merge with food_category
nutritional_data = nutritional_data.merge(food_category, on='food_category_id', how='inner')

# 5. Add serving size information
nutritional_data = nutritional_data.merge(food_portion, on='fdc_id', how='left')
nutritional_data.rename(columns={'gram_weight': 'serving_size'}, inplace=True)

# Rename columns for clarity
nutritional_data = nutritional_data.rename(columns={
    'description': 'food_name',
    'Protein': 'protein',
    'Total lipid (fat)': 'fat',
    'Carbohydrate, by difference': 'carbohydrates',
    'Energy': 'calories'
})

# Keep only relevant features
final_columns = ['fdc_id', 'food_name', 'protein', 'fat', 'carbohydrates', 'calories', 
                 'food_category', 'serving_size']
nutritional_data = nutritional_data[final_columns]

# Save to CSV
nutritional_data.to_csv(output_file, index=False)

print(f"nutritional_information.csv saved in {data_folder}")


  food_nutrient = pd.read_csv(data_folder + "food_nutrient.csv")


nutritional_information.csv saved in ../DATA/FoodData_Central_foundation_food_csv_2024-10-31/FoodData_Central_foundation_food_csv_2024-10-31/


In [2]:
import pandas as pd

input_file = "../DATA/nutritional_information.csv"
data = pd.read_csv(input_file)

print("Dataset Head:")
print(data.head())

print("\nMissing Values Per Feature:")
print(data.isnull().sum())

categorical_variables = data.select_dtypes(include=['object', 'category']).columns
print("\nCategorical Variables:")
print(categorical_variables)

for cat_var in categorical_variables:
    print(f"\nUnique values in '{cat_var}':")
    print(data[cat_var].unique())

print("\nMissing Values in Categorical Variables:")
print(data[categorical_variables].isnull().sum())


Dataset Head:
   fdc_id food_name  protein   fat  carbohydrates  calories  \
0  319877    Hummus      NaN  19.0            NaN       NaN   
1  319878    Hummus      NaN   NaN            NaN       NaN   
2  319882    Hummus      NaN  18.7            NaN       NaN   
3  319883    Hummus      NaN   NaN            NaN       NaN   
4  319884    Hummus      NaN   NaN            NaN       NaN   

                 food_category  serving_size  \
0  Legumes and Legume Products           NaN   
1  Legumes and Legume Products           NaN   
2  Legumes and Legume Products           NaN   
3  Legumes and Legume Products           NaN   
4  Legumes and Legume Products           NaN   

                                 dietary_preferences  
0  ['vegetarian', 'vegan', 'gluten-free', 'dairy-...  
1  ['vegetarian', 'vegan', 'gluten-free', 'dairy-...  
2  ['vegetarian', 'vegan', 'gluten-free', 'dairy-...  
3  ['vegetarian', 'vegan', 'gluten-free', 'dairy-...  
4  ['vegetarian', 'vegan', 'gluten-free', '

In [None]:
from sklearn.impute import SimpleImputer


input_file = "../DATA/nutritional_information.csv"  
data = pd.read_csv(input_file)

# Impute missing numerical values
numerical_columns = ['protein', 'fat', 'carbohydrates', 'calories', 'serving_size']
numerical_imputer = SimpleImputer(strategy='mean')

data[numerical_columns] = numerical_imputer.fit_transform(data[numerical_columns])

# Drop rows with missing categorical values
categorical_columns = ['food_name']
data = data.dropna(subset=categorical_columns)

# Check the updated dataset for missing values
missing_values_after = data.isnull().sum()

# Display the head of the cleaned dataset and the missing values summary
data_head_cleaned = data.head()

data_head_cleaned, missing_values_after


(   fdc_id food_name    protein        fat  carbohydrates    calories  \
 0  319877    Hummus  19.434893  19.000000      18.060414  437.578883   
 1  319878    Hummus  19.434893   8.644976      18.060414  437.578883   
 2  319882    Hummus  19.434893  18.700000      18.060414  437.578883   
 3  319883    Hummus  19.434893   8.644976      18.060414  437.578883   
 4  319884    Hummus  19.434893   8.644976      18.060414  437.578883   
 
                  food_category  serving_size  \
 0  Legumes and Legume Products    180.880749   
 1  Legumes and Legume Products    180.880749   
 2  Legumes and Legume Products    180.880749   
 3  Legumes and Legume Products    180.880749   
 4  Legumes and Legume Products    180.880749   
 
                                  dietary_preferences  
 0  ['vegetarian', 'vegan', 'gluten-free', 'dairy-...  
 1  ['vegetarian', 'vegan', 'gluten-free', 'dairy-...  
 2  ['vegetarian', 'vegan', 'gluten-free', 'dairy-...  
 3  ['vegetarian', 'vegan', 'gluten-free

In [4]:
import pandas as pd

# Load the nutritional information dataset
input_file = "../DATA/nutritional_information.csv"
data = pd.read_csv(input_file)

# Define the extended dietary preferences mapping
dietary_mapping = {
    "Legumes and Legume Products": ["vegetarian", "vegan", "gluten-free", "dairy-free", "paleo"],
    "Dairy and Egg Products": ["vegetarian", "kosher"],
    "Beef Products": ["keto", "paleo", "halal", "kosher"],
    "Vegetables and Vegetable Products": ["vegetarian", "vegan", "gluten-free", "dairy-free", "paleo", "keto"],
    "Spices and Herbs": ["vegetarian", "vegan", "gluten-free", "dairy-free", "paleo", "keto"],
    "Sausages and Luncheon Meats": ["keto", "halal", "kosher"],
    "Nut and Seed Products": ["vegetarian", "vegan", "gluten-free", "dairy-free", "paleo", "keto"],
    "Soups, Sauces, and Gravies": ["vegetarian", "vegan", "gluten-free", "dairy-free"],
    "Fruits and Fruit Juices": ["vegetarian", "vegan", "gluten-free", "dairy-free", "paleo"],
    "Baked Products": ["vegetarian", "kosher"],
    "Fats and Oils": ["vegetarian", "vegan", "gluten-free", "dairy-free", "paleo", "keto"],
    "Poultry Products": ["keto", "paleo", "halal", "kosher"],
    "Finfish and Shellfish Products": ["keto", "paleo", "gluten-free", "dairy-free", "pescatarian"],
    "Sweets": ["vegetarian", "kosher", "gluten-free"],
    "Restaurant Foods": ["vegetarian", "vegan", "gluten-free", "dairy-free"],
    "Pork Products": [],
    "Cereal Grains and Pasta": ["vegetarian", "vegan"],
    "Beverages": ["vegetarian", "vegan", "gluten-free", "dairy-free", "paleo", "keto"]
}

# Function to map food category to dietary preferences
def map_dietary_preferences(category):
    return dietary_mapping.get(category, [])

# Apply mapping to create a new column
data['dietary_preferences'] = data['food_category'].apply(map_dietary_preferences)

# Show the first few rows with the new column
print(data[['food_name', 'food_category', 'dietary_preferences']].head())

# Example: Filter for keto-compatible ingredients
keto_ingredients = data[data['dietary_preferences'].apply(lambda x: 'keto' in x)]
print("Keto Ingredients:")
print(keto_ingredients[['food_name', 'food_category']])

# Save the updated dataset
output_file = "../DATA/nutritional_information.csv"
data.to_csv(output_file, index=False)
print(f"Updated dataset saved to '{output_file}'.")



  food_name                food_category  \
0    Hummus  Legumes and Legume Products   
1    Hummus  Legumes and Legume Products   
2    Hummus  Legumes and Legume Products   
3    Hummus  Legumes and Legume Products   
4    Hummus  Legumes and Legume Products   

                                 dietary_preferences  
0  [vegetarian, vegan, gluten-free, dairy-free, p...  
1  [vegetarian, vegan, gluten-free, dairy-free, p...  
2  [vegetarian, vegan, gluten-free, dairy-free, p...  
3  [vegetarian, vegan, gluten-free, dairy-free, p...  
4  [vegetarian, vegan, gluten-free, dairy-free, p...  
Keto Ingredients:
                                               food_name  \
373    Proximates, Beef, Eye of Round roast/steak, le...   
374    Se - Beef,  Eye of Round roast/steak, select, ...   
375    Minerals - Beef,  Eye of Round roast/steak, se...   
376    Proximates, Beef, Eye of Round roast/steak, le...   
377    B12, B6, B3, B2 - Beef, Eye of Round roast/ste...   
...                        

In [None]:
import pandas as pd
import json


nutritional_file = "../DATA/nutritional_information.csv"
train_file = "../DATA/Recipe/train.json"
test_file = "../DATA/Recipe/test.json"


ingredients_data = pd.read_csv(nutritional_file)


with open(train_file, 'r') as f:
    train_recipes = json.load(f)

with open(test_file, 'r') as f:
    test_recipes = json.load(f)


print("Nutritional Data Sample:")
print(ingredients_data.head())

print("\nTrain Recipe Sample:")
print(train_recipes[0])


Nutritional Data Sample:
   fdc_id food_name  protein   fat  carbohydrates  calories  \
0  319877    Hummus      NaN  19.0            NaN       NaN   
1  319878    Hummus      NaN   NaN            NaN       NaN   
2  319882    Hummus      NaN  18.7            NaN       NaN   
3  319883    Hummus      NaN   NaN            NaN       NaN   
4  319884    Hummus      NaN   NaN            NaN       NaN   

                 food_category  serving_size  \
0  Legumes and Legume Products           NaN   
1  Legumes and Legume Products           NaN   
2  Legumes and Legume Products           NaN   
3  Legumes and Legume Products           NaN   
4  Legumes and Legume Products           NaN   

                                 dietary_preferences  
0  ['vegetarian', 'vegan', 'gluten-free', 'dairy-...  
1  ['vegetarian', 'vegan', 'gluten-free', 'dairy-...  
2  ['vegetarian', 'vegan', 'gluten-free', 'dairy-...  
3  ['vegetarian', 'vegan', 'gluten-free', 'dairy-...  
4  ['vegetarian', 'vegan', 'glut

In [None]:
# Function to map dietary preferences
def map_recipe_dietary_preferences(recipe_ingredients, ingredients_df):
    preferences = set()
    for ingredient in recipe_ingredients:
        matches = ingredients_df[ingredients_df['food_name'].str.contains(ingredient, case=False, na=False)]
        for pref in matches['dietary_preferences']:
            if isinstance(pref, str):
                preferences.update(eval(pref))
    return list(preferences)

# Function to compute macronutrients
def map_recipe_macros(recipe_ingredients, ingredients_df):
    macros = {"protein": 0, "fat": 0, "carbohydrates": 0, "calories": 0}
    for ingredient in recipe_ingredients:
        matches = ingredients_df[ingredients_df['food_name'].str.contains(ingredient, case=False, na=False)]
        if not matches.empty:
            macros['protein'] += matches['protein'].mean(skipna=True)
            macros['fat'] += matches['fat'].mean(skipna=True)
            macros['carbohydrates'] += matches['carbohydrates'].mean(skipna=True)
            macros['calories'] += matches['calories'].mean(skipna=True)
    return macros

# Map both to recipes with progress tracking
total_recipes = len(train_recipes)

for i, recipe in enumerate(train_recipes):
    recipe['dietary_preferences'] = map_recipe_dietary_preferences(
        recipe['ingredients'], ingredients_data
    )
    macros = map_recipe_macros(recipe['ingredients'], ingredients_data)
    recipe.update(macros)
    

    if i % 10 == 0 or i == total_recipes - 1:  
        print(f"Processed {i + 1}/{total_recipes} recipes...")

print("Updated Recipe with Macro and Dietary Information:")
print(train_recipes[0])


print("Updated Recipe with Macro and Dietary Information:")
print(train_recipes[0])


Processed 1/39774 recipes...
Processed 11/39774 recipes...
Processed 21/39774 recipes...
Processed 31/39774 recipes...
Processed 41/39774 recipes...
Processed 51/39774 recipes...
Processed 61/39774 recipes...
Processed 71/39774 recipes...
Processed 81/39774 recipes...
Processed 91/39774 recipes...
Processed 101/39774 recipes...
Processed 111/39774 recipes...
Processed 121/39774 recipes...
Processed 131/39774 recipes...
Processed 141/39774 recipes...
Processed 151/39774 recipes...
Processed 161/39774 recipes...
Processed 171/39774 recipes...
Processed 181/39774 recipes...
Processed 191/39774 recipes...
Processed 201/39774 recipes...
Processed 211/39774 recipes...
Processed 221/39774 recipes...
Processed 231/39774 recipes...
Processed 241/39774 recipes...
Processed 251/39774 recipes...
Processed 261/39774 recipes...
Processed 271/39774 recipes...
Processed 281/39774 recipes...


  matches = ingredients_df[ingredients_df['food_name'].str.contains(ingredient, case=False, na=False)]
  matches = ingredients_df[ingredients_df['food_name'].str.contains(ingredient, case=False, na=False)]


Processed 291/39774 recipes...
Processed 301/39774 recipes...
Processed 311/39774 recipes...
Processed 321/39774 recipes...
Processed 331/39774 recipes...
Processed 341/39774 recipes...
Processed 351/39774 recipes...
Processed 361/39774 recipes...
Processed 371/39774 recipes...
Processed 381/39774 recipes...
Processed 391/39774 recipes...
Processed 401/39774 recipes...
Processed 411/39774 recipes...
Processed 421/39774 recipes...
Processed 431/39774 recipes...
Processed 441/39774 recipes...
Processed 451/39774 recipes...
Processed 461/39774 recipes...
Processed 471/39774 recipes...
Processed 481/39774 recipes...
Processed 491/39774 recipes...
Processed 501/39774 recipes...
Processed 511/39774 recipes...
Processed 521/39774 recipes...
Processed 531/39774 recipes...
Processed 541/39774 recipes...
Processed 551/39774 recipes...
Processed 561/39774 recipes...
Processed 571/39774 recipes...
Processed 581/39774 recipes...
Processed 591/39774 recipes...
Processed 601/39774 recipes...
Processe

In [None]:
# User preferences including dietary and macro ranges
user_preferences = [
    {"user_id": 1, "dietary": "vegan", "cuisine": "greek", "protein_range": (10, 30), "carb_range": (10, 50)},
    {"user_id": 2, "dietary": "vegetarian", "cuisine": "southern_us", "protein_range": (5, 20), "carb_range": (20, 60)},
]

# Filter recipes
def filter_recipes(user, recipes):
    filtered_recipes = []
    for recipe in recipes:
        if (
            user['dietary'] in recipe['dietary_preferences'] and
            user['cuisine'] == recipe['cuisine'] and
            user['protein_range'][0] <= recipe['protein'] <= user['protein_range'][1] and
            user['carb_range'][0] <= recipe['carbohydrates'] <= user['carb_range'][1]
        ):
            filtered_recipes.append(recipe['id'])
    return filtered_recipes

# Generate user-recipe interactions
user_recipe_interactions = []
for user in user_preferences:
    matching_recipes = filter_recipes(user, train_recipes)
    for recipe_id in matching_recipes:
        user_recipe_interactions.append({"user_id": user['user_id'], "recipe_id": recipe_id, "interaction": 1})

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd

# Create interaction matrix
interaction_df = pd.DataFrame(user_recipe_interactions)
interaction_matrix = interaction_df.pivot_table(
    index="user_id", columns="recipe_id", values="interaction", fill_value=0
)

# Compute similarity
recipe_similarity = cosine_similarity(interaction_matrix.T)

# Recommend recipes
def recommend_recipes(user_id, n=3):
    user_vector = interaction_matrix.loc[user_id].values
    scores = recipe_similarity.dot(user_vector) / recipe_similarity.sum(axis=1)
    recommended_indices = scores.argsort()[::-1][:n]
    recommended_recipe_ids = interaction_matrix.columns[recommended_indices]
    return recommended_recipe_ids


print("Recommendations for User 1:")
print(recommend_recipes(user_id=1))

print("Recommendations for User 2:")
print(recommend_recipes(user_id=2))

Recommendations for User 1:
Index([49584, 28232, 27242], dtype='int64', name='recipe_id')
Recommendations for User 2:
Index([149, 28044, 28006], dtype='int64', name='recipe_id')
