## Optimizing High-Endurance Performance: Budget Analysis for Vegans and Non-Vegans Aged 19-30
# Team Sylvia Lane, Spring 2025

Our project focused on the minimum cost diets for individuals 19-30 based on diet (vegan and non-vegan), sex (male and female), and athlete status (athlete and non-athlete). We used data about the cost of food in the US in order to get an understanding of how people across these categories could minimize the cost of their diets while meeting nutritional requirements.

We began by identifying how the nutritional requirements for athletes (male and female) is different than for non-athletes. We also sorted all recipes and ingredients containing animal products out of the dataset in order to create options for vegan individuals. We generated diets for individuals in different categories, and then used the Consumer Price Index for different regions to approximate the actual cost of this diet in different regions.
                                                                                                                                                                            # Goal: 
Identify minimum-cost diets that meet nutritional needs for individuals aged 19-30, considering dietary preference, sex, and athlete status. Using U.S. Consumer Price Index (CPI) data, our project aims to provide region-specific insights into affordable, nutritionally adequate diets.
                                                                                                                                                                            Identify Nutritional Differences: Analyze how athletes' dietary needs differ from non-athletes based on their sex and nutritional requirements.
Categorize Dietary Preferences: Sort and classify available food options to create vegan and non-vegan diet plans.
Optimize for Cost: Construct the lowest-cost meal plans that meets the recommended dietary allowances (RDA).
Regional Cost Variation: Utilize CPI data to approximate actual food costs in different U.S. regions.
Ensure Nutritional Adequacy: Validate that the generated meal plans meet the macronutrient and micronutrient needs for vegans and non-vegans across different sexes. 


# Purpose: 
This project connects affordability and nutrition, providing cost-effective diets for diverse groups. It aids individuals in dietary choices, informs policy makers on food access, and supports economic and nutritional sustainability.



In [None]:
# Install basic packages

%pip install eep153_tools
%pip install python_gnupg
%pip install -U gspread_pandas

import pandas as pd
import re 
from eep153_tools.sheets import read_sheets

# Vegan

In [None]:
#load in file from class
def format_id(id,zeropadding=0):
    """Nice string format for any id, string or numeric.

    Optional zeropadding parameter takes an integer
    formats as {id:0z} where
    """
    if pd.isnull(id) or id in ['','.']: return None

    try:  # If numeric, return as string int
        return ('%d' % id).zfill(zeropadding)
    except TypeError:  # Not numeric
        return id.split('.')[0].strip().zfill(zeropadding)
    except ValueError:
        return None

data_url = "https://docs.google.com/spreadsheets/d/1GTo423_gUJe1Von9jypWAbC0zSQ7WGegAWPuRi7eJAI/edit?gid=1410082681#gid=1410082681"

#create recipes df
recipes = read_sheets(data_url, sheet="recipes")
recipes = (recipes
           .assign(parent_foodcode = lambda df: df["parent_foodcode"].apply(format_id),
                   ingred_code = lambda df: df["ingred_code"].apply(format_id))
           .rename(columns={"parent_desc": "recipe"}))

#List of non-vegan keywords AND non-natural foods keywords (including frozen, processed, etc).
NON_VEGAN_KEYWORDS = [
    "beef", "pork", "chicken", "turkey", "fish", "seafood", "shellfish", "shrimp", "crab","crabs",
    "lamb", "goat", "duck", "goose", "tuna", "salmon", "cod", "bacon", "ham",
    "shellfish", "lobster", "mussels", "oysters", "scallops", "octopus", "eel",
    "organ meat", "milk","Eggnog" "cheese", "butter", "cream","ice cream", "yogurt", "whey",
    "casein", "lactose", "ghee", "buttermilk", "egg", "eggs", "mayo", "mayonnaise", "albumen",
    "albumin", "lysozyme", "ovomucoid", "ovomucin", "ovovitellin", "honey",
    "bee pollen", "royal jelly", "propolis", "shellac", "confectioner’s glaze",
    "carmine", "cochineal", "lard", "tallow", "suet", "gelatin", "collagen",
    "isinglass", "bone broth", "bone stock", "fish sauce", "oyster sauce",
    "shrimp paste", "worcestershire sauce", "anchovies", "rennet", "pepsin",
    "bone char", "vitamin d3", "lanolin", "omega-3 fish oil", "caseinate",
    "lecithin (egg)", "cysteine", "l-cysteine", "glycerin (animal)",
    "glycerol (animal)", "stearic acid (animal)", "tallowate", "sodium tallowate",
    "capric acid", "caprylic acid", "cheese", "pudding", "processed", "veal",'sirloin', "steak", "animal",
    "Custard", "Mousse", "chocolate", "Meatballs", "meat", "Gravy", "poultry","baby", "frozen", 'dairy', 'lump',"peas","school"
]

#this partal match: "milkshake" or "eggroll" will get flagged (since "milk" or "egg" is in the keyword list).
NON_VEGAN_PATTERN = re.compile(
    '|'.join(map(re.escape, NON_VEGAN_KEYWORDS)),
    re.IGNORECASE
)

def filter_vegan_ingredients(df: pd.DataFrame) -> pd.DataFrame:
    # 1) Convert to string, lowercase, remove punctuation
    df["recipe"] = df["recipe"].astype(str).str.lower().fillna("")
    df["recipe"] = df["recipe"].str.replace(r"[^\w\s]", "", regex=True)

    df["ingred_desc"] = df["ingred_desc"].astype(str).str.lower().fillna("")
    df["ingred_desc"] = df["ingred_desc"].str.replace(r"[^\w\s]", "", regex=True)

    # 2) Create a mask for rows that do NOT contain non-vegan keywords
    mask = ~(df["recipe"].str.contains(NON_VEGAN_PATTERN, na=False, regex=True) |
             df["ingred_desc"].str.contains(NON_VEGAN_PATTERN, na=False, regex=True))

    return df[mask]

vegan_recipes = filter_vegan_ingredients(recipes)

#start copying code from mini lecture VEGAN

#create nutrition df
nutrition = (read_sheets(data_url, sheet="nutrients")
             .assign(ingred_code = lambda df: df["ingred_code"].apply(format_id)))

display(nutrition.head())
nutrition.columns
nutrition.shape

# normalize weights to percentage terms. 
vegan_recipes['ingred_wt'] = vegan_recipes['ingred_wt']/vegan_recipes.groupby(['parent_foodcode'])['ingred_wt'].transform("sum")

# we're going to extend the recipes data frame to include the nutrient profiles of its ingredients (in 100g)
df_vegan = vegan_recipes.merge(nutrition, how="left", on="ingred_code")

# multiply all nutrients per 100g of an ingredient by the weight of that ingredient in a recipe.
numeric_cols = list(df_vegan.select_dtypes(include=["number"]).columns)
numeric_cols.remove("ingred_wt")
df_vegan[numeric_cols] = df_vegan[numeric_cols].mul(df_vegan["ingred_wt"], axis=0)

# sum nutrients of food codes (over the multiple ingredients)
# python tip: one can merge dictionaries dict1 dict2 using **, that is: dict_merge = {**dict1, **dict2}. The ** effectively "unpacks" the key value pairs in each dictionary
df_vegan = df_vegan.groupby('parent_foodcode').agg({**{col: "sum" for col in numeric_cols},
                                        "recipe": "first"})

df_vegan.index.name = "recipe_id"

food_names = df_vegan["recipe"]

prices = read_sheets(data_url, sheet="prices")[["food_code", "year", "price"]]

prices["food_code"] = prices["food_code"].apply(format_id)

prices = prices.set_index(["year", "food_code"])
print(prices.index.levels[0])

# we'll focus on the latest price data
prices = prices.xs("2017/2018", level="year")

# drop rows of prices where the price is "NA"
prices = prices.dropna(subset="price")

print(f"We have prices for {prices.shape[0]} unique recipes (FNDDS food codes)")

rda = read_sheets(data_url, sheet="rda")

rda = rda.set_index("Nutrient")
rda_min = rda[rda["Constraint Type"].isin(["RDA", "AI"])].copy()

common_recipes = df_vegan.index.intersection(prices.index)

# python tip: given a list of indices, "loc" both subsets and sorts. 
df_vegan = df_vegan.loc[common_recipes]
prices = prices.loc[common_recipes]

# lets remap the price dataframe index to be the actual food names.
prices.index = prices.index.map(food_names)

A_all = df_vegan.T

