# Final Project 2: Diet Subsistence of Different Athlete Types
**Group:** Ancel Keys  
**Authors:** Colby, Mario, Oliver, Avani, Reed, Dahalan  

## Table of Contents
1. [Introduction & Imports](#introduction--imports)
   - **Deliverable 1[A]** - Description of Population of Interest
2. [Data Collection & Mapping](#data-collection--mapping)
   - **Deliverable 3[A]** - Nutritional Content of Different Foods
   - **Deliverable 3[A]** - Data on Prices for Different Foods
3. [Dietary Reference Intakes](#dietary-reference-intakes)
   - **Deliverable 2[A]** - Dietary Reference Intake without Age
   - **Deliverable 2[A]** - Dietary Reference Intake with Age
4. [Minimum Cost Diet for Different Athlete and Intensity Types](#minimum-cost-diet-for-athlete-types)
   - **Deliverable 5[A]** - Solution (without Age)
   - **Deliverable 5[A]** - Solution (with Age)
5. [Sensitivity of Solution](#sensitivity-of-solution)
   - **Deliverable 8[C]** - Sensitivity of Solution
6. [Total Cost of Populations of Interest](#total-cost-of-populations-of-interest)
   - **Deliverable 9[B]** - Total Cost for Population of Interest
   - **Deliverable 9[B]** - Total Cost for Population of the Boston Marathon
   - **Deliverable 9[B]** - Total Cost for Population of US Olympic Lifters
7. [Meal Review: Is Your Solution Edible](#meal-review-is-your-solution-edible)
   - **Deliverable 6[B]** - Is Your Solution Edible?
   - **Deliverable 7[B]** - Meal Reviews

# **Introduction & Imports**

In [1]:
%pip install eep153_tools
%pip install python_gnupg
%pip install -U gspread_pandas

# === Preprocess common data (run this once) ===
import pandas as pd
from eep153_tools.sheets import read_sheets
import re
from scipy.optimize import linprog as lp
import numpy as np

Collecting eep153_tools
  Using cached eep153_tools-0.12.4-py2.py3-none-any.whl.metadata (363 bytes)
Using cached eep153_tools-0.12.4-py2.py3-none-any.whl (4.9 kB)
Installing collected packages: eep153_tools
Successfully installed eep153_tools-0.12.4
Note: you may need to restart the kernel to use updated packages.
Collecting python_gnupg
  Using cached python_gnupg-0.5.4-py2.py3-none-any.whl.metadata (2.0 kB)
Using cached python_gnupg-0.5.4-py2.py3-none-any.whl (21 kB)
Installing collected packages: python_gnupg
Successfully installed python_gnupg-0.5.4
Note: you may need to restart the kernel to use updated packages.
Collecting gspread_pandas
  Using cached gspread_pandas-3.3.0-py2.py3-none-any.whl.metadata (10 kB)
Using cached gspread_pandas-3.3.0-py2.py3-none-any.whl (27 kB)
Installing collected packages: gspread_pandas
  Attempting uninstall: gspread_pandas
    Found existing installation: gspread-pandas 2.2.3
    Uninstalling gspread-pandas-2.2.3:
      Successfully uninstalled g

## [A]Description of Population of Interest

Athletes typically customize their diets to meet the physical demands of their sport. Endurance athletes, for example, prioritize high carbohydrate intake to sustain prolonged activities. This not only helps maintain stable blood glucose levels but also replenishes muscle glycogen. Conversely, strength and bodybuilding athletes usually consume higher amounts of protein, which is crucial for muscle repair and growth. Fat intake, which is essential for hormone production and providing concentrated energy, varies depending on the specific energy demands of the sport and the athlete's goals. Overall, it is vital for athletes to adapt their diet to the requirements of their sport to maximize its utility.

# **Data Collection & Mapping**

#### Helper Function for Formatting IDs

This code defines a helper function `format_id` which takes an ID and an optional zero-padding parameter. It returns a formatted string version of the ID if possible. The function handles cases where the ID might be null, empty, or in a non-standard format. The code also sets a data URL for reference.

In [2]:
# Helper function
def format_id(id, zeropadding=0):
    if pd.isnull(id) or id in ['', '.']:
        return None
    try:
        return ('%d' % id).zfill(zeropadding)
    except TypeError:
        return id.split('.')[0].strip().zfill(zeropadding)
    except ValueError:
        return None

data_url = "https://docs.google.com/spreadsheets/d/1l0Xl1NwSRN0dPwjHRWDEnChBWqTx7VPXOnLnJKI2lAY/edit?gid=415594035#gid=415594035"

#### Load and Clean Data

This code loads the original recipes data from a Google Sheet, applies formatting to specific columns using the `format_id` helper function, and renames one of the columns for clarity.


In [3]:
# Load the original recipes data from the specified Google Sheet (sheet named "recipes")
og_recipes = read_sheets(data_url, sheet="recipes")

# Clean and transform the data:
# - Apply the format_id function to 'parent_foodcode' and 'ingred_code' columns to standardize their format
# - Rename the 'parent_desc' column to 'recipe' for better clarity
og_recipes = (og_recipes
              .assign(
                  parent_foodcode=lambda df: df["parent_foodcode"].apply(format_id),
                  ingred_code=lambda df: df["ingred_code"].apply(format_id)
              )
              .rename(columns={"parent_desc": "recipe"}))

#### Filter Data By Athlete Diets

This section of code defines lists of ingredients that are beneficial (key foods) and those that should be avoided (excludable items) for athletes. It then creates regular expression patterns to filter recipes based on their ingredient descriptions and recipe names, ensuring that each recipe includes at least one beneficial ingredient while excluding those with undesirable terms. Finally, it applies these filters to the original recipes dataset to retain only those meals that meet the inclusion criteria and do not contain any excluded ingredients, thereby optimizing the meal selection for athlete-specific dietary benefits.

In [4]:
# Define a list of key food items to INCLUDE in ingredient descriptions
key_foods = [
    "Yogurt, Greek", "Cheese, Cottage", "lowfat, 1% Milk", "milk, lowfat" "Cheese, Parmesan", "Banana", "Apple", "Orange",
    "Avocado", "figs", "dates", "raisins", "apricots, dried", "Grapefruit", "Grapes",
    "pear", "Peach", "watermelon", "Oats", "Bread, rye", "Brown Rice", "Pasta",
    "Quinoa", "Rolled Oats", "Rice Cakes", "Whole Grain Cereal", "Special K", "Bread, whole-wheat", 
    "Bread, whole-mutligrain", "Popcorn, Air-popped",
    "whole grain pasta", "Almonds", "Peanut Butter", "Chicken", "Egg", "Tofu",
    "Lentils", "Beans, Black", "Tuna", "Salmon", "Soup, Bean ",
    "Steak", "Tilapia", "Pork", "Venison", "Cod", "Ground turkey", "turkey, Ground",
    "beef, ground", "Ground beef", "Tempeh", "chickpea", "beans, kidney", "Sweet Potato", "Potato",
    "Spinach", "Broccoli", "Bell Pepper", "Carrot", "Beets", "peas", "Tomato", "Millet",
    "Creatine", "Omega-3", "BCAAs", "Blueberries", "Strawberries", "juice, raw",
    "Garlic", "Lemon", "Onion", "Asparagus", "kale", "collards", "chard, swiss", "protein powder", "brussel sprouts", "oat",
    "sunflower", "salmon", "tuna", "mackerel", "fish", "chip"
    
]

# Define a list of foods or terms to EXCLUDE
key_excludable = [
    "sugar", "syrup", "soda", "candy", "artificial", "processed", "preservative",
    "yolk", "Fruit juice", "juice drink", "Sunny D", "sweetened", "added sugar", "liver", "babyfood", "baby food", "carp", "chip"
]

# Escape the items so that parentheses and other special characters are treated literally
escaped_key_foods = [re.escape(food) for food in key_foods]
escaped_excludable = [re.escape(term) for term in key_excludable]

# Wrap each escaped term in a non-capturing group '(?: ... )' before joining with '|'
include_pattern = '|'.join(f"(?:{term})" for term in escaped_key_foods)
exclude_pattern = '|'.join(f"(?:{term})" for term in escaped_excludable)

# 1) Include mask: meals that have at least one ingredient containing a key food
meal_mask_include = og_recipes.groupby('parent_foodcode')['ingred_desc'] \
    .transform(lambda x: x.str.contains(include_pattern, case=False, na=False).any())

# 2a) Exclude mask for INGREDIENTS: meals that have any ingredient containing an excludable term
meal_mask_exclude_ingredients = og_recipes.groupby('parent_foodcode')['ingred_desc'] \
    .transform(lambda x: x.str.contains(exclude_pattern, case=False, na=False).any())

# 2b) Exclude mask for RECIPE NAMES: meals whose recipe name contains an excludable term
meal_mask_exclude_names = og_recipes.groupby('parent_foodcode')['recipe'] \
    .transform(lambda x: x.str.contains(exclude_pattern, case=False, na=False).any())

# Combine both ingredient and recipe-name exclusions
meal_mask_exclude_total = meal_mask_exclude_ingredients | meal_mask_exclude_names

# 3) Final mask: include meals that pass the "include" filter AND do not match the exclusion filter
final_mask = meal_mask_include & (~meal_mask_exclude_total)

# Filter the original recipes dataset
recipes = og_recipes[final_mask]

## [A] Nutritional Content of Different Foods

This section makes a copy of the filtered recipes, normalizes ingredient weights to percentages, and merges nutrient information. Then, it scales nutrient values by their ingredient's normalized weight and aggregates the nutrient profile by meal. Finally, the code extracts recipe names for further use.

In [5]:
# Load nutrition data and merge
nutrition = read_sheets(data_url, sheet="nutrients") \
            .assign(ingred_code=lambda df: df["ingred_code"].apply(format_id))

In [6]:
# Make an explicit copy of recipes before modifying
recipes = recipes.copy()

# Normalize ingredient weights to percentages by dividing by the total weight per meal.
# Using .loc for assignment ensures we're modifying the DataFrame in place.
recipes.loc[:, 'ingred_wt'] = recipes['ingred_wt'] / recipes.groupby('parent_foodcode')['ingred_wt'].transform("sum")

# Merge nutrient information into recipes on the 'ingred_code' column.
# This performs a left join, ensuring all recipes are kept.
df = recipes.merge(nutrition, how="left", on="ingred_code")

# Identify numeric columns (e.g., nutrient values) in the merged DataFrame.
numeric_cols = list(df.select_dtypes(include=["number"]).columns)

# Remove 'ingred_wt' from the list as we don't want to scale it.
numeric_cols.remove("ingred_wt")

# Multiply each nutrient value by the normalized ingredient weight to get weighted nutrient values.
df[numeric_cols] = df[numeric_cols].mul(df["ingred_wt"], axis=0)

# Aggregate nutrient profiles by meal (identified by 'parent_foodcode').
# For nutrient columns, sum their weighted values; for the recipe name, take the first occurrence.
df = df.groupby('parent_foodcode').agg({
    **{col: "sum" for col in numeric_cols},
    "recipe": "first"
})

# Rename the index to 'recipe_id' for clarity.
df.index.name = "recipe_id"

# Extract recipe names for further use.
food_names = df["recipe"]
df.head()

Unnamed: 0_level_0,Capric acid,Lauric acid,Myristic acid,Palmitic acid,Palmitoleic acid,Stearic acid,Oleic acid,Linoleic Acid,Linolenic Acid,Stearidonic acid,...,"Vitamin B-12, added",Vitamin B6,Vitamin C,Vitamin D,Vitamin E,"Vitamin E, added",Vitamin K,Water,Zinc,recipe
recipe_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11116000,0.26,0.124,0.325,0.911,0.082,0.441,0.977,0.109,0.04,0.0,...,0.0,0.046,1.3,1.3,0.07,0.0,0.3,87.03,0.3,"Goat's milk, whole"
11400010,0.067,0.062,0.194,0.57,0.031,0.195,0.427,0.065,0.007,0.0,...,0.0,0.055,0.8,0.0,0.04,0.0,0.2,83.56,0.6,"Yogurt, Greek, NS as to type of milk or flavor"
11411390,0.067,0.062,0.194,0.57,0.031,0.195,0.427,0.065,0.007,0.0,...,0.0,0.055,0.8,0.0,0.04,0.0,0.2,83.56,0.6,"Yogurt, Greek, NS as to type of milk, plain"
11411400,0.264,0.127,0.316,1.0,0.049,0.422,0.993,0.209,0.023,0.0,...,0.0,0.063,0.0,0.0,0.01,0.0,0.0,81.3,0.52,"Yogurt, Greek, whole milk, plain"
11411410,0.067,0.062,0.194,0.57,0.031,0.195,0.427,0.065,0.007,0.0,...,0.0,0.055,0.8,0.0,0.04,0.0,0.2,83.56,0.6,"Yogurt, Greek, low fat milk, plain"


## [A] Data on Prices for Different Foods

This code loads pricing data from a Google Sheet, applies ID formatting, and filters prices for a specific year. It then matches the price data with the corresponding recipes based on common food codes, maps the prices to food names, and prepares a transposed version of the nutrient data for further analysis.

In [7]:
# --- After merging and aggregating recipes ---

# Extract recipe names for further use.
food_names = df["recipe"]

# Load prices data from the "prices" sheet, selecting only the necessary columns.
prices = read_sheets(data_url, sheet="prices")[["food_code", "year", "price"]]

# Format the 'food_code' column using the helper function 'format_id'
prices["food_code"] = prices["food_code"].apply(format_id)

# Set a multi-index using 'year' and 'food_code' for easier slicing and alignment.
prices = prices.set_index(["year", "food_code"])

# Filter the prices data to include only records for the year "2017/2018".
prices = prices.xs("2017/2018", level="year")

# Remove rows where the price is missing.
prices = prices.dropna(subset="price")

# Find the intersection of food codes that are common between our aggregated recipes (df) and the prices data.
common_recipes = df.index.intersection(prices.index)

# Subset both the recipes and prices data to only include common recipes.
df = df.loc[common_recipes]
prices = prices.loc[common_recipes]

# --- Now update the identifiers to be food names ---

# Instead of mapping prices.index to food names, update A_all’s columns.
# First, transpose the nutrient data:
A_all = df.T

# Now, update A_all's columns using the food_names Series.
# food_names is indexed by the recipe IDs (food codes), so this maps each code to its corresponding name.
A_all.columns = food_names.loc[A_all.columns]

# (Optional) If you want your prices DataFrame to use food names too, then do:
prices.index = prices.index.map(food_names)

# Now A_all's columns and prices' index should match (both being food names).
prices.head()

Unnamed: 0,price
"Yogurt, Greek, NS as to type of milk or flavor",0.583603
"Yogurt, Greek, NS as to type of milk, plain",0.605436
"Yogurt, Greek, whole milk, plain",0.605436
"Yogurt, Greek, low fat milk, plain",0.605436
"Yogurt, Greek, nonfat milk, plain",0.605436


# **Dietary Reference Intakes**

In [8]:
# Load RDA data (nutrient constraints)
rda = read_sheets(data_url, sheet="rda")
rda = rda.set_index("Nutrient")
rda_df = rda

## [A]Dietary Reference Intake (without Age)

This function retrieves the Dietary Reference Intakes (DRIs) for a specific group by constructing a column name from the provided sex, athlete type, and training type, and then extracting the corresponding column from the rda_df DataFrame. If the full column with training type isn’t available, it falls back to using just the sex and athlete type. It then cleans the data by removing any commas and converting all values to numeric types before returning the resulting pandas Series indexed by nutrient names.

In [9]:
def get_dri_from_rda_df(sex, athlete_type, training_type):
    """
    Returns a pandas.Series of Dietary Reference Intakes (DRIs)/Recommended Daily Allowances (RDAs)
    for the given group, using data from the rda_df DataFrame.
    
    Parameters
    ----------
    sex : str
        "Male" or "Female".
    athlete_type : str
        For example, "Endurance", "Strength", or "Bodybuilding".
    training_type : str, optional
        For example, "Normal" or "Intense". Defaults to "Normal".
        
    Returns
    -------
    pd.Series
        Series indexed by nutrient names containing the recommended intake values.
    
    Raises
    ------
    ValueError
        If no matching column is found in rda_df.
    """
    # First try full column name with training type.
    group_col = f"{sex}_{athlete_type}_{training_type}"
    if group_col not in rda_df.columns:
        # Fallback: try without training type.
        group_col = f"{sex}_{athlete_type}"
        if group_col not in rda_df.columns:
            raise ValueError(f"Column for group '{sex}_{athlete_type}_{training_type}' not found in rda_df.")
    
    # Extract the column.
    dri_series = rda_df[group_col].copy()
    
    # Convert values to numeric: remove commas if present, and coerce errors.
    dri_series = dri_series.apply(lambda x: float(str(x).replace(',', '')) if isinstance(x, str) else x)
    dri_series = pd.to_numeric(dri_series, errors='coerce')
    
    return dri_series

In [10]:
dri_male_endurance = get_dri_from_rda_df("Male", "Endurance", "Intense")
dri_male_endurance

Nutrient
Energy            4700.0
Protein            140.0
Carbohydrate       810.0
Dietary Fiber       34.0
Linoleic Acid       17.0
Linolenic Acid       1.6
Calcium           1000.0
Iron                10.0
Magnesium          460.0
Phosphorus         700.0
Potassium         3750.0
Sodium            3000.0
Zinc                11.0
Copper               0.9
Selenium            55.0
Vitamin A          900.0
Vitamin E           15.0
Vitamin D           15.0
Vitamin C          110.0
Thiamin              1.2
Riboflavin           1.3
Niacin              16.0
Vitamin B6           1.9
Vitamin B12          2.4
Choline            550.0
Vitamin K          120.0
Folate             400.0
Name: Male_Endurance_Intense, dtype: float64

## [A]Dietary Reference Intake (with Age)

This function begins by retrieving the base Dietary Reference Intakes (DRIs) for a given group using the get_dri_from_rda_df function, and then makes personalized adjustments to key macronutrients—Energy, Protein, and Carbohydrate—based on the individual's age. For children under 18, the requirements are scaled linearly (age/18) to reflect the gradual increase in metabolic demands until reaching full adult capacity, while adults aged 18 to 60 use the base values because their metabolic rate is generally stable. For individuals over 60, the function reduces these nutrient values by 0.7% for each year beyond 60, accounting for the natural decline in metabolism observed in older adults, while maintaining consistency across the stable adult period.

In [11]:
def get_personalized_dri(sex, athlete_type, training_type, age):
    """
    Returns a personalized pandas.Series of Dietary Reference Intakes (DRIs)/Recommended Daily Allowances (RDAs)
    for an individual based on sex, athlete type, training type, and age.
    
    The function uses the base DRIs from the rda_df DataFrame (via get_dri_from_rda_df) and then applies an age-based
    adjustment to key nutrients (Energy, Protein, Carbohydrate). For ages:
      - Under 18: The requirements are scaled by (age/18).
      - Between 18 and 60: The base values are used (metabolic rate is assumed consistent).
      - Above 60: The key nutrient values are decreased by 0.7% for each year beyond 60.
    
    Parameters
    ----------
    sex : str
        "Male" or "Female".
    athlete_type : str
        For example, "Endurance", "Strength", or "Bodybuilding".
    training_type : str
        For example, "Normal" or "Intense".
    age : int or float
        Age of the individual.
        
    Returns
    -------
    pd.Series
        Series indexed by nutrient names containing the personalized DRI values.
    """
    # Retrieve the base DRIs from rda_df using your provided function.
    base_dri = get_dri_from_rda_df(sex, athlete_type, training_type)
    personalized = base_dri.copy()
    
    # Define the key macronutrients that we want to adjust.
    key_nutrients = ["Energy", "Protein", "Carbohydrate"]
    
    # Apply an age adjustment:
    if age < 18:
        # For children, scale linearly by age/18.
        factor = age / 18.0
    elif age <= 60:
        # For adults aged 18-60, assume no change in metabolic rate.
        factor = 1.0
    else:
        # For adults older than 60, decrease key nutrients by 0.7% per year above 60.
        factor = 1 - 0.007 * (age - 60)
    
    # Adjust only the key macronutrients.
    for nutrient in key_nutrients:
        # Note: This assumes that the index names in the base_dri match exactly.
        if nutrient in personalized.index and pd.notnull(personalized[nutrient]):
            personalized[nutrient] *= factor
    
    return personalized

In [12]:
dri_10 = get_personalized_dri("Male", "Endurance", "Normal", age=10)
dri_30 = get_personalized_dri("Male", "Endurance", "Intense", age=30)
dri_65 = get_personalized_dri("Male", "Endurance", "Intense", age=65)

print(dri_10)
print(dri_30)
print(dri_65)

Nutrient
Energy            2361.111111
Protein             66.666667
Carbohydrate       388.888889
Dietary Fiber       34.000000
Linoleic Acid       17.000000
Linolenic Acid       1.600000
Calcium           1000.000000
Iron                10.000000
Magnesium          420.000000
Phosphorus         700.000000
Potassium         3400.000000
Sodium            2300.000000
Zinc                11.000000
Copper               0.900000
Selenium            55.000000
Vitamin A          900.000000
Vitamin E           15.000000
Vitamin D           15.000000
Vitamin C           90.000000
Thiamin              1.200000
Riboflavin           1.300000
Niacin              16.000000
Vitamin B6           1.700000
Vitamin B12          2.400000
Choline            550.000000
Vitamin K          120.000000
Folate             400.000000
Name: Male_Endurance_Normal, dtype: float64
Nutrient
Energy            4700.0
Protein            140.0
Carbohydrate       810.0
Dietary Fiber       34.0
Linoleic Acid       17.0
Lin

# **Minimum Cost Diet for Different Athlete and Intensity Types**
## [A] Solution (without Age)
This function, `diet_minimizer`, uses linear programming to optimize a daily diet based on nutrient constraints for a given sex, athlete type, and training type. It constructs nutrient constraints from recommended dietary allowances (RDA) and upper limits (UL) constructed in the google sheet we created, then minimizes cost while meeting these constraints.

In [13]:
def diet_minimizer(sex, athlete_type, training_type):
    import numpy as np  
    group = f"{sex}_{athlete_type}_{training_type}"
    
    # Create nutrient constraints based on the chosen demographic
    bmin = pd.to_numeric(rda.loc[rda['Constraint Type'].isin(['RDA', 'AI']), group], errors='coerce')
    bmax = pd.to_numeric(rda.loc[rda['Constraint Type'].isin(['UL']), group], errors='coerce')

    # Remove non-finite values
    bmin = bmin[np.isfinite(bmin)]
    bmax = bmax[np.isfinite(bmax)]

    # Filter constraints to only include nutrients available in A_all.
    bmin = bmin[bmin.index.isin(A_all.index)]
    bmax = bmax[bmax.index.isin(A_all.index)]

    # Remove excluded foods from A_all and prices before optimization
    filtered_A_all = A_all.loc[~A_all.index.isin(key_excludable)]
    filtered_prices = prices.loc[~prices.index.isin(key_excludable)]

    # Ensure reindexing aligns with filtered food data
    Amin = filtered_A_all.reindex(bmin.index).dropna(how='all')
    Amax = filtered_A_all.reindex(bmax.index).dropna(how='all')

    # Combine constraints
    b = pd.concat([bmin, -bmax]).dropna()
    A = pd.concat([Amin, -Amax])

    # Convert to NumPy arrays
    b = b.to_numpy().flatten()  
    A = A.to_numpy()
    
    # Prepare cost vector (filtered)
    p = filtered_prices["price"].to_numpy()

    # Tolerance for negligible quantities
    tol = 1e-6

    # Import linear programming solver
    from scipy.optimize import linprog as lp

    # Check that b contains only finite values
    if not np.all(np.isfinite(b)):
        raise ValueError("The constraint vector b contains non-finite values!")

    # Solve the linear programming problem
    result = lp(p, -A, -b, method='highs')
    
    # Check if optimization succeeded
    if not result.success:
        raise ValueError("Optimization failed: " + result.message)

    # Extract optimized diet quantities and total cost
    diet_quantities = pd.Series(result.x, index=filtered_prices.index)
    total_cost = result.fun 
    print(diet_quantities[diet_quantities > 0]*100)

    # Select foods with quantities above tolerance threshold
    selected_foods = diet_quantities[diet_quantities >= tol]

    # Create DataFrame listing foods and their cost per 100g
    df_foods = pd.DataFrame({
        "Food": selected_foods.index,
        "Cost per 100g": [float(filtered_prices.loc[food, 'price']) for food in selected_foods.index]
    })

    print(f"Your daily diet is ${total_cost:.2f}")
    return df_foods

In [14]:
female_strength = diet_minimizer("Female", "Strength", "Intense")
female_strength

Mackerel, canned                                                           78.693954
Peanut butter, lower sodium                                               293.351337
Pasta, gluten free                                                         15.386474
Oatmeal, regular or quick, made with milk, no added fat                   303.288884
Cereal (Post Honey Bunches of Oats Honey Roasted)                           7.859574
Beans and rice, with tomatoes                                             230.504443
Orange juice, 100%, with calcium added, canned, bottled or in a carton    141.077423
Banana, raw                                                               309.366928
Potato, boiled, from fresh, peel eaten, made with margarine               258.570936
dtype: float64
Your daily diet is $4.44


Unnamed: 0,Food,Cost per 100g
0,"Mackerel, canned",0.594039
1,"Peanut butter, lower sodium",0.51391
2,"Pasta, gluten free",0.114248
3,"Oatmeal, regular or quick, made with milk, no ...",0.174142
4,Cereal (Post Honey Bunches of Oats Honey Roasted),0.623858
5,"Beans and rice, with tomatoes",0.178637
6,"Orange juice, 100%, with calcium added, canned...",0.181187
7,"Banana, raw",0.189998
8,"Potato, boiled, from fresh, peel eaten, made w...",0.236255


## [A] Solution (with Age)
This function computes a minimum-cost daily diet by first retrieving personalized DRIs based on an individual's sex, athlete type, training type, and age, and then filters the nutrient composition (A_all) and price data to include only non-excluded foods. It sets up a linear programming problem where the nutrient contributions from the selected foods must meet or exceed the personalized targets, and solves for the combination that minimizes cost. Finally, it prints the optimized food quantities in grams (multiplying the 100‑gram unit by the optimizer’s output for foods with non-negligible quantities), displays the total cost, and returns a DataFrame summarizing the selected foods and their cost per 100 g.

In [15]:
def diet_minimizer_personalized(sex, athlete_type, training_type, age):
    """
    Calculates the minimum-cost daily diet for an individual with personalized nutrient requirements.
    It obtains personalized DRIs using get_personalized_dri, then solves a linear programming problem
    to determine the combination of foods (from A_all and prices) that meets those nutrient targets.
    
    Parameters:
      sex : str
          "Male" or "Female"
      athlete_type : str
          For example, "Endurance", "Strength", or "Bodybuilding"
      training_type : str
          For example, "Normal" or "Intense"
      age : int or float
          Age of the individual
      
    Assumes:
      - get_personalized_dri(sex, athlete_type, training_type, age) is defined.
      - A_all is a DataFrame with nutrients as its index and foods as its columns,
        where values represent nutrient content per unit (e.g., per 100g).
      - prices is a DataFrame with food names as its index and a numeric "price" column.
      - key_excludable is a list of food names to exclude.
      
    Returns:
      pd.DataFrame: A DataFrame listing the selected foods (with non-negligible quantities)
                    and their cost per 100g, and prints the total daily cost.
    """
    import numpy as np
    from scipy.optimize import linprog

    
    # Step 1: Get personalized nutrient targets.
    personalized_dri = get_personalized_dri(sex, athlete_type, training_type, age)
    
    # Step 2: Restrict the nutrient targets to only those nutrients available in A_all.
    nutrients = [nutrient for nutrient in personalized_dri.index if nutrient in A_all.index]
    b = np.array([personalized_dri[nutrient] for nutrient in nutrients], dtype=float)
    
    # Step 3: Get the nutrient composition matrix for these nutrients.
    filtered_A_all = A_all.loc[nutrients]
    
    # Step 4: Exclude foods that are in key_excludable from A_all and prices.
    filtered_A_all = filtered_A_all.loc[:, ~filtered_A_all.columns.isin(key_excludable)]
    filtered_prices = prices.loc[~prices.index.isin(key_excludable)]
    
    # Step 5: Align the food lists.
    food_list = filtered_A_all.columns.intersection(filtered_prices.index)
    filtered_A_all = filtered_A_all[food_list]
    filtered_prices = filtered_prices.loc[food_list]
    
    # Step 6: Convert nutrient matrix to NumPy array.
    A = filtered_A_all.to_numpy(dtype=float)  # Shape: (m, n)
    
    # Step 7: Prepare the cost vector and ensure it is 1-D.
    p = np.array(filtered_prices["price"].values, dtype=float).squeeze()
    if p.ndim != 1:
        p = p.ravel()
    
    # Step 8: Set bounds (quantities >= 0).
    bounds = [(0, None)] * p.shape[0]
    
    # Step 9: Set up the LP constraint: A x >= b  -->  -A x <= -b.
    res = linprog(c=p, A_ub=-A, b_ub=-b, bounds=bounds, method='highs')
    
    if not res.success:
        raise ValueError("Optimization failed: " + res.message)
    
    # Step 10: Retrieve optimized quantities and total cost.
    diet_quantities = pd.Series(res.x, index=filtered_prices.index)
    total_cost = res.fun
    print(diet_quantities[diet_quantities > 0]*100)
    
    # Filter out negligible quantities.
    tol = 1e-6
    selected_foods = diet_quantities[diet_quantities >= tol]
    
    df_foods = pd.DataFrame({
        "Food": selected_foods.index,
        "Cost per 100g": [float(filtered_prices.loc[food, "price"]) for food in selected_foods.index]
    })
    
    print(f"\nYour daily personalized diet is ${total_cost:.2f}")
    return df_foods

In [16]:
result = diet_minimizer_personalized("Male", "Endurance", "Normal", age=25)
result

Mackerel, canned                                                            98.224983
Egg, whole, fried no added fat                                              24.137694
Split peas, from dried, fat added                                           68.562762
Millet, no added fat                                                      2563.407080
Oatmeal, regular or quick, made with milk, no added fat                    276.226907
Orange juice, 100%, with calcium added, canned, bottled or in a carton     231.426696
Potato, boiled, from fresh, peel eaten, made with margarine                160.736224
Greens, NS as to form, cooked                                                4.570805
dtype: float64

Your daily personalized diet is $3.65


Unnamed: 0,Food,Cost per 100g
0,"Mackerel, canned",0.594039
1,"Egg, whole, fried no added fat",0.396952
2,"Split peas, from dried, fat added",0.140336
3,"Millet, no added fat",0.061534
4,"Oatmeal, regular or quick, made with milk, no ...",0.174142
5,"Orange juice, 100%, with calcium added, canned...",0.181187
6,"Potato, boiled, from fresh, peel eaten, made w...",0.236255
7,"Greens, NS as to form, cooked",0.368179


# **Sensitivity Solution**
## [C] Sensitivity of Solution

The sensitivity analysis code in this function shows how the overall daily diet cost increases when supplements are added, reflecting their impact on the optimized diet. The function searches for recipe matches for each custom ingredient, and if none are found, it uses manually provided supplement data (including price per 100 g and serving size) to compute an additional cost. For example, if the original minimum cost diet is $4.44 and the addition of supplements raises the cost to $5.51, this represents an approximate 24% increase in cost, clearly quantifying how much the supplement recommendations contribute to the overall price volatility in an athlete’s diet. This function additionally can be used to factor in other foods that an athlete may feel necessary in their day to day meals.

In [21]:
def add_custom_ingredients(diet_df, custom_food_servings, supplement_data=None):
    """
    Adds the cheapest meal(s) matching each custom food to the diet.
    If no recipe match is found, it checks a supplement dictionary (if provided)
    for manual data.
    
    Parameters
    ----------
    diet_df : pd.DataFrame
        The current daily diet DataFrame (output of diet_minimizer),
        with columns ["Food", "Cost per 100g"].
    custom_food_servings : dict
        A dictionary mapping custom food strings to their desired serving sizes (in grams).
        Example: {"rice cake": 50, "Apple": 80, "creatine": 5, "BCAAs": 10}
    supplement_data : dict, optional
        A dictionary mapping supplement names (in lower case) to their details.
        Example:
            {
                "creatine": {"price": 20.0, "serving": 5},
                "bcaas": {"price": 30.0, "serving": 10}
            }
        If a custom food is not found in recipes, the function will check here.
    
    Returns
    -------
    pd.DataFrame
        Updated daily diet DataFrame with new rows added for each matched custom food.
        Each new row includes "Food", "Cost per 100g", "Serving (g)", and "Cost Contribution".
    """
    import pandas as pd
    
    # If no supplement data provided, use an empty dictionary.
    if supplement_data is None:
        supplement_data = {}
    
    # Make a copy so we don't modify the original diet_df
    updated_diet = diet_df.copy()
    
    # Save the original cost (here we sum cost per 100g, though you might want a more sophisticated cost sum)
    original_cost = updated_diet["Cost per 100g"].sum()
    
    custom_rows = []
    
    for food, serving_size in custom_food_servings.items():
        # Look for a match in the global 'recipes' DataFrame (assumes recipes is defined)
        mask = recipes['recipe'].str.contains(food, case=False, na=False)
        matching_meals = recipes[mask]['parent_foodcode'].unique()
        
        # If no match is found in recipes, check if it's a supplement.
        if len(matching_meals) == 0:
            food_lower = food.lower()
            if food_lower in supplement_data:
                data = supplement_data[food_lower]
                # Use the manually provided cost from supplement_data.
                price = data["price"]
                adjusted_cost = price * (serving_size / 100.0)
                new_row = pd.DataFrame({
                    "Food": [food],
                    "Cost per 100g": [price],
                    "Serving (g)": [serving_size],
                    "Cost Contribution": [adjusted_cost]
                })
                custom_rows.append(new_row)
            else:
                print(f"No match found for '{food}' in recipes and no supplement data provided.")
            continue
        
        cheapest_price = float('inf')
        cheapest_meal_name = None
        
        # Loop through each matching meal to find the cheapest option.
        for meal_code in matching_meals:
            # Skip if meal_code is not present in the aggregated nutrient DataFrame 'df'
            if meal_code not in df.index:
                continue
            meal_name = df.loc[meal_code, 'recipe']
            # Look up the price using the meal name. If multiple rows match, take the first one.
            meal_price_info = prices.loc[meal_name, 'price']
            if isinstance(meal_price_info, pd.Series):
                meal_price_info = meal_price_info.iloc[0]
            meal_cost = float(meal_price_info)
            if meal_cost < cheapest_price:
                cheapest_price = meal_cost
                cheapest_meal_name = meal_name
        
        if cheapest_meal_name is None:
            print(f"No priced meal found for '{food}' among matches.")
            continue
        
        serving_fraction = serving_size / 100.0
        adjusted_cost = serving_fraction * cheapest_price
        
        new_row = pd.DataFrame({
            "Food": [cheapest_meal_name],
            "Cost per 100g": [cheapest_price],
            "Serving (g)": [serving_size],
            "Cost Contribution": [adjusted_cost]
        })
        custom_rows.append(new_row)
    
    if custom_rows:
        custom_df = pd.concat(custom_rows, ignore_index=True)
        # Append the custom rows to the original diet
        updated_diet = pd.concat([updated_diet, custom_df], ignore_index=True)
    else:
        custom_df = pd.DataFrame()
    
    # Calculate total custom cost as the sum of "Cost Contribution"
    custom_cost = custom_df["Cost Contribution"].sum() if not custom_df.empty else 0.0
    
    # Total cost is the sum of the original diet cost and the custom cost.
    total_cost = original_cost + custom_cost
    
    print(f"Updated daily diet total cost is: ${total_cost:.2f}")
    
    # Optionally, remove extra columns before returning:
    updated_diet = updated_diet.drop(columns=['Serving (g)', 'Cost Contribution'], errors='ignore')
    
    return updated_diet

In [22]:
supplement_data = {
    "creatine": {"price": 14.0, "serving": 5},  # $14 per 100g; 5g per serving
    "bcaa": {"price": 10.0, "serving": 2}       # $10 per 100g; 10g per serving
}

# Example custom foods dictionary: keys can be meal names or supplement names.
custom_food_servings = {
    "Creatine": 5,  # This will be matched in supplement_data if no recipe match is found.
    "BCAA": 20,
}

updated_diet_df = add_custom_ingredients(female_strength, custom_food_servings, supplement_data=supplement_data)
updated_diet_df

Updated daily diet total cost is: $5.51


Unnamed: 0,Food,Cost per 100g
0,"Mackerel, canned",0.594039
1,"Peanut butter, lower sodium",0.51391
2,"Pasta, gluten free",0.114248
3,"Oatmeal, regular or quick, made with milk, no ...",0.174142
4,Cereal (Post Honey Bunches of Oats Honey Roasted),0.623858
5,"Beans and rice, with tomatoes",0.178637
6,"Orange juice, 100%, with calcium added, canned...",0.181187
7,"Banana, raw",0.189998
8,"Potato, boiled, from fresh, peel eaten, made w...",0.236255
9,Creatine,14.0


# **Total Cost of Populations of Interest**
## [B] Total Cost for Population of Interest

## [B] Total Cost for Population of Boston Marathon Runners

## [B] Total Cost for Population of US Olympic Lifters

# **Meal Review: Is Your Solution Edible**
## [B] Is Your Solution Edible?

## [B] Meal Reviews