# **Project 2 Code and Visualizations**

 The following code provides the workflow, functions, analysis, and insights into Project 2 for group Justus von Liebig

Project Members: Allison Nguyen, Emily Wu, Wendy Peng, Emma Azhan, Magaly Santos, Noah Mujica

# Table of Contents

- **Data Setup**

- **Deliverable [A] - Population of Interest**

- **Deliverable [A] - Dietary Reference Intakes**

- **Deliverable [A] - Food Prices**

- **Deliverable [A] - Nutritional Content**

- **Deliverable [A] - Solution**

- **Deliverable [B] - Solution Sensitivity**

- **Deliverable [B] - Solution Total Cost**

- **Unit Tests**

## Data Setup

In [7]:
%pip install eep153_tools
%pip install python_gnupg
%pip install -U gspread_pandas

Collecting eep153_tools
  Using cached eep153_tools-0.12.4-py2.py3-none-any.whl.metadata (363 bytes)
Using cached eep153_tools-0.12.4-py2.py3-none-any.whl (4.9 kB)
Installing collected packages: eep153_tools
Successfully installed eep153_tools-0.12.4
Note: you may need to restart the kernel to use updated packages.
Collecting python_gnupg
  Using cached python_gnupg-0.5.4-py2.py3-none-any.whl.metadata (2.0 kB)
Using cached python_gnupg-0.5.4-py2.py3-none-any.whl (21 kB)
Installing collected packages: python_gnupg
Successfully installed python_gnupg-0.5.4
Note: you may need to restart the kernel to use updated packages.
Collecting gspread_pandas
  Using cached gspread_pandas-3.3.0-py2.py3-none-any.whl.metadata (10 kB)
Using cached gspread_pandas-3.3.0-py2.py3-none-any.whl (27 kB)
Installing collected packages: gspread_pandas
  Attempting uninstall: gspread_pandas
    Found existing installation: gspread-pandas 2.2.3
    Uninstalling gspread-pandas-2.2.3:
      Successfully uninstalled g

In [8]:
import numpy as np
import pandas as pd
from eep153_tools.sheets import read_sheets

In [14]:
def format_id(id,zeropadding=0):
    """Nice string format for any id, string or numeric.

    Optional zeropadding parameter takes an integer
    formats as {id:0z} where
    """
    if pd.isnull(id) or id in ['','.']: return None

    try:  # If numeric, return as string int
        return ('%d' % id).zfill(zeropadding)
    except TypeError:  # Not numeric
        return id.split('.')[0].strip().zfill(zeropadding)
    except ValueError:
        return None

data_url = "https://docs.google.com/spreadsheets/d/1qCxS3mh29miTIFQJ9IDs4cKUjgepZU37SbJO9v0_fOE/edit?usp=sharing"

In [15]:
recipes = read_sheets(data_url, sheet="recipes")
recipes = (recipes
           .assign(parent_foodcode = lambda df: df["parent_foodcode"].apply(format_id),
                   ingred_code = lambda df: df["ingred_code"].apply(format_id))
           .rename(columns={"parent_desc": "recipe"}))

nutrition = (read_sheets(data_url, sheet="nutrients")
             .assign(ingred_code = lambda df: df["ingred_code"].apply(format_id)))


In [16]:
# lets see an example of a recipe.
recipes[recipes["recipe"].str.contains("Pho", case=False)].head()

Unnamed: 0,parent_foodcode,recipe,ingred_code,ingred_desc,ingred_wt
18779,28310330,Pho,2010,"Spices, cinnamon, ground",2.6
18780,28310330,Pho,2030,"Spices, pepper, black",0.533
18781,28310330,Pho,2044,"Basil, fresh",2.0
18782,28310330,Pho,4322,"Vegetable oil, averaged",14.0
18783,28310330,Pho,6008,"Soup, beef broth or bouillon canned, ready-to-...",1200.0


In [17]:
display(nutrition.head())

Unnamed: 0,ingred_code,Ingredient description,Capric acid,Lauric acid,Myristic acid,Palmitic acid,Palmitoleic acid,Stearic acid,Oleic acid,Linoleic Acid,...,Vitamin B12,"Vitamin B-12, added",Vitamin B6,Vitamin C,Vitamin D,Vitamin E,"Vitamin E, added",Vitamin K,Water,Zinc
0,1001,"Butter, salted",2.529,2.587,7.436,21.697,0.961,9.999,19.961,2.728,...,0.17,0.0,0.003,0.0,0.0,2.32,0.0,7.0,15.87,0.09
1,1002,"Butter, whipped, with salt",2.039,2.354,7.515,20.531,1.417,7.649,17.37,2.713,...,0.07,0.0,0.008,0.0,0.0,1.37,0.0,4.6,16.72,0.05
2,1003,"Butter oil, anhydrous",2.495,2.793,10.005,26.166,2.228,12.056,25.026,2.247,...,0.01,0.0,0.001,0.0,0.0,2.8,0.0,8.6,0.24,0.01
3,1004,"Cheese, blue",0.601,0.491,3.301,9.153,0.816,3.235,6.622,0.536,...,1.22,0.0,0.166,0.0,0.5,0.25,0.0,2.4,42.41,2.66
4,1005,"Cheese, brick",0.585,0.482,3.227,8.655,0.817,3.455,7.401,0.491,...,1.26,0.0,0.065,0.0,0.5,0.26,0.0,2.5,41.11,2.6


**Deliverable [A] - Dietary Reference Intakes**

Write a function that takes as arguments the characteristics of a person (e.g., age, sex) and returns a `pandas.Series' of Dietary Reference Intakes (DRI's) or "Recommended Daily Allowances" (RDA) of a variety of nutrients appropriate for your population of interest.

In [39]:
rda = read_sheets(data_url, sheet="rda")
rda = rda.set_index("Nutrient")
#rda.columns, rda.head()

In [40]:
def diet_ref(sex, cancer_group='control', age_group="51U"):
    
    col_name = f"{sex}_{age_group}_{cancer_group}"

    if col_name not in rda.columns:
        raise ValueError(f"Column '{col_name}' not found in the dataset.")

    return rda[col_name]
        

In [42]:
diet_ref("Female", cancer_group="control").head()

Nutrient
Energy           1600.0
Protein            46.0
Carbohydrate      130.0
Dietary Fiber      22.4
Linoleic Acid      11.0
Name: Female_51U_control, dtype: float64