The purpose of this project is to answer the following question: what is the combination of foods with the fewest calories such that all recommended daily values (RDV) for every essential micronutrient is satisfied? 

My inspiration for this project is a desire to create a healthy diet plan that's as easy-to-follow as possible. As I understand it, a diet can be considered "healthy" if it satisfies two conditions: first, it must include an adequate amount of necessary nutrients (vitamins and minerals, as well as macronutrients); and second, it must limit the amount of certain groups of foods (like processed foods, refined carbohydrates, etc.). I, along with much of the developed world, likely don't get enough of the vitamins and minerals I need from the food I eat, and so if I can find combinations of foods that give me adequate nutrition with low-calories, then I can satisfy the rest of my macronutrient profile relatively easily.

This problem is a Linear Programming problem. Linear Programming (LP) problems are problems in which there is an objective function you are trying to minimize/maximize, with a list of constraints that must be satisfied. In this project, the objective is that we are trying to minimize the number of calories in a list of foods, with the constraints being that that list must have at least the RDV for every micronutrient. Inside of pulp_helpers.py, there is a dictionary containing each relevant micronutrient and its RDV.

For this project I am using food nutrition data from the USDA (source at the end of this file). The databases that I will be using are the Foundation Foods, SR Legacy, FNDDS, and All Data Types. The details of what these databases are can be found on the USDA's website, but for the sake of this project all you need to know is that Foundation Foods is the only database with data about all 27 vitamins and minerals that we are interested in. Because of this, most of the analysis will be done by combining the data from the Foundation Foods database with data from the All Data Types Database.

In this project I use the PuLP library, which is used to answer  linear programming problems. I also use pandas and numpy for downloading and manipulating data. I created two custom modules to clean up the code in this file; one module contains the functions that extract and clean the data, and one module contains the functions that creates and solves the LP problem. These modules can be found in this repository.

In [1]:
import pandas as pd
import os
from pulp import LpProblem, LpVariable, LpMinimize, lpSum, LpConstraint, LpStatus
import math
import numpy as np
import pulp_helpers as ph
import get_data_helpers as gdh
import warnings

# Disable all warnings
warnings.filterwarnings("ignore")

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

First, I'll show you what the Foundation Foods table looks like. The get_data_helpers.py file contains functions that extract the data from the USDA .csv files in this repository, then it manipulates the data into a form that's readable by the PuLP functions. If you want to see exactly how this is done, you can see in the get_data_helpers.py file.

In [2]:
df_ff = gdh.get_foundation_foods_table()
df_ff.head()

Unnamed: 0,fdc_id,description,"Ergosta-5,7-dienol (MG)","Ergosta-7,22-dienol (MG)",10-Formyl folic acid (10HCOFA) (UG),25-hydroxycholecalciferol (UG),5-Formyltetrahydrofolic acid (5-HCOH4 (UG),5-methyl tetrahydrofolate (5-MTHF) (UG),Alanine (G),Arginine (G),Ash (G),Aspartic acid (G),Beta-glucan (G),Beta-sitostanol (MG),Beta-sitosterol (MG),Betaine (MG),Biotin (UG),"Boron, B (UG)",Brassicasterol (MG),"Calcium, Ca (MG)",Campestanol (MG),Campesterol (MG),"Carbohydrate, by difference (G)","Carbohydrate, by summation (G)","Carotene, alpha (UG)","Carotene, beta (UG)","Carotene, gamma (UG)",Cholesterol (MG),"Choline, free (MG)","Choline, from glycerophosphocholine (MG)","Choline, from phosphocholine (MG)","Choline, from phosphotidyl choline (MG)","Choline, from sphingomyelin (MG)","Choline, total (MG)",Citric acid (MG),"Cobalt, Co (UG)","Copper, Cu (MG)","Cryptoxanthin, alpha (UG)","Cryptoxanthin, beta (UG)",Cysteine (G),Cystine (G),Daidzein (MG),Daidzin (MG),Delta-5-avenasterol (MG),Delta-7-Stigmastenol (MG),Energy (Atwater General Factors) (KCAL),Energy (Atwater Specific Factors) (KCAL),Energy (KCAL),Energy (kJ),Ergosta-7-enol (MG),Ergosterol (MG),Ergothioneine (MG),"Fatty acids, total monounsaturated (G)","Fatty acids, total polyunsaturated (G)","Fatty acids, total saturated (G)","Fatty acids, total trans (G)","Fatty acids, total trans-dienoic (G)","Fatty acids, total trans-monoenoic (G)","Fatty acids, total trans-polyenoic (G)","Fiber, insoluble (G)","Fiber, soluble (G)","Fiber, total dietary (G)","Folate, total (UG)",Fructose (G),Galactose (G),Genistein (MG),Genistin (MG),Glucose (G),Glutamic acid (G),Glycine (G),Glycitin (MG),High Molecular Weight Dietary Fiber (HMWDF) (G),Histidine (G),Hydroxyproline (G),"Iodine, I (UG)","Iron, Fe (MG)",Isoleucine (G),Lactose (G),Leucine (G),Low Molecular Weight Dietary Fiber (LMWDF) (G),Lutein (UG),Lutein + zeaxanthin (UG),Lycopene (UG),Lysine (G),MUFA 12:1 (G),MUFA 14:1 c (G),MUFA 15:1 (G),MUFA 16:1 c (G),MUFA 17:1 (G),MUFA 17:1 c (G),MUFA 18:1 (G),MUFA 18:1 c (G),MUFA 20:1 (G),MUFA 20:1 c (G),MUFA 22:1 (G),MUFA 22:1 c (G),MUFA 22:1 n-11 (G),MUFA 22:1 n-9 (G),MUFA 24:1 c (G),"Magnesium, Mg (MG)",Malic acid (MG),Maltose (G),"Manganese, Mn (MG)",Methionine (G),"Molybdenum, Mo (UG)",Niacin (MG),"Nickel, Ni (UG)",Nitrogen (G),Oxalic acid (MG),PUFA 18:2 (G),PUFA 18:2 CLAs (G),PUFA 18:2 c (G),"PUFA 18:2 n-6 c,c (G)",PUFA 18:3 (G),PUFA 18:3 c (G),"PUFA 18:3 n-3 c,c,c (ALA) (G)","PUFA 18:3 n-6 c,c,c (G)",PUFA 18:3i (G),PUFA 18:4 (G),PUFA 20:2 c (G),"PUFA 20:2 n-6 c,c (G)",PUFA 20:3 (G),PUFA 20:3 c (G),PUFA 20:3 n-3 (G),PUFA 20:3 n-9 (G),PUFA 20:4 (G),PUFA 20:4 n-6 (G),PUFA 20:4c (G),PUFA 20:5 n-3 (EPA) (G),PUFA 20:5c (G),PUFA 22:2 (G),PUFA 22:3 (G),PUFA 22:4 (G),PUFA 22:5 c (G),PUFA 22:5 n-3 (DPA) (G),PUFA 22:6 c (G),PUFA 22:6 n-3 (DHA) (G),Pantothenic acid (MG),Phenylalanine (G),"Phosphorus, P (MG)",Phytoene (UG),Phytofluene (UG),"Phytosterols, other (MG)","Potassium, K (MG)",Proline (G),Protein (G),Pyruvic acid (MG),Quinic acid (MG),Raffinose (G),Retinol (UG),Riboflavin (MG),SFA 10:0 (G),SFA 11:0 (G),SFA 12:0 (G),SFA 14:0 (G),SFA 15:0 (G),SFA 16:0 (G),SFA 17:0 (G),SFA 18:0 (G),SFA 20:0 (G),SFA 21:0 (G),SFA 22:0 (G),SFA 23:0 (G),SFA 24:0 (G),SFA 4:0 (G),SFA 5:0 (G),SFA 6:0 (G),SFA 7:0 (G),SFA 8:0 (G),SFA 9:0 (G),"Selenium, Se (UG)",Serine (G),"Sodium, Na (MG)",Specific Gravity (SP_GR),Stachyose (G),Starch (G),Stigmastadiene (MG),Stigmasterol (MG),Sucrose (G),"Sugars, Total (G)","Sugars, total including NLEA (G)","Sulfur, S (MG)",TFA 14:1 t (G),TFA 16:1 t (G),TFA 18:1 t (G),TFA 18:2 t (G),TFA 18:2 t not further defined (G),TFA 18:3 t (G),TFA 20:1 t (G),TFA 22:1 t (G),Thiamin (MG),Threonine (G),"Tocopherol, beta (MG)","Tocopherol, delta (MG)","Tocopherol, gamma (MG)","Tocotrienol, alpha (MG)","Tocotrienol, beta (MG)","Tocotrienol, delta (MG)","Tocotrienol, gamma (MG)",Total dietary fiber (AOAC 2011.25) (G),Total fat (NLEA) (G),Total lipid (fat) (G),Tryptophan (G),Tyrosine (G),Valine (G),Verbascose (G),"Vitamin A, RAE (UG)",Vitamin B-12 (UG),Vitamin B-6 (MG),"Vitamin C, total ascorbic acid (MG)",Vitamin D (D2 + D3) (UG),"Vitamin D (D2 + D3), International Units (IU)",Vitamin D2 (ergocalciferol) (UG),Vitamin D3 (cholecalciferol) (UG),Vitamin D4 (UG),Vitamin E (alpha-tocopherol) (MG),Vitamin K (Dihydrophylloquinone) (UG),Vitamin K (Menaquinone-4) (UG),Vitamin K (phylloquinone) (UG),Water (G),Zeaxanthin (UG),"Zinc, Zn (MG)",cis-Lutein/Zeaxanthin (UG),cis-Lycopene (UG),cis-beta-Carotene (UG),trans-Lycopene (UG),trans-beta-Carotene (UG),mass (G)
0,321358,"Hummus, commercial",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.97,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,41.0,0.0,0.0,14.9,13.9,0.0,12.0,0.0,0.0,22.3,1.1,23.0,0.2,0.0,46.6,0.0,0.0,0.348,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,243.0,229.0,229.0,960.0,0.0,0.0,0.0,6.37,7.48,2.22,0.018,0.012,0.006,0.0,0.0,0.0,5.4,36.0,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.41,0.0,0.0,0.0,0.0,0.0,258.0,0.0,0.0,0.0,0.0,0.0,0.021,0.007,0.007,0.0,6.25,0.0,0.084,0.0,0.001,0.0,0.001,0.005,71.1,0.0,0.0,1.06,0.0,0.0,0.948,0.0,1.18,0.0,0.0,0.002,6.81,6.81,0.0,0.656,0.637,0.02,0.0,0.0,0.005,0.005,0.0,0.0,0.0,0.0,0.005,0.0,0.005,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.318,0.0,166.0,0.0,0.0,0.0,289.0,0.0,7.35,0.0,0.0,0.0,0.0,0.115,0.0,0.0,0.0,0.009,0.004,1.41,0.01,0.634,0.079,0.0,0.044,0.0,0.027,0.0,0.0,0.0,0.0,0.0,0.0,16.2,0.0,438.0,0.0,0.0,8.12,0.0,0.0,0.18,0.34,0.0,0.0,0.0,0.0,0.006,0.0,0.012,0.0,0.0,0.0,0.15,0.0,0.31,1.3,9.47,0.0,0.0,0.0,0.0,0.0,16.1,17.1,0.0,0.0,0.0,0.0,1.0,0.0,0.143,0.0,0.0,0.0,0.0,0.0,0.0,1.74,0.0,0.0,17.2,58.7,0.0,1.38,0.0,0.0,0.0,0.0,0.0,100
1,321360,"Tomatoes, grape, raw",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.56,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0,0.0,5.51,0.0,0.0,0.0,0.0,0.0,8.0,0.0,0.6,1.2,0.0,9.8,0.0,0.0,0.058,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,31.0,27.0,27.0,113.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.1,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.33,0.0,0.0,0.0,0.0,95.0,0.0,4100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11.9,0.0,0.0,0.121,0.0,0.0,0.805,0.0,0.13,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,28.0,0.0,0.0,0.0,260.0,0.0,0.83,0.0,0.0,0.0,0.0,0.065,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075,0.0,0.02,0.12,0.7,0.0,0.0,0.0,0.0,0.0,0.0,0.63,0.0,0.0,0.0,0.0,0.0,0.0,0.06,27.2,0.0,0.0,0.0,0.0,0.0,0.98,0.0,0.0,4.2,92.5,9.0,0.2,12.0,554.0,49.0,0.0,393.0,100
2,321611,"Beans, snap, green, canned, regular pack, drai...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.89,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,36.0,0.0,0.0,4.11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,24.0,20.0,21.0,86.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.64,0.0,0.0,0.0,0.65,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.78,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12.7,0.0,0.0,0.176,0.0,0.0,0.0,0.0,0.17,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,23.0,0.0,0.0,0.0,97.0,0.0,1.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,282.0,0.0,0.0,0.0,0.0,0.0,0.0,1.29,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.39,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,93.6,0.0,0.19,0.0,0.0,0.0,0.0,0.0,100
3,323121,"Frankfurter, beef, unheated",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.74,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,15.0,0.0,0.0,2.89,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,310.0,314.0,314.0,1310.0,0.0,0.0,0.0,12.1,0.954,11.4,1.59,0.131,1.46,0.001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.17,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.267,0.0,0.985,0.257,0.257,0.0,10.4,0.0,0.117,0.0,0.064,0.0,0.064,0.005,11.5,0.0,0.09,0.031,0.0,0.0,2.25,0.0,1.87,0.0,0.0,0.169,0.794,0.625,0.0,0.084,0.078,0.005,0.0,0.003,0.008,0.008,0.0,0.022,0.001,0.0,0.029,0.021,0.029,0.003,0.003,0.001,0.0,0.008,0.013,0.013,0.0,0.0,0.263,0.0,128.0,0.0,0.0,0.0,343.0,0.0,11.7,0.0,0.0,0.0,3.0,0.154,0.015,0.0,0.019,0.84,0.138,6.33,0.355,3.66,0.029,0.0,0.008,0.0,0.002,0.004,0.0,0.0,0.0,0.005,0.0,0.0,0.0,872.0,0.0,0.0,0.0,0.0,0.0,0.0,1.26,0.0,0.0,0.0,0.087,1.38,0.0,0.131,0.001,0.0,0.0,0.033,0.0,0.0,0.0,0.17,0.0,0.0,0.0,0.0,0.0,26.0,28.0,0.0,0.0,0.0,0.0,3.0,0.97,0.13,0.0,0.0,0.0,0.0,0.0,0.0,0.51,0.0,0.0,0.0,54.6,0.0,2.06,0.0,0.0,0.0,0.0,0.0,100
4,323294,"Nuts, almonds, dry roasted, with salt added",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.47,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,273.0,0.0,0.0,16.2,0.0,0.0,17.0,0.0,0.0,4.3,0.4,56.1,0.0,0.0,60.8,0.0,0.0,0.87,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,667.0,621.0,620.0,2590.0,0.0,0.0,0.0,34.2,14.5,4.56,0.032,0.016,0.016,0.0,0.0,0.0,11.0,35.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.17,0.0,0.0,0.0,0.0,0.0,25.0,0.0,0.0,0.0,0.0,0.0,0.259,0.061,0.061,0.0,33.8,0.0,0.076,0.0,0.001,0.0,0.001,0.001,258.0,0.0,0.0,2.02,0.0,0.0,3.1,0.0,3.94,0.0,0.0,0.006,14.5,14.5,0.0,0.052,0.05,0.002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005,0.0,0.005,0.002,0.002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.237,0.0,456.0,0.0,0.0,0.0,684.0,0.0,20.4,0.0,0.0,0.0,0.0,1.57,0.0,0.0,0.0,0.027,0.008,3.54,0.028,0.828,0.059,0.0,0.042,0.0,0.024,0.001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,256.0,0.0,0.0,0.0,0.0,0.0,4.17,4.17,0.0,0.0,0.0,0.0,0.016,0.0,0.016,0.0,0.0,0.0,0.079,0.0,0.18,0.0,0.92,0.28,0.0,0.0,0.0,0.0,53.4,57.8,0.0,0.0,0.0,0.0,2.0,0.0,0.075,0.0,0.0,0.0,0.0,0.0,0.0,19.0,0.0,0.0,0.0,2.2,0.0,2.8,0.0,0.0,0.0,0.0,0.0,100


As you can see above, each row contains information about one specific food, and the columns contain data about how much of each nutrient is in that row's food. The majority of the nutrient columns aren't used by the PuLP functions, so we can ignore all of them except the relevant micronutrients as well as the calories column.

NOTE: many of the foods don't have information about some specific nutrients, and unless it has missing calorie info, for these instances I am treating that as if there is none of that nutrient in the food. This is likely untrue, but for the sake of this project it doesn't matter. If there is missing calorie information for a food, then we calculate it explicitly.

Now that we have the Foundation Foods table, we can input that table into our PuLP helper function to solve the problem for us:

In [3]:
ff_ans = ph.default_problem(df_ff, [])
print(ff_ans)

-Beef, round, top round roast, boneless, separable lean only, trimmed to 0" fat, select, raw: 52.42 g
-Lettuce, cos or romaine, raw: 525.88 g
-Salt, table, iodized: 1.51 g
-Milk, nonfat, fluid, with added vitamin A and vitamin D (fat free or skim): 579.73 g
-Beans, Dry, Red (0% moisture): 12.59 g
-Broccoli, raw: 294.13 g
-Eggs, Grade A, Large, egg whole: 122.65 g
-Almond milk, unsweetened, plain, shelf stable: 412.65 g
-Spinach, baby: 122.85 g
-Mushroom, maitake: 111.28 g
Total Calories: 777.7
Total Mass: 2236 g (4.9 lbs)


As we can see, using only the Foundation Foods table, the optimal list of foods is 52.4 g of lean beef, 525.9 g of romaine lettuce, etc., and this list gives you your necessary macronutrient profile at 777.7 calories. This list and the rest of the lists are repeated in the results.txt file.

You may notice that this list of foods weighs 4.9 lbs. We may want to come up with another list that gives the same micronutrient profile, but weighing less. The following request adds the constraing that the mass must be less than 1.5 kg

In [5]:
print(ph.default_problem(df_ff, [('mass (G)', -1, 1500)]))

-Kale, raw: 216.74 g
-Seeds, sunflower seed kernels, dry roasted, with salt added: 40.87 g
-Beef, round, top round roast, boneless, separable lean only, trimmed to 0" fat, select, raw: 36.44 g
-Salt, table, iodized: 1.49 g
-Beans, Dry, Brown (0% moisture): 13.67 g
-Broccoli, raw: 609.83 g
-Eggs, Grade A, Large, egg whole: 150.86 g
-Mushroom, lion's mane: 41.9 g
-Almond milk, unsweetened, plain, shelf stable: 137.45 g
-Mushroom, maitake: 52.93 g
-Soy milk, unsweetened, plain, refrigerated: 197.81 g
Total Calories: 952.7
Total Mass: 1500 g (3.3 lbs)


This list of foods weighs 3.3 lbs, and is 952.7 calories. Before we move on, I want to remove processed foods from this list. If I want my diet to be healthy, I want to remove as many processed foods as I can. In order to get an answer with no ultraprocessed foods (NOVA group 4 foods), I need to remove Almond milk and Soy milk.

In [6]:
remove_foods = [
    'Almond milk, unsweetened, plain, shelf stable',
    'Soy milk, unsweetened, plain, refrigerated'
]

temp_df = df_ff[~df_ff['description'].isin(remove_foods)].reset_index(drop=True)

string = ph.default_problem(temp_df, [])
print(string)

-Tomatoes, grape, raw: 31.23 g
-Kale, frozen, cooked, boiled, drained, without salt: 906.49 g
-Beef, round, top round roast, boneless, separable lean only, trimmed to 0" fat, select, raw: 38.11 g
-Salt, table, iodized: 2.17 g
-Milk, nonfat, fluid, with added vitamin A and vitamin D (fat free or skim): 855.15 g
-Beans, Dry, Medium Red (0% moisture): 14.46 g
-Eggs, Grade A, Large, egg whole: 109.52 g
-Oil, sunflower: 0.03 g
-Mushroom, maitake: 181.21 g
Total Calories: 929.9
Total Mass: 2138 g (4.7 lbs)


Next we will look at the All Data Types database. NOTE: in the get_data_helpers.py function get_all_databases_table(), it only includes foods that are in a select group of categories (namely fruits, vegetables, meats, and other clearly unprocessed/low-processed foods).

In [7]:
df_ad = gdh.get_all_databases_table()


In [None]:
ad_ans = ph.default_problem(df_ad, [])
print(ad_ans)

As you can see, if we only use the "all databases" table, then we don't have adequate information about 6 of the nutrients. So we'll concatenate the foundation foods database onto this one:

In [None]:
mask = df_ad['description'].isin(df_ff['description'])
df = pd.concat([df_ff, df_ad[~mask]], ignore_index=True, sort=False).fillna(0)

In [None]:
string = ph.default_problem(df, [])
print(string)

In [None]:
string = ph.default_problem(df, [('mass (G)', -1, 1500), ('Sodium, Na (MG)', -1, 5000)])
print(string)

In [None]:
df_thing = nutrition_of_list(df, string)

#display(temp_df[temp_df['description'].isin(df_thing['description'])])
print(gdh.find_PDV(df_thing))


In [None]:
remove_foods = [
    'Almond milk, unsweetened, plain, shelf stable',
    'Soy milk, unsweetened, plain, refrigerated',
    'Kidney',
    'Scallops, steamed or boiled',
    'Scallops, fried',
    'Rice milk',
    'Coconut milk'
]

tdf = df[~df['description'].isin(remove_foods)].reset_index(drop=True)

ans = ph.default_problem(tdf, [])
print(ans)

In [None]:
df_thing = gdh.nutrition_of_list(df, ans)
#display(df_thing)
#display(tdf[tdf['description'].isin(df_thing['description'])])
print(gdh.find_PDV(df_thing))


In [None]:
def filter_dataframe(df, tuples):
    # Extract the food names from the tuples
    foods = [t[0] for t in tuples]
    
    
    # Filter the dataframe based on the "description" column
    new_df = df[df['description'].isin(foods)]
    
    
    return new_df

def nutrition_of_list(df, input_string):

    tuples = gdh.extract_tuples(input_string)
    
    new_df = filter_dataframe(df, tuples)  # Create a copy of the original dataframe
    
    
    keep_cols = ['Energy (KCAL)', 'mass (G)', 'description']
    
    for food, factor in tuples:
        # Multiply the numeric columns by the factor for the matching food
        mask = new_df['description'] == food
        numeric_cols = new_df.select_dtypes(include='number').columns
        new_df.loc[mask, numeric_cols] *= (factor / 100)
        
        
    columns_list = new_df.columns.tolist()
    
    for col in columns_list:
        if (col not in keep_cols) and (col not in gdh.nutrients_dict.keys()):
            gdh.drop_cols(new_df, col)
        
    
    return new_df

In [None]:
string = ph.default_problem(df1, [('mass (G)', -1, 1500)])

lst = gdh.extract_tuples(string)
for (food, number) in lst:
    print(f'{food}: {number} g')

In [None]:
df_thing = gdh.nutrition_of_list(df1, string)
display(df_thing)

In [None]:
print(gdh.find_PDV(df_thing))

In [None]:
display(df_thing)

In [None]:
df = gdh.get_foundation_foods_table()

string = ph.default_problem(df, [])

df2 = gdh.nutrition_of_list(df1, string)

string2 = gdh.find_PDV(df2)

print(string)

In [None]:
print(gdh.find_PDV(df2))

In [None]:
df1 = gdh.get_all_databases_table()
df = gdh.get_foundation_foods_table()
df = pd.concat([df, df1], ignore_index=True, sort=False).fillna(0)




In [None]:
ans = ph.default_problem(df, [('mass (G)', -1, 1500)])

df_thing = gdh.nutrition_of_list(df, ans)

#display(df_thing)
print(gdh.find_PDV(df_thing))

In [None]:
print(ans)

Using this source for what nutrients are necessary:
https://www.hsph.harvard.edu/nutritionsource/vitamins/

and this source for all data:
https://fdc.nal.usda.gov/download-datasets.html

There are very few foods in the USDA database that have a value for chromium listed; I've taken it out of my consideration, since the NIH says that "Chromium deficiency has not been reported in healthy populations".
https://ods.od.nih.gov/factsheets/Chromium-HealthProfessional/#:~:text=Many%20whole%20grains%2C%20fruits%2C%20and,poultry%2C%20and%20eggs%20contain%20chromium.