# Team Shallot ML4VA Project: Personalized Meal Planning on a Budget

*In today’s fast-paced world, many people struggle to maintain balanced, healthy diets that fit their preferences, budgets, and dietary needs. This is especially true for UVA students and Charlottesville residents, who often face constraints such as limited time and budgets. Our goal is to automate meal planning based on user-specific inputs, making it easier for users to create economical and nutritious meal plans.*

Our project addresses the challenges faced by UVA students and Charlottesville residents in maintaining a healthy, budget-conscious diet. We are developing a personalized meal-planning application that recommends meal plans based on dietary preferences, budget, ingredient availability, and location. The application aims to promote healthier and more affordable eating habits while being accessible to low-income communities.

## 1. Data Preprocessing

### Nutritional Information
https://fdc.nal.usda.gov/download-datasets


| **Feature**          | **Description**                             | **Example**             |
|-----------------------|---------------------------------------------|-------------------------|
| `fdc_id`             | Unique food identifier                      | 320020                 |
| `description`        | Food name                                   | "Hummus"               |
| `protein`            | Protein content per serving (g)             | 3.47                   |
| `fat`                | Fat content per serving (g)                 | 8.37                   |
| `carbohydrates`      | Carbohydrate content per serving (g)        | 4.07                   |
| `calories`           | Total calories per serving (KCAL)           | 56                     |
| `food_category`      | Category of food                            | "Legumes and Products" |
| `serving_size`       | Serving size in grams                       | 35.8                   |
| `serving_unit`       | Serving measurement unit                    | "tablespoon"           |
| `price_per_serving`  | Cost per serving (in currency)               | 1.50                   |
| `available_in_location` | Availability by location (region/flag)    | "USA, EU"              |


In [6]:
import pandas as pd

# Define file paths
data_folder = "../DATA/FoodData_Central_foundation_food_csv_2024-10-31/FoodData_Central_foundation_food_csv_2024-10-31/"
output_file = data_folder + "nutritional_information.csv"

# Load relevant datasets
food = pd.read_csv(data_folder + "food.csv")
food_nutrient = pd.read_csv(data_folder + "food_nutrient.csv")
nutrient = pd.read_csv(data_folder + "nutrient.csv")
food_category = pd.read_csv(data_folder + "food_category.csv")
food_portion = pd.read_csv(data_folder + "food_portion.csv")

# Select required features
# From food.csv
food = food[['fdc_id', 'description', 'food_category_id']]

# From food_nutrient.csv
food_nutrient = food_nutrient[['fdc_id', 'nutrient_id', 'amount']]

# From nutrient.csv
nutrient = nutrient[['id', 'name', 'unit_name']]
nutrient.columns = ['nutrient_id', 'nutrient_name', 'unit_name']

# From food_category.csv
food_category = food_category[['id', 'description']]
food_category.columns = ['food_category_id', 'food_category']

# From food_portion.csv
food_portion = food_portion[['fdc_id', 'gram_weight']]

# Merge datasets
# 1. Add nutrient names and units to food_nutrient
food_nutrient = food_nutrient.merge(nutrient, on='nutrient_id', how='inner')

# 2. Pivot food_nutrient to have one row per food with macronutrient columns
macronutrients = food_nutrient.pivot_table(index='fdc_id', 
                                           columns='nutrient_name', 
                                           values='amount', 
                                           aggfunc='first').reset_index()

# 3. Merge macronutrients with food
nutritional_data = food.merge(macronutrients, on='fdc_id', how='inner')

# 4. Merge with food_category
nutritional_data = nutritional_data.merge(food_category, on='food_category_id', how='inner')

# 5. Add serving size information
nutritional_data = nutritional_data.merge(food_portion, on='fdc_id', how='left')
nutritional_data.rename(columns={'gram_weight': 'serving_size'}, inplace=True)

# Rename columns for clarity
nutritional_data = nutritional_data.rename(columns={
    'description': 'food_name',
    'Protein': 'protein',
    'Total lipid (fat)': 'fat',
    'Carbohydrate, by difference': 'carbohydrates',
    'Energy': 'calories'
})

# Keep only relevant features
final_columns = ['fdc_id', 'food_name', 'protein', 'fat', 'carbohydrates', 'calories', 
                 'food_category', 'serving_size']
nutritional_data = nutritional_data[final_columns]

# Save to CSV
nutritional_data.to_csv(output_file, index=False)

print(f"nutritional_information.csv saved in {data_folder}")


  food_nutrient = pd.read_csv(data_folder + "food_nutrient.csv")


nutritional_information.csv saved in ../DATA/FoodData_Central_foundation_food_csv_2024-10-31/FoodData_Central_foundation_food_csv_2024-10-31/


In [1]:
import pandas as pd 

chunk = pd.read_csv('../DATA/en.openfoodfacts.org.products.csv', nrows=5, low_memory=False)
print(chunk.columns)


Index(['code\turl\tcreator\tcreated_t\tcreated_datetime\tlast_modified_t\tlast_modified_datetime\tlast_modified_by\tlast_updated_t\tlast_updated_datetime\tproduct_name\tabbreviated_product_name\tgeneric_name\tquantity\tpackaging\tpackaging_tags\tpackaging_en\tpackaging_text\tbrands\tbrands_tags\tcategories\tcategories_tags\tcategories_en\torigins\torigins_tags\torigins_en\tmanufacturing_places\tmanufacturing_places_tags\tlabels\tlabels_tags\tlabels_en\temb_codes\temb_codes_tags\tfirst_packaging_code_geo\tcities\tcities_tags\tpurchase_places\tstores\tcountries\tcountries_tags\tcountries_en\tingredients_text\tingredients_tags\tingredients_analysis_tags\tallergens\tallergens_en\ttraces\ttraces_tags\ttraces_en\tserving_size\tserving_quantity\tno_nutrition_data\tadditives_n\tadditives\tadditives_tags\tadditives_en\tnutriscore_score\tnutriscore_grade\tnova_group\tpnns_groups_1\tpnns_groups_2\tfood_groups\tfood_groups_tags\tfood_groups_en\tstates\tstates_tags\tstates_en\tbrand_owner\tecoscore

In [2]:
def clean_columns(columns):
    return columns.str.strip().str.lower().str.replace('-', '_')

chunk = pd.read_csv('../DATA/en.openfoodfacts.org.products.csv', nrows=5, low_memory=False)
print(clean_columns(chunk.columns))


Index(['code\turl\tcreator\tcreated_t\tcreated_datetime\tlast_modified_t\tlast_modified_datetime\tlast_modified_by\tlast_updated_t\tlast_updated_datetime\tproduct_name\tabbreviated_product_name\tgeneric_name\tquantity\tpackaging\tpackaging_tags\tpackaging_en\tpackaging_text\tbrands\tbrands_tags\tcategories\tcategories_tags\tcategories_en\torigins\torigins_tags\torigins_en\tmanufacturing_places\tmanufacturing_places_tags\tlabels\tlabels_tags\tlabels_en\temb_codes\temb_codes_tags\tfirst_packaging_code_geo\tcities\tcities_tags\tpurchase_places\tstores\tcountries\tcountries_tags\tcountries_en\tingredients_text\tingredients_tags\tingredients_analysis_tags\tallergens\tallergens_en\ttraces\ttraces_tags\ttraces_en\tserving_size\tserving_quantity\tno_nutrition_data\tadditives_n\tadditives\tadditives_tags\tadditives_en\tnutriscore_score\tnutriscore_grade\tnova_group\tpnns_groups_1\tpnns_groups_2\tfood_groups\tfood_groups_tags\tfood_groups_en\tstates\tstates_tags\tstates_en\tbrand_owner\tecoscore

In [5]:
for chunk in pd.read_csv('../DATA/en.openfoodfacts.org.products.csv', chunksize=1000, sep='\t', on_bad_lines="skip", low_memory=False):
    print(chunk.columns)
    break


Index(['code', 'url', 'creator', 'created_t', 'created_datetime',
       'last_modified_t', 'last_modified_datetime', 'last_modified_by',
       'last_updated_t', 'last_updated_datetime',
       ...
       'glycemic-index_100g', 'water-hardness_100g', 'choline_100g',
       'phylloquinone_100g', 'beta-glucan_100g', 'inositol_100g',
       'carnitine_100g', 'sulphate_100g', 'nitrate_100g', 'acidity_100g'],
      dtype='object', length=206)


In [9]:
import pandas as pd

# List of important features to retain
important_features = [
    "product_name",
    "abbreviated_product_name",
    "generic_name",
    "categories",
    "labels",
    "brands",
    "energy-kcal_100g",
    "fat_100g",
    "saturated-fat_100g",
    "carbohydrates_100g",
    "sugars_100g",
    "fiber_100g",
    "proteins_100g",
    "salt_100g",
    "nutriscore_grade",
    "nova_group"
]

# File paths
input_file = "../DATA/en.openfoodfacts.org.products.csv"
output_file = "../DATA/nutritional_information.csv"

# Initialize an empty DataFrame to store processed chunks
filtered_data = pd.DataFrame()

# Read the dataset in chunks with tab-delimited format
chunk_size = 100000
i = 1
for chunk in pd.read_csv(input_file, chunksize=chunk_size, sep='\t', on_bad_lines="skip", low_memory=False):
    print("starting chunk ", i)
    # Filter the chunk for the important features
    filtered_chunk = chunk[important_features].dropna(how="all")  # Remove rows with all NaN values
    # Append the filtered chunk to the output DataFrame
    filtered_data = pd.concat([filtered_data, filtered_chunk], ignore_index=True)
    i = i + 1

# Save the filtered dataset
filtered_data.to_csv(output_file, index=False)

# Display the first few rows of the filtered DataFrame
print(filtered_data.head())


starting chunk  1
starting chunk  2
starting chunk  3
starting chunk  4
starting chunk  5
starting chunk  6
starting chunk  7
starting chunk  8
starting chunk  9
starting chunk  10
starting chunk  11
starting chunk  12
starting chunk  13
starting chunk  14
starting chunk  15
starting chunk  16
starting chunk  17
starting chunk  18
starting chunk  19
starting chunk  20
starting chunk  21
starting chunk  22
starting chunk  23
starting chunk  24
starting chunk  25
starting chunk  26
starting chunk  27
starting chunk  28
starting chunk  29
starting chunk  30
starting chunk  31
starting chunk  32
starting chunk  33
starting chunk  34
starting chunk  35
starting chunk  36
                        product_name abbreviated_product_name generic_name  \
0  Purée Mix Tropical Harmony + Aloe                      NaN          NaN   
1  Matcha organic Japanese green tea                      NaN          NaN   
2           Slim Jim snack size mild                      NaN          NaN   
3            

In [11]:
import os
print(os.getcwd())


c:\Users\Colette D'Costa\CS4774\TeamShallotML4VA\SCRIPTS
