# Fast Food Nutrition Project

**Team:** Lori Viaioli, Jennifer White, Jenna Johnshoy-Aarestad, Heather Shoberg, Eric Johnson, Chad Fletcher

---

## Load Fast Food Data from Kaggle

**Overview:** This dataset provides information on common nutrients in various fast food items.

**Disclosure:** Please note that this dataset may not be exhaustive or up to date. It is used here solely for educational purposes. For accurate nutritional advice, consult with a qualified expert.

**Sources:**
- Nutrition Data: [Fast Food Nutrition Dataset](https://www.kaggle.com/datasets/ulrikthygepedersen/fastfood-nutrition)
- Map Data: Map data provided by ©2024 Google
  - [Google Maps API Documentation](https://developers.google.com/maps/documentation)

**Additional Note:** Additional data was reviewed during the analysis but not included in the final analysis.


## Import and clean data

In [10]:
# Required Libraries
import pandas as pd
import sqlite3


In [11]:
# Load CSV for nutritional data. 
menu_df = pd.read_csv('../data/fastfood.csv')
menu_df.head(3)



Unnamed: 0,restaurant,item,calories,cal_fat,total_fat,sat_fat,trans_fat,cholesterol,sodium,total_carb,fiber,sugar,protein,vit_a,vit_c,calcium,salad
0,Mcdonalds,Artisan Grilled Chicken Sandwich,380,60,7,2.0,0.0,95,1110,44,3.0,11,37.0,4.0,20.0,20.0,Other
1,Mcdonalds,Single Bacon Smokehouse Burger,840,410,45,17.0,1.5,130,1580,62,2.0,18,46.0,6.0,20.0,20.0,Other
2,Mcdonalds,Double Bacon Smokehouse Burger,1130,600,67,27.0,3.0,220,1920,63,3.0,18,70.0,10.0,20.0,50.0,Other


In [12]:
# load the location data
location_df = pd.read_csv('../data/places_rating.csv')
location_df.head(3)

Unnamed: 0.1,Unnamed: 0,Rating,Total User Ratings,Address,City,State,Lat,Long_,Restaurant
0,0,4.3,2167,"8020 MN-7, St Louis Park","8020 MN-7, St Louis Park",MN,44.937051,-93.381334,Chick-fil-A Knollwood
1,1,4.4,1346,"2090 Snelling Ave N, Roseville","2090 Snelling Ave N, Roseville",MN,45.004271,-93.165906,Chick-fil-A
2,2,4.5,4137,"2500 W 79th St, Bloomington","2500 W 79th St, Bloomington",MN,44.861015,-93.310701,Chick-fil-A


In [13]:
# Get information on df
menu_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 515 entries, 0 to 514
Data columns (total 17 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   restaurant   515 non-null    object 
 1   item         515 non-null    object 
 2   calories     515 non-null    int64  
 3   cal_fat      515 non-null    int64  
 4   total_fat    515 non-null    int64  
 5   sat_fat      515 non-null    float64
 6   trans_fat    515 non-null    float64
 7   cholesterol  515 non-null    int64  
 8   sodium       515 non-null    int64  
 9   total_carb   515 non-null    int64  
 10  fiber        503 non-null    float64
 11  sugar        515 non-null    int64  
 12  protein      514 non-null    float64
 13  vit_a        301 non-null    float64
 14  vit_c        305 non-null    float64
 15  calcium      305 non-null    float64
 16  salad        515 non-null    object 
dtypes: float64(7), int64(7), object(3)
memory usage: 68.5+ KB


In [14]:
location_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145 entries, 0 to 144
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Unnamed: 0          145 non-null    int64  
 1   Rating              145 non-null    float64
 2   Total User Ratings  145 non-null    int64  
 3   Address             145 non-null    object 
 4   City                145 non-null    object 
 5   State               145 non-null    object 
 6   Lat                 145 non-null    float64
 7   Long_               145 non-null    float64
 8   Restaurant          145 non-null    object 
dtypes: float64(3), int64(2), object(4)
memory usage: 10.3+ KB


In [15]:
# drop the unnamed column 
location_df.drop(columns=['Unnamed: 0'], inplace=True)


In [16]:
location_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145 entries, 0 to 144
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rating              145 non-null    float64
 1   Total User Ratings  145 non-null    int64  
 2   Address             145 non-null    object 
 3   City                145 non-null    object 
 4   State               145 non-null    object 
 5   Lat                 145 non-null    float64
 6   Long_               145 non-null    float64
 7   Restaurant          145 non-null    object 
dtypes: float64(3), int64(1), object(4)
memory usage: 9.2+ KB


In [17]:
# add id to location so it works with db
location_df.index.name = 'id'
location_df.reset_index(inplace=True)
location_df['id'] = location_df.index + 1
location_df.head(3)

Unnamed: 0,id,Rating,Total User Ratings,Address,City,State,Lat,Long_,Restaurant
0,1,4.3,2167,"8020 MN-7, St Louis Park","8020 MN-7, St Louis Park",MN,44.937051,-93.381334,Chick-fil-A Knollwood
1,2,4.4,1346,"2090 Snelling Ave N, Roseville","2090 Snelling Ave N, Roseville",MN,45.004271,-93.165906,Chick-fil-A
2,3,4.5,4137,"2500 W 79th St, Bloomington","2500 W 79th St, Bloomington",MN,44.861015,-93.310701,Chick-fil-A


In [18]:
# Drop Columns missing several values
# Dropping Salad because there's not useful data within it 
menu_df = menu_df.drop(labels=['vit_a', 'vit_c', 'calcium', 'salad'], axis=1)

# Check to ensure it worked
menu_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 515 entries, 0 to 514
Data columns (total 13 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   restaurant   515 non-null    object 
 1   item         515 non-null    object 
 2   calories     515 non-null    int64  
 3   cal_fat      515 non-null    int64  
 4   total_fat    515 non-null    int64  
 5   sat_fat      515 non-null    float64
 6   trans_fat    515 non-null    float64
 7   cholesterol  515 non-null    int64  
 8   sodium       515 non-null    int64  
 9   total_carb   515 non-null    int64  
 10  fiber        503 non-null    float64
 11  sugar        515 non-null    int64  
 12  protein      514 non-null    float64
dtypes: float64(4), int64(7), object(2)
memory usage: 52.4+ KB


In [19]:
# Clean Fiber and Protein missing values
# Calculate column averages
fiber_avg = menu_df['fiber'].mean()
protein_avg = menu_df['protein'].mean()

# Fill missing values with averages
menu_df['fiber'].fillna(fiber_avg, inplace=True)
menu_df['protein'].fillna(protein_avg, inplace=True)

# Verify the changes
menu_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 515 entries, 0 to 514
Data columns (total 13 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   restaurant   515 non-null    object 
 1   item         515 non-null    object 
 2   calories     515 non-null    int64  
 3   cal_fat      515 non-null    int64  
 4   total_fat    515 non-null    int64  
 5   sat_fat      515 non-null    float64
 6   trans_fat    515 non-null    float64
 7   cholesterol  515 non-null    int64  
 8   sodium       515 non-null    int64  
 9   total_carb   515 non-null    int64  
 10  fiber        515 non-null    float64
 11  sugar        515 non-null    int64  
 12  protein      515 non-null    float64
dtypes: float64(4), int64(7), object(2)
memory usage: 52.4+ KB


In [20]:
# Convert all integer columns to floats to make it easier to compare
int_columns = menu_df.select_dtypes(include='int64').columns
menu_df[int_columns] = menu_df[int_columns].astype(float)

# Create a new column to display the item with the chain name after, making it easier to read the drop down menus
menu_df.insert(2, 'item_with_chain', menu_df['item'] + ' (' + menu_df['restaurant'] + ')')

# Verify the changes
menu_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 515 entries, 0 to 514
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   restaurant       515 non-null    object 
 1   item             515 non-null    object 
 2   item_with_chain  515 non-null    object 
 3   calories         515 non-null    float64
 4   cal_fat          515 non-null    float64
 5   total_fat        515 non-null    float64
 6   sat_fat          515 non-null    float64
 7   trans_fat        515 non-null    float64
 8   cholesterol      515 non-null    float64
 9   sodium           515 non-null    float64
 10  total_carb       515 non-null    float64
 11  fiber            515 non-null    float64
 12  sugar            515 non-null    float64
 13  protein          515 non-null    float64
dtypes: float64(11), object(3)
memory usage: 56.5+ KB


In [21]:
# Find duplicate items in 'item_with_chain'
print(menu_df['item_with_chain'].value_counts())

duplicate_items = menu_df[menu_df.duplicated(subset='item_with_chain', keep=False)]

# Display duplicate items
print(duplicate_items[['item_with_chain', 'calories', 'cal_fat', 'sat_fat', \
                       'trans_fat', 'cholesterol', 'sodium', 'total_carb', 'fiber', 'sugar', 'protein']])

# Remove one of the duplicates (you can choose 'first' or 'last')
menu_df = menu_df.drop_duplicates(subset='item_with_chain', keep='first')

Express Taco Salad w/ Chips (Taco Bell)         2
Chili Cheese Burrito (Taco Bell)                2
Artisan Grilled Chicken Sandwich (Mcdonalds)    1
6" Sweet Onion Chicken Teriyaki (Subway)        1
Footlong Subway Seafood Sensation (Subway)      1
                                               ..
Roast Turkey & Swiss Sandwich (Arbys)           1
Roast Beef Gyro (Arbys)                         1
Reuben Sandwich (Arbys)                         1
5 piece Prime-Cut Chicken Tenders (Arbys)       1
Fiesta Taco Salad-Steak (Taco Bell)             1
Name: item_with_chain, Length: 513, dtype: int64
                             item_with_chain  calories  cal_fat  sat_fat  \
414         Chili Cheese Burrito (Taco Bell)     380.0    150.0      8.0   
492         Chili Cheese Burrito (Taco Bell)     380.0    150.0      8.0   
497  Express Taco Salad w/ Chips (Taco Bell)     580.0    260.0      9.0   
511  Express Taco Salad w/ Chips (Taco Bell)     580.0    260.0      9.0   

     trans_fat  chol

In [22]:
# Check again
duplicate_items2 = menu_df[menu_df.duplicated(subset='item_with_chain', keep=False)]
print(duplicate_items2)

Empty DataFrame
Columns: [restaurant, item, item_with_chain, calories, cal_fat, total_fat, sat_fat, trans_fat, cholesterol, sodium, total_carb, fiber, sugar, protein]
Index: []


In [23]:
# Rename columns to match with the Pygwalker code
menu_df = menu_df.rename(columns={'restaurant': 'Restaurant'})
menu_df = menu_df.rename(columns={'item': 'Item'})
menu_df = menu_df.rename(columns={'calories': 'Calories'})
menu_df = menu_df.rename(columns={'cal_fat': 'Calories From Fat'})
menu_df = menu_df.rename(columns={'total_fat': 'Total Fat'})
menu_df = menu_df.rename(columns={'sat_fat': 'Saturated Fat'})
menu_df = menu_df.rename(columns={'trans_fat': 'Trans Fat'})
menu_df = menu_df.rename(columns={'cholesterol': 'Cholesterol'})
menu_df = menu_df.rename(columns={'sodium': 'Sodium'})
menu_df = menu_df.rename(columns={'total_carb': 'Total Carbohydrates'})
menu_df = menu_df.rename(columns={'fiber': 'Fiber'})
menu_df = menu_df.rename(columns={'sugar': 'Sugar'})
menu_df = menu_df.rename(columns={'protein': 'Protein'})

In [24]:
# Look at shape
menu_df.shape

(513, 14)

In [25]:
# Look at columns
menu_df.columns

Index(['Restaurant', 'Item', 'item_with_chain', 'Calories',
       'Calories From Fat', 'Total Fat', 'Saturated Fat', 'Trans Fat',
       'Cholesterol', 'Sodium', 'Total Carbohydrates', 'Fiber', 'Sugar',
       'Protein'],
      dtype='object')

In [26]:
# Checking info again
menu_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 513 entries, 0 to 514
Data columns (total 14 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Restaurant           513 non-null    object 
 1   Item                 513 non-null    object 
 2   item_with_chain      513 non-null    object 
 3   Calories             513 non-null    float64
 4   Calories From Fat    513 non-null    float64
 5   Total Fat            513 non-null    float64
 6   Saturated Fat        513 non-null    float64
 7   Trans Fat            513 non-null    float64
 8   Cholesterol          513 non-null    float64
 9   Sodium               513 non-null    float64
 10  Total Carbohydrates  513 non-null    float64
 11  Fiber                513 non-null    float64
 12  Sugar                513 non-null    float64
 13  Protein              513 non-null    float64
dtypes: float64(11), object(3)
memory usage: 60.1+ KB


In [27]:
# Need an id field for the database 
menu_df['id'] = range(1, len(menu_df) + 1)

menu_df = menu_df[['id'] + [col for col in menu_df.columns if col != 'id']]


In [28]:
# Look at columns again to ensure the id column was created
menu_df.columns

Index(['id', 'Restaurant', 'Item', 'item_with_chain', 'Calories',
       'Calories From Fat', 'Total Fat', 'Saturated Fat', 'Trans Fat',
       'Cholesterol', 'Sodium', 'Total Carbohydrates', 'Fiber', 'Sugar',
       'Protein'],
      dtype='object')

In [29]:
menu_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 513 entries, 0 to 514
Data columns (total 15 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   id                   513 non-null    int64  
 1   Restaurant           513 non-null    object 
 2   Item                 513 non-null    object 
 3   item_with_chain      513 non-null    object 
 4   Calories             513 non-null    float64
 5   Calories From Fat    513 non-null    float64
 6   Total Fat            513 non-null    float64
 7   Saturated Fat        513 non-null    float64
 8   Trans Fat            513 non-null    float64
 9   Cholesterol          513 non-null    float64
 10  Sodium               513 non-null    float64
 11  Total Carbohydrates  513 non-null    float64
 12  Fiber                513 non-null    float64
 13  Sugar                513 non-null    float64
 14  Protein              513 non-null    float64
dtypes: float64(11), int64(1), object(3)
memo

In [30]:
# Last check before saving
menu_df.head(2)

Unnamed: 0,id,Restaurant,Item,item_with_chain,Calories,Calories From Fat,Total Fat,Saturated Fat,Trans Fat,Cholesterol,Sodium,Total Carbohydrates,Fiber,Sugar,Protein
0,1,Mcdonalds,Artisan Grilled Chicken Sandwich,Artisan Grilled Chicken Sandwich (Mcdonalds),380.0,60.0,7.0,2.0,0.0,95.0,1110.0,44.0,3.0,11.0,37.0
1,2,Mcdonalds,Single Bacon Smokehouse Burger,Single Bacon Smokehouse Burger (Mcdonalds),840.0,410.0,45.0,17.0,1.5,130.0,1580.0,62.0,2.0,18.0,46.0


### Save to clean CSV and save some summary stats

In [31]:
# Saving to cleaned csvs 
menu_df.to_csv('../data/fastfood_cleaned.csv', index=True)
location_df.to_csv('../data/location_cleaned.csv', index=True)


In [32]:
# Create the summary table
# numeric_only used due to warning message from the original code, please refer to pandas docs: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html
summary_table = menu_df.groupby('Restaurant').mean(numeric_only=True)
summary_table

Unnamed: 0_level_0,id,Calories,Calories From Fat,Total Fat,Saturated Fat,Trans Fat,Cholesterol,Sodium,Total Carbohydrates,Fiber,Sugar,Protein
Restaurant,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Arbys,165.0,532.727273,237.836364,26.981818,7.972727,0.418182,70.454545,1515.272727,44.872727,2.709091,7.563636,29.254545
Burger King,227.5,608.571429,333.757143,36.814286,11.15,0.864286,100.857143,1223.571429,39.314286,2.633882,8.185714,29.984158
Chick Fil-A,71.0,384.444444,145.37037,16.148148,4.111111,0.037037,79.074074,1151.481481,28.62963,2.454606,4.148148,31.703704
Dairy Queen,283.5,520.238095,260.47619,28.857143,10.440476,0.678571,71.547619,1181.785714,38.690476,2.833333,6.357143,24.833333
Mcdonalds,29.0,640.350877,285.614035,31.807018,8.289474,0.464912,109.736842,1437.894737,48.789474,3.22807,11.070175,40.298246
Sonic,111.0,631.698113,338.301887,37.641509,11.415094,0.933962,86.981132,1350.754717,47.207547,2.660377,6.528302,29.188679
Subway,352.5,503.020833,165.104167,18.479167,6.197917,0.21875,61.302083,1272.96875,54.71875,6.5625,10.09375,30.3125
Taco Bell,457.0,443.00885,187.699115,20.858407,6.557522,0.243363,38.893805,1012.389381,46.575221,5.699115,3.690265,17.380531


In [33]:
summary_table['id'] = range(1, len(summary_table) + 1)

summary_table = summary_table[['id'] + [col for col in summary_table.columns if col != 'id']]


In [34]:
# Reset the index to save the restaurant column
#summary_table.reset_index(inplace=True)

summary_table.info()

<class 'pandas.core.frame.DataFrame'>
Index: 8 entries, Arbys to Taco Bell
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   id                   8 non-null      int64  
 1   Calories             8 non-null      float64
 2   Calories From Fat    8 non-null      float64
 3   Total Fat            8 non-null      float64
 4   Saturated Fat        8 non-null      float64
 5   Trans Fat            8 non-null      float64
 6   Cholesterol          8 non-null      float64
 7   Sodium               8 non-null      float64
 8   Total Carbohydrates  8 non-null      float64
 9   Fiber                8 non-null      float64
 10  Sugar                8 non-null      float64
 11  Protein              8 non-null      float64
dtypes: float64(11), int64(1)
memory usage: 832.0+ bytes


In [35]:
summary_table.head()

Unnamed: 0_level_0,id,Calories,Calories From Fat,Total Fat,Saturated Fat,Trans Fat,Cholesterol,Sodium,Total Carbohydrates,Fiber,Sugar,Protein
Restaurant,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Arbys,1,532.727273,237.836364,26.981818,7.972727,0.418182,70.454545,1515.272727,44.872727,2.709091,7.563636,29.254545
Burger King,2,608.571429,333.757143,36.814286,11.15,0.864286,100.857143,1223.571429,39.314286,2.633882,8.185714,29.984158
Chick Fil-A,3,384.444444,145.37037,16.148148,4.111111,0.037037,79.074074,1151.481481,28.62963,2.454606,4.148148,31.703704
Dairy Queen,4,520.238095,260.47619,28.857143,10.440476,0.678571,71.547619,1181.785714,38.690476,2.833333,6.357143,24.833333
Mcdonalds,5,640.350877,285.614035,31.807018,8.289474,0.464912,109.736842,1437.894737,48.789474,3.22807,11.070175,40.298246


In [None]:
# Set id to the index first
# summary_table.set_index(inplace=True)

# Save summary table to a new CSV
summary_table.to_csv('../data/summary_table.csv', index=True)


## Create DB

#### Create the DB:

In [None]:
# Create a database
conn = sqlite3.connect('../database/db.sqlite')
conn.close()

#### Add Tables

In [None]:
# connect to db
conn = sqlite3.connect('../database/db.sqlite')

# read the csv data into a dataframe
df = pd.read_csv('../data/location_cleaned.csv')
df2 = pd.read_csv('../data/fastfood_cleaned.csv')
df3 = pd.read_csv('../data/summary_table.csv')

# send it to the database (replace 'passenger' with your table name and 'id' with your primary key column)
df.to_sql('location', conn, index=False, if_exists='replace', dtype={'id': 'INTEGER PRIMARY KEY'})
df2.to_sql('nutrition', conn, index=False, if_exists='replace', dtype={'id': 'INTEGER PRIMARY KEY'})
df3.to_sql('summary', conn, index=False, if_exists='replace', dtype={'id': 'INTEGER PRIMARY KEY'})

conn.close()
