## **A comparison of nutrition information from two popular quick-service restaurants: Starbucks & McDonald's**


## Abstract

An analysis of nutrition facts for two popular quick-service restaurants, **Starbucks (SBUX)** and **McDonald’s (MCD)** to delineate comparative values for *macronutrients (macros)* such as fats, carbohydrates, and proteins. 








## Problem Overview

**Problem statement:**

Ever see these logos while driving to work?
![Image](https://cdn.apartmenttherapy.info/image/upload/f_auto,q_auto:eco,c_fit,w_730,h_488/at%2Farchive%2F2a2e07e47ac11c41e3ef81fb9fb76a7ee9f1634b)

![Image](https://www.designyourway.net/blog/wp-content/uploads/2023/06/Featured-1-14.jpg)


Despite becoming increasingly health conscious, Americans continue to frequent quick-service restaurants on a regular basis. While nutrition facts are ubiquitously available, they may not be frequently reviewed. 
 
 With restaurants being categorized into fast food versus fast casual versus coffee shops, this comparative analysis aims to assess for variations in the nutritional makeup of items sold at two common quick-service restaurants. Is one location more nutritious than the other?

**Objectives**: 

- To determine if there is variation in the macronutient nutritional makeup (macros) of items at SBUX versus MCD.
- To determine patterns in macros by 1-item type, 2-restaurant.

**Data sources**: 

Following extensive topic review, the following two datasets were idenitified using a "nutrition" search in kaggle: [Starbuck's Nutrition Facts](https://www.kaggle.com/datasets/starbucks/starbucks-menu) and [McDonald's Nutriction Facts](https://www.kaggle.com/datasets/mcdonalds/nutrition-facts). 




## Advisory Statement

To help users follow along, additional markdown headers have been included for clarity. 

*Advanced users may wish to skip these additions and can proceed directly to review of code if preferred*.

## Data Libraries, Collection, and Loading 

### Data Libraries

**Installing libraries**: 

1. The following libraries were installed to the virtual environment using the *pip install* command.

2. The *pip list* command can be used to verify the these libraries were correctly installed in the virtual environment.

The above steps are important to ensure that these libraries will be available for the following lines of code.

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

### Data Collection & Loading

**Loading data**: 

Pandas was utilized to load the CSV data files.

In [51]:
# loading dataset files to an original dataframe (helpful in case backup needed)
sbux = pd.read_csv("SBUXmenu.csv", index_col = "Beverage")
mcd  = pd.read_csv("mcdmenu.csv", index_col = "Item")

**Initial data check**: 

Display the first few /last rows and basic information about the dataset, noting column names, types, and missing values.

Displaying # of columns and rows in abbrevated and sentence form.

In [52]:
# displaying the dimensions of the SBUX DataFrame (ie # of rows & # of columns in the dataset)
sbux.shape

(242, 17)

In [53]:
# converting SBUX dataframe dimension info into a sentence format
df_shape = sbux.shape
print(f"The SBUX DataFrame has {df_shape[0]} rows and {df_shape[1]} columns.")

The SBUX DataFrame has 242 rows and 17 columns.


In [54]:
# displaying the dimensions of the MCD DataFrame (ie # of rows & # of columns in the dataset)
mcd.shape

(260, 23)

In [55]:
# converting MCD dataframe dimension info into a sentence format
df_shape = mcd.shape
print(f"The MCD DataFrame has {df_shape[0]} rows and {df_shape[1]} columns!")

The MCD DataFrame has 260 rows and 23 columns!


Displaying heads (beginning) of dataframe.

Using parenthesis allows for setting how many rows will be displayed.

In [56]:
# displaying the first 15 rows of the dataframe
sbux.head(15)

Unnamed: 0_level_0,Beverage_category,Beverage_prep,Calories,Total Fat (g),Trans Fat (g),Saturated Fat (g),Sodium (mg),Total Carbohydrates (g),Cholesterol (mg),Dietary Fibre (g),Sugars (g),Protein (g),Vitamin A (% DV),Vitamin C (% DV),Calcium (% DV),Iron (% DV),Caffeine (mg)
Beverage,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Brewed Coffee,Coffee,Short,3,0.1,0.0,0.0,0,5,0,0,0,0.3,0%,0%,0%,0%,175
Brewed Coffee,Coffee,Tall,4,0.1,0.0,0.0,0,10,0,0,0,0.5,0%,0%,0%,0%,260
Brewed Coffee,Coffee,Grande,5,0.1,0.0,0.0,0,10,0,0,0,1.0,0%,0%,0%,0%,330
Brewed Coffee,Coffee,Venti,5,0.1,0.0,0.0,0,10,0,0,0,1.0,0%,0%,2%,0%,410
Caffè Latte,Classic Espresso Drinks,Short Nonfat Milk,70,0.1,0.1,0.0,5,75,10,0,9,6.0,10%,0%,20%,0%,75
Caffè Latte,Classic Espresso Drinks,2% Milk,100,3.5,2.0,0.1,15,85,10,0,9,6.0,10%,0%,20%,0%,75
Caffè Latte,Classic Espresso Drinks,Soymilk,70,2.5,0.4,0.0,0,65,6,1,4,5.0,6%,0%,20%,8%,75
Caffè Latte,Classic Espresso Drinks,Tall Nonfat Milk,100,0.2,0.2,0.0,5,120,15,0,14,10.0,15%,0%,30%,0%,75
Caffè Latte,Classic Espresso Drinks,2% Milk,150,6.0,3.0,0.2,25,135,15,0,14,10.0,15%,0%,30%,0%,75
Caffè Latte,Classic Espresso Drinks,Soymilk,110,4.5,0.5,0.0,0,105,10,1,6,8.0,10%,0%,30%,15%,75


In [57]:
mcd.head(15)

Unnamed: 0_level_0,Category,Serving Size,Calories,Calories from Fat,Total Fat,Total Fat (% Daily Value),Saturated Fat,Saturated Fat (% Daily Value),Trans Fat,Cholesterol,...,Carbohydrates,Carbohydrates (% Daily Value),Dietary Fiber,Dietary Fiber (% Daily Value),Sugars,Protein,Vitamin A (% Daily Value),Vitamin C (% Daily Value),Calcium (% Daily Value),Iron (% Daily Value)
Item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Egg McMuffin,Breakfast,4.8 oz (136 g),300,120,13.0,20,5.0,25,0.0,260,...,31,10,4,17,3,17,10,0,25,15
Egg White Delight,Breakfast,4.8 oz (135 g),250,70,8.0,12,3.0,15,0.0,25,...,30,10,4,17,3,18,6,0,25,8
Sausage McMuffin,Breakfast,3.9 oz (111 g),370,200,23.0,35,8.0,42,0.0,45,...,29,10,4,17,2,14,8,0,25,10
Sausage McMuffin with Egg,Breakfast,5.7 oz (161 g),450,250,28.0,43,10.0,52,0.0,285,...,30,10,4,17,2,21,15,0,30,15
Sausage McMuffin with Egg Whites,Breakfast,5.7 oz (161 g),400,210,23.0,35,8.0,42,0.0,50,...,30,10,4,17,2,21,6,0,25,10
Steak & Egg McMuffin,Breakfast,6.5 oz (185 g),430,210,23.0,36,9.0,46,1.0,300,...,31,10,4,18,3,26,15,2,30,20
"Bacon, Egg & Cheese Biscuit (Regular Biscuit)",Breakfast,5.3 oz (150 g),460,230,26.0,40,13.0,65,0.0,250,...,38,13,2,7,3,19,10,8,15,15
"Bacon, Egg & Cheese Biscuit (Large Biscuit)",Breakfast,5.8 oz (164 g),520,270,30.0,47,14.0,68,0.0,250,...,43,14,3,12,4,19,15,8,20,20
"Bacon, Egg & Cheese Biscuit with Egg Whites (Regular Biscuit)",Breakfast,5.4 oz (153 g),410,180,20.0,32,11.0,56,0.0,35,...,36,12,2,7,3,20,2,8,15,10
"Bacon, Egg & Cheese Biscuit with Egg Whites (Large Biscuit)",Breakfast,5.9 oz (167 g),470,220,25.0,38,12.0,59,0.0,35,...,42,14,3,12,4,20,6,8,15,15


Displaying formatted quick overviews of the dataframes.

In [58]:
# displaying data in a formatted quick overview of the DataFrame structure
# will only work in jupyter notebook, othewise need to use: print(variable name)
sbux

Unnamed: 0_level_0,Beverage_category,Beverage_prep,Calories,Total Fat (g),Trans Fat (g),Saturated Fat (g),Sodium (mg),Total Carbohydrates (g),Cholesterol (mg),Dietary Fibre (g),Sugars (g),Protein (g),Vitamin A (% DV),Vitamin C (% DV),Calcium (% DV),Iron (% DV),Caffeine (mg)
Beverage,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Brewed Coffee,Coffee,Short,3,0.1,0.0,0.0,0,5,0,0,0,0.3,0%,0%,0%,0%,175
Brewed Coffee,Coffee,Tall,4,0.1,0.0,0.0,0,10,0,0,0,0.5,0%,0%,0%,0%,260
Brewed Coffee,Coffee,Grande,5,0.1,0.0,0.0,0,10,0,0,0,1.0,0%,0%,0%,0%,330
Brewed Coffee,Coffee,Venti,5,0.1,0.0,0.0,0,10,0,0,0,1.0,0%,0%,2%,0%,410
Caffè Latte,Classic Espresso Drinks,Short Nonfat Milk,70,0.1,0.1,0.0,5,75,10,0,9,6.0,10%,0%,20%,0%,75
Caffè Latte,Classic Espresso Drinks,2% Milk,100,3.5,2.0,0.1,15,85,10,0,9,6.0,10%,0%,20%,0%,75
Caffè Latte,Classic Espresso Drinks,Soymilk,70,2.5,0.4,0.0,0,65,6,1,4,5.0,6%,0%,20%,8%,75
Caffè Latte,Classic Espresso Drinks,Tall Nonfat Milk,100,0.2,0.2,0.0,5,120,15,0,14,10.0,15%,0%,30%,0%,75
Caffè Latte,Classic Espresso Drinks,2% Milk,150,6,3.0,0.2,25,135,15,0,14,10.0,15%,0%,30%,0%,75
Caffè Latte,Classic Espresso Drinks,Soymilk,110,4.5,0.5,0.0,0,105,10,1,6,8.0,10%,0%,30%,15%,75


In [59]:
# displaying data in a formatted quick overview of the DataFrame structure
# will only work in jupyter notebook, othewise need to use: print(variable name)
mcd

Unnamed: 0_level_0,Category,Serving Size,Calories,Calories from Fat,Total Fat,Total Fat (% Daily Value),Saturated Fat,Saturated Fat (% Daily Value),Trans Fat,Cholesterol,...,Carbohydrates,Carbohydrates (% Daily Value),Dietary Fiber,Dietary Fiber (% Daily Value),Sugars,Protein,Vitamin A (% Daily Value),Vitamin C (% Daily Value),Calcium (% Daily Value),Iron (% Daily Value)
Item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Egg McMuffin,Breakfast,4.8 oz (136 g),300,120,13.0,20,5.0,25,0.0,260,...,31,10,4,17,3,17,10,0,25,15
Egg White Delight,Breakfast,4.8 oz (135 g),250,70,8.0,12,3.0,15,0.0,25,...,30,10,4,17,3,18,6,0,25,8
Sausage McMuffin,Breakfast,3.9 oz (111 g),370,200,23.0,35,8.0,42,0.0,45,...,29,10,4,17,2,14,8,0,25,10
Sausage McMuffin with Egg,Breakfast,5.7 oz (161 g),450,250,28.0,43,10.0,52,0.0,285,...,30,10,4,17,2,21,15,0,30,15
Sausage McMuffin with Egg Whites,Breakfast,5.7 oz (161 g),400,210,23.0,35,8.0,42,0.0,50,...,30,10,4,17,2,21,6,0,25,10
Steak & Egg McMuffin,Breakfast,6.5 oz (185 g),430,210,23.0,36,9.0,46,1.0,300,...,31,10,4,18,3,26,15,2,30,20
"Bacon, Egg & Cheese Biscuit (Regular Biscuit)",Breakfast,5.3 oz (150 g),460,230,26.0,40,13.0,65,0.0,250,...,38,13,2,7,3,19,10,8,15,15
"Bacon, Egg & Cheese Biscuit (Large Biscuit)",Breakfast,5.8 oz (164 g),520,270,30.0,47,14.0,68,0.0,250,...,43,14,3,12,4,19,15,8,20,20
"Bacon, Egg & Cheese Biscuit with Egg Whites (Regular Biscuit)",Breakfast,5.4 oz (153 g),410,180,20.0,32,11.0,56,0.0,35,...,36,12,2,7,3,20,2,8,15,10
"Bacon, Egg & Cheese Biscuit with Egg Whites (Large Biscuit)",Breakfast,5.9 oz (167 g),470,220,25.0,38,12.0,59,0.0,35,...,42,14,3,12,4,20,6,8,15,15


Displaying tails (end) of dataframe.

Using parenthesis allows for setting how many rows will be displayed.

In [60]:
# displaying the last 20 rows of the dataframe
sbux.tail(20)

Unnamed: 0_level_0,Beverage_category,Beverage_prep,Calories,Total Fat (g),Trans Fat (g),Saturated Fat (g),Sodium (mg),Total Carbohydrates (g),Cholesterol (mg),Dietary Fibre (g),Sugars (g),Protein (g),Vitamin A (% DV),Vitamin C (% DV),Calcium (% DV),Iron (% DV),Caffeine (mg)
Beverage,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Mocha,Frappuccino® Light Blended Coffee,Venti Nonfat Milk,210,1,0.5,0.0,5,280,46,1,42,6.0,8%,0%,15%,10%,130
Caramel,Frappuccino® Light Blended Coffee,Tall Nonfat Milk,100,0.1,0.0,0.0,0,140,23,0,23,3.0,4%,0%,8%,0%,65
Caramel,Frappuccino® Light Blended Coffee,Grande Nonfat Milk,150,0.1,0.1,0.0,0,200,33,0,32,3.0,6%,0%,10%,0%,90
Caramel,Frappuccino® Light Blended Coffee,Venti Nonfat Milk,200,0.1,0.1,0.0,5,270,44,0,43,5.0,8%,0%,15%,2%,120
Java Chip,Frappuccino® Light Blended Coffee,Tall Nonfat Milk,150,3,2.0,0.0,0,170,30,1,27,4.0,4%,0%,10%,20%,70
Java Chip,Frappuccino® Light Blended Coffee,Grande Nonfat Milk,220,4,3.0,0.0,0,240,43,2,39,5.0,6%,0%,10%,25%,105
Java Chip,Frappuccino® Light Blended Coffee,Venti Nonfat Milk,290,5,4.0,0.0,5,320,58,2,52,7.0,8%,0%,15%,35%,165
Strawberries & Crème (Without Whipped Cream),Frappuccino® Blended Crème,Tall Nonfat Milk,170,0.1,0.1,0.0,0,140,39,0,38,3.0,6%,6%,10%,2%,0
Strawberries & Crème (Without Whipped Cream),Frappuccino® Blended Crème,Whole Milk,190,3,1.5,0.1,10,140,38,0,37,3.0,4%,6%,10%,2%,0
Strawberries & Crème (Without Whipped Cream),Frappuccino® Blended Crème,Soymilk,170,1.5,0.2,0.0,0,135,37,1,35,3.0,4%,6%,10%,6%,0


In [61]:
# displaying the last 20 rows of the dataframe
mcd.tail(20)

Unnamed: 0_level_0,Category,Serving Size,Calories,Calories from Fat,Total Fat,Total Fat (% Daily Value),Saturated Fat,Saturated Fat (% Daily Value),Trans Fat,Cholesterol,...,Carbohydrates,Carbohydrates (% Daily Value),Dietary Fiber,Dietary Fiber (% Daily Value),Sugars,Protein,Vitamin A (% Daily Value),Vitamin C (% Daily Value),Calcium (% Daily Value),Iron (% Daily Value)
Item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Mango Pineapple Smoothie (Large),Smoothies & Shakes,22 fl oz cup,340,10,1.0,2,0.5,3,0.0,5,...,78,26,2,6,72,4,50,30,10,2
Vanilla Shake (Small),Smoothies & Shakes,12 fl oz cup,530,140,15.0,24,10.0,49,1.0,60,...,86,29,0,0,63,11,20,0,40,0
Vanilla Shake (Medium),Smoothies & Shakes,16 fl oz cup,660,170,19.0,29,12.0,61,1.0,75,...,109,36,0,0,81,14,25,0,50,0
Vanilla Shake (Large),Smoothies & Shakes,22 fl oz cup,820,210,23.0,35,15.0,73,1.0,90,...,135,45,0,0,101,18,30,0,60,0
Strawberry Shake (Small),Smoothies & Shakes,12 fl oz cup,550,150,16.0,25,10.0,52,1.0,60,...,90,30,0,0,79,12,20,0,40,0
Strawberry Shake (Medium),Smoothies & Shakes,16 fl oz cup,690,180,20.0,30,13.0,63,1.0,75,...,114,38,0,0,100,15,25,0,50,0
Strawberry Shake (Large),Smoothies & Shakes,22 fl oz cup,850,210,24.0,36,15.0,75,1.0,90,...,140,47,0,0,123,18,30,0,70,0
Chocolate Shake (Small),Smoothies & Shakes,12 fl oz cup,560,150,16.0,25,10.0,51,1.0,60,...,91,30,1,5,77,12,20,0,40,8
Chocolate Shake (Medium),Smoothies & Shakes,16 fl oz cup,700,180,20.0,30,12.0,62,1.0,75,...,114,38,2,6,97,15,25,0,50,10
Chocolate Shake (Large),Smoothies & Shakes,22 fl oz cup,850,210,23.0,36,15.0,74,1.0,85,...,141,47,2,8,120,19,30,0,60,15


**Displaying column  names in list form**:

In [62]:
# displaying column names
sbux.columns
# displaying column names in list form
list(sbux.columns)

['Beverage_category',
 'Beverage_prep',
 'Calories',
 ' Total Fat (g)',
 'Trans Fat (g) ',
 'Saturated Fat (g)',
 ' Sodium (mg)',
 ' Total Carbohydrates (g) ',
 'Cholesterol (mg)',
 ' Dietary Fibre (g)',
 ' Sugars (g)',
 ' Protein (g) ',
 'Vitamin A (% DV) ',
 'Vitamin C (% DV)',
 ' Calcium (% DV) ',
 'Iron (% DV) ',
 'Caffeine (mg)']

In [63]:
# displaying column names
mcd.columns
# displaying column names in list form
list(mcd.columns)

['Category',
 'Serving Size',
 'Calories',
 'Calories from Fat',
 'Total Fat',
 'Total Fat (% Daily Value)',
 'Saturated Fat',
 'Saturated Fat (% Daily Value)',
 'Trans Fat',
 'Cholesterol',
 'Cholesterol (% Daily Value)',
 'Sodium',
 'Sodium (% Daily Value)',
 'Carbohydrates',
 'Carbohydrates (% Daily Value)',
 'Dietary Fiber',
 'Dietary Fiber (% Daily Value)',
 'Sugars',
 'Protein',
 'Vitamin A (% Daily Value)',
 'Vitamin C (% Daily Value)',
 'Calcium (% Daily Value)',
 'Iron (% Daily Value)']

Noting *similar* columns that will be interesting for comparison:
- Calories
- Fat: 1- Total Fat, 2- Trans Fat, 3- Saturated Fat 
- Carbs: 1- Total Carbhydreates/ Carbohydrates, 2- Sugars, 3- Dietary Fiber
- Protein
- Sodium
- Cholesterol
- Vitamin A 
- Vitamin C
- Calcium
- Iron
- Beverage Category / Category 

Noting *distinct* columns that will not allow for comparison:
- Beverage prep (only on SBUX)
- Caffeine (only on SBUX)
- Serving size (only on MCD)
- Calories from Fat (only on MCD)
- % Daily Value of: Total Fat, Saturated Fat, Cholesterol, Sodium, Carbohydrates, Dietary Fiber (only on MCD)

Analysis of distinct columns

In [64]:
# analyzing value counts of specific columns 
sbux["Beverage_prep"].value_counts()

Beverage_prep
Soymilk               66
2% Milk               50
Grande Nonfat Milk    26
Tall Nonfat Milk      23
Venti Nonfat Milk     22
Whole Milk            16
Short Nonfat Milk     12
Tall                   7
Grande                 7
Venti                  7
Short                  4
Solo                   1
Doppio                 1
Name: count, dtype: int64

In [66]:
# analyzing value counts of specific columns 
sbux["Caffeine (mg)"].value_counts()

Caffeine (mg)
75        37
0         35
150       34
70        14
varies    12
95        11
Varies    10
110        9
130        7
25         6
120        6
90         4
175        4
20         3
125        3
10         3
145        3
50         3
100        3
140        3
55         3
80         3
180        3
85         3
30         3
15         3
170        3
165        2
410        1
235        1
330        1
225        1
260        1
300        1
65         1
105        1
Name: count, dtype: int64

In [67]:
# analyzing value counts of specific columns 
mcd["Serving Size"].value_counts()

Serving Size
16 fl oz cup         45
12 fl oz cup         38
22 fl oz cup         20
20 fl oz cup         16
30 fl oz cup          7
21 fl oz cup          7
32 fl oz cup          5
5.7 oz (161 g)        5
3.9 oz (111 g)        3
7.1 oz (201 g)        3
7.1 oz (202 g)        2
9.6 oz (251 g)        2
9.5 oz (270 g)        2
6.7 oz (190 g)        2
10.9 oz (310 g)       2
4.3 oz (123 g)        2
1 cookie (33 g)       2
10 oz (283 g)         2
8.1 oz (230 g)        2
8.5 oz (241 g)        2
8.3 oz (235 g)        2
5.9 oz (167 g)        2
1 carton (236 ml)     2
6.3 oz (178 g)        2
10.3 oz (291 g)       1
4.6 oz (130 g)        1
11.1 oz (314 g)       1
11.8 oz (335 g)       1
11.2 oz (318 g)       1
10.7 oz (304 g)       1
12.3 oz (348 g)       1
5.7 oz (162 g)        1
9 oz (255 g)          1
7.9 oz (223 g)        1
5 oz (142 g)          1
22.8 oz (646 g)       1
2.3 oz (65 g)         1
11.4 oz (323 g)       1
3.4 oz (97 g)         1
4.8 oz (136 g)        1
3.1 oz (87 g)         1
4.1

In [68]:
# analyzing value counts of specific columns 
mcd["Calories from Fat"].value_counts()

Calories from Fat
0       54
80      16
150     11
120     10
200     10
210      9
180      8
280      8
100      7
40       7
170      7
70       7
30       6
60       6
240      6
90       6
140      6
250      5
10       5
5        5
190      5
35       4
45       4
220      4
290      4
230      4
50       4
110      4
130      3
160      3
270      2
20       2
330      2
300      2
1060     1
530      1
540      1
410      1
360      1
310      1
380      1
260      1
430      1
470      1
370      1
510      1
450      1
340      1
Name: count, dtype: int64

Upon initial analysis:

Consider *merging*: 
- "Beverage Prep (SBUX)" column with the "Beverage" column 
-  "Serving Size (MCD)" column with the "Beverage" column 

Consider *deleting*:
- Caffeine (SBUX)
- Calories from Fat (MCD)
- % Daily Value of: Total Fat, Saturated Fat, Cholesterol, Sodium, Carbohydrates, Dietary Fiber (MCD)



### Data Cleaning and Operations

**Assessing for missing values**: 

It is important to identify if there are any null values in the dataframes.

In [28]:
# displaying a summary of the dataframe that includes: column names, data types, non-null value counts
# check for: # of non-null values per column, fit of asssigned data types
sbux.info()

<class 'pandas.core.frame.DataFrame'>
Index: 242 entries, Brewed Coffee to Vanilla Bean (Without Whipped Cream)
Data columns (total 17 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Beverage_category          242 non-null    object 
 1   Beverage_prep              242 non-null    object 
 2   Calories                   242 non-null    int64  
 3    Total Fat (g)             242 non-null    object 
 4   Trans Fat (g)              242 non-null    float64
 5   Saturated Fat (g)          242 non-null    float64
 6    Sodium (mg)               242 non-null    int64  
 7    Total Carbohydrates (g)   242 non-null    int64  
 8   Cholesterol (mg)           242 non-null    int64  
 9    Dietary Fibre (g)         242 non-null    int64  
 10   Sugars (g)                242 non-null    int64  
 11   Protein (g)               242 non-null    float64
 12  Vitamin A (% DV)           242 non-null    object 
 13  Vitamin C 

In [29]:
# displaying a summary of the dataframe that includes: column names, data types, non-null value counts
# check for: # of non-null values per column, fit of asssigned data types
mcd.info()

<class 'pandas.core.frame.DataFrame'>
Index: 260 entries, Egg McMuffin to McFlurry with Reese's Peanut Butter Cups (Snack)
Data columns (total 23 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Category                       260 non-null    object 
 1   Serving Size                   260 non-null    object 
 2   Calories                       260 non-null    int64  
 3   Calories from Fat              260 non-null    int64  
 4   Total Fat                      260 non-null    float64
 5   Total Fat (% Daily Value)      260 non-null    int64  
 6   Saturated Fat                  260 non-null    float64
 7   Saturated Fat (% Daily Value)  260 non-null    int64  
 8   Trans Fat                      260 non-null    float64
 9   Cholesterol                    260 non-null    int64  
 10  Cholesterol (% Daily Value)    260 non-null    int64  
 11  Sodium                         260 non-null    int64  
 12 

Noting that given observed info above about these dataframes, there appears to be:
- one null value in the "Caffeine" column of the SBUX dataset.
- no null values in the MCD dataset 

The following code formally assesses this by checking for missing values.

In [30]:
# checking for missing values
# True = missing values, False = no missing values

print(sbux.isnull().any().any())

True


In [32]:
# checking for missing values
# True = missing values, False = no missing values

print(mcd.isnull().any().any())

False


Checking for null values by column.

In [33]:
# counting the missing values per coulmn
sbux.isnull().sum()

Beverage_category            0
Beverage_prep                0
Calories                     0
 Total Fat (g)               0
Trans Fat (g)                0
Saturated Fat (g)            0
 Sodium (mg)                 0
 Total Carbohydrates (g)     0
Cholesterol (mg)             0
 Dietary Fibre (g)           0
 Sugars (g)                  0
 Protein (g)                 0
Vitamin A (% DV)             0
Vitamin C (% DV)             0
 Calcium (% DV)              0
Iron (% DV)                  0
Caffeine (mg)                1
dtype: int64

In [34]:
# counting the missing values per coulmn
mcd.isnull().sum()

Category                         0
Serving Size                     0
Calories                         0
Calories from Fat                0
Total Fat                        0
Total Fat (% Daily Value)        0
Saturated Fat                    0
Saturated Fat (% Daily Value)    0
Trans Fat                        0
Cholesterol                      0
Cholesterol (% Daily Value)      0
Sodium                           0
Sodium (% Daily Value)           0
Carbohydrates                    0
Carbohydrates (% Daily Value)    0
Dietary Fiber                    0
Dietary Fiber (% Daily Value)    0
Sugars                           0
Protein                          0
Vitamin A (% Daily Value)        0
Vitamin C (% Daily Value)        0
Calcium (% Daily Value)          0
Iron (% Daily Value)             0
dtype: int64

Checking for null values by row.

In [37]:
# checking for rows with missing values
sbux.isnull().any(axis=1)

Beverage
Brewed Coffee                                   False
Brewed Coffee                                   False
Brewed Coffee                                   False
Brewed Coffee                                   False
Caffè Latte                                     False
                                                ...  
Strawberries & Crème (Without Whipped Cream)    False
Vanilla Bean (Without Whipped Cream)            False
Vanilla Bean (Without Whipped Cream)            False
Vanilla Bean (Without Whipped Cream)            False
Vanilla Bean (Without Whipped Cream)            False
Length: 242, dtype: bool

In [38]:
# checking for rows with missing values
mcd.isnull().any(axis=1)

Item
Egg McMuffin                                         False
Egg White Delight                                    False
Sausage McMuffin                                     False
Sausage McMuffin with Egg                            False
Sausage McMuffin with Egg Whites                     False
                                                     ...  
McFlurry with Oreo Cookies (Small)                   False
McFlurry with Oreo Cookies (Medium)                  False
McFlurry with Oreo Cookies (Snack)                   False
McFlurry with Reese's Peanut Butter Cups (Medium)    False
McFlurry with Reese's Peanut Butter Cups (Snack)     False
Length: 260, dtype: bool

Previous observational findings (below) from review of the dataframe summary were confirmed:
- one null value in the "Caffeine" column of the SBUX dataset.
- no null values in the MCD dataset 

The following code can be used to clarify which row in the "Caffeine" column contains the null value.

In [None]:
#checking for rows with null values using Boolean Series
null_caffeine = sbux[sbux.isnull().any(axis=1)]
print (null_caffeine)

                                                    Beverage_category  \
Beverage                                                                
Iced Brewed Coffee (With Milk & Classic Syrup)  Shaken Iced Beverages   

                                               Beverage_prep  Calories  \
Beverage                                                                 
Iced Brewed Coffee (With Milk & Classic Syrup)       2% Milk        90   

                                                Total Fat (g)  Trans Fat (g)   \
Beverage                                                                        
Iced Brewed Coffee (With Milk & Classic Syrup)              1             0.5   

                                                Saturated Fat (g)  \
Beverage                                                            
Iced Brewed Coffee (With Milk & Classic Syrup)                0.0   

                                                 Sodium (mg)  \
Beverage                                 

Above code identifies that the row for the "Beverage" *Iced Brewed Coffee (With Milk & Classic Syrup)* contains a null value in the "Caffeine" column.

Alternatively, the below code can identfy the specific index (indices) of this null value(s).

In [None]:
#checking indices of rows with null values
null_index = sbux.index[sbux.isnull().any(axis=1)].tolist()
print (null_index)

['Iced Brewed Coffee (With Milk & Classic Syrup)']


This matches the above finding that the row for the "Beverage" *Iced Brewed Coffee (With Milk & Classic Syrup)* contains a null value in the "Caffeine" column.

**Addreessing missing values**: 

It is important to determine how to address any null values in the dataframes.