# Starbucks - Nutrition Facts and Analysis☕

# **Introduction**

## Starbucks 
Starbucks coffee shop first opened its doors in Seattle, WA by three students on March 30, 1971 and it has become the biggest coffee chain in the world. Starbucks has over 31,000 stores, includig Starbucks Reserve stores and pickup-only stores. It seems like there is Starbucks on almost every corner in every town (especially in the big cities). Starbucks has variety of drinks menu including signature drinks and seasonal drinks during fall and winter season. Customers can also create customized drinks by adding more espresso shots, syrups, milks, etc. Since I really like drinking coffee, I though it would really interesting to see and anaylse nutrition facts in aech Starbucks drinks from the dataset.   


## Breakdown of this notebook:
1. **Importing Libraries**
2. **Loading the Dataset**
    * Load data into a Pandas DataFrame
    * Print the DAtatypes of the dataset
    * Remove the duplicates if any
    * Print column names
3. **Data Cleaning**
    * Check for the null values in each column 
4. **Questions and Data Visualizations:** Using plots to find relations between the features.
    * Plot Beverage_category 
    * WordCloud 
    * Q1.Which Starbucks drink has the highest calories from the dataset? 
    * Q2. Which drink has the highest calories from the Starbucks classic espresso drinks?
    * Q3. Highest Sugar Drink at Starbucks?
    * Q4. Highest Sugar Drink from Signature Espresso Drinks?
    * Q5. Which drink has the most calories from the Starbucks TazoÂ® Tea Drinks drinks?
    * Q6. Signature Espresso Drinks vs TazoÂ® Tea Drinks calories
    * Q7. Plot Historgram
    * Q8. Get Correlation between different variables
        * Calories vs Sugars
        * Caloreis vs Total Carbohydrates
        * Calories vs Total Fat
    * Q9. Map of Correlation betweeen Different Variables
5. **Conclusion**

### Aknowledgements:

This public dataset is part of Starbucks, and the original source can be found on this [website](https://www.kaggle.com/starbucks/starbucks-menu).


![](https://stories.starbucks.com/uploads/2019/01/1981-Pike-Place_Exterior_Photo-1-1440x700.jpg)
(📷Photo: [Starbucks](https://stories.starbucks.com/stories/2015/store-tour-inside-1912-pike-place-seattle-usa/))

# 1. Importing Libraries

In [None]:
%matplotlib inline
import numpy as np # linear algebra
import pandas as pd # data processing
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

# Suppress deprecation and incorrect usage warnings 
import warnings
warnings.filterwarnings('ignore')

# 2. Loading the Dataset

### * Load data into a Pandas DataFrame

In [None]:
df = pd.read_csv('/kaggle/input/starbucks-menu/starbucks_drinkMenu_expanded.csv', encoding="ISO-8859-1", low_memory=False)
df

### * Print the Datatypes of the dataset

In [None]:
df.dtypes

In [None]:
df.info()

### * Removing the Duplicates if any

In [None]:
df.duplicated().sum()
df.drop_duplicates(inplace=True)

In [None]:
df.head()

### * Print column names

In [None]:
df.columns

# 3. Data Cleaning

In [None]:
#check for the null values in each column
df.isnull().sum()

* Very clean data. There is only 1 null value in Caffeine (mg)
* If there are unnecessary columns: 
    * airbnb.isnull().sum()
    * airbnb.dropna(how='any',inplace=True)
    * airbnb.info()

# 4. Questions and Data Visualizations

In [None]:
import seaborn as sns

In [None]:
df['Beverage_category'].unique()

### * Plot Beverage_categry

In [None]:
#Pull two columns "Beverage_category" and "Calories" from df
#Rename these two columns to "DrinkMenu" and "Calories"

df.calories = df[['Beverage_category','Calories']]
df.calories.columns = ['DrinkMenu', 'Calories']
df.calories

In [None]:
sns.countplot(y='DrinkMenu', data=df.calories)

### * See number of drinks from Beverage_category

In [None]:
#Create a DataFrame to see number of drinks by Beverage_category
#assign "BeverageCategory" as index 
df_beverage = pd.DataFrame(df['Beverage_category'].value_counts())
df_beverage['BeverageCategory'] = df_beverage.index
df_beverage.columns = ['NumberofDrinks', 'BeverageCategory']
df_beverage = df_beverage.reset_index().drop('index', axis=1)
df_beverage.head()

### * WordCloud 

In [None]:
from wordcloud import WordCloud

In [None]:
plt.subplots(figsize=(25,15))
wordcloud = WordCloud(
                          background_color='white',
                          width=1920,
                          height=1080
                         ).generate(" ".join(df.Beverage_category))
plt.imshow(wordcloud)
plt.axis('off')
plt.savefig('neighbourhood.png')
plt.show()

## Q1. Which Starbucks drink has the highest calories from the dataset?

In [None]:
#To see which Starbucks drink has the highes calories Beverage_category
df.calories.sort_values("Calories", ascending=False)

In [None]:
# data visualization to see the which Starbucks drink has the highest calories
plt.figure(figsize=(15, 8))
sns.barplot(x="Beverage", y="Calories", data=df)
plt.xticks(rotation=40, ha='right')
plt.title("Starbucks Beverage Calories")
plt.show()

* Overall, White chocolate mocha (without whipped cream), followed by Java Chip (Without Whipped Cream) out of the entire starbucks drinks

## Q2. Which drink has the highest calories from the Starbucks classic espresso drinks?

In [None]:
#extract classic espress drinks from Beverage_category
classic = df.loc[(df['Beverage_category'] == 'Classic Espresso Drinks')]
classic.head()

In [None]:
# data visualization to see which classic espresso drink has the highest calories
sns.barplot(x="Beverage", y="Calories", data=classic)
plt.xticks(rotation=30, ha='right')
plt.title("Starbucks Classic Drinks Calories")

* Caffe Mocha (without whipped cream) has the highest kcal, followed by vanilla latte(or other flavoured latte) from the Starbucks classic espresso drinks.

## Q3. Highest Sugar Drink at Starbucks?

In [None]:
#To see which Starbucks drink has the highest sugars
df.sort_values(" Sugars (g)", ascending=False)

In [None]:
# data visualization to see the which Starbucks has the highest calories
plt.figure(figsize=(15, 8))
sns.barplot(x="Beverage", y=" Sugars (g)", data=df)
plt.xticks(rotation=40, ha='right')
plt.title("Starbucks Beverage Sugar(g)")
plt.show()

* FrappuccinoÂ® Blended Coffee in venti size has the highest sugar content than Classic Espresso Drinks, TazoÂ® Tea Drinks, Coffee.

## Q4. Highest Sugar Drink from Signature Espresso Drinks?

In [None]:
#extract signature espress drinks from Beverage_category
#There are 101 classic espresso drinks 
signature = df.loc[(df['Beverage_category'] == 'Signature Espresso Drinks')]
signature.tail()

In [None]:
# data visualization to see the which signature espresso drink has the highest sugars
sns.barplot(x='Beverage', y=' Sugars (g)', data=signature)
plt.xticks(rotation=30, ha='right')
plt.title("Starbucks Signature Espresso Drinks Sugars(g)")

-  I pulled a data from the Signature Espresso Drinks column. As you can see from the data, Caramel Apple Spice (without whipped cream), followed by White chocolate mocha (whithout whipped cream) have the highest sugars

## Q5. Which drink has the most calories from the Starbucks TazoÂ® Tea Drinks drinks?


In [None]:
tea = df.loc[(df['Beverage_category'] == 'TazoÂ® Tea Drinks')]
tea.head()

In [None]:
# data visualization to see the which TazoÂ® Tea drinks has the highest calories
sns.barplot(x='Beverage', y= 'Calories', data=tea)
plt.xticks(rotation=30, ha='right')
plt.title("TazoÂ® Tea Drinks Calories")
plt.show()

* TazoA Green Tea Latte has the highest calories!

## Q6. Signature vs TazoÂ® Tea Drinks calories

In [None]:
# classic & signautre coffee를 칼로리별로 

figure, (ax1, ax2) = plt.subplots(ncols=2)

figure.set_size_inches(12,5)
sns.distplot(df['Calories'].loc[df['Beverage_category'] == 'Signature Espresso Drinks'].dropna(),
             norm_hist=False, color=sns.color_palette("Paired")[3], ax=ax1)
ax1.set_title('Signature Espresso Drinks')

figure.set_size_inches(12,5)
sns.distplot(df['Calories'].loc[df['Beverage_category'] == 'TazoÂ® Tea Drinks'].dropna(),
             norm_hist=False, color=sns.color_palette("Paired")[2], ax=ax2)
ax2.set_title('Signature Espresso Drinks')

In [None]:
df.groupby('Beverage_prep').count()

## Q7. Plot Historgram 

In [None]:
#plot a historgram of calories data
calories = df["Calories"]
plt.hist(calories, bins=9, edgecolor = "black", color = "lightgreen")
plt.title("Calories in Starbucks Menu Items")
plt.xlabel("Calories in kcal")
plt.ylabel("Count")

In [None]:
#plot a historgram of total fag(g) data
fat = df[" Total Fat (g)"]
plt.hist(fat, bins=9, edgecolor = "black", color = "green")
plt.title("Fat in Starbucks Menu Items")
plt.xlabel("Fat in kcal")
plt.ylabel("Count")

In [None]:
#plot a historgram of sugar data
fat = df[" Sugars (g)"]
plt.hist(fat, bins=9, edgecolor = "black", color = "lightgreen")
plt.title("Sugars in Starbucks Menu Items")
plt.xlabel("Sugar in g")
plt.ylabel("Count")

In [None]:
# plot a historgram of choresterol data
# histogram
fat = df[" Total Fat (g)"]
plt.hist(fat, bins=9, edgecolor = "black", color = "green")
plt.title("Cholesterol in Starbucks Menu Items") # add a title
plt.xlabel("Cholesterol in mg")
plt.ylabel("Count")

## Q8. Correlation between Diffrenet Variables

### * Correlation between Calories vs Sugars (g)

In [None]:
# Calories vs Sugars (g)
sns.scatterplot(data=df, x="Calories", y=" Sugars (g)", hue="Beverage_category")
plt.legend(loc="left", bbox_to_anchor=(1.15, -0.15), ncol= 2)

* There is correlation between calories and sugars. 

### * Correlation between Calories vs Total Carbohydrates (g)

In [None]:
# Calories vs Total Carbohydrates
sns.scatterplot(data=df, x="Calories", y=" Total Carbohydrates (g) ",hue="Beverage_category")
plt.legend(loc="left", bbox_to_anchor=(1.15, -0.15), ncol= 2)
plt.show()

* There is correlation between calories and total carbohydrates 

### * Calories vs Total Fat (g)

In [None]:
# Calories vs Total Fat (g)
plt.figure(figsize=(8,6))
sns.scatterplot(data=df, x="Calories", y=" Total Fat (g)", hue="Beverage_category")
plt.legend(loc="left", bbox_to_anchor=(0.95, -0.15), ncol= 2)
plt.show()

* It's interesting to see that there isn't much correlation between calories and total fat(g) compare to calories vs sugars(g) and caloreis vs cholresterol

In [None]:
sns.barplot(data=df, x="Calories", y=" Total Fat (g)")

In [None]:
sns.countplot(y="Beverage_prep", data=df)

### * Drop Columns 

In [None]:
dfcopy = df.copy()
dfcopy
dfcopy.drop(['Beverage_prep','Trans Fat (g) ', 'Saturated Fat (g)', ' Sodium (mg)',' Total Carbohydrates (g) ', ' Dietary Fibre (g)', ' Protein (g) ', 'Vitamin A (% DV) ', 'Vitamin C (% DV)',
       ' Calcium (% DV) ', 'Caffeine (mg)'], axis=1, inplace=True)
# examing the changes
dfcopy.head()

### * Encode the input variables

In [None]:
#Encode the input variables
def Encode(dfcopy):
    for column in dfcopy.columns[dfcopy.columns.isin(['Beverage_categry', 'Beverage'])]:
        dfcopy[column] = dfcopy[column].factorize()[0]
    return dfcopy

df_en = Encode(dfcopy.copy())

In [None]:
df_en.head(15)

# Q9. Map of Correlation betweeen Different Variables

In [None]:
#Get correlation between different variables 
corr = df.corr(method='kendall')
plt.figure(figsize=(15,8))
sns.heatmap(corr, annot=True, cmap="Greens")
df.columns

# 6. Conclusion
* There are three **strong correlations according to heatmap analysis:** 
1. cholesterol and sugars  0.92
2. calories and cholesterol 0.81
3. calories and sugars  0.76
* My initial thought was there could be a strong correlation between calories vs fat(g) but there ins't much correlation compare to the three correlations listed above - **the probability for calories vs trans fat(g) is 0.46 and caloreis vs satruated fat(g) is 0.22** 
* There is a correlation between calories vs cholesterol - because of the amount of milk and its amount of cholesterol

![](https://negativespace.co/wp-content/uploads/2017/04/neagtive-space-i-love-coffee-message-Custom-1.jpg)
(📷Photo by Stokpic in [Food](https://negativespace.co/category/food/))