In [1]:
import pandas as pd
import numpy as np
import plotly
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Reading in the Data

In [2]:
df = pd.read_csv('https://github.com/brookelafferty/bdi-475-final-project/raw/main/paleo.csv')

In [3]:
df.head()

Unnamed: 0,Diet_type,Recipe_name,Cuisine_type,Protein(g),Carbs(g),Fat(g),Extraction_day,Extraction_time
0,paleo,Bone Broth From 'Nom Nom Paleo',american,5.22,1.29,3.2,2022-10-16,17:20:09
1,paleo,"Paleo Effect Asian-Glazed Pork Sides, A Sweet ...",south east asian,181.55,28.62,146.14,2022-10-16,17:20:09
2,paleo,Paleo Pumpkin Pie,american,30.91,302.59,96.76,2022-10-16,17:20:09
3,paleo,Strawberry Guacamole recipes,mexican,9.62,75.78,59.89,2022-10-16,17:20:09
4,paleo,"Asian Cauliflower Fried ""Rice"" From 'Nom Nom P...",chinese,39.84,54.08,71.55,2022-10-16,17:20:09


# General Information about Why this Dataset

**Why I Picked this Dataset**

I have a personal interest in nutrition, healthy eating, and cooking. This dataset was interesting to me because I have heard of eating a paleo diet because my mom is well read on eating whole foods and holistically for her personal health reasons. I wanted to explore the nutrition information associated with recipes that categorize as paleo to analyze these recipes and guide decision making in my own life.

**Questions to Explore**

1. How much protein do paleo recipes include?
2. How many carbs are in paleo recipes on average?
3. How much fat is in paleo recipes? 
4. How do the macro nutrients compare in paleo recipes?
5. What types of cuisine can easily be made paleo?
6. What is the most common cuisine type is included in the dataset of paleo recipes?

# Dataset Info

In [4]:
df.shape

(1274, 8)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1274 entries, 0 to 1273
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Diet_type        1274 non-null   object 
 1   Recipe_name      1274 non-null   object 
 2   Cuisine_type     1274 non-null   object 
 3   Protein(g)       1274 non-null   float64
 4   Carbs(g)         1274 non-null   float64
 5   Fat(g)           1274 non-null   float64
 6   Extraction_day   1274 non-null   object 
 7   Extraction_time  1274 non-null   object 
dtypes: float64(3), object(5)
memory usage: 79.8+ KB


# Updating Datetime Information

In [6]:
df['Extraction_day']=pd.to_datetime(df['Extraction_day'])

In [7]:
df['Extraction_time']=pd.to_datetime(df['Extraction_time'])

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1274 entries, 0 to 1273
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Diet_type        1274 non-null   object        
 1   Recipe_name      1274 non-null   object        
 2   Cuisine_type     1274 non-null   object        
 3   Protein(g)       1274 non-null   float64       
 4   Carbs(g)         1274 non-null   float64       
 5   Fat(g)           1274 non-null   float64       
 6   Extraction_day   1274 non-null   datetime64[ns]
 7   Extraction_time  1274 non-null   datetime64[ns]
dtypes: datetime64[ns](2), float64(3), object(3)
memory usage: 79.8+ KB


# Browsing the Data


In [9]:
df.head()

Unnamed: 0,Diet_type,Recipe_name,Cuisine_type,Protein(g),Carbs(g),Fat(g),Extraction_day,Extraction_time
0,paleo,Bone Broth From 'Nom Nom Paleo',american,5.22,1.29,3.2,2022-10-16,2022-12-06 17:20:09
1,paleo,"Paleo Effect Asian-Glazed Pork Sides, A Sweet ...",south east asian,181.55,28.62,146.14,2022-10-16,2022-12-06 17:20:09
2,paleo,Paleo Pumpkin Pie,american,30.91,302.59,96.76,2022-10-16,2022-12-06 17:20:09
3,paleo,Strawberry Guacamole recipes,mexican,9.62,75.78,59.89,2022-10-16,2022-12-06 17:20:09
4,paleo,"Asian Cauliflower Fried ""Rice"" From 'Nom Nom P...",chinese,39.84,54.08,71.55,2022-10-16,2022-12-06 17:20:09


In [10]:
df.tail()

Unnamed: 0,Diet_type,Recipe_name,Cuisine_type,Protein(g),Carbs(g),Fat(g),Extraction_day,Extraction_time
1269,paleo,Sunday Slow Cooker: Thai Curry Ground Beef rec...,south east asian,104.82,34.94,32.89,2022-10-16,2022-12-06 17:31:24
1270,paleo,Paleo Chicken Nuggets recipes,american,211.64,47.69,100.54,2022-10-16,2022-12-06 17:31:24
1271,paleo,Paleo Bacon-Wrapped Cauliflower,american,70.23,224.84,424.27,2022-10-16,2022-12-06 17:31:24
1272,paleo,Green Eggs and Bacon,american,28.37,13.96,101.73,2022-10-16,2022-12-06 17:31:24
1273,paleo,Kitchen Sink Paleo Lunch,american,94.21,25.93,36.16,2022-10-16,2022-12-06 17:31:24


In [11]:
df.sample(5)

Unnamed: 0,Diet_type,Recipe_name,Cuisine_type,Protein(g),Carbs(g),Fat(g),Extraction_day,Extraction_time
952,paleo,Beef Curry (Paleo),south east asian,112.87,76.07,111.59,2022-10-16,2022-12-06 17:28:33
781,paleo,Paleo Pumpkin Pie,american,76.56,180.96,184.22,2022-10-16,2022-12-06 17:26:55
677,paleo,Paleo Pizza Crust,italian,34.22,27.58,82.33,2022-10-16,2022-12-06 17:25:47
948,paleo,Paleo Banana Loaf,british,65.05,169.97,121.5,2022-10-16,2022-12-06 17:28:33
332,paleo,Banana Cocoa Snack Cake (Paleo) recipes,american,68.94,227.74,168.98,2022-10-16,2022-12-06 17:22:58


# Filtering, Aggregating, and Sorting the Dataset

In [12]:
df_cuisine_macros = df.groupby('Cuisine_type', as_index = False).agg({
    'Protein(g)': 'mean',
    'Carbs(g)' : 'mean',
    'Fat(g)' : 'mean'

})

display(df_cuisine_macros)

Unnamed: 0,Cuisine_type,Protein(g),Carbs(g),Fat(g)
0,american,90.51615,147.465981,144.378486
1,asian,78.385,95.13,62.726667
2,british,66.732407,185.514259,138.021111
3,caribbean,82.741667,70.495,50.53
4,central europe,42.848889,169.602222,129.656667
5,chinese,107.998462,70.411154,81.695385
6,eastern europe,95.701111,106.002222,137.704074
7,french,74.361494,138.918571,143.159026
8,indian,155.762222,86.123333,161.175556
9,italian,94.097719,126.05117,138.518012


In [13]:
df_cuisine_type = df.groupby('Cuisine_type', as_index = False).agg({
    'Diet_type':'count'
})

df_cuisine_type.rename(columns={'Diet_type':'num_recipes'}, inplace=True)

display(df_cuisine_type)

Unnamed: 0,Cuisine_type,num_recipes
0,american,535
1,asian,12
2,british,54
3,caribbean,6
4,central europe,9
5,chinese,26
6,eastern europe,27
7,french,154
8,indian,9
9,italian,171


In [14]:
merged_df = pd.merge(
    left = df_cuisine_type,
    right = df_cuisine_macros,
    on = 'Cuisine_type',
    how = 'left'
)

display(merged_df)

Unnamed: 0,Cuisine_type,num_recipes,Protein(g),Carbs(g),Fat(g)
0,american,535,90.51615,147.465981,144.378486
1,asian,12,78.385,95.13,62.726667
2,british,54,66.732407,185.514259,138.021111
3,caribbean,6,82.741667,70.495,50.53
4,central europe,9,42.848889,169.602222,129.656667
5,chinese,26,107.998462,70.411154,81.695385
6,eastern europe,27,95.701111,106.002222,137.704074
7,french,154,74.361494,138.918571,143.159026
8,indian,9,155.762222,86.123333,161.175556
9,italian,171,94.097719,126.05117,138.518012


In [15]:
updated_merged_df = merged_df[merged_df['num_recipes']>=30]

In [16]:
display(updated_merged_df)

Unnamed: 0,Cuisine_type,num_recipes,Protein(g),Carbs(g),Fat(g)
0,american,535,90.51615,147.465981,144.378486
2,british,54,66.732407,185.514259,138.021111
7,french,154,74.361494,138.918571,143.159026
9,italian,171,94.097719,126.05117,138.518012
12,mediterranean,106,81.221226,84.421038,140.818208
13,mexican,48,93.372708,86.832708,110.452083
15,nordic,45,103.178444,51.230667,116.785556


# Data Visualizations

In [17]:
fig = px.bar(
    df_cuisine_type,
    title = 'Number of Paleo Recipes per Cuisine Type',
    x = 'num_recipes',
    y = 'Cuisine_type',
    color = 'num_recipes',
    color_continuous_scale= 'magenta',
    text = 'num_recipes',
    template = 'plotly_white',
    height = 600
)

fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_yaxes(categoryorder='total ascending')
fig.show()

**Explanation of Findings**

After assessing this dataset, it appears to have the most amount of recipes that are classified as American dishes. There are significantly more American dishes (540) compared to the other cuisines. However, the following three tops cuisines have a similar amount of recipes. There are 170 Italian, 150 French, and 110 Mediterranean recipes. I think it is important to consider the cuisines with very few recipes included when compelting subsequent analysis. Cuisines with more recipes included in the dataset will have more accurate averages than the cuisine that only include a few recipes in the dataset. With this in mind, further visualizations include data from cuisines that have more than 30 recipes. I selected this value because this is often used in statistics when considering the valdity of the amount of data points to conduct any analysis.

In [18]:
fig = px.bar(
    updated_merged_df,
    title = 'Average amount of Fat in Paleo Recipes based on Cuisine Type',
    x = 'Cuisine_type',
    y = 'Fat(g)',
)

fig.show()

**Explanation of Visualization**

This bar chart shows the average fat content for the cuisines in this dataset with more than 30 recipes included. Another misunderstood macro nutrient is fat. While there are differences in types of fat (healthy and unhealthy), it is an important part of the diet. It is interest to see that overall paleo recipes included roughly the same amount of fat in these different cuisines. Although, the fat content in Mexican and Nordic food is less. 

In [19]:
fig = px.bar(updated_merged_df,
              x = 'Cuisine_type',
             y = ['Protein(g)', 'Fat(g)', 'Carbs(g)'],
             title = 'Macro Nutrient Breakdown Based on Cuisine Type')
fig.show()

**Explanation of Code**

This bar char displays the cuisine types with more than 30 recipes entered in the dataset. Each cuisine type shows the average amount of protein, fat, and carbs in each paleo recipe that is categorizied under this cuisine. This visualization is interesting to compare and contrast the macro nutrient values between cuisines when deciding what to make in the kitchen for pleasure or to reach certain nutrition goals.

In [20]:
fig = px.treemap(updated_merged_df,
                 title = 'Carbohydrate Breakdown Based on Cuisine Type',
                 path = ['Cuisine_type'],
                 values = 'Carbs(g)',
                 height = 600)

fig.show()

**Explanation of Visualization**

This treemap visually compares the average amount of carbohydrates in the cuisines that have more than 30 recipes included in the dataset. Often when individuals are looking to improve their nutrition carbohydrates are considered something to limit. However they are still a valuable macro nutrtient for brain function and performace. It was interesting to see the lowest average carb amount comes from the Mediterranean, Mexican, and Nordic paleo recipes.

In [21]:
fig = px.pie(df_cuisine_type, 
             values = 'num_recipes',
             names = 'Cuisine_type',
             title = 'Number of Recipes per Cuisine Type')
fig.show()

**Explanation of Visualization**

This pie chart shows the proportion of recipes provided based on the type of cuisine. This information is interesting because users understand the types of cuisines are not equal across this dataset when making conclusions. Additionally, if users want to explore the recipes listed, this chart can show them the cuisines with the most recipe ideas which may appeal to users.

In [22]:
fig = px.sunburst(merged_df,
                  title = 'Protein Break Down and Recipe Entries by Cuisine Type',
                  path = ['Cuisine_type', 'num_recipes'],
                  values = 'Protein(g)',
                  
                  width = 800,
                  height = 800)
fig.show()

**Explanation of Visualization**

This sunburst chart shows proportionally the break down of the average amount of protein in the paleo recipes in this dataset based on the type of cuisine. This data comes from the non-filtered dataset which is transparent because the amount of recipes included is listed in the outter burst. I think this is interesting to assess when looking to this dataset for analysis regarding protein intake and possible new recipes. Protein is a macro nutrient many people want to increase and this chart shows the cuisine types that are promising in protein consumption.

In [23]:
fig = px.scatter(updated_merged_df,
                 title = 'Fat vs Protein Amount in Paleo Recipes',
                 x = 'Protein(g)',
                 y = 'Fat(g)',
                 text = 'Cuisine_type',
                 template = 'plotly_white',
                 width = 800,
                 height = 800)
fig.show()

**Explanation of Visualization**

This visualization compares the average protein to average fat content for cuisines in the dataset with 30 or more recipes included. I found this interesting to compare because both macro nutrient is important to one's diet. However, some protein sources can be fattier than others. I wanted to assess how these cuisines varied and compared the amount of both these nutrinets. Further exploration would be interesting to determine for the cuisines with higher fat content and lower, what types of fats are consumed most often. I also would like to learn more about common protein sources in these areas to determine if they contribute to the fat content.