# FOOD RECIPES DATA SCIENCE PROJECT
This project is focused on the [food_recipes](https://www.kaggle.com/datasets/sarthak71/food-recipes) dataset.
Main goal of this project is to find meaningful insights from the data and showcase them in a clear and understandable way.

### Before We Start
Importing pandas and loading the dataset.

In [200]:
! pip install pandas 

import pandas as pd





[notice] A new release of pip is available: 23.0.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [212]:
path = "food_recipes.csv"
df = pd.read_csv(path)
df.head()

Unnamed: 0,recipe_title,url,record_health,vote_count,rating,description,cuisine,course,diet,prep_time,cook_time,ingredients,instructions,author,tags,category
0,Roasted Peppers And Mushroom Tortilla Pizza Re...,https://www.archanaskitchen.com/roasted-pepper...,good,434,4.958525,is a quicker version pizza to satisfy your cr...,Mexican,Dinner,Vegetarian,15 M,15 M,Tortillas|Extra Virgin Olive Oil|Garlic|Mozzar...,To begin making the Roasted Peppers And Mushro...,Divya Shivaraman,Party Food Recipes|Tea Party Recipes|Mushroom ...,Pizza Recipes
1,Thakkali Gotsu Recipe | Thakkali Curry | Spicy...,https://www.archanaskitchen.com/tomato-gotsu-r...,good,3423,4.932223,also known as the is a quick and easy to ma...,South Indian Recipes,Lunch,Vegetarian,10 M,20 M,Sesame (Gingelly) Oil|Mustard seeds (Rai/ Kadu...,To begin making Tomato Gotsu Recipe/ Thakkali ...,Archana Doshi,Vegetarian Recipes|Tomato Recipes|South Indian...,Indian Curry Recipes
2,Spicy Grilled Pineapple Salsa Recipe,https://www.archanaskitchen.com/spicy-grilled-...,good,2091,4.945959,Spicy Grilled Pineapple Salsa is a simple reci...,Mexican,Side Dish,Vegetarian,10 M,0 M,Extra Virgin Olive Oil|Pineapple|White onion|R...,To begin making the Spicy Grilled Pineapple Sa...,Archana's Kitchen,Party Starter & Appetizer Recipes|Pineapple Re...,Mexican Recipes
3,Karwar Style Dali Thoy Recipe - Toor dal Curry,https://www.archanaskitchen.com/dali-thoy-reci...,good,990,4.888889,The is a quintessential of Konkani dish whic...,Coastal Karnataka,Side Dish,High Protein Vegetarian,5 M,20 M,Arhar dal (Split Toor Dal)|Turmeric powder (Ha...,To prepare Karwar Style Dali Thoy Recipe (Toor...,Jyothi Rajesh,Side Dish Recipes|South Indian Recipes|Indian ...,Indian Curry Recipes
4,Rajma Kofta In Milk And Poppy Seed Gravy Recipe,https://www.archanaskitchen.com/rajma-kofta-in...,good,345,4.828986,Koftas are traditional Indian recipes mostly w...,North Indian Recipes,Side Dish,High Protein Vegetarian,20 M,30 M,Rajma (Large Kidney Beans)|Cashew nuts|Sultana...,To begin making Rajma Kofta In Milk And Poppy ...,RUBY PATHAK,Side Dish Recipes|Indian Lunch Recipes|Office ...,Kofta Recipes


## Table of Contents
1. [Data Cleaning](#data_cleaning)
2. [Data Analysis and Visualization](#data_analysis_and_visualization)
3. [Conclusion](#conclusion)

### Data Cleaning <a name="data_cleaning"></a>
First, we need to clean the data. We will fill the missing values with the mean of the columns, fix some of the data types and remove unnecessary columns.
We will also store the cleaned data in a new csv file.

In [213]:
# Drop the columns that are not required
# Urls, authors, tags and categories are not required for the analysis

# Let's see the different types of record_healt column
df['record_health'].groupby(df['record_health']).count()
# All of them are good, so we can drop this column

# Prep time and cook time are better off as sum of the two as total time so let's also drop them

new_df = df.drop(['url', 'record_health', 'author','tags','category','prep_time','cook_time'], axis='columns')
new_df.head()

Unnamed: 0,recipe_title,vote_count,rating,description,cuisine,course,diet,ingredients,instructions
0,Roasted Peppers And Mushroom Tortilla Pizza Re...,434,4.958525,is a quicker version pizza to satisfy your cr...,Mexican,Dinner,Vegetarian,Tortillas|Extra Virgin Olive Oil|Garlic|Mozzar...,To begin making the Roasted Peppers And Mushro...
1,Thakkali Gotsu Recipe | Thakkali Curry | Spicy...,3423,4.932223,also known as the is a quick and easy to ma...,South Indian Recipes,Lunch,Vegetarian,Sesame (Gingelly) Oil|Mustard seeds (Rai/ Kadu...,To begin making Tomato Gotsu Recipe/ Thakkali ...
2,Spicy Grilled Pineapple Salsa Recipe,2091,4.945959,Spicy Grilled Pineapple Salsa is a simple reci...,Mexican,Side Dish,Vegetarian,Extra Virgin Olive Oil|Pineapple|White onion|R...,To begin making the Spicy Grilled Pineapple Sa...
3,Karwar Style Dali Thoy Recipe - Toor dal Curry,990,4.888889,The is a quintessential of Konkani dish whic...,Coastal Karnataka,Side Dish,High Protein Vegetarian,Arhar dal (Split Toor Dal)|Turmeric powder (Ha...,To prepare Karwar Style Dali Thoy Recipe (Toor...
4,Rajma Kofta In Milk And Poppy Seed Gravy Recipe,345,4.828986,Koftas are traditional Indian recipes mostly w...,North Indian Recipes,Side Dish,High Protein Vegetarian,Rajma (Large Kidney Beans)|Cashew nuts|Sultana...,To begin making Rajma Kofta In Milk And Poppy ...


In [214]:
# 1. count the number of NaN values in time columns
num_nan = df['prep_time'].isna().sum()
print(num_nan)

# 2. for now replace the NaN values in time columns with 0
df['prep_time'] = df['prep_time'].fillna(0)
df['cook_time'] = df['cook_time'].fillna(0)

print(df['prep_time'].dtypes)
print(df['cook_time'].dtypes)

# 3. convert the datatype of prep_time and cook_time from object to str
df['prep_time'] = df['prep_time'].astype(str)
df['cook_time'] = df['cook_time'].astype(str)

# 4. convert the datatype of prep_time and cook_time to int and extract the numbers from the string
new_df['prep_time'] = df['prep_time'].str.extract('(\d+)').astype(int)
new_df['cook_time'] = df['cook_time'].str.extract('(\d+)').astype(int)

print(new_df['prep_time'].mean())

# 5. change the 0 values in prep_time and cook_time to mean of the column
new_df['prep_time'] = new_df['prep_time'].replace(0, new_df['prep_time'].mean())
new_df['cook_time'] = new_df['cook_time'].replace(0, new_df['cook_time'].mean())

# 6. add a new column total_time
new_df['total_time'] = (new_df['prep_time'] + new_df['cook_time']).astype(int)

# 7. drop the prep_time and cook_time columns
new_df = new_df.drop(['prep_time', 'cook_time'], axis='columns')

new_df.head()


30
object
object
28.82394805843426


Unnamed: 0,recipe_title,vote_count,rating,description,cuisine,course,diet,ingredients,instructions,total_time
0,Roasted Peppers And Mushroom Tortilla Pizza Re...,434,4.958525,is a quicker version pizza to satisfy your cr...,Mexican,Dinner,Vegetarian,Tortillas|Extra Virgin Olive Oil|Garlic|Mozzar...,To begin making the Roasted Peppers And Mushro...,30
1,Thakkali Gotsu Recipe | Thakkali Curry | Spicy...,3423,4.932223,also known as the is a quick and easy to ma...,South Indian Recipes,Lunch,Vegetarian,Sesame (Gingelly) Oil|Mustard seeds (Rai/ Kadu...,To begin making Tomato Gotsu Recipe/ Thakkali ...,30
2,Spicy Grilled Pineapple Salsa Recipe,2091,4.945959,Spicy Grilled Pineapple Salsa is a simple reci...,Mexican,Side Dish,Vegetarian,Extra Virgin Olive Oil|Pineapple|White onion|R...,To begin making the Spicy Grilled Pineapple Sa...,41
3,Karwar Style Dali Thoy Recipe - Toor dal Curry,990,4.888889,The is a quintessential of Konkani dish whic...,Coastal Karnataka,Side Dish,High Protein Vegetarian,Arhar dal (Split Toor Dal)|Turmeric powder (Ha...,To prepare Karwar Style Dali Thoy Recipe (Toor...,25
4,Rajma Kofta In Milk And Poppy Seed Gravy Recipe,345,4.828986,Koftas are traditional Indian recipes mostly w...,North Indian Recipes,Side Dish,High Protein Vegetarian,Rajma (Large Kidney Beans)|Cashew nuts|Sultana...,To begin making Rajma Kofta In Milk And Poppy ...,50


In [241]:
# Let's check the total_time column for any absurd values
new_df['total_time'].describe()

# Let's find the super high and low values
new_df.loc[new_df['total_time'] > 800]
new_df.loc[new_df['total_time'] < 5]

# Let's see some of the instructions for the super high and low values
#new_df.loc[6207, 'instructions']

# Since we have only one Swedish dish and it breaks our data, let's drop it
new_df = new_df.drop([6207], axis='rows')

Most of the time values are not errors but the food takes a long time to dry, soak, marinate, etc.

In [242]:

# There is an error in this Nepalese dish
new_df.loc[4917, 'total_time'] = new_df['total_time'].mean()
new_df.loc[new_df['cuisine'] == 'Nepalese']

# Convert the total_time column to int
new_df['total_time'] = new_df['total_time'].astype(int)


# Save the cleaned data to a new csv file
new_df.to_csv('cleaned_food_recipes.csv', index=False)

### Data Analysis and Visualization <a name="data_analysis_and_visualization"></a>
Now that we have cleaned the data, we can start analyzing it. We will start by looking at the total time to cook certain cuisines and diets and then we will look at the most popular cuisines and diets.

#### Before We Start
Importing bokeh and loading the cleaned dataset.

In [243]:
! pip install bokeh

from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource
from bokeh.palettes import Category20c
from bokeh.transform import factor_cmap
from bokeh.palettes import Turbo256

current_df = pd.read_csv('cleaned_food_recipes.csv')




[notice] A new release of pip is available: 23.0.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Let's find out the average cooking time of every cuisine

In [244]:
# 1. group the dataframe by cuisine
grouped_df = current_df.groupby('cuisine')
grouped_df.head()

# 2. find the average of total_time for every cuisine
avg_time = grouped_df['total_time'].mean()
print(avg_time)

# 3. sort the values in descending order
avg_time = avg_time.sort_values(ascending=False)

output_notebook()

# Visualize the data using Bokeh
# create a ColumnDataSource object
source = ColumnDataSource(avg_time.to_frame())

# create a figure object
fig_cuisine_time = figure(x_range=avg_time.index.values, height=500, width=1500, title="Average time taken to cook every cuisine")

# add axis labels
fig_cuisine_time.xaxis.axis_label = "Cuisine"
fig_cuisine_time.yaxis.axis_label = "Average time taken to cook (in minutes)"

# create a palette of colors because we have many cuisines we use Turbo256
my_palette = Turbo256[::3]

# create a bar chart
fig_cuisine_time.vbar(x='cuisine', top='total_time', width=0.9, source=source,
         line_color='white', fill_color=factor_cmap('cuisine', palette=my_palette, factors=avg_time.index.values))

# set the y axis range
fig_cuisine_time.y_range.start = 0

# set the x axis labels to appear at 45 degree angle
fig_cuisine_time.xaxis.major_label_orientation = 1

# show the plot
show(fig_cuisine_time)

cuisine
Afghan                       45.250000
African                      52.380952
American                     77.500000
Andhra                       48.637037
Arab                         55.000000
                               ...    
Thai                         43.835616
Udupi                        47.000000
Uttar Pradesh                37.666667
Uttarakhand-North Kumaon     92.500000
Vietnamese                   77.076923
Name: total_time, Length: 76, dtype: float64


Let's find out the average cooking time of every diet

In [256]:

# 1. group the dataframe by diet
grouped_df_diet = current_df.groupby('diet')
grouped_df_diet.head()

# 2. find the average of total_time for every diet
avg_time_diet = grouped_df_diet['total_time'].mean()
print(avg_time_diet)

# 3. sort the values in descending order
avg_time_diet = avg_time_diet.sort_values(ascending=False)

output_notebook()

# create a ColumnDataSource object
source = ColumnDataSource(avg_time_diet.to_frame())

# create a figure object
fig_diet_time = figure(x_range=avg_time_diet.index.values, height=500, width=750, title="Average time taken to cook every diet")

# add axis labels
fig_diet_time.xaxis.axis_label = "Diet"
fig_diet_time.yaxis.axis_label = "Average time taken to cook (in minutes)"

# create a palette of colors because we have less than 20 diets we use Category20c
my_palette = Category20c[20]

# create a bar chart
fig_diet_time.vbar(x='diet', top='total_time', width=0.9, source=source,
            line_color='white', fill_color=factor_cmap('diet', palette=my_palette, factors=avg_time_diet.index.values))

# set the y axis range
fig_diet_time.y_range.start = 0

# set the x axis labels to appear at 45 degree angle
fig_diet_time.xaxis.major_label_orientation = 1

# show the plot
show(fig_diet_time)

diet
Diabetic Friendly               67.942568
Eggetarian                      70.074359
Gluten Free                     94.818182
High Protein Non Vegetarian     52.328889
High Protein Vegetarian         72.079545
No Onion No Garlic (Sattvic)    44.103896
Non Vegeterian                  70.791946
Sugar Free Diet                 47.733333
Vegan                           73.140845
Vegetarian                      58.755568
Name: total_time, dtype: float64


Let's find out the most liked cuisine

In [254]:

# 0. multiply ratings by 20 only once to scale it between 0 and 100 
#current_df['rating'] = current_df['rating'] / 20 

# 1. group the dataframe by cuisine
grouped_df_likes = current_df.groupby('cuisine')
grouped_df_likes.head()

# 2. find the average rating for every cuisine
avg_likes = grouped_df_likes['rating'].mean()
print(avg_likes)

# 3. sort the values in descending order
avg_likes = avg_likes.sort_values(ascending=False)

output_notebook()

# create a ColumnDataSource object
source = ColumnDataSource(avg_likes.to_frame())

# create a figure object
fig_cuisine_rating = figure(x_range=avg_likes.index.values, height=500, width=1500, title="Average rating of every cuisine")

# add axis labels
fig_cuisine_rating.xaxis.axis_label = "Cuisine"
fig_cuisine_rating.yaxis.axis_label = "Average rating"

# create a palette of colors because we have many cuisines we use Turbo256
my_palette = Turbo256[::3]

# create a bar chart
fig_cuisine_rating.vbar(x='cuisine', top='rating', width=0.9, source=source, 
            line_color='white', fill_color=factor_cmap('cuisine', palette=my_palette, factors=avg_likes.index.values))

# set the y axis range
fig_cuisine_rating.y_range.start = 90

# set the x axis labels to appear at 45 degree angle
fig_cuisine_rating.xaxis.major_label_orientation = 1

# show the plot
show(fig_cuisine_rating)

cuisine
Afghan                       93.470089
African                      98.326757
American                     98.386360
Andhra                       97.940153
Arab                         97.644716
                               ...    
Thai                         98.213016
Udupi                        97.871597
Uttar Pradesh                97.612161
Uttarakhand-North Kumaon     98.906171
Vietnamese                   98.623637
Name: rating, Length: 76, dtype: float64


Let's find out the most liked diet

In [257]:
# 1. group the dataframe by diet
grouped_df_likes_diet = current_df.groupby('diet')
grouped_df_likes_diet.head()

# 2. find the average rating for every diet
avg_likes_diet = grouped_df_likes_diet['rating'].mean()
print(avg_likes_diet)

# 3. sort the values in descending order
avg_likes_diet = avg_likes_diet.sort_values(ascending=False)

output_notebook()

# create a ColumnDataSource object
source = ColumnDataSource(avg_likes_diet.to_frame())

# create a figure object   
fig_diet_rating = figure(x_range=avg_likes_diet.index.values, height=500, width=750, title="Average rating of every diet")

# add axis labels
fig_diet_rating.xaxis.axis_label = "Diet"
fig_diet_rating.yaxis.axis_label = "Average rating"

# create a palette of colors because we have less than 20 diets we use Category20c
my_palette = Category20c[20]

# create a bar chart
fig_diet_rating.vbar(x='diet', top='rating', width=0.9, source=source,
            line_color='white', fill_color=factor_cmap('diet', palette=my_palette, factors=avg_likes_diet.index.values))

# set the y axis range
fig_diet_rating.y_range.start = 97

# set the x axis labels to appear at 45 degree angle
fig_diet_rating.xaxis.major_label_orientation = 1

# show the plot
show(fig_diet_rating)

diet
Diabetic Friendly               97.816502
Eggetarian                      97.725072
Gluten Free                     97.460345
High Protein Non Vegetarian     98.117434
High Protein Vegetarian         97.824175
No Onion No Garlic (Sattvic)    97.515257
Non Vegeterian                  98.022794
Sugar Free Diet                 97.722362
Vegan                           97.443330
Vegetarian                      97.750647
Name: rating, dtype: float64


Let's see all of the above in one graph

In [258]:
# Let's create a dashboard to visualize the data using Bokeh Dashboard
# import the necessary libraries
from bokeh.io import show
from bokeh.layouts import row, column
from bokeh.io import curdoc

# create a dashboard layout
layout = row(column(fig_cuisine_time, fig_cuisine_rating, row(fig_diet_time, fig_diet_rating)))

# add the layout to curdoc
curdoc().add_root(layout)

# Show the dashboard
show(layout)

### Conclusion <a name="conclusion"></a>
Let's summarize what we have found out.

#### Why is this average preparation times important? 

The data on the average preparation times of different cuisines and diets is a valuable resource for anyone who is interested in exploring new flavors, managing their time effectively, and making healthy and delicious meals. This data can help you plan your meals with ease, and ensure that you have enough time to prepare your food while adhering to your dietary needs.

One of the most significant advantages of this data is that it helps you understand the level of difficulty associated with preparing dishes from different cuisines and diets. This can be especially helpful for those who are new to a particular cuisine or diet, as it can help them identify dishes that are easier to prepare and gradually work their way up to more complex recipes. Additionally, knowing the average preparation time of different cuisines and diets can help you identify recipes that can be prepared quickly, without compromising on flavor or nutrition.

Another important benefit of this data is that it can help you explore new cuisines and flavors with ease. By identifying recipes with shorter preparation times, you can start experimenting with new ingredients and techniques, without feeling overwhelmed or intimidated by the process. This can be particularly useful for those who are looking to expand their culinary horizons, but may not have a lot of time to spend in the kitchen.

In conclusion, the data on the average preparation times of different cuisines and diets is a valuable tool for anyone who is interested in making healthy and delicious meals while managing their time effectively. This data can help you plan your meals, explore new cuisines and flavors, and ensure that you are making the most of your time in the kitchen.

#### Why is this average rating important?

The average user rating of certain cuisines or diets is another important piece of data that can be incredibly useful for those interested in exploring new flavors and cuisines. By knowing the average rating of different cuisines or diets, you can get a sense of which dishes are particularly popular and highly regarded by others. This can help you identify recipes that are likely to be delicious, and that will appeal to your own personal tastes and preferences.

Additionally, the average user rating of certain cuisines or diets can help you discover new recipes and dishes that you may not have otherwise considered. By looking for highly rated recipes in a particular cuisine or diet, you can expand your culinary horizons and explore new flavors and ingredients that you may not have tried before.

Another important benefit of knowing the average user rating of certain cuisines or diets is that it can help you avoid recipes that are likely to be disappointing or not worth your time and effort. By looking for recipes with high ratings, you can be more confident that you are investing your time and resources into a dish that is likely to be delicious and well-received by your family and friends.

In conclusion, the average user rating of certain cuisines or diets is a valuable piece of data that can help you discover new recipes, explore new flavors, and avoid recipes that are likely to be disappointing. Whether you are a seasoned cook or a novice in the kitchen, this data can help you make the most of your culinary adventures and ensure that every meal you prepare is delicious and satisfying.