# Data Wrangling: AI-Powered Rceipe Recommender

* **Group 1:** Aktham Almomani, Victor Hsu and Yunus Tezcan
* **Course:** Introduction to Artificial Intelligence (MS-AAI-501) / University Of San Diego
* **Semester:** Summer 2024

<center>
    <img src="https://github.com/akthammomani/AI_powered_heart_disease_risk_assessment_app/assets/67468718/2cab2215-ce7f-4951-a43a-02b88a5b9fa9" alt="wrnagling">
</center>

## **Contents**<a is='Contents'></a>
* [Introduction](#Introduction)
* [Dataset](#Dataset)
* [Setup and preliminaries](#Setup_and_preliminaries)
  * [Import Libraries](#Import_libraries)
  * [Necessary Functions](#Necessary_Functions)
* [Importing dataset](#Importing_dataset)
* [Dataset Cleaning](#Dataset_Cleaning)
  * [Removing Columns with High Missing Data Percentage](#Removing_Columns_with_High_Missing_Data_Percentage)
  * [Eliminating Duplicate Recipe Names](#Eliminating_Duplicate_Recipe_Names)
  * [Evaluating and Cleaning Category Column](#Evaluating_and_Cleaning_Category_Column)
  * [Eliminating Missing Data in Cook, Prep, and Calories Columns](#Eliminating_Missing_Data_in_Cook_Prep_and_Calories_Columns)
  * [Handling Missing Values in Nutritional Columns by Replacing Nulls with Zero](#Handling_Missing_Values_in_Nutritional_Columns_by_Replacing_Nulls_with_Zero)
* [Features Engineering](#Features_Engineering)
  * [Evaluating and Cleaning Prep, Cook and Total Columns](#Evaluating_and_Cleaning_Prep_Cook_and_Total_Columns)
  * [Cleaning and Parsing Ingredients for Standardized Analysis](#Cleaning_and_Parsing_Ingredients_for_Standardized_Analysis)
  * [Utilizing spaCy (NLP) for Ingredient Extraction and Cleaning](#Utilizing_spaCy_for_Ingredient_Extraction_and_Cleaning)
  * [Developing Diet Type Feature](#Developing_Diet_Type_Feature)
  * [Developing Recommended daily values Features](#Developing_Recommended_daily_values_Features)
  * [Developing Recipe Length Feature](#Developing_Recipe_Length_Feature)
* [Saving the cleaned dataframe](#Saving_the_cleaned_dataframe)

## **Introduction**<a id='Introduction'></a>
[Contents](#Contents)

This project involves developing a comprehensive data wrangling and pre-processing pipeline for a recipe dataset. The objective is to clean, parse, and enhance the dataset to facilitate further analysis and model building for an AI-powered recipe recommender system. Key steps include handling missing values, standardizing ingredient formats using NLP techniques, verifying cooking times, and creating new features such as diet type and recipe length.

* **Data Cleaning:**
  * Handled Missing Values: Filled missing nutritional values with zeros to maintain data consistency.
  * Removed High Missing Data Percentage Columns: Eliminated columns with a high percentage of missing values to ensure data quality.
  * Eliminated Duplicates: Removed duplicate entries in recipe names to prevent redundancy.
* **Feature Engineering:**
  * Parsed and Cleaned Ingredients: Utilized NLP with SpaCy to parse and standardize the 'ingredients' column, ensuring uniform ingredient formats.
  * Verified Total Cooking Times: Ensured accuracy by summing preparation and cooking times and comparing them with the total time provided.
  * Created 'Diet Type' Column: Categorized recipes based on their nutritional content, such as low carb, low fat, high protein, low sodium, and low sugar.
  * Added 'Recipe Length' Feature: Calculated the number of words in the directions column to analyze recipe complexity and verbosity.

## **Dataset**<a id='Dataset'></a>
[Contents](#Contents)

**[All recipes website](https://www.allrecipes.com/)** is a popular online platform known for its extensive collection of user-generated recipes. It is a go-to resource for home cooks and culinary enthusiasts, offering a diverse range of recipes across various cuisines and dietary preferences. The website features detailed recipe information, including ingredients, instructions, user ratings, and reviews, making it a comprehensive resource for anyone looking to explore new dishes or improve their cooking skills.

For AI-Powered Recipe Recommender project, we will be using a dataset scraped from **[All recipes website](https://www.allrecipes.com/)**. This [dataset](https://github.com/shaansubbaiah/allrecipes-scraper/blob/main/export/scraped-07-05-21.csv), provides a wealth of information about a wide variety of recipes, which will be essential for building an effective recommendation system.

**Key Features of the Dataset:**
* Recipe Titles
* Ingredients
* Instructions
* Ratings
* Reviews
* Preparation and Cooking Times
* Nutritional information

The dataset from **[All recipes website](https://www.allrecipes.com/)** is a rich resource for AI-Recipe Recommender Project. It contains comprehensive details about each recipe, including titles, ingredients, instructions, ratings, reviews, and nutritional information. Leveraging this data, this project can deliver personalized, relevant, and appealing recipe recommendations to users, enhancing their cooking experience and meeting their dietary preferences.

## **Setup and preliminaries**<a id='Setup_and_preliminaries'></a>
[Contents](#Contents)

### **Import libraries**<a id='Import_libraries'></a>
[Contents](#Contents)

In [1]:
#Let's import the necessary packages:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import math
import scipy.stats as stats
from scipy.stats import gamma, linregress
from bs4 import BeautifulSoup
import re

# let's run below to customize notebook display:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', 4000) # 100 means 100 characters in the col

# format floating-point numbers to 2 decimal places: we'll adjust below requirement as needed for specific answers during this assignment:
pd.set_option('float_format', '{:.2f}'.format)

### **Necessary  functions**<a id='Necessary_Functions'></a>
[Contents](#Contents)

In [2]:
def summarize(df):
    """
    Generate a summary DataFrame for an input DataFrame.   
    Parameters:
    df (pd.DataFrame): The DataFrame to summarize.
    Returns:
    A datafram: containing the following columns:
              - 'unique_count': No. unique values in each column.
              - 'data_types': Data types of each column.
              - 'missing_counts': No. of missing (NaN) values in each column.
              - 'missing_percentage': Percentage of missing values in each column.
    """
    # No. of unique values for each column:
    unique_counts = df.nunique()    
    # Data types of each column:
    data_types = df.dtypes    
    # No. of missing (NaN) values in each column:
    missing_counts = df.isnull().sum()    
    # Percentage of missing values in each column:
    missing_percentage = 100 * df.isnull().mean()    
    # Concatenate the above metrics:
    summary_df = pd.concat([unique_counts, data_types, missing_counts, missing_percentage], axis=1)    
    # Rename the columns for better readibility
    summary_df.columns = ['unique_count', 'data_types', 'missing_counts', 'missing_percentage']   
    # Return summary df
    return summary_df
#-----------------------------------------------------------------------------------------------------------------#
# Function to clean and format the label
def clean_label(label):
    # Replace any non-alphabetic or non-numeric characters with nothing
    label = re.sub(r'[^a-zA-Z0-9\s]', '', label)
    # Replace spaces with underscores
    label = re.sub(r'\s+', '_', label)
    return label
#-----------------------------------------------------------------------------------------------------------------#

# Function to impute missing values based on distribution
def impute_missing(row):
    if pd.isna(row['Are_you_male_or_female_3']):
        return np.random.choice(value_counts.index, p=value_counts.values)
    else:
        return row['Are_you_male_or_female_3']


#-----------------------------------------------------------------------------------------------------------------#
def value_counts_with_percentage(df, column_name):
    # Calculate value counts
    counts = df[column_name].value_counts(dropna=False)
    
    # Calculate percentages
    percentages = df[column_name].value_counts(dropna=False, normalize=True) * 100
    
    # Combine counts and percentages into a DataFrame
    result = pd.DataFrame({
        'Count': counts,
        'Percentage': percentages
    })
    
    return result

#-----------------------------------------------------------------------------------------------------------------#
def add_char_count_column(df, source_column, new_column_name):
    """
    Adds a new column to the DataFrame that contains the count of characters 
    in the specified source column and places it next to the source column.
    
    Parameters:
    df (pd.DataFrame): The input DataFrame.
    source_column (str): The column name for which the character count is calculated.
    new_column_name (str): The name of the new column to be added.
    
    Returns:
    pd.DataFrame: The DataFrame with the new character count column placed next to the source column.
    """
    # Check if the source column exists in the DataFrame
    if source_column not in df.columns:
        raise ValueError(f"Column '{source_column}' does not exist in the DataFrame.")
    
    # Calculate the character count for each row in the source column
    df[new_column_name] = df[source_column].astype(str).apply(len)
    
    # Get the position of the source column
    col_index = df.columns.get_loc(source_column) + 1
    
    # Move the new column to the position next to the source column
    cols = list(df.columns)
    cols.insert(col_index, cols.pop(cols.index(new_column_name)))
    df = df[cols]
    
    return df

## **Importing dataset**<a id='Importing_dataset'></a>
[Contents](#Contents)

In [3]:
#First, let's load the main dataset Allrecipes dataset 2021: (https://github.com/shaansubbaiah/allrecipes-scraper/blob/main/export/scraped-07-05-21.csv)
df = pd.read_csv('scraped-07-05-21.csv')

In [4]:
#now, let's look at the shape of df:
shape = df.shape
print("Number of rows:", shape[0], "\nNumber of columns:", shape[1])

Number of rows: 35516 
Number of columns: 47


In [5]:
# Now, let's look at the top 5 rows of the df:
df.head()

Unnamed: 0,name,url,category,author,summary,rating,rating_count,review_count,ingredients,directions,prep,cook,total,servings,yield,calories,carbohydrates_g,sugars_g,fat_g,saturated_fat_g,cholesterol_mg,protein_g,dietary_fiber_g,sodium_mg,calories_from_fat,calcium_mg,iron_mg,magnesium_mg,potassium_mg,zinc_mg,phosphorus_mg,vitamin_a_iu_IU,niacin_equivalents_mg,vitamin_b6_mg,vitamin_c_mg,folate_mcg,thiamin_mg,riboflavin_mg,vitamin_e_iu_IU,vitamin_k_mcg,biotin_mcg,vitamin_b12_mcg,mono_fat_g,poly_fat_g,trans_fatty_acid_g,omega_3_fatty_acid_g,omega_6_fatty_acid_g
0,Simple Macaroni and Cheese,https://www.allrecipes.com/recipe/238691/simple-macaroni-and-cheese/,main-dish,g0dluvsugly,"A very quick and easy fix to a tasty side-dish. Fancy, designer mac and cheese often costs forty or fifty dollars to prepare when you have so many exotic and expensive cheeses, but they aren't always the best tasting. This recipe is cheap and tasty.",4.42,834,575,1 (8 ounce) box elbow macaroni ; ¼ cup butter ; ¼ cup all-purpose flour ; ½ teaspoon salt ; ground black pepper to taste ; 2 cups milk ; 2 cups shredded Cheddar cheese,"Bring a large pot of lightly salted water to a boil. Cook elbow macaroni in the boiling water, stirring occasionally until cooked through but firm to the bite, 8 minutes. Drain. Melt butter in a saucepan over medium heat; stir in flour, salt, and pepper until smooth, about 5 minutes. Slowly pour milk into butter-flour mixture while continuously stirring until mixture is smooth and bubbling, about 5 minutes. Add Cheddar cheese to milk mixture and stir until cheese is melted, 2 to 4 minutes. Fold macaroni into cheese sauce until coated.",10 mins,20 mins,30 mins,4,4 servings,630.2,55.0,7.6,33.6,20.9,99.6,26.5,2.1,777.0,302.2,567.9,2.7,61.8,380.0,,,1152.0,10.1,,0.3,165.6,0.7,,,,,,,,,,
1,Gourmet Mushroom Risotto,https://www.allrecipes.com/recipe/85389/gourmet-mushroom-risotto/,main-dish,Myleen Sagrado Sjödin,"Authentic Italian-style risotto cooked the slow and painful way, but oh so worth it. Complements grilled meats and chicken dishes very well. Check the rice by biting into it. It should be slightly al dente (or resist slightly to the tooth but not be hard in the center).",4.8,3388,2245,"6 cups chicken broth, divided ; 3 tablespoons olive oil, divided ; 1 pound portobello mushrooms, thinly sliced ; 1 pound white mushrooms, thinly sliced ; 2 shallots, diced ; 1 ½ cups Arborio rice ; ½ cup dry white wine ; sea salt to taste ; freshly ground black pepper to taste ; 3 tablespoons finely chopped chives ; 4 tablespoons butter ; ⅓ cup freshly grated Parmesan cheese","In a saucepan, warm the broth over low heat. Warm 2 tablespoons olive oil in a large saucepan over medium-high heat. Stir in the mushrooms, and cook until soft, about 3 minutes. Remove mushrooms and their liquid, and set aside. Add 1 tablespoon olive oil to skillet, and stir in the shallots. Cook 1 minute. Add rice, stirring to coat with oil, about 2 minutes. When the rice has taken on a pale, golden color, pour in wine, stirring constantly until the wine is fully absorbed. Add 1/2 cup broth to the rice, and stir until the broth is absorbed. Continue adding broth 1/2 cup at a time, stirring continuously, until the liquid is absorbed and the rice is al dente, about 15 to 20 minutes. Remove from heat, and stir in mushrooms with their liquid, butter, chives, and parmesan. Season with salt and pepper to taste.",20 mins,30 mins,50 mins,6,6 servings,430.6,56.6,4.4,16.6,6.6,29.3,11.3,2.7,1130.8,149.8,70.1,2.1,24.1,692.0,,,520.3,7.5,,3.8,36.9,0.1,,,,,,,,,,
2,Dessert Crepes,https://www.allrecipes.com/recipe/19037/dessert-crepes/,breakfast-and-brunch,ANN57,"Essential crepe recipe. Sprinkle warm crepes with sugar and lemon, or serve with cream or ice cream and fruit.",4.8,1156,794,"4 eggs, lightly beaten ; 1 ⅓ cups milk ; 2 tablespoons butter, melted ; 1 cup all-purpose flour ; 2 tablespoons white sugar ; ½ teaspoon salt","In large bowl, whisk together eggs, milk, melted butter, flour sugar and salt until smooth. Heat a medium-sized skillet or crepe pan over medium heat. Grease pan with a small amount of butter or oil applied with a brush or paper towel. Using a serving spoon or small ladle, spoon about 3 tablespoons crepe batter into hot pan, tilting the pan so that bottom surface is evenly coated. Cook over medium heat, 1 to 2 minutes on a side, or until golden brown. Serve immediately.",10 mins,10 mins,20 mins,8,8 crepes,163.8,17.2,5.3,7.7,3.4,111.1,6.4,0.4,234.5,69.0,65.6,1.2,11.2,115.4,,,347.8,2.3,,0.1,43.5,0.2,,,,,,,,,,
3,Pork Steaks,https://www.allrecipes.com/recipe/70463/pork-steaks/,meat-and-poultry,BABYLOVE1222,My mom came up with this recipe when I was a child. It is the ONLY way I will eat green onions.,4.57,689,539,"¼ cup butter ; ¼ cup soy sauce ; 1 bunch green onions ; 2 cloves garlic, minced ; 6 pork butt steaks","Melt butter in a skillet, and mix in the soy sauce. Saute the green onions and garlic until lightly browned. Place the pork steaks in the skillet, cover, and cook 8 to 10 minutes on each side, Remove cover, and continue cooking 10 minutes, or to an internal temperature of 145 degrees F (63 degrees C).",15 mins,30 mins,45 mins,6,6 servings,353.1,3.9,1.1,25.4,11.4,118.0,26.5,1.1,719.7,228.4,59.0,2.5,35.4,436.9,,,618.3,9.0,,7.4,25.8,0.7,,,,,,,,,,
4,Quick and Easy Pizza Crust,https://www.allrecipes.com/recipe/20171/quick-and-easy-pizza-crust/,bread,CHEF RIDER,"This is a great recipe when you don't want to wait for the dough to rise. You just mix it and allow it to rest for 5 minutes and then it's ready to go!! It yields a soft, chewy crust. For a real treat, I recommend you use bread flour and bake it on a pizza stone, but all-purpose flour works well too. Enjoy!",4.7,3741,2794,1 (.25 ounce) package active dry yeast ; 1 teaspoon white sugar ; 1 cup warm water (110 degrees F/45 degrees C) ; 2 ½ cups bread flour ; 2 tablespoons olive oil ; 1 teaspoon salt,"Preheat oven to 450 degrees F (230 degrees C). In a medium bowl, dissolve yeast and sugar in warm water. Let stand until creamy, about 10 minutes. Stir in flour, salt and oil. Beat until smooth. Let rest for 5 minutes. Turn dough out onto a lightly floured surface and pat or roll into a round. Transfer crust to a lightly greased pizza pan or baker's peel dusted with cornmeal. Spread with desired toppings and bake in preheated oven for 15 to 20 minutes, or until golden brown. Let baked pizza cool for 5 minutes before serving.",,,,8,1 12-inch pizza crust,169.8,28.1,0.6,4.0,0.6,,4.8,1.1,292.8,36.3,7.3,1.8,10.5,55.4,,,0.8,4.1,,,89.1,0.3,,,,,,,,,,


## **Dataset Cleaning**<a id='Dataset_Cleaning'></a>
[Contents](#Contents)

In [6]:
# First, let's make sure no white space in the dataset, so if any will be replaced with NAN:
df.replace("", np.nan, inplace=True)

In [7]:
# Then, let's review the contents of the dataset:
summary_df = summarize(df)
summary_df 

Unnamed: 0,unique_count,data_types,missing_counts,missing_percentage
name,35502,object,0,0.0
url,35513,object,0,0.0
category,22,object,0,0.0
author,20780,object,41,0.12
summary,35488,object,0,0.0
rating,259,float64,0,0.0
rating_count,1638,int64,0,0.0
review_count,1400,int64,0,0.0
ingredients,35497,object,0,0.0
directions,35480,object,0,0.0


### **Removing Columns with High Missing Data Percentage**<a id='Removing_Columns_with_High_Missing_Data_Percentage'></a>
[Contents](#Contents)

In [8]:
# Filter columns where 'missing_percentage' is 99% or higher:
high_missing_columns = summary_df[summary_df['missing_percentage'] >= 99]
high_missing_column_names = high_missing_columns.index.tolist()
high_missing_column_names

['zinc_mg',
 'phosphorus_mg',
 'vitamin_b6_mg',
 'riboflavin_mg',
 'vitamin_e_iu_IU',
 'vitamin_k_mcg',
 'biotin_mcg',
 'vitamin_b12_mcg',
 'mono_fat_g',
 'poly_fat_g',
 'trans_fatty_acid_g',
 'omega_3_fatty_acid_g',
 'omega_6_fatty_acid_g']

**Alright above the columns will be dropped from the dataset**

In [9]:
# Here, will drop columns where 'missing_percentage' is 99% or higher:
df = df.drop(columns=high_missing_column_names)


In [10]:
#now, let's look at the shape of df:
shape = df.shape
print("Number of rows:", shape[0], "\nNumber of columns:", shape[1])

Number of rows: 35516 
Number of columns: 34


### **Eliminating Duplicate Recipe Names**<a id='Eliminating_Duplicate_Recipe_Names'></a>
[Contents](#Contents)

In [11]:
# Then, let's review the contents of the dataset:
summary_df = summarize(df)
summary_df 

Unnamed: 0,unique_count,data_types,missing_counts,missing_percentage
name,35502,object,0,0.0
url,35513,object,0,0.0
category,22,object,0,0.0
author,20780,object,41,0.12
summary,35488,object,0,0.0
rating,259,float64,0,0.0
rating_count,1638,int64,0,0.0
review_count,1400,int64,0,0.0
ingredients,35497,object,0,0.0
directions,35480,object,0,0.0


**Based on above, looks like we have duplciates based on recipe name and url**

In [12]:
# Drop duplicates based on recipe name:
df = df.drop_duplicates(subset=['name'])

In [13]:
#now, let's look at the shape of df:
shape = df.shape
print("Number of rows:", shape[0], "\nNumber of columns:", shape[1])

Number of rows: 35502 
Number of columns: 34


In [14]:
# Then, let's review the contents of the dataset:
summary_df = summarize(df)
summary_df 

Unnamed: 0,unique_count,data_types,missing_counts,missing_percentage
name,35502,object,0,0.0
url,35502,object,0,0.0
category,22,object,0,0.0
author,20777,object,41,0.12
summary,35477,object,0,0.0
rating,259,float64,0,0.0
rating_count,1637,int64,0,0.0
review_count,1400,int64,0,0.0
ingredients,35486,object,0,0.0
directions,35469,object,0,0.0


### **Evaluating and Cleaning Category Column**<a id='Evaluating_and_Cleaning_Category_Column'></a>
[Contents](#Contents)

In [15]:
# Now, let's evaluate category column: 
value_counts_with_percentage(df, 'category')

Unnamed: 0_level_0,Count,Percentage
category,Unnamed: 1_level_1,Unnamed: 2_level_1
appetizers-and-snacks,5717,16.1
desserts,3959,11.15
side-dish,3227,9.09
world-cuisine,3117,8.78
main-dish,2948,8.3
salad,2809,7.91
bread,2736,7.71
soups-stews-and-chili,2631,7.41
meat-and-poultry,1909,5.38
trusted-brands-recipes-and-tips,1619,4.56


**Based on above, let's drop: '515',  '251', 'ingredients' and 'uncategorized'**

In [16]:
# Filter out rows where the column contains the specific values
values_to_drop = ['515', '251', 'ingredients','uncategorized']
df = df[~df['category'].isin(values_to_drop)]

In [17]:
# Now, let's evaluate category column: 
value_counts_with_percentage(df, 'category')

Unnamed: 0_level_0,Count,Percentage
category,Unnamed: 1_level_1,Unnamed: 2_level_1
appetizers-and-snacks,5717,16.45
desserts,3959,11.39
side-dish,3227,9.29
world-cuisine,3117,8.97
main-dish,2948,8.48
salad,2809,8.08
bread,2736,7.87
soups-stews-and-chili,2631,7.57
meat-and-poultry,1909,5.49
trusted-brands-recipes-and-tips,1619,4.66


### **Eliminating Missing Data in Cook, Prep, and Calories Columns**<a id='Eliminating_Missing_Data_in_Cook_Prep_and_Calories_Columns'></a>
[Contents](#Contents)

**Below let's drop Nulls from both Prep and Cook Columns**

In [18]:
#Next, let's remove nulls when prep & cook are nulls:
# Drop rows where both col_1 and col_2 are null
df = df.dropna(subset=['prep', 'cook'], how='all')
# Then, let's review the contents of the dataset:
summary_df = summarize(df)
summary_df 

Unnamed: 0,unique_count,data_types,missing_counts,missing_percentage
name,32796,object,0,0.0
url,32796,object,0,0.0
category,18,object,0,0.0
author,19890,object,33,0.1
summary,32783,object,0,0.0
rating,255,float64,0,0.0
rating_count,1596,int64,0,0.0
review_count,1358,int64,0,0.0
ingredients,32787,object,0,0.0
directions,32766,object,0,0.0


In [19]:
df = df.dropna(subset=['cook'], how='all')
# Then, let's review the contents of the dataset:
summary_df = summarize(df)
summary_df 

Unnamed: 0,unique_count,data_types,missing_counts,missing_percentage
name,27296,object,0,0.0
url,27296,object,0,0.0
category,18,object,0,0.0
author,17090,object,26,0.1
summary,27286,object,0,0.0
rating,253,float64,0,0.0
rating_count,1548,int64,0,0.0
review_count,1322,int64,0,0.0
ingredients,27289,object,0,0.0
directions,27284,object,0,0.0


In [20]:
df = df.dropna(subset=['prep'], how='all')
# Then, let's review the contents of the dataset:
summary_df = summarize(df)
summary_df 

Unnamed: 0,unique_count,data_types,missing_counts,missing_percentage
name,27162,object,0,0.0
url,27162,object,0,0.0
category,18,object,0,0.0
author,17021,object,26,0.1
summary,27152,object,0,0.0
rating,253,float64,0,0.0
rating_count,1546,int64,0,0.0
review_count,1321,int64,0,0.0
ingredients,27155,object,0,0.0
directions,27150,object,0,0.0


**Below, let's drop Nulls from calories columns**

In [21]:
df = df.dropna(subset=['calories'], how='all')
# Then, let's review the contents of the dataset:
summary_df = summarize(df)
summary_df 

Unnamed: 0,unique_count,data_types,missing_counts,missing_percentage
name,27109,object,0,0.0
url,27109,object,0,0.0
category,18,object,0,0.0
author,17017,object,26,0.1
summary,27099,object,0,0.0
rating,253,float64,0,0.0
rating_count,1546,int64,0,0.0
review_count,1320,int64,0,0.0
ingredients,27102,object,0,0.0
directions,27097,object,0,0.0


In [22]:
print(df.columns)

Index(['name', 'url', 'category', 'author', 'summary', 'rating',
       'rating_count', 'review_count', 'ingredients', 'directions', 'prep',
       'cook', 'total', 'servings', 'yield', 'calories', 'carbohydrates_g',
       'sugars_g', 'fat_g', 'saturated_fat_g', 'cholesterol_mg', 'protein_g',
       'dietary_fiber_g', 'sodium_mg', 'calories_from_fat', 'calcium_mg',
       'iron_mg', 'magnesium_mg', 'potassium_mg', 'vitamin_a_iu_IU',
       'niacin_equivalents_mg', 'vitamin_c_mg', 'folate_mcg', 'thiamin_mg'],
      dtype='object')


### **Handling Missing Values in Nutritional Columns by Replacing Nulls with Zero**<a id='Handling_Missing_Values_in_Nutritional_Columns_by_Replacing_Nulls_with_Zero'></a>
[Contents](#Contents)

**To handle missing values (nulls) in nutritional columns in our dataset, we'll replacing Nulls with zero.**

In [23]:
nutrition_columns = ['carbohydrates_g',
                     'sugars_g', 'fat_g', 'saturated_fat_g', 'cholesterol_mg', 'protein_g',
                     'dietary_fiber_g', 'sodium_mg', 'calories_from_fat', 'calcium_mg',
                     'iron_mg', 'magnesium_mg', 'potassium_mg', 'vitamin_a_iu_IU',
                     'niacin_equivalents_mg', 'vitamin_c_mg', 'folate_mcg', 'thiamin_mg']
df[nutrition_columns] = df[nutrition_columns].fillna(0)
# Then, let's review the contents of the dataset:
summary_df = summarize(df)
summary_df 


Unnamed: 0,unique_count,data_types,missing_counts,missing_percentage
name,27109,object,0,0.0
url,27109,object,0,0.0
category,18,object,0,0.0
author,17017,object,26,0.1
summary,27099,object,0,0.0
rating,253,float64,0,0.0
rating_count,1546,int64,0,0.0
review_count,1320,int64,0,0.0
ingredients,27102,object,0,0.0
directions,27097,object,0,0.0


## **Features Engineering**<a id='Features_Engineering'></a>
[Contents](#Contents)

### **Evaluating and Cleaning Prep, Cook and Total Columns**<a id='Evaluating_and_Cleaning_Prep_Cook_and_Total_Columns'></a>
[Contents](#Contents)

**Clean up Prep and Cook Columns and verify the total column**

In [24]:
def parse_time(time_str):
    """
    Parse a time string like '1 hr 25 mins' into total minutes.
    """
    if pd.isnull(time_str):
        return 0
    
    # Normalize and clean the time string
    time_str = time_str.strip().lower()
    
    # Remove any invalid parts such as 'day', 'week', etc.
    time_str = re.sub(r'[^0-9hrsmins ]', '', time_str)
    
    # Replace common misformats
    time_str = time_str.replace('s ', 's').replace(' min', ' mins').replace(' hr', ' hrs')
    time_str = time_str.replace('minss', 'mins')  # Handle specific typo
    time_str = time_str.replace('hrss', 'hrs')    # Handle another specific typo

    hours = 0
    minutes = 0
    
    try:
        if 'hrs' in time_str:
            parts = time_str.split('hrs')
            hours_part = parts[0].strip()
            hours = int(hours_part) if hours_part else 0
            if len(parts) > 1 and 'mins' in parts[1]:
                minutes_part = parts[1].replace('mins', '').strip()
                minutes = int(minutes_part) if minutes_part else 0
        elif 'mins' in time_str:
            minutes = int(time_str.replace('mins', '').strip())
        return hours * 60 + minutes
    except ValueError as e:
        print(f"Error parsing time: {time_str} -> {e}")
        return 0

In [25]:
def verify_total_times(df):
    """
    Verify if the total times in the 'total' column are correct based on 'prep' and 'cook' columns.
    """
    df['prep_mins'] = df['prep'].apply(parse_time)
    df['cook_mins'] = df['cook'].apply(parse_time)
    df['total_mins'] = df['total'].apply(parse_time)
    
    df['calculated_total_mins'] = df['prep_mins'] + df['cook_mins']
    df['is_correct'] = df['total_mins'] == df['calculated_total_mins']
    
    # Drop the intermediate columns used for calculation
    df.drop(columns=['prep_mins', 'cook_mins', 'total_mins', 'calculated_total_mins'], inplace=True)
    
    return df

In [26]:
# Verify the total times
df = verify_total_times(df)
# Now, let's evaluate category column: 
value_counts_with_percentage(df, 'is_correct')

Unnamed: 0_level_0,Count,Percentage
is_correct,Unnamed: 1_level_1,Unnamed: 2_level_1
True,18297,67.49
False,8812,32.51


**"total" column is wrong in more than 32% of the recipes so let's drop this column and create a new column**

In [27]:
df = df.drop(columns=['total'])

In [28]:
def format_time(minutes):
    """
    Format minutes into a string like '1 hr 25 mins'.
    """
    if minutes < 60:
        return f"{minutes} mins"
    else:
        hrs = minutes // 60
        mins = minutes % 60
        if mins == 0:
            return f"{hrs} hr" if hrs == 1 else f"{hrs} hrs"
        else:
            return f"{hrs} hr {mins} mins" if hrs == 1 else f"{hrs} hrs {mins} mins"

In [29]:
# Parse the 'prep' and 'cook' columns into minutes
df['prep_mins'] = df['prep'].apply(parse_time)
df['cook_mins'] = df['cook'].apply(parse_time)

# Sum the 'prep' and 'cook' times
df['total_mins'] = df['prep_mins'] + df['cook_mins']

# Format the total minutes back into the original format
df['total'] = df['total_mins'].apply(format_time)

# Drop the intermediate columns
df.drop(columns=['prep_mins', 'cook_mins', 'total_mins'], inplace=True)

# Reorder the columns to place 'total' after 'cook'
cols = list(df.columns)
total_index = cols.index('total')
cook_index = cols.index('cook')
cols.insert(cook_index + 1, cols.pop(total_index))
df = df[cols]

df.head()

Unnamed: 0,name,url,category,author,summary,rating,rating_count,review_count,ingredients,directions,prep,cook,total,servings,yield,calories,carbohydrates_g,sugars_g,fat_g,saturated_fat_g,cholesterol_mg,protein_g,dietary_fiber_g,sodium_mg,calories_from_fat,calcium_mg,iron_mg,magnesium_mg,potassium_mg,vitamin_a_iu_IU,niacin_equivalents_mg,vitamin_c_mg,folate_mcg,thiamin_mg,is_correct
0,Simple Macaroni and Cheese,https://www.allrecipes.com/recipe/238691/simple-macaroni-and-cheese/,main-dish,g0dluvsugly,"A very quick and easy fix to a tasty side-dish. Fancy, designer mac and cheese often costs forty or fifty dollars to prepare when you have so many exotic and expensive cheeses, but they aren't always the best tasting. This recipe is cheap and tasty.",4.42,834,575,1 (8 ounce) box elbow macaroni ; ¼ cup butter ; ¼ cup all-purpose flour ; ½ teaspoon salt ; ground black pepper to taste ; 2 cups milk ; 2 cups shredded Cheddar cheese,"Bring a large pot of lightly salted water to a boil. Cook elbow macaroni in the boiling water, stirring occasionally until cooked through but firm to the bite, 8 minutes. Drain. Melt butter in a saucepan over medium heat; stir in flour, salt, and pepper until smooth, about 5 minutes. Slowly pour milk into butter-flour mixture while continuously stirring until mixture is smooth and bubbling, about 5 minutes. Add Cheddar cheese to milk mixture and stir until cheese is melted, 2 to 4 minutes. Fold macaroni into cheese sauce until coated.",10 mins,20 mins,30 mins,4,4 servings,630.2,55.0,7.6,33.6,20.9,99.6,26.5,2.1,777.0,302.2,567.9,2.7,61.8,380.0,1152.0,10.1,0.3,165.6,0.7,True
1,Gourmet Mushroom Risotto,https://www.allrecipes.com/recipe/85389/gourmet-mushroom-risotto/,main-dish,Myleen Sagrado Sjödin,"Authentic Italian-style risotto cooked the slow and painful way, but oh so worth it. Complements grilled meats and chicken dishes very well. Check the rice by biting into it. It should be slightly al dente (or resist slightly to the tooth but not be hard in the center).",4.8,3388,2245,"6 cups chicken broth, divided ; 3 tablespoons olive oil, divided ; 1 pound portobello mushrooms, thinly sliced ; 1 pound white mushrooms, thinly sliced ; 2 shallots, diced ; 1 ½ cups Arborio rice ; ½ cup dry white wine ; sea salt to taste ; freshly ground black pepper to taste ; 3 tablespoons finely chopped chives ; 4 tablespoons butter ; ⅓ cup freshly grated Parmesan cheese","In a saucepan, warm the broth over low heat. Warm 2 tablespoons olive oil in a large saucepan over medium-high heat. Stir in the mushrooms, and cook until soft, about 3 minutes. Remove mushrooms and their liquid, and set aside. Add 1 tablespoon olive oil to skillet, and stir in the shallots. Cook 1 minute. Add rice, stirring to coat with oil, about 2 minutes. When the rice has taken on a pale, golden color, pour in wine, stirring constantly until the wine is fully absorbed. Add 1/2 cup broth to the rice, and stir until the broth is absorbed. Continue adding broth 1/2 cup at a time, stirring continuously, until the liquid is absorbed and the rice is al dente, about 15 to 20 minutes. Remove from heat, and stir in mushrooms with their liquid, butter, chives, and parmesan. Season with salt and pepper to taste.",20 mins,30 mins,50 mins,6,6 servings,430.6,56.6,4.4,16.6,6.6,29.3,11.3,2.7,1130.8,149.8,70.1,2.1,24.1,692.0,520.3,7.5,3.8,36.9,0.1,True
2,Dessert Crepes,https://www.allrecipes.com/recipe/19037/dessert-crepes/,breakfast-and-brunch,ANN57,"Essential crepe recipe. Sprinkle warm crepes with sugar and lemon, or serve with cream or ice cream and fruit.",4.8,1156,794,"4 eggs, lightly beaten ; 1 ⅓ cups milk ; 2 tablespoons butter, melted ; 1 cup all-purpose flour ; 2 tablespoons white sugar ; ½ teaspoon salt","In large bowl, whisk together eggs, milk, melted butter, flour sugar and salt until smooth. Heat a medium-sized skillet or crepe pan over medium heat. Grease pan with a small amount of butter or oil applied with a brush or paper towel. Using a serving spoon or small ladle, spoon about 3 tablespoons crepe batter into hot pan, tilting the pan so that bottom surface is evenly coated. Cook over medium heat, 1 to 2 minutes on a side, or until golden brown. Serve immediately.",10 mins,10 mins,20 mins,8,8 crepes,163.8,17.2,5.3,7.7,3.4,111.1,6.4,0.4,234.5,69.0,65.6,1.2,11.2,115.4,347.8,2.3,0.1,43.5,0.2,True
3,Pork Steaks,https://www.allrecipes.com/recipe/70463/pork-steaks/,meat-and-poultry,BABYLOVE1222,My mom came up with this recipe when I was a child. It is the ONLY way I will eat green onions.,4.57,689,539,"¼ cup butter ; ¼ cup soy sauce ; 1 bunch green onions ; 2 cloves garlic, minced ; 6 pork butt steaks","Melt butter in a skillet, and mix in the soy sauce. Saute the green onions and garlic until lightly browned. Place the pork steaks in the skillet, cover, and cook 8 to 10 minutes on each side, Remove cover, and continue cooking 10 minutes, or to an internal temperature of 145 degrees F (63 degrees C).",15 mins,30 mins,45 mins,6,6 servings,353.1,3.9,1.1,25.4,11.4,118.0,26.5,1.1,719.7,228.4,59.0,2.5,35.4,436.9,618.3,9.0,7.4,25.8,0.7,True
5,Chicken Parmesan,https://www.allrecipes.com/recipe/223042/chicken-parmesan/,world-cuisine,Chef John,"My version of chicken parmesan is a little different than what they do in the restaurants, with less sauce and a crispier crust.",4.83,4245,2662,"4 skinless, boneless chicken breast halves ; salt and freshly ground black pepper to taste ; 2 eggs ; 1 cup panko bread crumbs, or more as needed ; ½ cup grated Parmesan cheese ; 2 tablespoons all-purpose flour, or more if needed ; 1 cup olive oil for frying ; ½ cup prepared tomato sauce ; ¼ cup fresh mozzarella, cut into small cubes ; ¼ cup chopped fresh basil ; ½ cup grated provolone cheese ; ¼ cup grated Parmesan cheese ; 1 tablespoon olive oil","Preheat an oven to 450 degrees F (230 degrees C). Place chicken breasts between two sheets of heavy plastic (resealable freezer bags work well) on a solid, level surface. Firmly pound chicken with the smooth side of a meat mallet to a thickness of 1/2-inch. Season chicken thoroughly with salt and pepper. Beat eggs in a shallow bowl and set aside. Mix bread crumbs and 1/2 cup Parmesan cheese in a separate bowl, set aside. Place flour in a sifter or strainer; sprinkle over chicken breasts, evenly coating both sides. Dip flour coated chicken breast in beaten eggs. Transfer breast to breadcrumb mixture, pressing the crumbs into both sides. Repeat for each breast. Set aside breaded chicken breasts for about 15 minutes. Heat 1 cup olive oil in a large skillet on medium-high heat until it begins to shimmer. Cook chicken until golden, about 2 minutes on each side. The chicken will finish cooking in the oven. Place chicken in a baking dish and top each breast with about 1/3 cup of tomato sauce. Layer each chicken breast with equal amounts of mozzarella cheese, fresh basil, and provolone cheese. Sprinkle 1 to 2 tablespoons of Parmesan cheese on top and drizzle with 1 tablespoon olive oil. Bake in the preheated oven until cheese is browned and bubbly, and chicken breasts are no longer pink in the center, 15 to 20 minutes. An instant-read thermometer inserted into the center should read at least 165 degrees F (74 degrees C).",25 mins,20 mins,45 mins,4,4 servings,470.8,24.8,1.7,24.9,9.1,186.7,42.1,0.6,840.3,223.8,380.2,2.1,44.4,388.9,628.4,18.9,2.6,30.9,0.1,False


In [30]:
def parse_time(time_str):
    """
    Parse a time string like '1 hr 25 mins' into total minutes.
    """
    if pd.isnull(time_str):
        return 0
    
    # Normalize and clean the time string
    time_str = time_str.strip().lower()
    
    # Replace common misformats
    time_str = re.sub(r'(\d)\s+hr', r'\1 hr', time_str)  # Correct spacing issues in 'hr'
    time_str = re.sub(r'(\d)\s+min', r'\1 min', time_str)  # Correct spacing issues in 'min'
    time_str = time_str.replace('minss', 'mins')  # Handle specific typo
    time_str = time_str.replace('hrss', 'hrs')    # Handle another specific typo

    # Remove any invalid parts such as 'day', 'week', etc.
    time_str = re.sub(r'[^0-9hrsmins ]', '', time_str)
    
    hours = 0
    minutes = 0
    
    try:
        if 'hrs' in time_str or 'hr' in time_str:
            parts = re.split(r'hrs|hr', time_str)
            hours_part = parts[0].strip()
            hours = int(hours_part) if hours_part else 0
            if len(parts) > 1 and 'mins' in parts[1]:
                minutes_part = parts[1].replace('mins', '').replace('min', '').strip()
                minutes = int(minutes_part) if minutes_part else 0
        elif 'mins' in time_str or 'min' in time_str:
            minutes = int(time_str.replace('mins', '').replace('min', '').strip())
        return hours * 60 + minutes
    except ValueError as e:
        print(f"Error parsing time: {time_str} -> {e}")
        return 0

def format_time(minutes):
    """
    Format minutes into a string like '1 hr 25 mins'.
    """
    if minutes < 60:
        return f"{minutes} mins"
    else:
        hrs = minutes // 60
        mins = minutes % 60
        if mins == 0:
            return f"{hrs} hr" if hrs == 1 else f"{hrs} hrs"
        else:
            return f"{hrs} hr {mins} mins" if hrs == 1 else f"{hrs} hrs {mins} mins"

def fix_time_format(df, column):
    """
    Fix the time format for a given column in the DataFrame.
    """
    df[column] = df[column].apply(parse_time).apply(format_time)
    return df

In [31]:
# Fix the time format for 'prep' and 'cook' columns
df = fix_time_format(df, 'prep')
df = fix_time_format(df, 'cook')

df.head()

Unnamed: 0,name,url,category,author,summary,rating,rating_count,review_count,ingredients,directions,prep,cook,total,servings,yield,calories,carbohydrates_g,sugars_g,fat_g,saturated_fat_g,cholesterol_mg,protein_g,dietary_fiber_g,sodium_mg,calories_from_fat,calcium_mg,iron_mg,magnesium_mg,potassium_mg,vitamin_a_iu_IU,niacin_equivalents_mg,vitamin_c_mg,folate_mcg,thiamin_mg,is_correct
0,Simple Macaroni and Cheese,https://www.allrecipes.com/recipe/238691/simple-macaroni-and-cheese/,main-dish,g0dluvsugly,"A very quick and easy fix to a tasty side-dish. Fancy, designer mac and cheese often costs forty or fifty dollars to prepare when you have so many exotic and expensive cheeses, but they aren't always the best tasting. This recipe is cheap and tasty.",4.42,834,575,1 (8 ounce) box elbow macaroni ; ¼ cup butter ; ¼ cup all-purpose flour ; ½ teaspoon salt ; ground black pepper to taste ; 2 cups milk ; 2 cups shredded Cheddar cheese,"Bring a large pot of lightly salted water to a boil. Cook elbow macaroni in the boiling water, stirring occasionally until cooked through but firm to the bite, 8 minutes. Drain. Melt butter in a saucepan over medium heat; stir in flour, salt, and pepper until smooth, about 5 minutes. Slowly pour milk into butter-flour mixture while continuously stirring until mixture is smooth and bubbling, about 5 minutes. Add Cheddar cheese to milk mixture and stir until cheese is melted, 2 to 4 minutes. Fold macaroni into cheese sauce until coated.",10 mins,20 mins,30 mins,4,4 servings,630.2,55.0,7.6,33.6,20.9,99.6,26.5,2.1,777.0,302.2,567.9,2.7,61.8,380.0,1152.0,10.1,0.3,165.6,0.7,True
1,Gourmet Mushroom Risotto,https://www.allrecipes.com/recipe/85389/gourmet-mushroom-risotto/,main-dish,Myleen Sagrado Sjödin,"Authentic Italian-style risotto cooked the slow and painful way, but oh so worth it. Complements grilled meats and chicken dishes very well. Check the rice by biting into it. It should be slightly al dente (or resist slightly to the tooth but not be hard in the center).",4.8,3388,2245,"6 cups chicken broth, divided ; 3 tablespoons olive oil, divided ; 1 pound portobello mushrooms, thinly sliced ; 1 pound white mushrooms, thinly sliced ; 2 shallots, diced ; 1 ½ cups Arborio rice ; ½ cup dry white wine ; sea salt to taste ; freshly ground black pepper to taste ; 3 tablespoons finely chopped chives ; 4 tablespoons butter ; ⅓ cup freshly grated Parmesan cheese","In a saucepan, warm the broth over low heat. Warm 2 tablespoons olive oil in a large saucepan over medium-high heat. Stir in the mushrooms, and cook until soft, about 3 minutes. Remove mushrooms and their liquid, and set aside. Add 1 tablespoon olive oil to skillet, and stir in the shallots. Cook 1 minute. Add rice, stirring to coat with oil, about 2 minutes. When the rice has taken on a pale, golden color, pour in wine, stirring constantly until the wine is fully absorbed. Add 1/2 cup broth to the rice, and stir until the broth is absorbed. Continue adding broth 1/2 cup at a time, stirring continuously, until the liquid is absorbed and the rice is al dente, about 15 to 20 minutes. Remove from heat, and stir in mushrooms with their liquid, butter, chives, and parmesan. Season with salt and pepper to taste.",20 mins,30 mins,50 mins,6,6 servings,430.6,56.6,4.4,16.6,6.6,29.3,11.3,2.7,1130.8,149.8,70.1,2.1,24.1,692.0,520.3,7.5,3.8,36.9,0.1,True
2,Dessert Crepes,https://www.allrecipes.com/recipe/19037/dessert-crepes/,breakfast-and-brunch,ANN57,"Essential crepe recipe. Sprinkle warm crepes with sugar and lemon, or serve with cream or ice cream and fruit.",4.8,1156,794,"4 eggs, lightly beaten ; 1 ⅓ cups milk ; 2 tablespoons butter, melted ; 1 cup all-purpose flour ; 2 tablespoons white sugar ; ½ teaspoon salt","In large bowl, whisk together eggs, milk, melted butter, flour sugar and salt until smooth. Heat a medium-sized skillet or crepe pan over medium heat. Grease pan with a small amount of butter or oil applied with a brush or paper towel. Using a serving spoon or small ladle, spoon about 3 tablespoons crepe batter into hot pan, tilting the pan so that bottom surface is evenly coated. Cook over medium heat, 1 to 2 minutes on a side, or until golden brown. Serve immediately.",10 mins,10 mins,20 mins,8,8 crepes,163.8,17.2,5.3,7.7,3.4,111.1,6.4,0.4,234.5,69.0,65.6,1.2,11.2,115.4,347.8,2.3,0.1,43.5,0.2,True
3,Pork Steaks,https://www.allrecipes.com/recipe/70463/pork-steaks/,meat-and-poultry,BABYLOVE1222,My mom came up with this recipe when I was a child. It is the ONLY way I will eat green onions.,4.57,689,539,"¼ cup butter ; ¼ cup soy sauce ; 1 bunch green onions ; 2 cloves garlic, minced ; 6 pork butt steaks","Melt butter in a skillet, and mix in the soy sauce. Saute the green onions and garlic until lightly browned. Place the pork steaks in the skillet, cover, and cook 8 to 10 minutes on each side, Remove cover, and continue cooking 10 minutes, or to an internal temperature of 145 degrees F (63 degrees C).",15 mins,30 mins,45 mins,6,6 servings,353.1,3.9,1.1,25.4,11.4,118.0,26.5,1.1,719.7,228.4,59.0,2.5,35.4,436.9,618.3,9.0,7.4,25.8,0.7,True
5,Chicken Parmesan,https://www.allrecipes.com/recipe/223042/chicken-parmesan/,world-cuisine,Chef John,"My version of chicken parmesan is a little different than what they do in the restaurants, with less sauce and a crispier crust.",4.83,4245,2662,"4 skinless, boneless chicken breast halves ; salt and freshly ground black pepper to taste ; 2 eggs ; 1 cup panko bread crumbs, or more as needed ; ½ cup grated Parmesan cheese ; 2 tablespoons all-purpose flour, or more if needed ; 1 cup olive oil for frying ; ½ cup prepared tomato sauce ; ¼ cup fresh mozzarella, cut into small cubes ; ¼ cup chopped fresh basil ; ½ cup grated provolone cheese ; ¼ cup grated Parmesan cheese ; 1 tablespoon olive oil","Preheat an oven to 450 degrees F (230 degrees C). Place chicken breasts between two sheets of heavy plastic (resealable freezer bags work well) on a solid, level surface. Firmly pound chicken with the smooth side of a meat mallet to a thickness of 1/2-inch. Season chicken thoroughly with salt and pepper. Beat eggs in a shallow bowl and set aside. Mix bread crumbs and 1/2 cup Parmesan cheese in a separate bowl, set aside. Place flour in a sifter or strainer; sprinkle over chicken breasts, evenly coating both sides. Dip flour coated chicken breast in beaten eggs. Transfer breast to breadcrumb mixture, pressing the crumbs into both sides. Repeat for each breast. Set aside breaded chicken breasts for about 15 minutes. Heat 1 cup olive oil in a large skillet on medium-high heat until it begins to shimmer. Cook chicken until golden, about 2 minutes on each side. The chicken will finish cooking in the oven. Place chicken in a baking dish and top each breast with about 1/3 cup of tomato sauce. Layer each chicken breast with equal amounts of mozzarella cheese, fresh basil, and provolone cheese. Sprinkle 1 to 2 tablespoons of Parmesan cheese on top and drizzle with 1 tablespoon olive oil. Bake in the preheated oven until cheese is browned and bubbly, and chicken breasts are no longer pink in the center, 15 to 20 minutes. An instant-read thermometer inserted into the center should read at least 165 degrees F (74 degrees C).",25 mins,20 mins,45 mins,4,4 servings,470.8,24.8,1.7,24.9,9.1,186.7,42.1,0.6,840.3,223.8,380.2,2.1,44.4,388.9,628.4,18.9,2.6,30.9,0.1,False


In [32]:
df = df[~df['cook'].isin(['0 mins'])]

In [33]:
df = df.drop(columns=['total'])

In [34]:
def add_total_time_column(df, prep_col='prep', cook_col='cook', total_col='total'):
    """
    Add a new column to the DataFrame that sums the prep and cook times.
    """
    # Parse the 'prep' and 'cook' columns into minutes
    df['prep_mins'] = df[prep_col].apply(parse_time)
    df['cook_mins'] = df[cook_col].apply(parse_time)

    # Sum the 'prep' and 'cook' times
    df[total_col] = df['prep_mins'] + df['cook_mins']

    # Format the total minutes back into the original format
    df[total_col] = df[total_col].apply(format_time)

    # Drop the intermediate columns
    df.drop(columns=['prep_mins', 'cook_mins'], inplace=True)

    return df

In [35]:
# Add the total time column
df = add_total_time_column(df)

In [36]:
df = df.drop(columns=['yield'])

In [37]:
df = df.drop(columns=['is_correct'])

### **Cleaning and Parsing Ingredients for Standardized Analysis**<a id='Cleaning_and_Parsing_Ingredients_for_Standardized_Analysis'></a>
[Contents](#Contents)

**Parse ingredients** below function cleans and parses the ingredient strings by:

* Splitting: It divides the ingredient string into a list using semicolons (;) as delimiters.
* Trimming: It removes any leading or trailing whitespace from each ingredient in the list.

This function is applied to the 'ingredients' column, converting it into a list of clean ingredient strings for each recipe. This standardizes the data for easier analysis.

In [38]:
# Function to clean and parse ingredients
def parse_ingredients(ingredient_str):
    # Split the string by the delimiter ';'
    ingredients_list = ingredient_str.split(';')
    # Clean up any leading/trailing whitespace and return the list
    return [ingredient.strip() for ingredient in ingredients_list]

# Apply the function to the 'ingredients' column
df['parsed_ingredients'] = df['ingredients'].apply(parse_ingredients)
df.head()

Unnamed: 0,name,url,category,author,summary,rating,rating_count,review_count,ingredients,directions,prep,cook,servings,calories,carbohydrates_g,sugars_g,fat_g,saturated_fat_g,cholesterol_mg,protein_g,dietary_fiber_g,sodium_mg,calories_from_fat,calcium_mg,iron_mg,magnesium_mg,potassium_mg,vitamin_a_iu_IU,niacin_equivalents_mg,vitamin_c_mg,folate_mcg,thiamin_mg,total,parsed_ingredients
0,Simple Macaroni and Cheese,https://www.allrecipes.com/recipe/238691/simple-macaroni-and-cheese/,main-dish,g0dluvsugly,"A very quick and easy fix to a tasty side-dish. Fancy, designer mac and cheese often costs forty or fifty dollars to prepare when you have so many exotic and expensive cheeses, but they aren't always the best tasting. This recipe is cheap and tasty.",4.42,834,575,1 (8 ounce) box elbow macaroni ; ¼ cup butter ; ¼ cup all-purpose flour ; ½ teaspoon salt ; ground black pepper to taste ; 2 cups milk ; 2 cups shredded Cheddar cheese,"Bring a large pot of lightly salted water to a boil. Cook elbow macaroni in the boiling water, stirring occasionally until cooked through but firm to the bite, 8 minutes. Drain. Melt butter in a saucepan over medium heat; stir in flour, salt, and pepper until smooth, about 5 minutes. Slowly pour milk into butter-flour mixture while continuously stirring until mixture is smooth and bubbling, about 5 minutes. Add Cheddar cheese to milk mixture and stir until cheese is melted, 2 to 4 minutes. Fold macaroni into cheese sauce until coated.",10 mins,20 mins,4,630.2,55.0,7.6,33.6,20.9,99.6,26.5,2.1,777.0,302.2,567.9,2.7,61.8,380.0,1152.0,10.1,0.3,165.6,0.7,30 mins,"[1 (8 ounce) box elbow macaroni, ¼ cup butter, ¼ cup all-purpose flour, ½ teaspoon salt, ground black pepper to taste, 2 cups milk, 2 cups shredded Cheddar cheese]"
1,Gourmet Mushroom Risotto,https://www.allrecipes.com/recipe/85389/gourmet-mushroom-risotto/,main-dish,Myleen Sagrado Sjödin,"Authentic Italian-style risotto cooked the slow and painful way, but oh so worth it. Complements grilled meats and chicken dishes very well. Check the rice by biting into it. It should be slightly al dente (or resist slightly to the tooth but not be hard in the center).",4.8,3388,2245,"6 cups chicken broth, divided ; 3 tablespoons olive oil, divided ; 1 pound portobello mushrooms, thinly sliced ; 1 pound white mushrooms, thinly sliced ; 2 shallots, diced ; 1 ½ cups Arborio rice ; ½ cup dry white wine ; sea salt to taste ; freshly ground black pepper to taste ; 3 tablespoons finely chopped chives ; 4 tablespoons butter ; ⅓ cup freshly grated Parmesan cheese","In a saucepan, warm the broth over low heat. Warm 2 tablespoons olive oil in a large saucepan over medium-high heat. Stir in the mushrooms, and cook until soft, about 3 minutes. Remove mushrooms and their liquid, and set aside. Add 1 tablespoon olive oil to skillet, and stir in the shallots. Cook 1 minute. Add rice, stirring to coat with oil, about 2 minutes. When the rice has taken on a pale, golden color, pour in wine, stirring constantly until the wine is fully absorbed. Add 1/2 cup broth to the rice, and stir until the broth is absorbed. Continue adding broth 1/2 cup at a time, stirring continuously, until the liquid is absorbed and the rice is al dente, about 15 to 20 minutes. Remove from heat, and stir in mushrooms with their liquid, butter, chives, and parmesan. Season with salt and pepper to taste.",20 mins,30 mins,6,430.6,56.6,4.4,16.6,6.6,29.3,11.3,2.7,1130.8,149.8,70.1,2.1,24.1,692.0,520.3,7.5,3.8,36.9,0.1,50 mins,"[6 cups chicken broth, divided, 3 tablespoons olive oil, divided, 1 pound portobello mushrooms, thinly sliced, 1 pound white mushrooms, thinly sliced, 2 shallots, diced, 1 ½ cups Arborio rice, ½ cup dry white wine, sea salt to taste, freshly ground black pepper to taste, 3 tablespoons finely chopped chives, 4 tablespoons butter, ⅓ cup freshly grated Parmesan cheese]"
2,Dessert Crepes,https://www.allrecipes.com/recipe/19037/dessert-crepes/,breakfast-and-brunch,ANN57,"Essential crepe recipe. Sprinkle warm crepes with sugar and lemon, or serve with cream or ice cream and fruit.",4.8,1156,794,"4 eggs, lightly beaten ; 1 ⅓ cups milk ; 2 tablespoons butter, melted ; 1 cup all-purpose flour ; 2 tablespoons white sugar ; ½ teaspoon salt","In large bowl, whisk together eggs, milk, melted butter, flour sugar and salt until smooth. Heat a medium-sized skillet or crepe pan over medium heat. Grease pan with a small amount of butter or oil applied with a brush or paper towel. Using a serving spoon or small ladle, spoon about 3 tablespoons crepe batter into hot pan, tilting the pan so that bottom surface is evenly coated. Cook over medium heat, 1 to 2 minutes on a side, or until golden brown. Serve immediately.",10 mins,10 mins,8,163.8,17.2,5.3,7.7,3.4,111.1,6.4,0.4,234.5,69.0,65.6,1.2,11.2,115.4,347.8,2.3,0.1,43.5,0.2,20 mins,"[4 eggs, lightly beaten, 1 ⅓ cups milk, 2 tablespoons butter, melted, 1 cup all-purpose flour, 2 tablespoons white sugar, ½ teaspoon salt]"
3,Pork Steaks,https://www.allrecipes.com/recipe/70463/pork-steaks/,meat-and-poultry,BABYLOVE1222,My mom came up with this recipe when I was a child. It is the ONLY way I will eat green onions.,4.57,689,539,"¼ cup butter ; ¼ cup soy sauce ; 1 bunch green onions ; 2 cloves garlic, minced ; 6 pork butt steaks","Melt butter in a skillet, and mix in the soy sauce. Saute the green onions and garlic until lightly browned. Place the pork steaks in the skillet, cover, and cook 8 to 10 minutes on each side, Remove cover, and continue cooking 10 minutes, or to an internal temperature of 145 degrees F (63 degrees C).",15 mins,30 mins,6,353.1,3.9,1.1,25.4,11.4,118.0,26.5,1.1,719.7,228.4,59.0,2.5,35.4,436.9,618.3,9.0,7.4,25.8,0.7,45 mins,"[¼ cup butter, ¼ cup soy sauce, 1 bunch green onions, 2 cloves garlic, minced, 6 pork butt steaks]"
5,Chicken Parmesan,https://www.allrecipes.com/recipe/223042/chicken-parmesan/,world-cuisine,Chef John,"My version of chicken parmesan is a little different than what they do in the restaurants, with less sauce and a crispier crust.",4.83,4245,2662,"4 skinless, boneless chicken breast halves ; salt and freshly ground black pepper to taste ; 2 eggs ; 1 cup panko bread crumbs, or more as needed ; ½ cup grated Parmesan cheese ; 2 tablespoons all-purpose flour, or more if needed ; 1 cup olive oil for frying ; ½ cup prepared tomato sauce ; ¼ cup fresh mozzarella, cut into small cubes ; ¼ cup chopped fresh basil ; ½ cup grated provolone cheese ; ¼ cup grated Parmesan cheese ; 1 tablespoon olive oil","Preheat an oven to 450 degrees F (230 degrees C). Place chicken breasts between two sheets of heavy plastic (resealable freezer bags work well) on a solid, level surface. Firmly pound chicken with the smooth side of a meat mallet to a thickness of 1/2-inch. Season chicken thoroughly with salt and pepper. Beat eggs in a shallow bowl and set aside. Mix bread crumbs and 1/2 cup Parmesan cheese in a separate bowl, set aside. Place flour in a sifter or strainer; sprinkle over chicken breasts, evenly coating both sides. Dip flour coated chicken breast in beaten eggs. Transfer breast to breadcrumb mixture, pressing the crumbs into both sides. Repeat for each breast. Set aside breaded chicken breasts for about 15 minutes. Heat 1 cup olive oil in a large skillet on medium-high heat until it begins to shimmer. Cook chicken until golden, about 2 minutes on each side. The chicken will finish cooking in the oven. Place chicken in a baking dish and top each breast with about 1/3 cup of tomato sauce. Layer each chicken breast with equal amounts of mozzarella cheese, fresh basil, and provolone cheese. Sprinkle 1 to 2 tablespoons of Parmesan cheese on top and drizzle with 1 tablespoon olive oil. Bake in the preheated oven until cheese is browned and bubbly, and chicken breasts are no longer pink in the center, 15 to 20 minutes. An instant-read thermometer inserted into the center should read at least 165 degrees F (74 degrees C).",25 mins,20 mins,4,470.8,24.8,1.7,24.9,9.1,186.7,42.1,0.6,840.3,223.8,380.2,2.1,44.4,388.9,628.4,18.9,2.6,30.9,0.1,45 mins,"[4 skinless, boneless chicken breast halves, salt and freshly ground black pepper to taste, 2 eggs, 1 cup panko bread crumbs, or more as needed, ½ cup grated Parmesan cheese, 2 tablespoons all-purpose flour, or more if needed, 1 cup olive oil for frying, ½ cup prepared tomato sauce, ¼ cup fresh mozzarella, cut into small cubes, ¼ cup chopped fresh basil, ½ cup grated provolone cheese, ¼ cup grated Parmesan cheese, 1 tablespoon olive oil]"


### **Utilizing spaCy (NLP) for Ingredient Extraction and Cleaning**<a id='Utilizing_spaCy_for_Ingredient_Extraction_and_Cleaning'></a>
[Contents](#Contents)

In this section, we utilize the **spaCy** library for natural language processing to clean and extract high-level ingredients from recipe text data. The code first removes unwanted characters and patterns from the ingredient descriptions. Then, it extracts meaningful high-level ingredient names by filtering out common stop words and measurement terms. This process helps to standardize the ingredient data for better analysis and modeling.

The final dataset includes a new column, **high_level_ingredients**, which contains cleaned and high-level ingredient names, and an **ingredient_count** column, which represents the number of high-level ingredients in each recipe. This approach enhances the quality and consistency of ingredient data, making it more useful for downstream analysis and machine learning tasks.

In [39]:
import spacy
nlp = spacy.load("en_core_web_sm-3.1.0")




In [49]:
# List of unwanted characters and patterns to exclude
unwanted_patterns = [
    r'\u2009', r'/', r'inch', r'â…›', r'â…”', r'Â®"', r'®', r'\)', r'\(', r'%', r'V8', r'V8®', r'™', r'®', r'\'', r'"'
]

# Function to clean and trim ingredient text
def clean_ingredient_text(text):
    text = text.strip()
    for pattern in unwanted_patterns:
        text = re.sub(pattern, '', text)
    return text

# Function to extract high-level ingredients using spaCy:
def extract_high_level_ingredients(parsed_ingredients):
    high_level_ingredients = []
    
    # Custom stop words list to filter out non-ingredient words:
    stop_words = set([
        'cup', 'cups', 'teaspoon', 'teaspoons', 'tablespoon', 'tablespoons', 'ounce', 'ounces',
        'pound', 'pounds', 'quart', 'quarts', 'pinch', 'dash', 'taste', 'large', 'small', 'medium',
        'divided', 'minced', 'sliced', 'diced', 'chopped', 'ground', 'freshly', 'prepared', 'cut',
        'into', 'strips', 'halves', 'cubes', 'to', 'box', 'spoon', 'spoons', 'optional'
    ])
    
    for ingredient in parsed_ingredients:
        # Clean and trim ingredient text
        ingredient = clean_ingredient_text(ingredient)
        
        # Remove numbers, fractional numbers, and measurement words
        ingredient = re.sub(r'\d*\s*[\d¼½¾⅓⅔⅛⅜⅝⅞]+\s*', '', ingredient)  # Remove any numbers or fractional numbers
        ingredient = re.sub(r'\b(?:' + '|'.join(stop_words) + r')\b', '', ingredient, flags=re.IGNORECASE)  # Remove stop words

        # Remove any remaining unwanted characters and patterns
        ingredient = clean_ingredient_text(ingredient)
        
        doc = nlp(ingredient)
        for chunk in doc.noun_chunks:
            filtered_words = [token.text for token in chunk if token.text.lower() not in stop_words and not token.is_digit]
            if filtered_words:
                high_level_ingredients.append(' '.join(filtered_words).strip())
    
    return list(set(high_level_ingredients))  # Remove duplicates

# Apply the function to the 'parsed_ingredients' column:
df['high_level_ingredients'] = df['parsed_ingredients'].apply(extract_high_level_ingredients)

# Create the ingredient_count column:
df['ingredient_count'] = df['high_level_ingredients'].apply(len)

df.head()

Unnamed: 0,name,url,category,author,summary,rating,rating_count,review_count,ingredients,directions,prep,cook,servings,calories,carbohydrates_g,sugars_g,fat_g,saturated_fat_g,cholesterol_mg,protein_g,dietary_fiber_g,sodium_mg,calories_from_fat,calcium_mg,iron_mg,magnesium_mg,potassium_mg,vitamin_a_iu_IU,niacin_equivalents_mg,vitamin_c_mg,folate_mcg,thiamin_mg,total,parsed_ingredients,high_level_ingredients,ingredient_count
0,Simple Macaroni and Cheese,https://www.allrecipes.com/recipe/238691/simple-macaroni-and-cheese/,main-dish,g0dluvsugly,"A very quick and easy fix to a tasty side-dish. Fancy, designer mac and cheese often costs forty or fifty dollars to prepare when you have so many exotic and expensive cheeses, but they aren't always the best tasting. This recipe is cheap and tasty.",4.42,834,575,1 (8 ounce) box elbow macaroni ; ¼ cup butter ; ¼ cup all-purpose flour ; ½ teaspoon salt ; ground black pepper to taste ; 2 cups milk ; 2 cups shredded Cheddar cheese,"Bring a large pot of lightly salted water to a boil. Cook elbow macaroni in the boiling water, stirring occasionally until cooked through but firm to the bite, 8 minutes. Drain. Melt butter in a saucepan over medium heat; stir in flour, salt, and pepper until smooth, about 5 minutes. Slowly pour milk into butter-flour mixture while continuously stirring until mixture is smooth and bubbling, about 5 minutes. Add Cheddar cheese to milk mixture and stir until cheese is melted, 2 to 4 minutes. Fold macaroni into cheese sauce until coated.",10 mins,20 mins,4,630.2,55.0,7.6,33.6,20.9,99.6,26.5,2.1,777.0,302.2,567.9,2.7,61.8,380.0,1152.0,10.1,0.3,165.6,0.7,30 mins,"[1 (8 ounce) box elbow macaroni, ¼ cup butter, ¼ cup all-purpose flour, ½ teaspoon salt, ground black pepper to taste, 2 cups milk, 2 cups shredded Cheddar cheese]","[all - purpose flour, salt, milk, elbow macaroni, shredded Cheddar cheese, butter, black pepper]",7
1,Gourmet Mushroom Risotto,https://www.allrecipes.com/recipe/85389/gourmet-mushroom-risotto/,main-dish,Myleen Sagrado Sjödin,"Authentic Italian-style risotto cooked the slow and painful way, but oh so worth it. Complements grilled meats and chicken dishes very well. Check the rice by biting into it. It should be slightly al dente (or resist slightly to the tooth but not be hard in the center).",4.8,3388,2245,"6 cups chicken broth, divided ; 3 tablespoons olive oil, divided ; 1 pound portobello mushrooms, thinly sliced ; 1 pound white mushrooms, thinly sliced ; 2 shallots, diced ; 1 ½ cups Arborio rice ; ½ cup dry white wine ; sea salt to taste ; freshly ground black pepper to taste ; 3 tablespoons finely chopped chives ; 4 tablespoons butter ; ⅓ cup freshly grated Parmesan cheese","In a saucepan, warm the broth over low heat. Warm 2 tablespoons olive oil in a large saucepan over medium-high heat. Stir in the mushrooms, and cook until soft, about 3 minutes. Remove mushrooms and their liquid, and set aside. Add 1 tablespoon olive oil to skillet, and stir in the shallots. Cook 1 minute. Add rice, stirring to coat with oil, about 2 minutes. When the rice has taken on a pale, golden color, pour in wine, stirring constantly until the wine is fully absorbed. Add 1/2 cup broth to the rice, and stir until the broth is absorbed. Continue adding broth 1/2 cup at a time, stirring continuously, until the liquid is absorbed and the rice is al dente, about 15 to 20 minutes. Remove from heat, and stir in mushrooms with their liquid, butter, chives, and parmesan. Season with salt and pepper to taste.",20 mins,30 mins,6,430.6,56.6,4.4,16.6,6.6,29.3,11.3,2.7,1130.8,149.8,70.1,2.1,24.1,692.0,520.3,7.5,3.8,36.9,0.1,50 mins,"[6 cups chicken broth, divided, 3 tablespoons olive oil, divided, 1 pound portobello mushrooms, thinly sliced, 1 pound white mushrooms, thinly sliced, 2 shallots, diced, 1 ½ cups Arborio rice, ½ cup dry white wine, sea salt to taste, freshly ground black pepper to taste, 3 tablespoons finely chopped chives, 4 tablespoons butter, ⅓ cup freshly grated Parmesan cheese]","[chicken broth, shallots, Arborio rice, butter, sea salt, Parmesan cheese, finely chives, olive oil, white mushrooms, portobello mushrooms, black pepper, dry white wine]",12
2,Dessert Crepes,https://www.allrecipes.com/recipe/19037/dessert-crepes/,breakfast-and-brunch,ANN57,"Essential crepe recipe. Sprinkle warm crepes with sugar and lemon, or serve with cream or ice cream and fruit.",4.8,1156,794,"4 eggs, lightly beaten ; 1 ⅓ cups milk ; 2 tablespoons butter, melted ; 1 cup all-purpose flour ; 2 tablespoons white sugar ; ½ teaspoon salt","In large bowl, whisk together eggs, milk, melted butter, flour sugar and salt until smooth. Heat a medium-sized skillet or crepe pan over medium heat. Grease pan with a small amount of butter or oil applied with a brush or paper towel. Using a serving spoon or small ladle, spoon about 3 tablespoons crepe batter into hot pan, tilting the pan so that bottom surface is evenly coated. Cook over medium heat, 1 to 2 minutes on a side, or until golden brown. Serve immediately.",10 mins,10 mins,8,163.8,17.2,5.3,7.7,3.4,111.1,6.4,0.4,234.5,69.0,65.6,1.2,11.2,115.4,347.8,2.3,0.1,43.5,0.2,20 mins,"[4 eggs, lightly beaten, 1 ⅓ cups milk, 2 tablespoons butter, melted, 1 cup all-purpose flour, 2 tablespoons white sugar, ½ teaspoon salt]","[all - purpose flour, milk, white sugar, salt, eggs, butter]",6
3,Pork Steaks,https://www.allrecipes.com/recipe/70463/pork-steaks/,meat-and-poultry,BABYLOVE1222,My mom came up with this recipe when I was a child. It is the ONLY way I will eat green onions.,4.57,689,539,"¼ cup butter ; ¼ cup soy sauce ; 1 bunch green onions ; 2 cloves garlic, minced ; 6 pork butt steaks","Melt butter in a skillet, and mix in the soy sauce. Saute the green onions and garlic until lightly browned. Place the pork steaks in the skillet, cover, and cook 8 to 10 minutes on each side, Remove cover, and continue cooking 10 minutes, or to an internal temperature of 145 degrees F (63 degrees C).",15 mins,30 mins,6,353.1,3.9,1.1,25.4,11.4,118.0,26.5,1.1,719.7,228.4,59.0,2.5,35.4,436.9,618.3,9.0,7.4,25.8,0.7,45 mins,"[¼ cup butter, ¼ cup soy sauce, 1 bunch green onions, 2 cloves garlic, minced, 6 pork butt steaks]","[soy sauce, pork butt, bunch green onions, cloves, butter]",5
5,Chicken Parmesan,https://www.allrecipes.com/recipe/223042/chicken-parmesan/,world-cuisine,Chef John,"My version of chicken parmesan is a little different than what they do in the restaurants, with less sauce and a crispier crust.",4.83,4245,2662,"4 skinless, boneless chicken breast halves ; salt and freshly ground black pepper to taste ; 2 eggs ; 1 cup panko bread crumbs, or more as needed ; ½ cup grated Parmesan cheese ; 2 tablespoons all-purpose flour, or more if needed ; 1 cup olive oil for frying ; ½ cup prepared tomato sauce ; ¼ cup fresh mozzarella, cut into small cubes ; ¼ cup chopped fresh basil ; ½ cup grated provolone cheese ; ¼ cup grated Parmesan cheese ; 1 tablespoon olive oil","Preheat an oven to 450 degrees F (230 degrees C). Place chicken breasts between two sheets of heavy plastic (resealable freezer bags work well) on a solid, level surface. Firmly pound chicken with the smooth side of a meat mallet to a thickness of 1/2-inch. Season chicken thoroughly with salt and pepper. Beat eggs in a shallow bowl and set aside. Mix bread crumbs and 1/2 cup Parmesan cheese in a separate bowl, set aside. Place flour in a sifter or strainer; sprinkle over chicken breasts, evenly coating both sides. Dip flour coated chicken breast in beaten eggs. Transfer breast to breadcrumb mixture, pressing the crumbs into both sides. Repeat for each breast. Set aside breaded chicken breasts for about 15 minutes. Heat 1 cup olive oil in a large skillet on medium-high heat until it begins to shimmer. Cook chicken until golden, about 2 minutes on each side. The chicken will finish cooking in the oven. Place chicken in a baking dish and top each breast with about 1/3 cup of tomato sauce. Layer each chicken breast with equal amounts of mozzarella cheese, fresh basil, and provolone cheese. Sprinkle 1 to 2 tablespoons of Parmesan cheese on top and drizzle with 1 tablespoon olive oil. Bake in the preheated oven until cheese is browned and bubbly, and chicken breasts are no longer pink in the center, 15 to 20 minutes. An instant-read thermometer inserted into the center should read at least 165 degrees F (74 degrees C).",25 mins,20 mins,4,470.8,24.8,1.7,24.9,9.1,186.7,42.1,0.6,840.3,223.8,380.2,2.1,44.4,388.9,628.4,18.9,2.6,30.9,0.1,45 mins,"[4 skinless, boneless chicken breast halves, salt and freshly ground black pepper to taste, 2 eggs, 1 cup panko bread crumbs, or more as needed, ½ cup grated Parmesan cheese, 2 tablespoons all-purpose flour, or more if needed, 1 cup olive oil for frying, ½ cup prepared tomato sauce, ¼ cup fresh mozzarella, cut into small cubes, ¼ cup chopped fresh basil, ½ cup grated provolone cheese, ¼ cup grated Parmesan cheese, 1 tablespoon olive oil]","[tomato sauce, fresh mozzarella, skinless, all - purpose flour, salt, panko bread crumbs, Parmesan cheese, eggs, olive oil, boneless chicken breast, provolone cheese, fresh basil, black pepper]",13


In [51]:
# Save DataFrame to a CSV file
df.to_csv('all_recipes_final_df.csv', index=False)

In [52]:
df[df['url'] == 'https://www.allrecipes.com/recipe/238014/memes-pasta-fagioli/']


Unnamed: 0,name,url,category,author,summary,rating,rating_count,review_count,ingredients,directions,prep,cook,servings,calories,carbohydrates_g,sugars_g,fat_g,saturated_fat_g,cholesterol_mg,protein_g,dietary_fiber_g,sodium_mg,calories_from_fat,calcium_mg,iron_mg,magnesium_mg,potassium_mg,vitamin_a_iu_IU,niacin_equivalents_mg,vitamin_c_mg,folate_mcg,thiamin_mg,total,parsed_ingredients,high_level_ingredients,ingredient_count
394,MeMe's Pasta Fagioli,https://www.allrecipes.com/recipe/238014/memes-pasta-fagioli/,soups-stews-and-chili,Karyn Osborne,"White cannellini beans, ditalini pasta with vegetables, lean hamburger, and herbs are simmered in vegetable juice and chicken broth. It is like an Italian chili like Olive Garden®'s Pasta Fagioli and even better the second day.",4.76,119,79,"1 pound lean ground beef ; 1 tablespoon olive oil ; 1 carrot, diced ; 1 stalk celery, diced ; 1 thin slice onion, diced ; 1 teaspoon minced garlic ; 1 (32 ounce) bottle tomato-vegetable juice cocktail (such as V8®) ; 1 (14 ounce) can chicken broth ; 1 tablespoon dried parsley ; 1 tablespoon dried basil ; 1 teaspoon dried oregano ; freshly ground black pepper to taste ; 1 ½ cups ditalini pasta ; 1 (15 ounce) can cannellini beans, drained and rinsed","Heat a large skillet over medium-high heat. Cook and stir beef in the hot skillet until browned and crumbly, 5 to 7 minutes; drain and discard grease. Heat olive oil in a large saucepan over medium-high heat; saute carrot, celery, and onion until softened, 5 to 10 minutes. Add garlic and saute until fragrant, 1 to 2 minutes. Stir vegetable juice cocktail, chicken broth, parsley, basil, oregano, and black pepper into vegetable mixture; bring to a boil. Reduce heat and simmer soup for 20 minutes. Bring a large pot of lightly salted water to a boil. Cook ditalini pasta in the boiling water, stirring occasionally until cooked through but firm to the bite, 8 minutes. Drain. Stir cannellini beans and ground beef into soup; cook and stir until soup is heated through, about 10 minutes. Spoon about 1/3 cup pasta into each serving bowl; ladle soup over pasta.",15 mins,50 mins,8,298.8,33.2,4.7,10.1,3.4,35.5,17.9,4.7,566.0,91.0,84.9,4.0,48.9,710.7,2282.9,6.9,28.8,91.9,0.1,1 hr 5 mins,"[1 pound lean ground beef, 1 tablespoon olive oil, 1 carrot, diced, 1 stalk celery, diced, 1 thin slice onion, diced, 1 teaspoon minced garlic, 1 (32 ounce) bottle tomato-vegetable juice cocktail (such as V8®), 1 (14 ounce) can chicken broth, 1 tablespoon dried parsley, 1 tablespoon dried basil, 1 teaspoon dried oregano, freshly ground black pepper to taste, 1 ½ cups ditalini pasta, 1 (15 ounce) can cannellini beans, drained and rinsed]","[stalk celery, oregano, dried parsley, dried basil, pasta, beef, carrot, broth, olive oil, beans, bottle tomato - vegetable juice cocktail, ditalini, thin slice onion, garlic, black pepper]",15


In [53]:
print(df.columns)

Index(['name', 'url', 'category', 'author', 'summary', 'rating',
       'rating_count', 'review_count', 'ingredients', 'directions', 'prep',
       'cook', 'servings', 'calories', 'carbohydrates_g', 'sugars_g', 'fat_g',
       'saturated_fat_g', 'cholesterol_mg', 'protein_g', 'dietary_fiber_g',
       'sodium_mg', 'calories_from_fat', 'calcium_mg', 'iron_mg',
       'magnesium_mg', 'potassium_mg', 'vitamin_a_iu_IU',
       'niacin_equivalents_mg', 'vitamin_c_mg', 'folate_mcg', 'thiamin_mg',
       'total', 'parsed_ingredients', 'high_level_ingredients',
       'ingredient_count'],
      dtype='object')


### **Developing Diet Type Feature**<a id='Developing_Diet_Type_Feature'></a>
[Contents](#Contents)

**Diet Type** Belo function categorizes recipes based on their nutritional content by:

* Conditions: Checking if specific nutritional values (carbohydrates, fat, protein, sodium, sugars) meet predefined thresholds.
* Categorization: Assigning diet types (e.g., Low Carb, Low Fat, High Protein) based on these conditions.
* Result: Returning a string of applicable diet types or 'General' if none match.
  
This function is applied to the DataFrame to create a new column, 'diet_type,' that categorizes each recipe based on its nutritional profile.

In [54]:
# Function to determine diet type:
def determine_diet_type(row):
    diet_types = []
    
    if row['carbohydrates_g'] < 20:
        diet_types.append('Low Carb')
    if row['fat_g'] < 10:
        diet_types.append('Low Fat')
    if row['protein_g'] > 20:
        diet_types.append('High Protein')
    if row['sodium_mg'] < 140:
        diet_types.append('Low Sodium')
    if row['sugars_g'] < 5:
        diet_types.append('Low Sugar')
    
    return ', '.join(diet_types) if diet_types else 'General'

# Apply the function to create the diet_type column
df['diet_type'] = df.apply(determine_diet_type, axis=1)

In [55]:
# Now, let's evaluate diet_type column: 
value_counts_with_percentage(df, 'diet_type')

Unnamed: 0_level_0,Count,Percentage
diet_type,Unnamed: 1_level_1,Unnamed: 2_level_1
General,4072,15.03
High Protein,2746,10.13
"Low Carb, Low Sugar",2265,8.36
"Low Carb, High Protein, Low Sugar",2003,7.39
"High Protein, Low Sugar",1989,7.34
Low Fat,1900,7.01
Low Sugar,1844,6.8
"Low Carb, Low Fat, Low Sugar",1790,6.61
"Low Fat, Low Sugar",1300,4.8
"Low Carb, Low Fat, Low Sodium, Low Sugar",1046,3.86


### **Developing Recommended daily values Features**<a id='Developing_Recommended_daily_values_Features'></a>
[Contents](#Contents)

In here, we'll create columns that represent the daily value percentage for each nutritional column based on a 2000 calorie diet, we'll need to use the recommended daily values for each nutrient. Here are the recommended daily values we'll use for the calculations:

* Carbohydrates: 275g
* Sugars: 50g
* Fat: 78g
* Saturated Fat: 20g
* Cholesterol: 300mg
* Protein: 50g
* Dietary Fiber: 28g
* Sodium: 2300mg
* Calories from Fat: This will be calculated as (Fat in grams * 9) / 2000 * 100
* Calcium: 1300mg
* Iron: 18mg
* Magnesium: 420mg
* Potassium: 4700mg
* Vitamin A: 5000 IU
* Niacin Equivalents: 16mg
* Vitamin C: 90mg
* Folate: 400mcg
* Thiamin: 1.2mg

These values are based on the FDA's guidelines for daily values on nutrition and supplement facts labels. For more detailed information on daily values, you can refer to the FDA's resources​​​​ [here](https://www.fda.gov/media/135301/download#:~:text=URL%3A%20https%3A%2F%2Fwww.fda.gov%2Fmedia%2F135301%2Fdownload%0AVisible%3A%200%25%20)

In [58]:
# Recommended daily values
daily_values = {
    'carbohydrates_g': 275,
    'sugars_g': 50,
    'fat_g': 78,
    'saturated_fat_g': 20,
    'cholesterol_mg': 300,
    'protein_g': 50,
    'dietary_fiber_g': 28,
    'sodium_mg': 2300,
    'calcium_mg': 1300,
    'iron_mg': 18,
    'magnesium_mg': 420,
    'potassium_mg': 4700,
    'vitamin_a_iu_IU': 5000,
    'niacin_equivalents_mg': 16,
    'vitamin_c_mg': 90,
    'folate_mcg': 400,
    'thiamin_mg': 1.2
}

# Calculate daily value percentage for each nutrient
for nutrient, daily_value in daily_values.items():
    df[f'{nutrient}_dv_perc'] = (df[nutrient] / daily_value) * 100

# Special case for calories from fat
df['calories_from_fat_dv_perc'] = (df['fat_g'] * 9 / 2000) * 100

# Round all daily value percentage columns to 2 decimal places
dv_columns = [col for col in df.columns if col.endswith('_dv_perc')]
df[dv_columns] = df[dv_columns].round(2)

df.head()

Unnamed: 0,name,url,category,author,summary,rating,rating_count,review_count,ingredients,directions,prep,cook,servings,calories,carbohydrates_g,sugars_g,fat_g,saturated_fat_g,cholesterol_mg,protein_g,dietary_fiber_g,sodium_mg,calories_from_fat,calcium_mg,iron_mg,magnesium_mg,potassium_mg,vitamin_a_iu_IU,niacin_equivalents_mg,vitamin_c_mg,folate_mcg,thiamin_mg,total,parsed_ingredients,high_level_ingredients,ingredient_count,diet_type,carbohydrates_g_dv_perc,sugars_g_dv_perc,fat_g_dv_perc,saturated_fat_g_dv_perc,cholesterol_mg_dv_perc,protein_g_dv_perc,dietary_fiber_g_dv_perc,sodium_mg_dv_perc,calcium_mg_dv_perc,iron_mg_dv_perc,magnesium_mg_dv_perc,potassium_mg_dv_perc,vitamin_a_iu_IU_dv_perc,niacin_equivalents_mg_dv_perc,vitamin_c_mg_dv_perc,folate_mcg_dv_perc,thiamin_mg_dv_perc,calories_from_fat_dv_perc
0,Simple Macaroni and Cheese,https://www.allrecipes.com/recipe/238691/simple-macaroni-and-cheese/,main-dish,g0dluvsugly,"A very quick and easy fix to a tasty side-dish. Fancy, designer mac and cheese often costs forty or fifty dollars to prepare when you have so many exotic and expensive cheeses, but they aren't always the best tasting. This recipe is cheap and tasty.",4.42,834,575,1 (8 ounce) box elbow macaroni ; ¼ cup butter ; ¼ cup all-purpose flour ; ½ teaspoon salt ; ground black pepper to taste ; 2 cups milk ; 2 cups shredded Cheddar cheese,"Bring a large pot of lightly salted water to a boil. Cook elbow macaroni in the boiling water, stirring occasionally until cooked through but firm to the bite, 8 minutes. Drain. Melt butter in a saucepan over medium heat; stir in flour, salt, and pepper until smooth, about 5 minutes. Slowly pour milk into butter-flour mixture while continuously stirring until mixture is smooth and bubbling, about 5 minutes. Add Cheddar cheese to milk mixture and stir until cheese is melted, 2 to 4 minutes. Fold macaroni into cheese sauce until coated.",10 mins,20 mins,4,630.2,55.0,7.6,33.6,20.9,99.6,26.5,2.1,777.0,302.2,567.9,2.7,61.8,380.0,1152.0,10.1,0.3,165.6,0.7,30 mins,"[1 (8 ounce) box elbow macaroni, ¼ cup butter, ¼ cup all-purpose flour, ½ teaspoon salt, ground black pepper to taste, 2 cups milk, 2 cups shredded Cheddar cheese]","[all - purpose flour, salt, milk, elbow macaroni, shredded Cheddar cheese, butter, black pepper]",7,High Protein,20.0,15.2,43.08,104.5,33.2,53.0,7.5,33.78,43.68,15.0,14.71,8.09,23.04,63.12,0.33,41.4,58.33,15.12
1,Gourmet Mushroom Risotto,https://www.allrecipes.com/recipe/85389/gourmet-mushroom-risotto/,main-dish,Myleen Sagrado Sjödin,"Authentic Italian-style risotto cooked the slow and painful way, but oh so worth it. Complements grilled meats and chicken dishes very well. Check the rice by biting into it. It should be slightly al dente (or resist slightly to the tooth but not be hard in the center).",4.8,3388,2245,"6 cups chicken broth, divided ; 3 tablespoons olive oil, divided ; 1 pound portobello mushrooms, thinly sliced ; 1 pound white mushrooms, thinly sliced ; 2 shallots, diced ; 1 ½ cups Arborio rice ; ½ cup dry white wine ; sea salt to taste ; freshly ground black pepper to taste ; 3 tablespoons finely chopped chives ; 4 tablespoons butter ; ⅓ cup freshly grated Parmesan cheese","In a saucepan, warm the broth over low heat. Warm 2 tablespoons olive oil in a large saucepan over medium-high heat. Stir in the mushrooms, and cook until soft, about 3 minutes. Remove mushrooms and their liquid, and set aside. Add 1 tablespoon olive oil to skillet, and stir in the shallots. Cook 1 minute. Add rice, stirring to coat with oil, about 2 minutes. When the rice has taken on a pale, golden color, pour in wine, stirring constantly until the wine is fully absorbed. Add 1/2 cup broth to the rice, and stir until the broth is absorbed. Continue adding broth 1/2 cup at a time, stirring continuously, until the liquid is absorbed and the rice is al dente, about 15 to 20 minutes. Remove from heat, and stir in mushrooms with their liquid, butter, chives, and parmesan. Season with salt and pepper to taste.",20 mins,30 mins,6,430.6,56.6,4.4,16.6,6.6,29.3,11.3,2.7,1130.8,149.8,70.1,2.1,24.1,692.0,520.3,7.5,3.8,36.9,0.1,50 mins,"[6 cups chicken broth, divided, 3 tablespoons olive oil, divided, 1 pound portobello mushrooms, thinly sliced, 1 pound white mushrooms, thinly sliced, 2 shallots, diced, 1 ½ cups Arborio rice, ½ cup dry white wine, sea salt to taste, freshly ground black pepper to taste, 3 tablespoons finely chopped chives, 4 tablespoons butter, ⅓ cup freshly grated Parmesan cheese]","[chicken broth, shallots, Arborio rice, butter, sea salt, Parmesan cheese, finely chives, olive oil, white mushrooms, portobello mushrooms, black pepper, dry white wine]",12,Low Sugar,20.58,8.8,21.28,33.0,9.77,22.6,9.64,49.17,5.39,11.67,5.74,14.72,10.41,46.88,4.22,9.22,8.33,7.47
2,Dessert Crepes,https://www.allrecipes.com/recipe/19037/dessert-crepes/,breakfast-and-brunch,ANN57,"Essential crepe recipe. Sprinkle warm crepes with sugar and lemon, or serve with cream or ice cream and fruit.",4.8,1156,794,"4 eggs, lightly beaten ; 1 ⅓ cups milk ; 2 tablespoons butter, melted ; 1 cup all-purpose flour ; 2 tablespoons white sugar ; ½ teaspoon salt","In large bowl, whisk together eggs, milk, melted butter, flour sugar and salt until smooth. Heat a medium-sized skillet or crepe pan over medium heat. Grease pan with a small amount of butter or oil applied with a brush or paper towel. Using a serving spoon or small ladle, spoon about 3 tablespoons crepe batter into hot pan, tilting the pan so that bottom surface is evenly coated. Cook over medium heat, 1 to 2 minutes on a side, or until golden brown. Serve immediately.",10 mins,10 mins,8,163.8,17.2,5.3,7.7,3.4,111.1,6.4,0.4,234.5,69.0,65.6,1.2,11.2,115.4,347.8,2.3,0.1,43.5,0.2,20 mins,"[4 eggs, lightly beaten, 1 ⅓ cups milk, 2 tablespoons butter, melted, 1 cup all-purpose flour, 2 tablespoons white sugar, ½ teaspoon salt]","[all - purpose flour, milk, white sugar, salt, eggs, butter]",6,"Low Carb, Low Fat",6.25,10.6,9.87,17.0,37.03,12.8,1.43,10.2,5.05,6.67,2.67,2.46,6.96,14.37,0.11,10.88,16.67,3.46
3,Pork Steaks,https://www.allrecipes.com/recipe/70463/pork-steaks/,meat-and-poultry,BABYLOVE1222,My mom came up with this recipe when I was a child. It is the ONLY way I will eat green onions.,4.57,689,539,"¼ cup butter ; ¼ cup soy sauce ; 1 bunch green onions ; 2 cloves garlic, minced ; 6 pork butt steaks","Melt butter in a skillet, and mix in the soy sauce. Saute the green onions and garlic until lightly browned. Place the pork steaks in the skillet, cover, and cook 8 to 10 minutes on each side, Remove cover, and continue cooking 10 minutes, or to an internal temperature of 145 degrees F (63 degrees C).",15 mins,30 mins,6,353.1,3.9,1.1,25.4,11.4,118.0,26.5,1.1,719.7,228.4,59.0,2.5,35.4,436.9,618.3,9.0,7.4,25.8,0.7,45 mins,"[¼ cup butter, ¼ cup soy sauce, 1 bunch green onions, 2 cloves garlic, minced, 6 pork butt steaks]","[soy sauce, pork butt, bunch green onions, cloves, butter]",5,"Low Carb, High Protein, Low Sugar",1.42,2.2,32.56,57.0,39.33,53.0,3.93,31.29,4.54,13.89,8.43,9.3,12.37,56.25,8.22,6.45,58.33,11.43
5,Chicken Parmesan,https://www.allrecipes.com/recipe/223042/chicken-parmesan/,world-cuisine,Chef John,"My version of chicken parmesan is a little different than what they do in the restaurants, with less sauce and a crispier crust.",4.83,4245,2662,"4 skinless, boneless chicken breast halves ; salt and freshly ground black pepper to taste ; 2 eggs ; 1 cup panko bread crumbs, or more as needed ; ½ cup grated Parmesan cheese ; 2 tablespoons all-purpose flour, or more if needed ; 1 cup olive oil for frying ; ½ cup prepared tomato sauce ; ¼ cup fresh mozzarella, cut into small cubes ; ¼ cup chopped fresh basil ; ½ cup grated provolone cheese ; ¼ cup grated Parmesan cheese ; 1 tablespoon olive oil","Preheat an oven to 450 degrees F (230 degrees C). Place chicken breasts between two sheets of heavy plastic (resealable freezer bags work well) on a solid, level surface. Firmly pound chicken with the smooth side of a meat mallet to a thickness of 1/2-inch. Season chicken thoroughly with salt and pepper. Beat eggs in a shallow bowl and set aside. Mix bread crumbs and 1/2 cup Parmesan cheese in a separate bowl, set aside. Place flour in a sifter or strainer; sprinkle over chicken breasts, evenly coating both sides. Dip flour coated chicken breast in beaten eggs. Transfer breast to breadcrumb mixture, pressing the crumbs into both sides. Repeat for each breast. Set aside breaded chicken breasts for about 15 minutes. Heat 1 cup olive oil in a large skillet on medium-high heat until it begins to shimmer. Cook chicken until golden, about 2 minutes on each side. The chicken will finish cooking in the oven. Place chicken in a baking dish and top each breast with about 1/3 cup of tomato sauce. Layer each chicken breast with equal amounts of mozzarella cheese, fresh basil, and provolone cheese. Sprinkle 1 to 2 tablespoons of Parmesan cheese on top and drizzle with 1 tablespoon olive oil. Bake in the preheated oven until cheese is browned and bubbly, and chicken breasts are no longer pink in the center, 15 to 20 minutes. An instant-read thermometer inserted into the center should read at least 165 degrees F (74 degrees C).",25 mins,20 mins,4,470.8,24.8,1.7,24.9,9.1,186.7,42.1,0.6,840.3,223.8,380.2,2.1,44.4,388.9,628.4,18.9,2.6,30.9,0.1,45 mins,"[4 skinless, boneless chicken breast halves, salt and freshly ground black pepper to taste, 2 eggs, 1 cup panko bread crumbs, or more as needed, ½ cup grated Parmesan cheese, 2 tablespoons all-purpose flour, or more if needed, 1 cup olive oil for frying, ½ cup prepared tomato sauce, ¼ cup fresh mozzarella, cut into small cubes, ¼ cup chopped fresh basil, ½ cup grated provolone cheese, ¼ cup grated Parmesan cheese, 1 tablespoon olive oil]","[tomato sauce, fresh mozzarella, skinless, all - purpose flour, salt, panko bread crumbs, Parmesan cheese, eggs, olive oil, boneless chicken breast, provolone cheese, fresh basil, black pepper]",13,"High Protein, Low Sugar",9.02,3.4,31.92,45.5,62.23,84.2,2.14,36.53,29.25,11.67,10.57,8.27,12.57,118.12,2.89,7.72,8.33,11.2


In [60]:
# Save DataFrame to a CSV file
df.to_csv('all_recipes_final_df.csv', index=False)

In [61]:
print(df.columns)

Index(['name', 'url', 'category', 'author', 'summary', 'rating',
       'rating_count', 'review_count', 'ingredients', 'directions', 'prep',
       'cook', 'servings', 'calories', 'carbohydrates_g', 'sugars_g', 'fat_g',
       'saturated_fat_g', 'cholesterol_mg', 'protein_g', 'dietary_fiber_g',
       'sodium_mg', 'calories_from_fat', 'calcium_mg', 'iron_mg',
       'magnesium_mg', 'potassium_mg', 'vitamin_a_iu_IU',
       'niacin_equivalents_mg', 'vitamin_c_mg', 'folate_mcg', 'thiamin_mg',
       'total', 'parsed_ingredients', 'high_level_ingredients',
       'ingredient_count', 'diet_type', 'carbohydrates_g_dv_perc',
       'sugars_g_dv_perc', 'fat_g_dv_perc', 'saturated_fat_g_dv_perc',
       'cholesterol_mg_dv_perc', 'protein_g_dv_perc',
       'dietary_fiber_g_dv_perc', 'sodium_mg_dv_perc', 'calcium_mg_dv_perc',
       'iron_mg_dv_perc', 'magnesium_mg_dv_perc', 'potassium_mg_dv_perc',
       'vitamin_a_iu_IU_dv_perc', 'niacin_equivalents_mg_dv_perc',
       'vitamin_c_mg_dv_perc',

### **Developing Recipe Length Feature**<a id='Developing_Recipe_Length_Feature'></a>
[Contents](#Contents)

* Extracting Length: Calculating the number of words in the directions column.
* Handling Non-Strings: Using a lambda function to check if each entry in directions is a string and then splitting the string into words. If it is not a string, it assigns a length of 0.
* Applying the Function: Applying this calculation to each row in the directions column to populate the recipe_length column.

In [62]:
# Add the 'recipe_length' feature using the 'directions' column:
df['recipe_length'] = df['directions'].apply(lambda x: len(x.split()) if isinstance(x, str) else 0)
df.head()

Unnamed: 0,name,url,category,author,summary,rating,rating_count,review_count,ingredients,directions,prep,cook,servings,calories,carbohydrates_g,sugars_g,fat_g,saturated_fat_g,cholesterol_mg,protein_g,dietary_fiber_g,sodium_mg,calories_from_fat,calcium_mg,iron_mg,magnesium_mg,potassium_mg,vitamin_a_iu_IU,niacin_equivalents_mg,vitamin_c_mg,folate_mcg,thiamin_mg,total,parsed_ingredients,high_level_ingredients,ingredient_count,diet_type,carbohydrates_g_dv_perc,sugars_g_dv_perc,fat_g_dv_perc,saturated_fat_g_dv_perc,cholesterol_mg_dv_perc,protein_g_dv_perc,dietary_fiber_g_dv_perc,sodium_mg_dv_perc,calcium_mg_dv_perc,iron_mg_dv_perc,magnesium_mg_dv_perc,potassium_mg_dv_perc,vitamin_a_iu_IU_dv_perc,niacin_equivalents_mg_dv_perc,vitamin_c_mg_dv_perc,folate_mcg_dv_perc,thiamin_mg_dv_perc,calories_from_fat_dv_perc,recipe_length
0,Simple Macaroni and Cheese,https://www.allrecipes.com/recipe/238691/simple-macaroni-and-cheese/,main-dish,g0dluvsugly,"A very quick and easy fix to a tasty side-dish. Fancy, designer mac and cheese often costs forty or fifty dollars to prepare when you have so many exotic and expensive cheeses, but they aren't always the best tasting. This recipe is cheap and tasty.",4.42,834,575,1 (8 ounce) box elbow macaroni ; ¼ cup butter ; ¼ cup all-purpose flour ; ½ teaspoon salt ; ground black pepper to taste ; 2 cups milk ; 2 cups shredded Cheddar cheese,"Bring a large pot of lightly salted water to a boil. Cook elbow macaroni in the boiling water, stirring occasionally until cooked through but firm to the bite, 8 minutes. Drain. Melt butter in a saucepan over medium heat; stir in flour, salt, and pepper until smooth, about 5 minutes. Slowly pour milk into butter-flour mixture while continuously stirring until mixture is smooth and bubbling, about 5 minutes. Add Cheddar cheese to milk mixture and stir until cheese is melted, 2 to 4 minutes. Fold macaroni into cheese sauce until coated.",10 mins,20 mins,4,630.2,55.0,7.6,33.6,20.9,99.6,26.5,2.1,777.0,302.2,567.9,2.7,61.8,380.0,1152.0,10.1,0.3,165.6,0.7,30 mins,"[1 (8 ounce) box elbow macaroni, ¼ cup butter, ¼ cup all-purpose flour, ½ teaspoon salt, ground black pepper to taste, 2 cups milk, 2 cups shredded Cheddar cheese]","[all - purpose flour, salt, milk, elbow macaroni, shredded Cheddar cheese, butter, black pepper]",7,High Protein,20.0,15.2,43.08,104.5,33.2,53.0,7.5,33.78,43.68,15.0,14.71,8.09,23.04,63.12,0.33,41.4,58.33,15.12,91
1,Gourmet Mushroom Risotto,https://www.allrecipes.com/recipe/85389/gourmet-mushroom-risotto/,main-dish,Myleen Sagrado Sjödin,"Authentic Italian-style risotto cooked the slow and painful way, but oh so worth it. Complements grilled meats and chicken dishes very well. Check the rice by biting into it. It should be slightly al dente (or resist slightly to the tooth but not be hard in the center).",4.8,3388,2245,"6 cups chicken broth, divided ; 3 tablespoons olive oil, divided ; 1 pound portobello mushrooms, thinly sliced ; 1 pound white mushrooms, thinly sliced ; 2 shallots, diced ; 1 ½ cups Arborio rice ; ½ cup dry white wine ; sea salt to taste ; freshly ground black pepper to taste ; 3 tablespoons finely chopped chives ; 4 tablespoons butter ; ⅓ cup freshly grated Parmesan cheese","In a saucepan, warm the broth over low heat. Warm 2 tablespoons olive oil in a large saucepan over medium-high heat. Stir in the mushrooms, and cook until soft, about 3 minutes. Remove mushrooms and their liquid, and set aside. Add 1 tablespoon olive oil to skillet, and stir in the shallots. Cook 1 minute. Add rice, stirring to coat with oil, about 2 minutes. When the rice has taken on a pale, golden color, pour in wine, stirring constantly until the wine is fully absorbed. Add 1/2 cup broth to the rice, and stir until the broth is absorbed. Continue adding broth 1/2 cup at a time, stirring continuously, until the liquid is absorbed and the rice is al dente, about 15 to 20 minutes. Remove from heat, and stir in mushrooms with their liquid, butter, chives, and parmesan. Season with salt and pepper to taste.",20 mins,30 mins,6,430.6,56.6,4.4,16.6,6.6,29.3,11.3,2.7,1130.8,149.8,70.1,2.1,24.1,692.0,520.3,7.5,3.8,36.9,0.1,50 mins,"[6 cups chicken broth, divided, 3 tablespoons olive oil, divided, 1 pound portobello mushrooms, thinly sliced, 1 pound white mushrooms, thinly sliced, 2 shallots, diced, 1 ½ cups Arborio rice, ½ cup dry white wine, sea salt to taste, freshly ground black pepper to taste, 3 tablespoons finely chopped chives, 4 tablespoons butter, ⅓ cup freshly grated Parmesan cheese]","[chicken broth, shallots, Arborio rice, butter, sea salt, Parmesan cheese, finely chives, olive oil, white mushrooms, portobello mushrooms, black pepper, dry white wine]",12,Low Sugar,20.58,8.8,21.28,33.0,9.77,22.6,9.64,49.17,5.39,11.67,5.74,14.72,10.41,46.88,4.22,9.22,8.33,7.47,147
2,Dessert Crepes,https://www.allrecipes.com/recipe/19037/dessert-crepes/,breakfast-and-brunch,ANN57,"Essential crepe recipe. Sprinkle warm crepes with sugar and lemon, or serve with cream or ice cream and fruit.",4.8,1156,794,"4 eggs, lightly beaten ; 1 ⅓ cups milk ; 2 tablespoons butter, melted ; 1 cup all-purpose flour ; 2 tablespoons white sugar ; ½ teaspoon salt","In large bowl, whisk together eggs, milk, melted butter, flour sugar and salt until smooth. Heat a medium-sized skillet or crepe pan over medium heat. Grease pan with a small amount of butter or oil applied with a brush or paper towel. Using a serving spoon or small ladle, spoon about 3 tablespoons crepe batter into hot pan, tilting the pan so that bottom surface is evenly coated. Cook over medium heat, 1 to 2 minutes on a side, or until golden brown. Serve immediately.",10 mins,10 mins,8,163.8,17.2,5.3,7.7,3.4,111.1,6.4,0.4,234.5,69.0,65.6,1.2,11.2,115.4,347.8,2.3,0.1,43.5,0.2,20 mins,"[4 eggs, lightly beaten, 1 ⅓ cups milk, 2 tablespoons butter, melted, 1 cup all-purpose flour, 2 tablespoons white sugar, ½ teaspoon salt]","[all - purpose flour, milk, white sugar, salt, eggs, butter]",6,"Low Carb, Low Fat",6.25,10.6,9.87,17.0,37.03,12.8,1.43,10.2,5.05,6.67,2.67,2.46,6.96,14.37,0.11,10.88,16.67,3.46,85
3,Pork Steaks,https://www.allrecipes.com/recipe/70463/pork-steaks/,meat-and-poultry,BABYLOVE1222,My mom came up with this recipe when I was a child. It is the ONLY way I will eat green onions.,4.57,689,539,"¼ cup butter ; ¼ cup soy sauce ; 1 bunch green onions ; 2 cloves garlic, minced ; 6 pork butt steaks","Melt butter in a skillet, and mix in the soy sauce. Saute the green onions and garlic until lightly browned. Place the pork steaks in the skillet, cover, and cook 8 to 10 minutes on each side, Remove cover, and continue cooking 10 minutes, or to an internal temperature of 145 degrees F (63 degrees C).",15 mins,30 mins,6,353.1,3.9,1.1,25.4,11.4,118.0,26.5,1.1,719.7,228.4,59.0,2.5,35.4,436.9,618.3,9.0,7.4,25.8,0.7,45 mins,"[¼ cup butter, ¼ cup soy sauce, 1 bunch green onions, 2 cloves garlic, minced, 6 pork butt steaks]","[soy sauce, pork butt, bunch green onions, cloves, butter]",5,"Low Carb, High Protein, Low Sugar",1.42,2.2,32.56,57.0,39.33,53.0,3.93,31.29,4.54,13.89,8.43,9.3,12.37,56.25,8.22,6.45,58.33,11.43,56
5,Chicken Parmesan,https://www.allrecipes.com/recipe/223042/chicken-parmesan/,world-cuisine,Chef John,"My version of chicken parmesan is a little different than what they do in the restaurants, with less sauce and a crispier crust.",4.83,4245,2662,"4 skinless, boneless chicken breast halves ; salt and freshly ground black pepper to taste ; 2 eggs ; 1 cup panko bread crumbs, or more as needed ; ½ cup grated Parmesan cheese ; 2 tablespoons all-purpose flour, or more if needed ; 1 cup olive oil for frying ; ½ cup prepared tomato sauce ; ¼ cup fresh mozzarella, cut into small cubes ; ¼ cup chopped fresh basil ; ½ cup grated provolone cheese ; ¼ cup grated Parmesan cheese ; 1 tablespoon olive oil","Preheat an oven to 450 degrees F (230 degrees C). Place chicken breasts between two sheets of heavy plastic (resealable freezer bags work well) on a solid, level surface. Firmly pound chicken with the smooth side of a meat mallet to a thickness of 1/2-inch. Season chicken thoroughly with salt and pepper. Beat eggs in a shallow bowl and set aside. Mix bread crumbs and 1/2 cup Parmesan cheese in a separate bowl, set aside. Place flour in a sifter or strainer; sprinkle over chicken breasts, evenly coating both sides. Dip flour coated chicken breast in beaten eggs. Transfer breast to breadcrumb mixture, pressing the crumbs into both sides. Repeat for each breast. Set aside breaded chicken breasts for about 15 minutes. Heat 1 cup olive oil in a large skillet on medium-high heat until it begins to shimmer. Cook chicken until golden, about 2 minutes on each side. The chicken will finish cooking in the oven. Place chicken in a baking dish and top each breast with about 1/3 cup of tomato sauce. Layer each chicken breast with equal amounts of mozzarella cheese, fresh basil, and provolone cheese. Sprinkle 1 to 2 tablespoons of Parmesan cheese on top and drizzle with 1 tablespoon olive oil. Bake in the preheated oven until cheese is browned and bubbly, and chicken breasts are no longer pink in the center, 15 to 20 minutes. An instant-read thermometer inserted into the center should read at least 165 degrees F (74 degrees C).",25 mins,20 mins,4,470.8,24.8,1.7,24.9,9.1,186.7,42.1,0.6,840.3,223.8,380.2,2.1,44.4,388.9,628.4,18.9,2.6,30.9,0.1,45 mins,"[4 skinless, boneless chicken breast halves, salt and freshly ground black pepper to taste, 2 eggs, 1 cup panko bread crumbs, or more as needed, ½ cup grated Parmesan cheese, 2 tablespoons all-purpose flour, or more if needed, 1 cup olive oil for frying, ½ cup prepared tomato sauce, ¼ cup fresh mozzarella, cut into small cubes, ¼ cup chopped fresh basil, ½ cup grated provolone cheese, ¼ cup grated Parmesan cheese, 1 tablespoon olive oil]","[tomato sauce, fresh mozzarella, skinless, all - purpose flour, salt, panko bread crumbs, Parmesan cheese, eggs, olive oil, boneless chicken breast, provolone cheese, fresh basil, black pepper]",13,"High Protein, Low Sugar",9.02,3.4,31.92,45.5,62.23,84.2,2.14,36.53,29.25,11.67,10.57,8.27,12.57,118.12,2.89,7.72,8.33,11.2,248


In [64]:
# Save DataFrame to a CSV file
df.to_csv('all_recipes_final_df.csv', index=False)

In [66]:
#now, let's look at the shape of df:
shape = df.shape
print("Number of rows:", shape[0], "\nNumber of columns:", shape[1])

Number of rows: 27099 
Number of columns: 56


In [67]:
df = df.drop(columns='author')
df.head()

Unnamed: 0,name,url,category,summary,rating,rating_count,review_count,ingredients,directions,prep,cook,servings,calories,carbohydrates_g,sugars_g,fat_g,saturated_fat_g,cholesterol_mg,protein_g,dietary_fiber_g,sodium_mg,calories_from_fat,calcium_mg,iron_mg,magnesium_mg,potassium_mg,vitamin_a_iu_IU,niacin_equivalents_mg,vitamin_c_mg,folate_mcg,thiamin_mg,total,parsed_ingredients,high_level_ingredients,ingredient_count,diet_type,carbohydrates_g_dv_perc,sugars_g_dv_perc,fat_g_dv_perc,saturated_fat_g_dv_perc,cholesterol_mg_dv_perc,protein_g_dv_perc,dietary_fiber_g_dv_perc,sodium_mg_dv_perc,calcium_mg_dv_perc,iron_mg_dv_perc,magnesium_mg_dv_perc,potassium_mg_dv_perc,vitamin_a_iu_IU_dv_perc,niacin_equivalents_mg_dv_perc,vitamin_c_mg_dv_perc,folate_mcg_dv_perc,thiamin_mg_dv_perc,calories_from_fat_dv_perc,recipe_length
0,Simple Macaroni and Cheese,https://www.allrecipes.com/recipe/238691/simple-macaroni-and-cheese/,main-dish,"A very quick and easy fix to a tasty side-dish. Fancy, designer mac and cheese often costs forty or fifty dollars to prepare when you have so many exotic and expensive cheeses, but they aren't always the best tasting. This recipe is cheap and tasty.",4.42,834,575,1 (8 ounce) box elbow macaroni ; ¼ cup butter ; ¼ cup all-purpose flour ; ½ teaspoon salt ; ground black pepper to taste ; 2 cups milk ; 2 cups shredded Cheddar cheese,"Bring a large pot of lightly salted water to a boil. Cook elbow macaroni in the boiling water, stirring occasionally until cooked through but firm to the bite, 8 minutes. Drain. Melt butter in a saucepan over medium heat; stir in flour, salt, and pepper until smooth, about 5 minutes. Slowly pour milk into butter-flour mixture while continuously stirring until mixture is smooth and bubbling, about 5 minutes. Add Cheddar cheese to milk mixture and stir until cheese is melted, 2 to 4 minutes. Fold macaroni into cheese sauce until coated.",10 mins,20 mins,4,630.2,55.0,7.6,33.6,20.9,99.6,26.5,2.1,777.0,302.2,567.9,2.7,61.8,380.0,1152.0,10.1,0.3,165.6,0.7,30 mins,"[1 (8 ounce) box elbow macaroni, ¼ cup butter, ¼ cup all-purpose flour, ½ teaspoon salt, ground black pepper to taste, 2 cups milk, 2 cups shredded Cheddar cheese]","[all - purpose flour, salt, milk, elbow macaroni, shredded Cheddar cheese, butter, black pepper]",7,High Protein,20.0,15.2,43.08,104.5,33.2,53.0,7.5,33.78,43.68,15.0,14.71,8.09,23.04,63.12,0.33,41.4,58.33,15.12,91
1,Gourmet Mushroom Risotto,https://www.allrecipes.com/recipe/85389/gourmet-mushroom-risotto/,main-dish,"Authentic Italian-style risotto cooked the slow and painful way, but oh so worth it. Complements grilled meats and chicken dishes very well. Check the rice by biting into it. It should be slightly al dente (or resist slightly to the tooth but not be hard in the center).",4.8,3388,2245,"6 cups chicken broth, divided ; 3 tablespoons olive oil, divided ; 1 pound portobello mushrooms, thinly sliced ; 1 pound white mushrooms, thinly sliced ; 2 shallots, diced ; 1 ½ cups Arborio rice ; ½ cup dry white wine ; sea salt to taste ; freshly ground black pepper to taste ; 3 tablespoons finely chopped chives ; 4 tablespoons butter ; ⅓ cup freshly grated Parmesan cheese","In a saucepan, warm the broth over low heat. Warm 2 tablespoons olive oil in a large saucepan over medium-high heat. Stir in the mushrooms, and cook until soft, about 3 minutes. Remove mushrooms and their liquid, and set aside. Add 1 tablespoon olive oil to skillet, and stir in the shallots. Cook 1 minute. Add rice, stirring to coat with oil, about 2 minutes. When the rice has taken on a pale, golden color, pour in wine, stirring constantly until the wine is fully absorbed. Add 1/2 cup broth to the rice, and stir until the broth is absorbed. Continue adding broth 1/2 cup at a time, stirring continuously, until the liquid is absorbed and the rice is al dente, about 15 to 20 minutes. Remove from heat, and stir in mushrooms with their liquid, butter, chives, and parmesan. Season with salt and pepper to taste.",20 mins,30 mins,6,430.6,56.6,4.4,16.6,6.6,29.3,11.3,2.7,1130.8,149.8,70.1,2.1,24.1,692.0,520.3,7.5,3.8,36.9,0.1,50 mins,"[6 cups chicken broth, divided, 3 tablespoons olive oil, divided, 1 pound portobello mushrooms, thinly sliced, 1 pound white mushrooms, thinly sliced, 2 shallots, diced, 1 ½ cups Arborio rice, ½ cup dry white wine, sea salt to taste, freshly ground black pepper to taste, 3 tablespoons finely chopped chives, 4 tablespoons butter, ⅓ cup freshly grated Parmesan cheese]","[chicken broth, shallots, Arborio rice, butter, sea salt, Parmesan cheese, finely chives, olive oil, white mushrooms, portobello mushrooms, black pepper, dry white wine]",12,Low Sugar,20.58,8.8,21.28,33.0,9.77,22.6,9.64,49.17,5.39,11.67,5.74,14.72,10.41,46.88,4.22,9.22,8.33,7.47,147
2,Dessert Crepes,https://www.allrecipes.com/recipe/19037/dessert-crepes/,breakfast-and-brunch,"Essential crepe recipe. Sprinkle warm crepes with sugar and lemon, or serve with cream or ice cream and fruit.",4.8,1156,794,"4 eggs, lightly beaten ; 1 ⅓ cups milk ; 2 tablespoons butter, melted ; 1 cup all-purpose flour ; 2 tablespoons white sugar ; ½ teaspoon salt","In large bowl, whisk together eggs, milk, melted butter, flour sugar and salt until smooth. Heat a medium-sized skillet or crepe pan over medium heat. Grease pan with a small amount of butter or oil applied with a brush or paper towel. Using a serving spoon or small ladle, spoon about 3 tablespoons crepe batter into hot pan, tilting the pan so that bottom surface is evenly coated. Cook over medium heat, 1 to 2 minutes on a side, or until golden brown. Serve immediately.",10 mins,10 mins,8,163.8,17.2,5.3,7.7,3.4,111.1,6.4,0.4,234.5,69.0,65.6,1.2,11.2,115.4,347.8,2.3,0.1,43.5,0.2,20 mins,"[4 eggs, lightly beaten, 1 ⅓ cups milk, 2 tablespoons butter, melted, 1 cup all-purpose flour, 2 tablespoons white sugar, ½ teaspoon salt]","[all - purpose flour, milk, white sugar, salt, eggs, butter]",6,"Low Carb, Low Fat",6.25,10.6,9.87,17.0,37.03,12.8,1.43,10.2,5.05,6.67,2.67,2.46,6.96,14.37,0.11,10.88,16.67,3.46,85
3,Pork Steaks,https://www.allrecipes.com/recipe/70463/pork-steaks/,meat-and-poultry,My mom came up with this recipe when I was a child. It is the ONLY way I will eat green onions.,4.57,689,539,"¼ cup butter ; ¼ cup soy sauce ; 1 bunch green onions ; 2 cloves garlic, minced ; 6 pork butt steaks","Melt butter in a skillet, and mix in the soy sauce. Saute the green onions and garlic until lightly browned. Place the pork steaks in the skillet, cover, and cook 8 to 10 minutes on each side, Remove cover, and continue cooking 10 minutes, or to an internal temperature of 145 degrees F (63 degrees C).",15 mins,30 mins,6,353.1,3.9,1.1,25.4,11.4,118.0,26.5,1.1,719.7,228.4,59.0,2.5,35.4,436.9,618.3,9.0,7.4,25.8,0.7,45 mins,"[¼ cup butter, ¼ cup soy sauce, 1 bunch green onions, 2 cloves garlic, minced, 6 pork butt steaks]","[soy sauce, pork butt, bunch green onions, cloves, butter]",5,"Low Carb, High Protein, Low Sugar",1.42,2.2,32.56,57.0,39.33,53.0,3.93,31.29,4.54,13.89,8.43,9.3,12.37,56.25,8.22,6.45,58.33,11.43,56
5,Chicken Parmesan,https://www.allrecipes.com/recipe/223042/chicken-parmesan/,world-cuisine,"My version of chicken parmesan is a little different than what they do in the restaurants, with less sauce and a crispier crust.",4.83,4245,2662,"4 skinless, boneless chicken breast halves ; salt and freshly ground black pepper to taste ; 2 eggs ; 1 cup panko bread crumbs, or more as needed ; ½ cup grated Parmesan cheese ; 2 tablespoons all-purpose flour, or more if needed ; 1 cup olive oil for frying ; ½ cup prepared tomato sauce ; ¼ cup fresh mozzarella, cut into small cubes ; ¼ cup chopped fresh basil ; ½ cup grated provolone cheese ; ¼ cup grated Parmesan cheese ; 1 tablespoon olive oil","Preheat an oven to 450 degrees F (230 degrees C). Place chicken breasts between two sheets of heavy plastic (resealable freezer bags work well) on a solid, level surface. Firmly pound chicken with the smooth side of a meat mallet to a thickness of 1/2-inch. Season chicken thoroughly with salt and pepper. Beat eggs in a shallow bowl and set aside. Mix bread crumbs and 1/2 cup Parmesan cheese in a separate bowl, set aside. Place flour in a sifter or strainer; sprinkle over chicken breasts, evenly coating both sides. Dip flour coated chicken breast in beaten eggs. Transfer breast to breadcrumb mixture, pressing the crumbs into both sides. Repeat for each breast. Set aside breaded chicken breasts for about 15 minutes. Heat 1 cup olive oil in a large skillet on medium-high heat until it begins to shimmer. Cook chicken until golden, about 2 minutes on each side. The chicken will finish cooking in the oven. Place chicken in a baking dish and top each breast with about 1/3 cup of tomato sauce. Layer each chicken breast with equal amounts of mozzarella cheese, fresh basil, and provolone cheese. Sprinkle 1 to 2 tablespoons of Parmesan cheese on top and drizzle with 1 tablespoon olive oil. Bake in the preheated oven until cheese is browned and bubbly, and chicken breasts are no longer pink in the center, 15 to 20 minutes. An instant-read thermometer inserted into the center should read at least 165 degrees F (74 degrees C).",25 mins,20 mins,4,470.8,24.8,1.7,24.9,9.1,186.7,42.1,0.6,840.3,223.8,380.2,2.1,44.4,388.9,628.4,18.9,2.6,30.9,0.1,45 mins,"[4 skinless, boneless chicken breast halves, salt and freshly ground black pepper to taste, 2 eggs, 1 cup panko bread crumbs, or more as needed, ½ cup grated Parmesan cheese, 2 tablespoons all-purpose flour, or more if needed, 1 cup olive oil for frying, ½ cup prepared tomato sauce, ¼ cup fresh mozzarella, cut into small cubes, ¼ cup chopped fresh basil, ½ cup grated provolone cheese, ¼ cup grated Parmesan cheese, 1 tablespoon olive oil]","[tomato sauce, fresh mozzarella, skinless, all - purpose flour, salt, panko bread crumbs, Parmesan cheese, eggs, olive oil, boneless chicken breast, provolone cheese, fresh basil, black pepper]",13,"High Protein, Low Sugar",9.02,3.4,31.92,45.5,62.23,84.2,2.14,36.53,29.25,11.67,10.57,8.27,12.57,118.12,2.89,7.72,8.33,11.2,248


In [68]:
#now, let's look at the shape of df:
shape = df.shape
print("Number of rows:", shape[0], "\nNumber of columns:", shape[1])

Number of rows: 27099 
Number of columns: 55


## **Saving the clean dataframe**<a id='Saving_the_cleaned_dataframe'></a>
[Contents](#Contents)

In [69]:
# Save DataFrame to a CSV file
df.to_csv('all_recipes_final_df.csv', index=False)