# Recipe Recommendation System: What's On The Menu? 🍽️👩‍🍳
# Data Cleaning and Preprocessing Steps

## Table of contents
* [1. Introduction](#introduction)
* [2. Dataset](#dataset)
    * [2.1 Data Import](#import)
    * [2.2 Data Understanding](#understanding)
    * [2.3 Data Dictionary](#dictionary)
* [3. Data Cleaning](#cleaning)
    * [3.1 Missing Data](#missing)
    * [3.2 Duplicated Data](#duplicated)
* [4. Data Preprocessing](#preprocessing)
    * [4.1 Feature Engineering and Filtering](#featureeng)
        * [4.1.1 Serving Size](#servings)
            * [4.1.1.1 Measure Conversion: dozen to single units](#dozen)
        * [4.1.3 Ingredient Counter](#counter)
        * [4.1.4 Meal Types](#mealtypes)
            * [4.1.4.1 First Round of Recipe Labelling](#firstlabel)
            * [4.1.4.2 Second Round of Recipe Labelling](#secondlabel)
* [5. Saving the Data](#saving)  
 * [6. Conclusion](#conclusion)

---

## 1. Introduction <a name="introduction"></a>

Households often face the challenge of managing food resources efficiently. A significant amount of food is wasted due to over purchasing, improper storage, and the inability to use ingredients before they spoil. Individuals often struggle to plan meals that make the best use of what they already have available at home, leading to unnecessary expenditure on groceries. To address this issue, I present a recipe recommendation system that provides relevant recipe options tailored to the ingredients users already have.

The core idea behind an ingredient-based recipe recommendation system is to empower users with the ability to make the most of what is already in their kitchen. This approach can help in reducing expenses and food waste. Additionally, this system can simplify the meal planning process while inspiring culinary creativity and experimentation.

## 2. Dataset <a name="dataset"></a>


This dataset consists of cooking recipes from RecipeNLG, which is an expanded version of Recipe1M+. The new dataset provides over 1 million new, preprocessed and deduplicated recipes on top of the Recipe1M+ dataset. It offers a significantly large selection of recipes of approximately 2.2 million recipes in total. It emphasizes recipe text, structure and logic, rather than linking recipes to corresponding images.

### 2.1 Data Import <a name="import"></a>

In [7]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
import ast

Load the recipe dataset and create dataframe.

In [9]:
recipe_df = pd.read_csv(r"C:\Users\cryst\Desktop\BrainStation - Data Science\Capstone\Capstone Files\data\recipe_dataset.csv", index_col=0)

Adjusting the width of the column to display all content in the columns to ensure that the dataset is not truncated.

In [11]:
pd.set_option('display.max_colwidth', None)

In [12]:
#pd.reset_option('display.max_colwidth')

In [13]:
#pd.reset_option('display.max_rows')

### 2.2 Data Understanding <a name="understanding"></a>

#### 2.2.1 Shape and First Look of the Dataframe

In [16]:
print(f'This dataset contains {recipe_df.shape[0]} rows and {recipe_df.shape[1]} columns')

This dataset contains 2231142 rows and 6 columns


Peep into the dataframe.

In [18]:
recipe_df.head()

Unnamed: 0,title,ingredients,directions,link,source,NER
0,No-Bake Nut Cookies,"[""1 c. firmly packed brown sugar"", ""1/2 c. evaporated milk"", ""1/2 tsp. vanilla"", ""1/2 c. broken nuts (pecans)"", ""2 Tbsp. butter or margarine"", ""3 1/2 c. bite size shredded rice biscuits""]","[""In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine."", ""Stir over medium heat until mixture bubbles all over top."", ""Boil and stir 5 minutes more. Take off heat."", ""Stir in vanilla and cereal; mix well."", ""Using 2 teaspoons, drop and shape into 30 clusters on wax paper."", ""Let stand until firm, about 30 minutes.""]",www.cookbooks.com/Recipe-Details.aspx?id=44874,Gathered,"[""brown sugar"", ""milk"", ""vanilla"", ""nuts"", ""butter"", ""bite size shredded rice biscuits""]"
1,Jewell Ball'S Chicken,"[""1 small jar chipped beef, cut up"", ""4 boned chicken breasts"", ""1 can cream of mushroom soup"", ""1 carton sour cream""]","[""Place chipped beef on bottom of baking dish."", ""Place chicken on top of beef."", ""Mix soup and cream together; pour over chicken. Bake, uncovered, at 275\u00b0 for 3 hours.""]",www.cookbooks.com/Recipe-Details.aspx?id=699419,Gathered,"[""beef"", ""chicken breasts"", ""cream of mushroom soup"", ""sour cream""]"
2,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg. cream cheese, cubed"", ""1/3 c. butter, cubed"", ""1/2 tsp. garlic powder"", ""1/2 tsp. salt"", ""1/4 tsp. pepper""]","[""In a slow cooker, combine all ingredients. Cover and cook on low for 4 hours or until heated through and cheese is melted. Stir well before serving. Yields 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""garlic powder"", ""salt"", ""pepper""]"
3,Chicken Funny,"[""1 large whole chicken"", ""2 (10 1/2 oz.) cans chicken gravy"", ""1 (10 1/2 oz.) can cream of mushroom soup"", ""1 (6 oz.) box Stove Top stuffing"", ""4 oz. shredded cheese""]","[""Boil and debone chicken."", ""Put bite size pieces in average size square casserole dish."", ""Pour gravy and cream of mushroom soup over chicken; level."", ""Make stuffing according to instructions on box (do not make too moist)."", ""Put stuffing on top of chicken and gravy; level."", ""Sprinkle shredded cheese on top and bake at 350\u00b0 for approximately 20 minutes or until golden and bubbly.""]",www.cookbooks.com/Recipe-Details.aspx?id=897570,Gathered,"[""chicken"", ""chicken gravy"", ""cream of mushroom soup"", ""shredded cheese""]"
4,Reeses Cups(Candy),"[""1 c. peanut butter"", ""3/4 c. graham cracker crumbs"", ""1 c. melted butter"", ""1 lb. (3 1/2 c.) powdered sugar"", ""1 large pkg. chocolate chips""]","[""Combine first four ingredients and press in 13 x 9-inch ungreased pan."", ""Melt chocolate chips and spread over mixture. Refrigerate for about 20 minutes and cut into pieces before chocolate gets hard."", ""Keep in refrigerator.""]",www.cookbooks.com/Recipe-Details.aspx?id=659239,Gathered,"[""peanut butter"", ""graham cracker crumbs"", ""butter"", ""powdered sugar"", ""chocolate chips""]"


At first glance, this dataset seems to contain several variables that would be helpful for my project to build a recipe recommendation tool based on ingredients. Some really useful variables for this purpose would be 'title', 'ingredients', 'directions' and 'NER' (Name Entity Recognizer). Other variables in this dataset include 'link' which contains the same information as 'directions', but the link takes users to additional features such as printing, e-mailing, saving the recipe. Lastly, 'source' just tells us how the recipe was gathered. One of the drawbacks of this dataset is that it doesn't contain the number of calories or nutritional value information.

#### 2.2.2. Data Types

Next, we will take a look at `recipe_df.info()` to print information on column names, data types, and total number of observations.

In [22]:
recipe_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2231142 entries, 0 to 2231141
Data columns (total 6 columns):
 #   Column       Dtype 
---  ------       ----- 
 0   title        object
 1   ingredients  object
 2   directions   object
 3   link         object
 4   source       object
 5   NER          object
dtypes: object(6)
memory usage: 119.2+ MB


In this section we can see that all of the columns are categorical.

In [24]:
recipe_df.select_dtypes(include = 'object').head()

Unnamed: 0,title,ingredients,directions,link,source,NER
0,No-Bake Nut Cookies,"[""1 c. firmly packed brown sugar"", ""1/2 c. evaporated milk"", ""1/2 tsp. vanilla"", ""1/2 c. broken nuts (pecans)"", ""2 Tbsp. butter or margarine"", ""3 1/2 c. bite size shredded rice biscuits""]","[""In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine."", ""Stir over medium heat until mixture bubbles all over top."", ""Boil and stir 5 minutes more. Take off heat."", ""Stir in vanilla and cereal; mix well."", ""Using 2 teaspoons, drop and shape into 30 clusters on wax paper."", ""Let stand until firm, about 30 minutes.""]",www.cookbooks.com/Recipe-Details.aspx?id=44874,Gathered,"[""brown sugar"", ""milk"", ""vanilla"", ""nuts"", ""butter"", ""bite size shredded rice biscuits""]"
1,Jewell Ball'S Chicken,"[""1 small jar chipped beef, cut up"", ""4 boned chicken breasts"", ""1 can cream of mushroom soup"", ""1 carton sour cream""]","[""Place chipped beef on bottom of baking dish."", ""Place chicken on top of beef."", ""Mix soup and cream together; pour over chicken. Bake, uncovered, at 275\u00b0 for 3 hours.""]",www.cookbooks.com/Recipe-Details.aspx?id=699419,Gathered,"[""beef"", ""chicken breasts"", ""cream of mushroom soup"", ""sour cream""]"
2,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg. cream cheese, cubed"", ""1/3 c. butter, cubed"", ""1/2 tsp. garlic powder"", ""1/2 tsp. salt"", ""1/4 tsp. pepper""]","[""In a slow cooker, combine all ingredients. Cover and cook on low for 4 hours or until heated through and cheese is melted. Stir well before serving. Yields 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""garlic powder"", ""salt"", ""pepper""]"
3,Chicken Funny,"[""1 large whole chicken"", ""2 (10 1/2 oz.) cans chicken gravy"", ""1 (10 1/2 oz.) can cream of mushroom soup"", ""1 (6 oz.) box Stove Top stuffing"", ""4 oz. shredded cheese""]","[""Boil and debone chicken."", ""Put bite size pieces in average size square casserole dish."", ""Pour gravy and cream of mushroom soup over chicken; level."", ""Make stuffing according to instructions on box (do not make too moist)."", ""Put stuffing on top of chicken and gravy; level."", ""Sprinkle shredded cheese on top and bake at 350\u00b0 for approximately 20 minutes or until golden and bubbly.""]",www.cookbooks.com/Recipe-Details.aspx?id=897570,Gathered,"[""chicken"", ""chicken gravy"", ""cream of mushroom soup"", ""shredded cheese""]"
4,Reeses Cups(Candy),"[""1 c. peanut butter"", ""3/4 c. graham cracker crumbs"", ""1 c. melted butter"", ""1 lb. (3 1/2 c.) powdered sugar"", ""1 large pkg. chocolate chips""]","[""Combine first four ingredients and press in 13 x 9-inch ungreased pan."", ""Melt chocolate chips and spread over mixture. Refrigerate for about 20 minutes and cut into pieces before chocolate gets hard."", ""Keep in refrigerator.""]",www.cookbooks.com/Recipe-Details.aspx?id=659239,Gathered,"[""peanut butter"", ""graham cracker crumbs"", ""butter"", ""powdered sugar"", ""chocolate chips""]"


Again, we see that all our columns are text. Focusing on the `directions` column we see that some recipes contain the serving size, while others do not. Eventually I will probably want to extract this information and focus only on recipes that contain the serving size for two purposes. First, because it provides a measure of how much food each recipe yields, which could help with reducing waste should the user not need that much food, or on the other hand, prepare more food if the recipe is too small. Secondly, having this field would be useful in calculating the amount of calories per meal.

### 2.3 Data Dictionary <a name="dictionary"></a>

Now that we have taken a look at the dataframe, let's start putting our learnings together in a data dictionary.

|Column Name|Meaning|Data Type|Notes|
|---|---|---|---|
|title|Name of the recipe|object| |
|ingredients| List of measures, units and ingredients that form the recipe|object| |
|directions| Steps to prepare the recipe|object| |
|link| URL link to the recipe|object| |
|source| Label showing how the recipe was obtained |object| Gathered: recipes gathered from multiple cooking web pages using web scraping process; Recipe1M: recipes from Recipe1M dataset |
|NER| Contains list of ingredient names in the recipe|object| Name Entity Recognizer|

## 3. Data Cleaning <a name="cleaning"></a>

### 3.1 Missing Data <a name="missing"></a>

First, let's see if there is any missing data in our dataset.

In [32]:
recipe_df.isna().sum()

title          1
ingredients    0
directions     0
link           0
source         0
NER            0
dtype: int64

There is one missing value in the `title` column. Let's examine this row further and explore how to deal with this row.

In [34]:
recipe_df[recipe_df['title'].isna()]

Unnamed: 0,title,ingredients,directions,link,source,NER
1394448,,"[""2 pieces bacon""]","[""Slice bacon into lardons, place in nonstick skillet and cook on medium heat until crisp and fat is rendered- about 7 minutes."", ""Meanwhile, in a small bowl add: fermented bean paste, gochujang, soy sauce, honey, coarse black pepper and kosher salt. Stir to combine, set aside. Cut onion into a small dice, slice garlic, and cube tofu- set aside."", ""Once the bacon is cooked, drain on a paper towel and drain all but 1 tablespoon of bacon fat from the pan. On medium high heat, add the onion and garlic and sweat until translucent."", ""Add tofu to the pan and turn heat up to high, lightly frying the tofu in the bacon fat. Toss and brown until heated through- about 3 minutes. Add the spicy sauce and 1/3 cup of water to the tofu, stirring gently to prevent breaking up the tofu. Cook on high for 4-6 minutes until sauce has thickened and coated the tofu. Turn off heat and drizzle tofu with sesame oil."", ""Slice green onions on a bias and place in a small bowl. Dress the green onions with a pinch of gochugaru (red pepper flakes) and 2 teaspoons of rice wine vinegar."", ""Spoon tofu into a shallow bowl and garnish with dressed green onions and crisp bacon. Serve with white rice and devour immediately.""]",food52.com/recipes/57431-none,Gathered,"[""bacon""]"


After examining this row, it appears that the directions don't match up with the ingredients column. Since there is only one missing value in this case, I will drop this row from the dataset and maintain consistency across the rest of the dataset.

In [36]:
recipe_df.dropna(subset = ['title'], inplace = True)

In [37]:
# Sanity Check
recipe_df.isna().sum()

title          0
ingredients    0
directions     0
link           0
source         0
NER            0
dtype: int64

Here we can see that the row was successfully dropped from our dataset.

### 3.2 Duplicated Data <a name="duplicated"></a>

Let's see if our dataset contains any duplicated values.

In [41]:
print("duplicated rows in recipes dataset:", recipe_df.duplicated().sum())

duplicated rows in recipes dataset: 0


After examining for duplicated rows, we can see that there are no duplicates in our dataset!

## 4. Data Preprocessing <a name="preprocessing"></a>

### 4.1 Feature Engineering <a name="featureeng"></a>

#### 4.1.1 Serving Size <a name="servings"></a>

Here I filter recipes based on terms 'serving', servings' preceded by a number, and terms 'serves', 'serve' followed by a number.

In [47]:
recipe_df_filtered = recipe_df[recipe_df['directions'].str.contains(pat = r'\b\d+\s*(serving|servings)\b|\b(serves|serve)\s*\d+\b', case=False, na=False, regex = True)]                                                                   

  recipe_df_filtered = recipe_df[recipe_df['directions'].str.contains(pat = r'\b\d+\s*(serving|servings)\b|\b(serves|serve)\s*\d+\b', case=False, na=False, regex = True)]


In [48]:
recipe_df_filtered.shape

(105543, 6)

In [49]:
recipe_df_filtered

Unnamed: 0,title,ingredients,directions,link,source,NER
2,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg. cream cheese, cubed"", ""1/3 c. butter, cubed"", ""1/2 tsp. garlic powder"", ""1/2 tsp. salt"", ""1/4 tsp. pepper""]","[""In a slow cooker, combine all ingredients. Cover and cook on low for 4 hours or until heated through and cheese is melted. Stir well before serving. Yields 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""garlic powder"", ""salt"", ""pepper""]"
17,Broccoli Salad,"[""1 large head broccoli (about 1 1/2 lb.)"", ""10 slices bacon, cooked and crumbled"", ""5 green onions, sliced or 1/4 c. chopped red onion"", ""1/2 c. raisins"", ""1 c. mayonnaise"", ""2 Tbsp. vinegar"", ""1/4 c. sugar""]","[""Trim off large leaves of broccoli and remove the tough ends of lower stalks. Wash the broccoli thoroughly. Cut the florets and stems into bite-size pieces. Place in a large bowl. Add bacon, onions and raisins. Combine remaining ingredients, stirring well. Add dressing to broccoli mixture and toss gently. Cover and refrigerate 2 to 3 hours. Makes about 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=50992,Gathered,"[""broccoli"", ""bacon"", ""green onions"", ""raisins"", ""mayonnaise"", ""vinegar"", ""sugar""]"
24,Prize-Winning Meat Loaf,"[""1 1/2 lb. ground beef"", ""1 c. tomato juice"", ""3/4 c. oats (uncooked)"", ""1 egg, beaten"", ""1/4 c. chopped onion"", ""1/4 tsp. pepper"", ""1 1/2 tsp. salt""]","[""Mix well."", ""Press firmly into an 8 1/2 x 4 1/2 x 2 1/2-inch loaf pan."", ""Bake in preheated moderate oven."", ""Bake at 350\u00b0 for 1 hour."", ""Let stand 5 minutes before slicing."", ""Makes 8 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=923674,Gathered,"[""ground beef"", ""tomato juice"", ""oats"", ""egg"", ""onion"", ""pepper"", ""salt""]"
26,Corral Barbecued Beef Steak Strips,"[""2 lb. round steak 1/2 to 3/4-inch thick, sliced in strips 1/8-inch thick (or thinner) and 3 1/2 to 4-inches long (easily sliced if partially frozen)"", ""2 Tbsp. cooking oil"", ""1 can (15 oz.) tomato sauce"", ""1/3 c. water"", ""2 Tbsp. brown sugar"", ""1 Tbsp. prepared mustard"", ""1 tbsp. Worcestershire sauce"", ""1 medium sized onion, thinly sliced""]","[""Brown strips in cooking oil."", ""Pour off drippings."", ""Combine tomato sauce, water, brown sugar, mustard and Worcestershire sauce."", ""Add sauce and onion to meat slices."", ""Cover and cook slowly, stirring occasionally 30 minutes or until meat is tender. Serve over rice or buttered noodles."", ""Yields 6 to 8 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=420402,Gathered,"[""long"", ""cooking oil"", ""tomato sauce"", ""water"", ""brown sugar"", ""mustard"", ""Worcestershire sauce"", ""onion""]"
48,Mexican Cookie Rings,"[""1 1/2 c. sifted flour"", ""1/2 tsp. baking powder"", ""1/2 tsp. salt"", ""1/2 c. butter"", ""2/3 c. sugar"", ""3 egg yolks"", ""1 tsp. vanilla"", ""multi-colored candies""]","[""Sift flour, baking powder and salt together."", ""Cream together butter and sugar."", ""Add egg yolks and vanilla."", ""Beat until light and fluffy."", ""Mix in sifted dry ingredients."", ""Shape into 1-inch balls."", ""Push wooden spoon handle through center (twist)."", ""Shape into rings."", ""Dip each cookie into candies."", ""Place on lightly greased baking sheets."", ""Bake in 375\u00b0 oven for 10 to 12 minutes or until golden brown."", ""Cool on racks."", ""Serves 2 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=364136,Gathered,"[""flour"", ""baking powder"", ""salt"", ""butter"", ""sugar"", ""egg yolks"", ""vanilla"", ""multi-colored candies""]"
...,...,...,...,...,...,...
2231009,Chicken Stuffing Mix Recipe,"[""1 (8 ounce.) stuffing mix & 4 slices bread"", ""1/2 c. butter, melted"", ""1 c. chicken broth"", ""2 1/2 c. chicken, diced"", ""1 c. onion, minced"", ""1/2 c. celery, minced"", ""1/2 c. salad dressing"", ""3/4 teaspoon salt"", ""2 Large eggs"", ""1 1/2 c. lowfat milk"", ""1 can cream of mushroom soup"", ""1 c. Cheddar cheese, shredded""]","[""Mix first 3 ingredients."", ""Put in 9 x 13 inch pan."", ""Combine chicken, onion, celery, salad dressing and salt."", ""Spread on top of dressing."", ""Save a little for top."", ""Mix Large eggs and lowfat milk, pour over."", ""Cover with foil and chill at least 6 hrs."", ""Before baking, take the cream of mushroom soup and spread over stuffing."", ""Bake at 325 degrees for 40 min."", ""Sprinkle with 1 c. Cheddar cheese."", ""Bake 10 min more."", ""Cut into squares."", ""Serves 8.""]",cookeatshare.com/recipes/chicken-stuffing-mix-38487,Recipes1M,"[""stuffing mix"", ""butter"", ""chicken broth"", ""chicken"", ""onion"", ""celery"", ""salad dressing"", ""salt"", ""eggs"", ""milk"", ""cream of mushroom soup"", ""Cheddar cheese""]"
2231028,Blackberry Upside Down Cake Recipe,"[""1/2 stk margarine or possibly butter"", ""1/4 c. Sugar"", ""1 1/2 c. Blackberries"", ""2 Tbsp. Sliced almonds"", ""1 1/2 c. Bisquick original baking mix"", ""1/2 c. Sugar"", ""1/2 c. Lowfat milk or possibly water"", ""2 Tbsp. Vegetable oil"", ""1/2 tsp Vanilla"", ""1/2 tsp Almond extract"", ""1 x Egg Sweetened whipped cream or possibly ice cream, if you like""]","[""HEAT oven to 350 degrees."", ""Heat margarine in round pan, 9x1-1/2 inches, or possibly square pan, 8x8x2 inches, in oven till melted."", ""Sprinkle 1/4 c. sugar proportionately over melted margarine."", ""Arrange Blackberries with open ends up over sugar mix; sprinkle with almonds."", ""BEAT remaining ingredients except whipped cream in medium bowl on low speed 30 seconds, scraping bowl constantly."", ""Beat on medium speed 4 min, scraping bowl occasionally."", ""Pour batter over Blackberries."", ""BAKE 35 to 40 min or possibly till toothpick inserted in center comes out clean."", ""Immediately invert pan onto heatproof serving plate; leave pan over cake a few min."", ""Remove pan."", ""Let cake stand at least 10 min before serving."", ""Serve hot with whipped cream."", ""9 servings."", ""Pear Upside-down Cake: Substitute packed brown sugar for the sugar and 1 large pear, thinly sliced, for the Blackberries."", ""Substitute minced pecans for the almonds."", ""Increase vanilla to 1 tsp."", ""; omit almond extract."", ""Add in 1/2 tsp."", ""grnd mace or possibly cinnamon with the vanilla.""]",cookeatshare.com/recipes/blackberry-upside-down-cake-86757,Recipes1M,"[""margarine"", ""Sugar"", ""Blackberries"", ""almonds"", ""Bisquick original baking mix"", ""Sugar"", ""milk"", ""Vegetable oil"", ""Vanilla"", ""Egg""]"
2231063,Broken Wheat Pudding ( Lapsi Kheer ) Recipe,"[""150 gm broken wheat"", ""300 ml water"", ""4 Tbsp. jaggery grated"", ""300 ml coconut lowfat milk"", ""1 tsp cardamom pwdr""]","[""(To make coconut lowfat milk finely grate a fresh coconut add in 150ml of warm water and squeeze to extract the thick lowfat milk."", ""You should get roughly 300ml of coconut lowfat milk from a coconut.)"", ""Cook the broken wheat in the water till soft."", ""Stir in the jaggery and cook till blended."", ""Add in the coconut lowfat milk and cardamom pwdr."", ""Bring to the boil once and remove from the heat."", ""Serve hot."", ""This is a pudding from south India and is made on festive occasions."", ""Serves 4""]",cookeatshare.com/recipes/broken-wheat-pudding-lapsi-kheer-93485,Recipes1M,"[""broken wheat"", ""water"", ""jaggery grated"", ""coconut lowfat milk"", ""pwdr""]"
2231076,Boysenberry Tiramisu Recipe,"[""1 pkt Frzn red Boysenberries in, light syrup (10 ounce)"", ""2 sqr semisweet chocolate, (1 ounce)"", ""1 ct whipped cream, cheese (8 ounce)"", ""3 Tbsp. Coffee-flavor liqueur"", ""1 Tbsp. Lowfat milk"", ""1 tsp Vanilla extract"", ""1 1/2 c. Heavy or possibly whipping cream"", ""2/3 c. Vanilla wafers, coarsely Crumble (about 40 cookies) Fresh Boysenberries, garnish""]","[""About 3 hrs before serving or possibly early in day: Thaw frzn Boysenberries as label directs."", ""Meanwhile, grate semi-sweet chocolate."", ""Reserve 1/4 c. grated chocolate for garnish."", ""In large bowl, with wire whisk or possibly fork, beat cream cheese, coffee flavor liqueur, lowfat milk, vanilla extract, and remaining grated chocolate till well blended."", ""In small bowl, with mixer at medium speed, beat heavy or possibly whipping cream and confectioners' sugar till stiff peaks forms."", ""Reserve 2 c. mix for topping."", ""With rubber spatula or possibly wire whisk, fold remaining 1 c. whipped cream mix."", ""Into 8 dessert glasses, place half of crumbled vanilla wafers; top with half of cream mix."", ""Spoon half of thawed Boysenberries with their syrup over cheese mix; top with remaining vanilla wafers, remaining thawed raspberies, then with remaining cheese mix."", ""Spoon reserved whipped cream mix into decorating bag with small rosette tube."", ""Pipe whipped cream around edge of each dessert glass."", ""Sprinkle reserved grated chocolate in center of each dessert."", ""Garnish with fresh raspberies."", ""Chill at least 2 hrs to blend flavor."", ""Makes 8 servings.""]",cookeatshare.com/recipes/boysenberry-tiramisu-90221,Recipes1M,"[""red Boysenberries"", ""chocolate"", ""whipped cream"", ""Coffee-flavor"", ""milk"", ""Vanilla"", ""whipping cream"", ""Vanilla wafers""]"


After filtering the dataset, there's 105,543 recipes left which include the serving size terms. Let's reset the index.

In [51]:
recipe_df_filtered.reset_index(drop=True, inplace=True)

In [52]:
recipe_df_filtered

Unnamed: 0,title,ingredients,directions,link,source,NER
0,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg. cream cheese, cubed"", ""1/3 c. butter, cubed"", ""1/2 tsp. garlic powder"", ""1/2 tsp. salt"", ""1/4 tsp. pepper""]","[""In a slow cooker, combine all ingredients. Cover and cook on low for 4 hours or until heated through and cheese is melted. Stir well before serving. Yields 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""garlic powder"", ""salt"", ""pepper""]"
1,Broccoli Salad,"[""1 large head broccoli (about 1 1/2 lb.)"", ""10 slices bacon, cooked and crumbled"", ""5 green onions, sliced or 1/4 c. chopped red onion"", ""1/2 c. raisins"", ""1 c. mayonnaise"", ""2 Tbsp. vinegar"", ""1/4 c. sugar""]","[""Trim off large leaves of broccoli and remove the tough ends of lower stalks. Wash the broccoli thoroughly. Cut the florets and stems into bite-size pieces. Place in a large bowl. Add bacon, onions and raisins. Combine remaining ingredients, stirring well. Add dressing to broccoli mixture and toss gently. Cover and refrigerate 2 to 3 hours. Makes about 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=50992,Gathered,"[""broccoli"", ""bacon"", ""green onions"", ""raisins"", ""mayonnaise"", ""vinegar"", ""sugar""]"
2,Prize-Winning Meat Loaf,"[""1 1/2 lb. ground beef"", ""1 c. tomato juice"", ""3/4 c. oats (uncooked)"", ""1 egg, beaten"", ""1/4 c. chopped onion"", ""1/4 tsp. pepper"", ""1 1/2 tsp. salt""]","[""Mix well."", ""Press firmly into an 8 1/2 x 4 1/2 x 2 1/2-inch loaf pan."", ""Bake in preheated moderate oven."", ""Bake at 350\u00b0 for 1 hour."", ""Let stand 5 minutes before slicing."", ""Makes 8 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=923674,Gathered,"[""ground beef"", ""tomato juice"", ""oats"", ""egg"", ""onion"", ""pepper"", ""salt""]"
3,Corral Barbecued Beef Steak Strips,"[""2 lb. round steak 1/2 to 3/4-inch thick, sliced in strips 1/8-inch thick (or thinner) and 3 1/2 to 4-inches long (easily sliced if partially frozen)"", ""2 Tbsp. cooking oil"", ""1 can (15 oz.) tomato sauce"", ""1/3 c. water"", ""2 Tbsp. brown sugar"", ""1 Tbsp. prepared mustard"", ""1 tbsp. Worcestershire sauce"", ""1 medium sized onion, thinly sliced""]","[""Brown strips in cooking oil."", ""Pour off drippings."", ""Combine tomato sauce, water, brown sugar, mustard and Worcestershire sauce."", ""Add sauce and onion to meat slices."", ""Cover and cook slowly, stirring occasionally 30 minutes or until meat is tender. Serve over rice or buttered noodles."", ""Yields 6 to 8 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=420402,Gathered,"[""long"", ""cooking oil"", ""tomato sauce"", ""water"", ""brown sugar"", ""mustard"", ""Worcestershire sauce"", ""onion""]"
4,Mexican Cookie Rings,"[""1 1/2 c. sifted flour"", ""1/2 tsp. baking powder"", ""1/2 tsp. salt"", ""1/2 c. butter"", ""2/3 c. sugar"", ""3 egg yolks"", ""1 tsp. vanilla"", ""multi-colored candies""]","[""Sift flour, baking powder and salt together."", ""Cream together butter and sugar."", ""Add egg yolks and vanilla."", ""Beat until light and fluffy."", ""Mix in sifted dry ingredients."", ""Shape into 1-inch balls."", ""Push wooden spoon handle through center (twist)."", ""Shape into rings."", ""Dip each cookie into candies."", ""Place on lightly greased baking sheets."", ""Bake in 375\u00b0 oven for 10 to 12 minutes or until golden brown."", ""Cool on racks."", ""Serves 2 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=364136,Gathered,"[""flour"", ""baking powder"", ""salt"", ""butter"", ""sugar"", ""egg yolks"", ""vanilla"", ""multi-colored candies""]"
...,...,...,...,...,...,...
105538,Chicken Stuffing Mix Recipe,"[""1 (8 ounce.) stuffing mix & 4 slices bread"", ""1/2 c. butter, melted"", ""1 c. chicken broth"", ""2 1/2 c. chicken, diced"", ""1 c. onion, minced"", ""1/2 c. celery, minced"", ""1/2 c. salad dressing"", ""3/4 teaspoon salt"", ""2 Large eggs"", ""1 1/2 c. lowfat milk"", ""1 can cream of mushroom soup"", ""1 c. Cheddar cheese, shredded""]","[""Mix first 3 ingredients."", ""Put in 9 x 13 inch pan."", ""Combine chicken, onion, celery, salad dressing and salt."", ""Spread on top of dressing."", ""Save a little for top."", ""Mix Large eggs and lowfat milk, pour over."", ""Cover with foil and chill at least 6 hrs."", ""Before baking, take the cream of mushroom soup and spread over stuffing."", ""Bake at 325 degrees for 40 min."", ""Sprinkle with 1 c. Cheddar cheese."", ""Bake 10 min more."", ""Cut into squares."", ""Serves 8.""]",cookeatshare.com/recipes/chicken-stuffing-mix-38487,Recipes1M,"[""stuffing mix"", ""butter"", ""chicken broth"", ""chicken"", ""onion"", ""celery"", ""salad dressing"", ""salt"", ""eggs"", ""milk"", ""cream of mushroom soup"", ""Cheddar cheese""]"
105539,Blackberry Upside Down Cake Recipe,"[""1/2 stk margarine or possibly butter"", ""1/4 c. Sugar"", ""1 1/2 c. Blackberries"", ""2 Tbsp. Sliced almonds"", ""1 1/2 c. Bisquick original baking mix"", ""1/2 c. Sugar"", ""1/2 c. Lowfat milk or possibly water"", ""2 Tbsp. Vegetable oil"", ""1/2 tsp Vanilla"", ""1/2 tsp Almond extract"", ""1 x Egg Sweetened whipped cream or possibly ice cream, if you like""]","[""HEAT oven to 350 degrees."", ""Heat margarine in round pan, 9x1-1/2 inches, or possibly square pan, 8x8x2 inches, in oven till melted."", ""Sprinkle 1/4 c. sugar proportionately over melted margarine."", ""Arrange Blackberries with open ends up over sugar mix; sprinkle with almonds."", ""BEAT remaining ingredients except whipped cream in medium bowl on low speed 30 seconds, scraping bowl constantly."", ""Beat on medium speed 4 min, scraping bowl occasionally."", ""Pour batter over Blackberries."", ""BAKE 35 to 40 min or possibly till toothpick inserted in center comes out clean."", ""Immediately invert pan onto heatproof serving plate; leave pan over cake a few min."", ""Remove pan."", ""Let cake stand at least 10 min before serving."", ""Serve hot with whipped cream."", ""9 servings."", ""Pear Upside-down Cake: Substitute packed brown sugar for the sugar and 1 large pear, thinly sliced, for the Blackberries."", ""Substitute minced pecans for the almonds."", ""Increase vanilla to 1 tsp."", ""; omit almond extract."", ""Add in 1/2 tsp."", ""grnd mace or possibly cinnamon with the vanilla.""]",cookeatshare.com/recipes/blackberry-upside-down-cake-86757,Recipes1M,"[""margarine"", ""Sugar"", ""Blackberries"", ""almonds"", ""Bisquick original baking mix"", ""Sugar"", ""milk"", ""Vegetable oil"", ""Vanilla"", ""Egg""]"
105540,Broken Wheat Pudding ( Lapsi Kheer ) Recipe,"[""150 gm broken wheat"", ""300 ml water"", ""4 Tbsp. jaggery grated"", ""300 ml coconut lowfat milk"", ""1 tsp cardamom pwdr""]","[""(To make coconut lowfat milk finely grate a fresh coconut add in 150ml of warm water and squeeze to extract the thick lowfat milk."", ""You should get roughly 300ml of coconut lowfat milk from a coconut.)"", ""Cook the broken wheat in the water till soft."", ""Stir in the jaggery and cook till blended."", ""Add in the coconut lowfat milk and cardamom pwdr."", ""Bring to the boil once and remove from the heat."", ""Serve hot."", ""This is a pudding from south India and is made on festive occasions."", ""Serves 4""]",cookeatshare.com/recipes/broken-wheat-pudding-lapsi-kheer-93485,Recipes1M,"[""broken wheat"", ""water"", ""jaggery grated"", ""coconut lowfat milk"", ""pwdr""]"
105541,Boysenberry Tiramisu Recipe,"[""1 pkt Frzn red Boysenberries in, light syrup (10 ounce)"", ""2 sqr semisweet chocolate, (1 ounce)"", ""1 ct whipped cream, cheese (8 ounce)"", ""3 Tbsp. Coffee-flavor liqueur"", ""1 Tbsp. Lowfat milk"", ""1 tsp Vanilla extract"", ""1 1/2 c. Heavy or possibly whipping cream"", ""2/3 c. Vanilla wafers, coarsely Crumble (about 40 cookies) Fresh Boysenberries, garnish""]","[""About 3 hrs before serving or possibly early in day: Thaw frzn Boysenberries as label directs."", ""Meanwhile, grate semi-sweet chocolate."", ""Reserve 1/4 c. grated chocolate for garnish."", ""In large bowl, with wire whisk or possibly fork, beat cream cheese, coffee flavor liqueur, lowfat milk, vanilla extract, and remaining grated chocolate till well blended."", ""In small bowl, with mixer at medium speed, beat heavy or possibly whipping cream and confectioners' sugar till stiff peaks forms."", ""Reserve 2 c. mix for topping."", ""With rubber spatula or possibly wire whisk, fold remaining 1 c. whipped cream mix."", ""Into 8 dessert glasses, place half of crumbled vanilla wafers; top with half of cream mix."", ""Spoon half of thawed Boysenberries with their syrup over cheese mix; top with remaining vanilla wafers, remaining thawed raspberies, then with remaining cheese mix."", ""Spoon reserved whipped cream mix into decorating bag with small rosette tube."", ""Pipe whipped cream around edge of each dessert glass."", ""Sprinkle reserved grated chocolate in center of each dessert."", ""Garnish with fresh raspberies."", ""Chill at least 2 hrs to blend flavor."", ""Makes 8 servings.""]",cookeatshare.com/recipes/boysenberry-tiramisu-90221,Recipes1M,"[""red Boysenberries"", ""chocolate"", ""whipped cream"", ""Coffee-flavor"", ""milk"", ""Vanilla"", ""whipping cream"", ""Vanilla wafers""]"


Now, I am going to test different regex patterns to see if they extract serving sizes. The regex expression below extracts terms like '3 servings', '1 serving'. Removed case sensitivity for this to work properly.

In [54]:
pattern = r'\b(\d+)\s*(serving|servings|serve|serves)\b'

In [55]:
recipe_df_extract_serving = recipe_df_filtered['directions'].str.extract(pattern, flags=re.IGNORECASE, expand = False) # imported re for flag to work to ignore case sensitivity.
recipe_df_extract_serving

Unnamed: 0,0,1
0,6,servings
1,6,servings
2,8,servings
3,8,servings
4,,
...,...,...
105538,,
105539,9,servings
105540,,
105541,8,servings


The regex expression below extracts terms like 'serves 4', 'serve 1'. Removed case sensitivity for this to work properly.

In [57]:
pattern = r'\b(serves|serve|serving|servings)\s*(\d+)\b'

In [58]:
recipe_df_extract_serve = recipe_df_filtered['directions'].str.extract(pattern, flags=re.IGNORECASE, expand=True)
recipe_df_extract_serve

Unnamed: 0,0,1
0,,
1,,
2,,
3,,
4,Serves,2
...,...,...
105538,Serves,8
105539,,
105540,Serves,4
105541,,


Now that I have tested both expressions separately, I am going to combine the regex patterns to extract serving sizes and assign to columns in the dataframe.

In [60]:
pattern = r'\b(\d+)\s*(serving|servings|serve|serves)\b|\b(serves|serve|serving|servings)\s*(\d+)\b'

In [61]:
recipe_df_extract = recipe_df_filtered['directions'].str.extract(pattern, flags=re.IGNORECASE, expand=False)
recipe_df_extract

Unnamed: 0,0,1,2,3
0,6,servings,,
1,6,servings,,
2,8,servings,,
3,8,servings,,
4,,,Serves,2
...,...,...,...,...
105538,,,Serves,8
105539,9,servings,,
105540,,,Serves,4
105541,8,servings,,


In [62]:
recipe_df_filtered.loc[:, 'serving_size'] = recipe_df_extract[0].combine_first(recipe_df_extract[3])
recipe_df_filtered.loc[:, 'serving_term'] = recipe_df_extract[1].combine_first(recipe_df_extract[2])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recipe_df_filtered.loc[:, 'serving_size'] = recipe_df_extract[0].combine_first(recipe_df_extract[3])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recipe_df_filtered.loc[:, 'serving_term'] = recipe_df_extract[1].combine_first(recipe_df_extract[2])


In [63]:
recipe_df_filtered.head(1)

Unnamed: 0,title,ingredients,directions,link,source,NER,serving_size,serving_term
0,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg. cream cheese, cubed"", ""1/3 c. butter, cubed"", ""1/2 tsp. garlic powder"", ""1/2 tsp. salt"", ""1/4 tsp. pepper""]","[""In a slow cooker, combine all ingredients. Cover and cook on low for 4 hours or until heated through and cheese is melted. Stir well before serving. Yields 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""garlic powder"", ""salt"", ""pepper""]",6,servings


In [64]:
recipe_df_filtered.shape

(105543, 8)

The regex patterns succesfully extracted the serving sizes.

##### 4.1.1.1 Measure Conversion: dozen to single units <a name="dozen"></a>

Now I want to work on the recipes that contain 'dozen' as their unit of measure from the recipe_df_filtered dataframe, so that the serving sizes reflect the same unit of measure.

In [68]:
pattern = r'\b(serve|serves)\s*(\d+)\s*(dozen|dozens)\b'

In [69]:
recipe_df_extract_dozen = recipe_df_filtered['directions'].str.extract(pattern, flags=re.IGNORECASE, expand=False)
recipe_df_extract_dozen

Unnamed: 0,0,1,2
0,,,
1,,,
2,,,
3,,,
4,Serves,2,dozen
...,...,...,...
105538,,,
105539,,,
105540,,,
105541,,,


In [70]:
recipe_df_filtered.loc[:, 'dozen'] = recipe_df_extract_dozen[2]
recipe_df_filtered

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recipe_df_filtered.loc[:, 'dozen'] = recipe_df_extract_dozen[2]


Unnamed: 0,title,ingredients,directions,link,source,NER,serving_size,serving_term,dozen
0,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg. cream cheese, cubed"", ""1/3 c. butter, cubed"", ""1/2 tsp. garlic powder"", ""1/2 tsp. salt"", ""1/4 tsp. pepper""]","[""In a slow cooker, combine all ingredients. Cover and cook on low for 4 hours or until heated through and cheese is melted. Stir well before serving. Yields 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""garlic powder"", ""salt"", ""pepper""]",6,servings,
1,Broccoli Salad,"[""1 large head broccoli (about 1 1/2 lb.)"", ""10 slices bacon, cooked and crumbled"", ""5 green onions, sliced or 1/4 c. chopped red onion"", ""1/2 c. raisins"", ""1 c. mayonnaise"", ""2 Tbsp. vinegar"", ""1/4 c. sugar""]","[""Trim off large leaves of broccoli and remove the tough ends of lower stalks. Wash the broccoli thoroughly. Cut the florets and stems into bite-size pieces. Place in a large bowl. Add bacon, onions and raisins. Combine remaining ingredients, stirring well. Add dressing to broccoli mixture and toss gently. Cover and refrigerate 2 to 3 hours. Makes about 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=50992,Gathered,"[""broccoli"", ""bacon"", ""green onions"", ""raisins"", ""mayonnaise"", ""vinegar"", ""sugar""]",6,servings,
2,Prize-Winning Meat Loaf,"[""1 1/2 lb. ground beef"", ""1 c. tomato juice"", ""3/4 c. oats (uncooked)"", ""1 egg, beaten"", ""1/4 c. chopped onion"", ""1/4 tsp. pepper"", ""1 1/2 tsp. salt""]","[""Mix well."", ""Press firmly into an 8 1/2 x 4 1/2 x 2 1/2-inch loaf pan."", ""Bake in preheated moderate oven."", ""Bake at 350\u00b0 for 1 hour."", ""Let stand 5 minutes before slicing."", ""Makes 8 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=923674,Gathered,"[""ground beef"", ""tomato juice"", ""oats"", ""egg"", ""onion"", ""pepper"", ""salt""]",8,servings,
3,Corral Barbecued Beef Steak Strips,"[""2 lb. round steak 1/2 to 3/4-inch thick, sliced in strips 1/8-inch thick (or thinner) and 3 1/2 to 4-inches long (easily sliced if partially frozen)"", ""2 Tbsp. cooking oil"", ""1 can (15 oz.) tomato sauce"", ""1/3 c. water"", ""2 Tbsp. brown sugar"", ""1 Tbsp. prepared mustard"", ""1 tbsp. Worcestershire sauce"", ""1 medium sized onion, thinly sliced""]","[""Brown strips in cooking oil."", ""Pour off drippings."", ""Combine tomato sauce, water, brown sugar, mustard and Worcestershire sauce."", ""Add sauce and onion to meat slices."", ""Cover and cook slowly, stirring occasionally 30 minutes or until meat is tender. Serve over rice or buttered noodles."", ""Yields 6 to 8 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=420402,Gathered,"[""long"", ""cooking oil"", ""tomato sauce"", ""water"", ""brown sugar"", ""mustard"", ""Worcestershire sauce"", ""onion""]",8,servings,
4,Mexican Cookie Rings,"[""1 1/2 c. sifted flour"", ""1/2 tsp. baking powder"", ""1/2 tsp. salt"", ""1/2 c. butter"", ""2/3 c. sugar"", ""3 egg yolks"", ""1 tsp. vanilla"", ""multi-colored candies""]","[""Sift flour, baking powder and salt together."", ""Cream together butter and sugar."", ""Add egg yolks and vanilla."", ""Beat until light and fluffy."", ""Mix in sifted dry ingredients."", ""Shape into 1-inch balls."", ""Push wooden spoon handle through center (twist)."", ""Shape into rings."", ""Dip each cookie into candies."", ""Place on lightly greased baking sheets."", ""Bake in 375\u00b0 oven for 10 to 12 minutes or until golden brown."", ""Cool on racks."", ""Serves 2 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=364136,Gathered,"[""flour"", ""baking powder"", ""salt"", ""butter"", ""sugar"", ""egg yolks"", ""vanilla"", ""multi-colored candies""]",2,Serves,dozen
...,...,...,...,...,...,...,...,...,...
105538,Chicken Stuffing Mix Recipe,"[""1 (8 ounce.) stuffing mix & 4 slices bread"", ""1/2 c. butter, melted"", ""1 c. chicken broth"", ""2 1/2 c. chicken, diced"", ""1 c. onion, minced"", ""1/2 c. celery, minced"", ""1/2 c. salad dressing"", ""3/4 teaspoon salt"", ""2 Large eggs"", ""1 1/2 c. lowfat milk"", ""1 can cream of mushroom soup"", ""1 c. Cheddar cheese, shredded""]","[""Mix first 3 ingredients."", ""Put in 9 x 13 inch pan."", ""Combine chicken, onion, celery, salad dressing and salt."", ""Spread on top of dressing."", ""Save a little for top."", ""Mix Large eggs and lowfat milk, pour over."", ""Cover with foil and chill at least 6 hrs."", ""Before baking, take the cream of mushroom soup and spread over stuffing."", ""Bake at 325 degrees for 40 min."", ""Sprinkle with 1 c. Cheddar cheese."", ""Bake 10 min more."", ""Cut into squares."", ""Serves 8.""]",cookeatshare.com/recipes/chicken-stuffing-mix-38487,Recipes1M,"[""stuffing mix"", ""butter"", ""chicken broth"", ""chicken"", ""onion"", ""celery"", ""salad dressing"", ""salt"", ""eggs"", ""milk"", ""cream of mushroom soup"", ""Cheddar cheese""]",8,Serves,
105539,Blackberry Upside Down Cake Recipe,"[""1/2 stk margarine or possibly butter"", ""1/4 c. Sugar"", ""1 1/2 c. Blackberries"", ""2 Tbsp. Sliced almonds"", ""1 1/2 c. Bisquick original baking mix"", ""1/2 c. Sugar"", ""1/2 c. Lowfat milk or possibly water"", ""2 Tbsp. Vegetable oil"", ""1/2 tsp Vanilla"", ""1/2 tsp Almond extract"", ""1 x Egg Sweetened whipped cream or possibly ice cream, if you like""]","[""HEAT oven to 350 degrees."", ""Heat margarine in round pan, 9x1-1/2 inches, or possibly square pan, 8x8x2 inches, in oven till melted."", ""Sprinkle 1/4 c. sugar proportionately over melted margarine."", ""Arrange Blackberries with open ends up over sugar mix; sprinkle with almonds."", ""BEAT remaining ingredients except whipped cream in medium bowl on low speed 30 seconds, scraping bowl constantly."", ""Beat on medium speed 4 min, scraping bowl occasionally."", ""Pour batter over Blackberries."", ""BAKE 35 to 40 min or possibly till toothpick inserted in center comes out clean."", ""Immediately invert pan onto heatproof serving plate; leave pan over cake a few min."", ""Remove pan."", ""Let cake stand at least 10 min before serving."", ""Serve hot with whipped cream."", ""9 servings."", ""Pear Upside-down Cake: Substitute packed brown sugar for the sugar and 1 large pear, thinly sliced, for the Blackberries."", ""Substitute minced pecans for the almonds."", ""Increase vanilla to 1 tsp."", ""; omit almond extract."", ""Add in 1/2 tsp."", ""grnd mace or possibly cinnamon with the vanilla.""]",cookeatshare.com/recipes/blackberry-upside-down-cake-86757,Recipes1M,"[""margarine"", ""Sugar"", ""Blackberries"", ""almonds"", ""Bisquick original baking mix"", ""Sugar"", ""milk"", ""Vegetable oil"", ""Vanilla"", ""Egg""]",9,servings,
105540,Broken Wheat Pudding ( Lapsi Kheer ) Recipe,"[""150 gm broken wheat"", ""300 ml water"", ""4 Tbsp. jaggery grated"", ""300 ml coconut lowfat milk"", ""1 tsp cardamom pwdr""]","[""(To make coconut lowfat milk finely grate a fresh coconut add in 150ml of warm water and squeeze to extract the thick lowfat milk."", ""You should get roughly 300ml of coconut lowfat milk from a coconut.)"", ""Cook the broken wheat in the water till soft."", ""Stir in the jaggery and cook till blended."", ""Add in the coconut lowfat milk and cardamom pwdr."", ""Bring to the boil once and remove from the heat."", ""Serve hot."", ""This is a pudding from south India and is made on festive occasions."", ""Serves 4""]",cookeatshare.com/recipes/broken-wheat-pudding-lapsi-kheer-93485,Recipes1M,"[""broken wheat"", ""water"", ""jaggery grated"", ""coconut lowfat milk"", ""pwdr""]",4,Serves,
105541,Boysenberry Tiramisu Recipe,"[""1 pkt Frzn red Boysenberries in, light syrup (10 ounce)"", ""2 sqr semisweet chocolate, (1 ounce)"", ""1 ct whipped cream, cheese (8 ounce)"", ""3 Tbsp. Coffee-flavor liqueur"", ""1 Tbsp. Lowfat milk"", ""1 tsp Vanilla extract"", ""1 1/2 c. Heavy or possibly whipping cream"", ""2/3 c. Vanilla wafers, coarsely Crumble (about 40 cookies) Fresh Boysenberries, garnish""]","[""About 3 hrs before serving or possibly early in day: Thaw frzn Boysenberries as label directs."", ""Meanwhile, grate semi-sweet chocolate."", ""Reserve 1/4 c. grated chocolate for garnish."", ""In large bowl, with wire whisk or possibly fork, beat cream cheese, coffee flavor liqueur, lowfat milk, vanilla extract, and remaining grated chocolate till well blended."", ""In small bowl, with mixer at medium speed, beat heavy or possibly whipping cream and confectioners' sugar till stiff peaks forms."", ""Reserve 2 c. mix for topping."", ""With rubber spatula or possibly wire whisk, fold remaining 1 c. whipped cream mix."", ""Into 8 dessert glasses, place half of crumbled vanilla wafers; top with half of cream mix."", ""Spoon half of thawed Boysenberries with their syrup over cheese mix; top with remaining vanilla wafers, remaining thawed raspberies, then with remaining cheese mix."", ""Spoon reserved whipped cream mix into decorating bag with small rosette tube."", ""Pipe whipped cream around edge of each dessert glass."", ""Sprinkle reserved grated chocolate in center of each dessert."", ""Garnish with fresh raspberies."", ""Chill at least 2 hrs to blend flavor."", ""Makes 8 servings.""]",cookeatshare.com/recipes/boysenberry-tiramisu-90221,Recipes1M,"[""red Boysenberries"", ""chocolate"", ""whipped cream"", ""Coffee-flavor"", ""milk"", ""Vanilla"", ""whipping cream"", ""Vanilla wafers""]",8,servings,


Checking how many recipes actually had 'dozen' as their serving size measure. Ultimately I want to convert the serving size on these recipes to a single unit of measure so that it matches the measure of the rest of the recipes.

In [72]:
recipe_df_filtered[recipe_df_filtered['dozen'] == 'dozen']

Unnamed: 0,title,ingredients,directions,link,source,NER,serving_size,serving_term,dozen
4,Mexican Cookie Rings,"[""1 1/2 c. sifted flour"", ""1/2 tsp. baking powder"", ""1/2 tsp. salt"", ""1/2 c. butter"", ""2/3 c. sugar"", ""3 egg yolks"", ""1 tsp. vanilla"", ""multi-colored candies""]","[""Sift flour, baking powder and salt together."", ""Cream together butter and sugar."", ""Add egg yolks and vanilla."", ""Beat until light and fluffy."", ""Mix in sifted dry ingredients."", ""Shape into 1-inch balls."", ""Push wooden spoon handle through center (twist)."", ""Shape into rings."", ""Dip each cookie into candies."", ""Place on lightly greased baking sheets."", ""Bake in 375\u00b0 oven for 10 to 12 minutes or until golden brown."", ""Cool on racks."", ""Serves 2 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=364136,Gathered,"[""flour"", ""baking powder"", ""salt"", ""butter"", ""sugar"", ""egg yolks"", ""vanilla"", ""multi-colored candies""]",2,Serves,dozen
4770,Simplified Oat Bran Muffins,"[""2 1/2 c. oat bran"", ""1 Tbsp. baking powder"", ""1/4 c. sugar or maple syrup"", ""2 Tbsp. almonds"", ""1/2 c. raisins or blueberries"", ""1/4 c. coconut"", ""1 1/4 c. nonfat milk"", ""2 egg whites"", ""2 large overripe bananas or 1 c. pineapple""]","[""Mix dry ingredients with wet."", ""Bake at 450\u00b0 for 15 minutes. Serves 1 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=741372,Gathered,"[""bran"", ""baking powder"", ""sugar"", ""almonds"", ""raisins"", ""coconut"", ""nonfat milk"", ""egg whites"", ""overripe bananas""]",1,Serves,dozen
20235,Refrigerator Cookies,"[""1/2 c. shortening (part butter)"", ""1 c. brown sugar"", ""1 egg"", ""1/2 tsp. vanilla"", ""1 3/4 c. flour"", ""1/2 tsp. soda"", ""1/4 tsp. salt""]","[""Mix shortening, sugar, eggs and vanilla."", ""Blend in flour, soda and salt."", ""Add nuts, if desired."", ""Mix well."", ""Form into rolls and refrigerate for 3 to 6 hours."", ""Slice and bake at 400\u00b0 for 8 to 10 minutes."", ""Serves 4 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=716034,Gathered,"[""shortening"", ""brown sugar"", ""egg"", ""vanilla"", ""flour"", ""soda"", ""salt""]",4,Serves,dozen
21941,Snickerdoodles,"[""1 c. soft shortening"", ""1 1/2 c. sugar"", ""2 eggs"", ""2 3/4 c. flour"", ""2 tsp. cream of tartar"", ""1 tsp. soda"", ""1/2 tsp. salt"", ""2 Tbsp. cinnamon""]","[""Cream shortening, sugar and eggs."", ""Sift together flour, cream of tartar, soda and salt."", ""Stir into creamed mixture."", ""Roll into balls the size of small walnuts."", ""Roll in mixture of 2 tablespoons sugar and 2 tablespoons cinnamon."", ""Place about 2 inches apart on ungreased cookie sheet."", ""Bake 8 to 10 minutes at 325\u00b0 until lightly browned but soft."", ""These cookies puff up at first, then flatten out with crinkled tops."", ""Serves 5 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=463255,Gathered,"[""shortening"", ""sugar"", ""eggs"", ""flour"", ""cream of tartar"", ""soda"", ""salt"", ""cinnamon""]",5,Serves,dozen
23048,Oatmeal Raisin Chocolate Chip Cookies,"[""3/4 c. oil"", ""1 c. granulated sugar"", ""1 c. brown sugar"", ""2 eggs"", ""1 tsp. vanilla"", ""2 1/3 c. flour"", ""1 tsp. baking soda"", ""1/2 tsp. baking powder"", ""1/2 tsp. salt"", ""2 c. oats (instant or old fashioned)"", ""1/2 c. chocolate chips"", ""1/2 c. raisins""]","[""Cream oil and sugars."", ""Add eggs and vanilla."", ""Beat in flour, baking soda, baking powder and salt."", ""Mix in oats, chocolate chips and raisins."", ""Drop by teaspoons onto a greased cookie sheet."", ""Bake at 350\u00b0 for 8 to 10 minutes."", ""Serves 5 dozen."", ""Kids love em!""]",www.cookbooks.com/Recipe-Details.aspx?id=770459,Gathered,"[""oil"", ""sugar"", ""brown sugar"", ""eggs"", ""vanilla"", ""flour"", ""baking soda"", ""baking powder"", ""salt"", ""oats"", ""chocolate chips"", ""raisins""]",5,Serves,dozen
34627,Yeast Roll Recipe,"[""2 c. warm water"", ""2 pkg. yeast"", ""3/4 c. sugar"", ""1 tsp. salt"", ""3/4 c. liquid shortening"", ""1 egg, slightly beaten"", ""6 c. flour, sifted""]","[""Mix water and yeast."", ""Add sugar, salt, shortening and egg. Add flour; let rise until double and stir down."", ""Make into rolls. Let rise until double."", ""Bake at 375\u00b0 for 10 to 12 minutes."", ""Serves 2 dozen rolls.""]",www.cookbooks.com/Recipe-Details.aspx?id=496935,Gathered,"[""water"", ""yeast"", ""sugar"", ""salt"", ""liquid shortening"", ""egg"", ""flour""]",2,Serves,dozen
35293,Chocolate Oatmeal Cookies,"[""1 c. sugar"", ""1/2 c. butter"", ""1/2 c. milk"", ""4 tsp. cocoa"", ""1 c. peanut butter"", ""3 c. oatmeal"", ""2 tsp. vanilla""]","[""Mix sugar, butter, milk and cocoa into a pot."", ""Bring to a boil for one minute."", ""Add peanut butter; stir until melted."", ""Add oats and vanilla."", ""Mix well."", ""Spoon droplets onto wax paper."", ""Let harden and serve."", ""Serves 3 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=745276,Gathered,"[""sugar"", ""butter"", ""milk"", ""cocoa"", ""peanut butter"", ""oatmeal"", ""vanilla""]",3,Serves,dozen
40413,Blonde Brownies,"[""2/3 c. shortening"", ""1 lb. brown sugar (2 1/4 c.)"", ""3 eggs"", ""2 3/4 c. sifted flour"", ""2 1/2 tsp. baking powder"", ""1/2 tsp. salt"", ""1 c. nutmeats"", ""1 pkg. chocolate chips""]","[""Melt the shortening."", ""Add brown sugar; cool slightly."", ""Add the eggs, one at a time, beating well after each addition."", ""Add flour, baking powder and salt, then add nutmeats and chocolate chips. Bake 20 minutes at 350\u00b0."", ""Serves 3 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=362310,Gathered,"[""shortening"", ""brown sugar"", ""eggs"", ""flour"", ""baking powder"", ""salt"", ""nutmeats"", ""chocolate chips""]",3,Serves,dozen
42829,Angel Biscuits,"[""1 pkg. yeast"", ""1/8 c. warm water"", ""5 c. self-rising flour"", ""1/4 c. sugar"", ""1 c. shortening"", ""2 c. buttermilk""]","[""Dissolve yeast in water."", ""Sift flour and sugar together."", ""Cut in shortening."", ""Stir in yeast mixture and buttermilk."", ""Roll out on floured board and cut into biscuits."", ""Brush with melted butter or cooking oil, if desired."", ""Dough will keep in refrigerator for few days."", ""Serves 4 dozen biscuits.""]",www.cookbooks.com/Recipe-Details.aspx?id=553630,Gathered,"[""yeast"", ""water"", ""flour"", ""sugar"", ""shortening"", ""buttermilk""]",4,Serves,dozen
44790,Sugar Cookies,"[""3/4 c. shortening (part butter)"", ""1 c. sugar"", ""2 eggs"", ""1 tsp. vanilla"", ""2 1/2 c. flour"", ""1 tsp. baking powder"", ""1 tsp. salt""]","[""Preheat oven to 400\u00b0."", ""Mix thoroughly the softened shortening, sugar, eggs and vanilla. Blend in flour, baking powder and salt. Cover and chill 1 hour."", ""Roll dough 1/4-inch thick and cut."", ""Bake on ungreased cookie sheet for 6 to 8 minutes at 400\u00b0."", ""Serves 3 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=646737,Gathered,"[""shortening"", ""sugar"", ""eggs"", ""vanilla"", ""flour"", ""baking powder"", ""salt""]",3,Serves,dozen


Before converting the recipes with a 'dozen' unit of measure, I want to make sure that the `serving_size` column is of datatype 'integer' so I can perform  calculations with it.

In [74]:
recipe_df_filtered.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105543 entries, 0 to 105542
Data columns (total 9 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   title         105543 non-null  object
 1   ingredients   105543 non-null  object
 2   directions    105543 non-null  object
 3   link          105543 non-null  object
 4   source        105543 non-null  object
 5   NER           105543 non-null  object
 6   serving_size  105543 non-null  object
 7   serving_term  105543 non-null  object
 8   dozen         19 non-null      object
dtypes: object(9)
memory usage: 7.2+ MB


In [75]:
recipe_df_filtered['serving_size'] = pd.to_numeric(recipe_df_filtered['serving_size'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recipe_df_filtered['serving_size'] = pd.to_numeric(recipe_df_filtered['serving_size'])


In [76]:
recipe_df_filtered.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105543 entries, 0 to 105542
Data columns (total 9 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   title         105543 non-null  object
 1   ingredients   105543 non-null  object
 2   directions    105543 non-null  object
 3   link          105543 non-null  object
 4   source        105543 non-null  object
 5   NER           105543 non-null  object
 6   serving_size  105543 non-null  int64 
 7   serving_term  105543 non-null  object
 8   dozen         19 non-null      object
dtypes: int64(1), object(8)
memory usage: 7.2+ MB


I have converted the data type for the `serving_size` column to integer, so I can update this column now by calculating the serving size of recipes which have a 'dozen' as a unit of measure by 12.

In [78]:
recipe_df_filtered.loc[recipe_df_filtered['dozen']=='dozen', 'serving_size']*=12

In [79]:
recipe_df_filtered[recipe_df_filtered['dozen'] == 'dozen']

Unnamed: 0,title,ingredients,directions,link,source,NER,serving_size,serving_term,dozen
4,Mexican Cookie Rings,"[""1 1/2 c. sifted flour"", ""1/2 tsp. baking powder"", ""1/2 tsp. salt"", ""1/2 c. butter"", ""2/3 c. sugar"", ""3 egg yolks"", ""1 tsp. vanilla"", ""multi-colored candies""]","[""Sift flour, baking powder and salt together."", ""Cream together butter and sugar."", ""Add egg yolks and vanilla."", ""Beat until light and fluffy."", ""Mix in sifted dry ingredients."", ""Shape into 1-inch balls."", ""Push wooden spoon handle through center (twist)."", ""Shape into rings."", ""Dip each cookie into candies."", ""Place on lightly greased baking sheets."", ""Bake in 375\u00b0 oven for 10 to 12 minutes or until golden brown."", ""Cool on racks."", ""Serves 2 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=364136,Gathered,"[""flour"", ""baking powder"", ""salt"", ""butter"", ""sugar"", ""egg yolks"", ""vanilla"", ""multi-colored candies""]",24,Serves,dozen
4770,Simplified Oat Bran Muffins,"[""2 1/2 c. oat bran"", ""1 Tbsp. baking powder"", ""1/4 c. sugar or maple syrup"", ""2 Tbsp. almonds"", ""1/2 c. raisins or blueberries"", ""1/4 c. coconut"", ""1 1/4 c. nonfat milk"", ""2 egg whites"", ""2 large overripe bananas or 1 c. pineapple""]","[""Mix dry ingredients with wet."", ""Bake at 450\u00b0 for 15 minutes. Serves 1 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=741372,Gathered,"[""bran"", ""baking powder"", ""sugar"", ""almonds"", ""raisins"", ""coconut"", ""nonfat milk"", ""egg whites"", ""overripe bananas""]",12,Serves,dozen
20235,Refrigerator Cookies,"[""1/2 c. shortening (part butter)"", ""1 c. brown sugar"", ""1 egg"", ""1/2 tsp. vanilla"", ""1 3/4 c. flour"", ""1/2 tsp. soda"", ""1/4 tsp. salt""]","[""Mix shortening, sugar, eggs and vanilla."", ""Blend in flour, soda and salt."", ""Add nuts, if desired."", ""Mix well."", ""Form into rolls and refrigerate for 3 to 6 hours."", ""Slice and bake at 400\u00b0 for 8 to 10 minutes."", ""Serves 4 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=716034,Gathered,"[""shortening"", ""brown sugar"", ""egg"", ""vanilla"", ""flour"", ""soda"", ""salt""]",48,Serves,dozen
21941,Snickerdoodles,"[""1 c. soft shortening"", ""1 1/2 c. sugar"", ""2 eggs"", ""2 3/4 c. flour"", ""2 tsp. cream of tartar"", ""1 tsp. soda"", ""1/2 tsp. salt"", ""2 Tbsp. cinnamon""]","[""Cream shortening, sugar and eggs."", ""Sift together flour, cream of tartar, soda and salt."", ""Stir into creamed mixture."", ""Roll into balls the size of small walnuts."", ""Roll in mixture of 2 tablespoons sugar and 2 tablespoons cinnamon."", ""Place about 2 inches apart on ungreased cookie sheet."", ""Bake 8 to 10 minutes at 325\u00b0 until lightly browned but soft."", ""These cookies puff up at first, then flatten out with crinkled tops."", ""Serves 5 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=463255,Gathered,"[""shortening"", ""sugar"", ""eggs"", ""flour"", ""cream of tartar"", ""soda"", ""salt"", ""cinnamon""]",60,Serves,dozen
23048,Oatmeal Raisin Chocolate Chip Cookies,"[""3/4 c. oil"", ""1 c. granulated sugar"", ""1 c. brown sugar"", ""2 eggs"", ""1 tsp. vanilla"", ""2 1/3 c. flour"", ""1 tsp. baking soda"", ""1/2 tsp. baking powder"", ""1/2 tsp. salt"", ""2 c. oats (instant or old fashioned)"", ""1/2 c. chocolate chips"", ""1/2 c. raisins""]","[""Cream oil and sugars."", ""Add eggs and vanilla."", ""Beat in flour, baking soda, baking powder and salt."", ""Mix in oats, chocolate chips and raisins."", ""Drop by teaspoons onto a greased cookie sheet."", ""Bake at 350\u00b0 for 8 to 10 minutes."", ""Serves 5 dozen."", ""Kids love em!""]",www.cookbooks.com/Recipe-Details.aspx?id=770459,Gathered,"[""oil"", ""sugar"", ""brown sugar"", ""eggs"", ""vanilla"", ""flour"", ""baking soda"", ""baking powder"", ""salt"", ""oats"", ""chocolate chips"", ""raisins""]",60,Serves,dozen
34627,Yeast Roll Recipe,"[""2 c. warm water"", ""2 pkg. yeast"", ""3/4 c. sugar"", ""1 tsp. salt"", ""3/4 c. liquid shortening"", ""1 egg, slightly beaten"", ""6 c. flour, sifted""]","[""Mix water and yeast."", ""Add sugar, salt, shortening and egg. Add flour; let rise until double and stir down."", ""Make into rolls. Let rise until double."", ""Bake at 375\u00b0 for 10 to 12 minutes."", ""Serves 2 dozen rolls.""]",www.cookbooks.com/Recipe-Details.aspx?id=496935,Gathered,"[""water"", ""yeast"", ""sugar"", ""salt"", ""liquid shortening"", ""egg"", ""flour""]",24,Serves,dozen
35293,Chocolate Oatmeal Cookies,"[""1 c. sugar"", ""1/2 c. butter"", ""1/2 c. milk"", ""4 tsp. cocoa"", ""1 c. peanut butter"", ""3 c. oatmeal"", ""2 tsp. vanilla""]","[""Mix sugar, butter, milk and cocoa into a pot."", ""Bring to a boil for one minute."", ""Add peanut butter; stir until melted."", ""Add oats and vanilla."", ""Mix well."", ""Spoon droplets onto wax paper."", ""Let harden and serve."", ""Serves 3 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=745276,Gathered,"[""sugar"", ""butter"", ""milk"", ""cocoa"", ""peanut butter"", ""oatmeal"", ""vanilla""]",36,Serves,dozen
40413,Blonde Brownies,"[""2/3 c. shortening"", ""1 lb. brown sugar (2 1/4 c.)"", ""3 eggs"", ""2 3/4 c. sifted flour"", ""2 1/2 tsp. baking powder"", ""1/2 tsp. salt"", ""1 c. nutmeats"", ""1 pkg. chocolate chips""]","[""Melt the shortening."", ""Add brown sugar; cool slightly."", ""Add the eggs, one at a time, beating well after each addition."", ""Add flour, baking powder and salt, then add nutmeats and chocolate chips. Bake 20 minutes at 350\u00b0."", ""Serves 3 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=362310,Gathered,"[""shortening"", ""brown sugar"", ""eggs"", ""flour"", ""baking powder"", ""salt"", ""nutmeats"", ""chocolate chips""]",36,Serves,dozen
42829,Angel Biscuits,"[""1 pkg. yeast"", ""1/8 c. warm water"", ""5 c. self-rising flour"", ""1/4 c. sugar"", ""1 c. shortening"", ""2 c. buttermilk""]","[""Dissolve yeast in water."", ""Sift flour and sugar together."", ""Cut in shortening."", ""Stir in yeast mixture and buttermilk."", ""Roll out on floured board and cut into biscuits."", ""Brush with melted butter or cooking oil, if desired."", ""Dough will keep in refrigerator for few days."", ""Serves 4 dozen biscuits.""]",www.cookbooks.com/Recipe-Details.aspx?id=553630,Gathered,"[""yeast"", ""water"", ""flour"", ""sugar"", ""shortening"", ""buttermilk""]",48,Serves,dozen
44790,Sugar Cookies,"[""3/4 c. shortening (part butter)"", ""1 c. sugar"", ""2 eggs"", ""1 tsp. vanilla"", ""2 1/2 c. flour"", ""1 tsp. baking powder"", ""1 tsp. salt""]","[""Preheat oven to 400\u00b0."", ""Mix thoroughly the softened shortening, sugar, eggs and vanilla. Blend in flour, baking powder and salt. Cover and chill 1 hour."", ""Roll dough 1/4-inch thick and cut."", ""Bake on ungreased cookie sheet for 6 to 8 minutes at 400\u00b0."", ""Serves 3 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=646737,Gathered,"[""shortening"", ""sugar"", ""eggs"", ""vanilla"", ""flour"", ""baking powder"", ""salt""]",36,Serves,dozen


Here we can see the `serving_size` was succesfully updated on the recipes that had a unit of measure of a dozen in the dataframe.

Now that I have retrieved and transformed the `serving_size` column to the same unit of measure, columns `serving_term` and `dozen` are no longer necessary, so I will drop them now.

In [82]:
recipe_df_filtered=recipe_df_filtered.drop(columns=['serving_term', 'dozen'])

In [83]:
recipe_df_filtered

Unnamed: 0,title,ingredients,directions,link,source,NER,serving_size
0,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg. cream cheese, cubed"", ""1/3 c. butter, cubed"", ""1/2 tsp. garlic powder"", ""1/2 tsp. salt"", ""1/4 tsp. pepper""]","[""In a slow cooker, combine all ingredients. Cover and cook on low for 4 hours or until heated through and cheese is melted. Stir well before serving. Yields 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""garlic powder"", ""salt"", ""pepper""]",6
1,Broccoli Salad,"[""1 large head broccoli (about 1 1/2 lb.)"", ""10 slices bacon, cooked and crumbled"", ""5 green onions, sliced or 1/4 c. chopped red onion"", ""1/2 c. raisins"", ""1 c. mayonnaise"", ""2 Tbsp. vinegar"", ""1/4 c. sugar""]","[""Trim off large leaves of broccoli and remove the tough ends of lower stalks. Wash the broccoli thoroughly. Cut the florets and stems into bite-size pieces. Place in a large bowl. Add bacon, onions and raisins. Combine remaining ingredients, stirring well. Add dressing to broccoli mixture and toss gently. Cover and refrigerate 2 to 3 hours. Makes about 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=50992,Gathered,"[""broccoli"", ""bacon"", ""green onions"", ""raisins"", ""mayonnaise"", ""vinegar"", ""sugar""]",6
2,Prize-Winning Meat Loaf,"[""1 1/2 lb. ground beef"", ""1 c. tomato juice"", ""3/4 c. oats (uncooked)"", ""1 egg, beaten"", ""1/4 c. chopped onion"", ""1/4 tsp. pepper"", ""1 1/2 tsp. salt""]","[""Mix well."", ""Press firmly into an 8 1/2 x 4 1/2 x 2 1/2-inch loaf pan."", ""Bake in preheated moderate oven."", ""Bake at 350\u00b0 for 1 hour."", ""Let stand 5 minutes before slicing."", ""Makes 8 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=923674,Gathered,"[""ground beef"", ""tomato juice"", ""oats"", ""egg"", ""onion"", ""pepper"", ""salt""]",8
3,Corral Barbecued Beef Steak Strips,"[""2 lb. round steak 1/2 to 3/4-inch thick, sliced in strips 1/8-inch thick (or thinner) and 3 1/2 to 4-inches long (easily sliced if partially frozen)"", ""2 Tbsp. cooking oil"", ""1 can (15 oz.) tomato sauce"", ""1/3 c. water"", ""2 Tbsp. brown sugar"", ""1 Tbsp. prepared mustard"", ""1 tbsp. Worcestershire sauce"", ""1 medium sized onion, thinly sliced""]","[""Brown strips in cooking oil."", ""Pour off drippings."", ""Combine tomato sauce, water, brown sugar, mustard and Worcestershire sauce."", ""Add sauce and onion to meat slices."", ""Cover and cook slowly, stirring occasionally 30 minutes or until meat is tender. Serve over rice or buttered noodles."", ""Yields 6 to 8 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=420402,Gathered,"[""long"", ""cooking oil"", ""tomato sauce"", ""water"", ""brown sugar"", ""mustard"", ""Worcestershire sauce"", ""onion""]",8
4,Mexican Cookie Rings,"[""1 1/2 c. sifted flour"", ""1/2 tsp. baking powder"", ""1/2 tsp. salt"", ""1/2 c. butter"", ""2/3 c. sugar"", ""3 egg yolks"", ""1 tsp. vanilla"", ""multi-colored candies""]","[""Sift flour, baking powder and salt together."", ""Cream together butter and sugar."", ""Add egg yolks and vanilla."", ""Beat until light and fluffy."", ""Mix in sifted dry ingredients."", ""Shape into 1-inch balls."", ""Push wooden spoon handle through center (twist)."", ""Shape into rings."", ""Dip each cookie into candies."", ""Place on lightly greased baking sheets."", ""Bake in 375\u00b0 oven for 10 to 12 minutes or until golden brown."", ""Cool on racks."", ""Serves 2 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=364136,Gathered,"[""flour"", ""baking powder"", ""salt"", ""butter"", ""sugar"", ""egg yolks"", ""vanilla"", ""multi-colored candies""]",24
...,...,...,...,...,...,...,...
105538,Chicken Stuffing Mix Recipe,"[""1 (8 ounce.) stuffing mix & 4 slices bread"", ""1/2 c. butter, melted"", ""1 c. chicken broth"", ""2 1/2 c. chicken, diced"", ""1 c. onion, minced"", ""1/2 c. celery, minced"", ""1/2 c. salad dressing"", ""3/4 teaspoon salt"", ""2 Large eggs"", ""1 1/2 c. lowfat milk"", ""1 can cream of mushroom soup"", ""1 c. Cheddar cheese, shredded""]","[""Mix first 3 ingredients."", ""Put in 9 x 13 inch pan."", ""Combine chicken, onion, celery, salad dressing and salt."", ""Spread on top of dressing."", ""Save a little for top."", ""Mix Large eggs and lowfat milk, pour over."", ""Cover with foil and chill at least 6 hrs."", ""Before baking, take the cream of mushroom soup and spread over stuffing."", ""Bake at 325 degrees for 40 min."", ""Sprinkle with 1 c. Cheddar cheese."", ""Bake 10 min more."", ""Cut into squares."", ""Serves 8.""]",cookeatshare.com/recipes/chicken-stuffing-mix-38487,Recipes1M,"[""stuffing mix"", ""butter"", ""chicken broth"", ""chicken"", ""onion"", ""celery"", ""salad dressing"", ""salt"", ""eggs"", ""milk"", ""cream of mushroom soup"", ""Cheddar cheese""]",8
105539,Blackberry Upside Down Cake Recipe,"[""1/2 stk margarine or possibly butter"", ""1/4 c. Sugar"", ""1 1/2 c. Blackberries"", ""2 Tbsp. Sliced almonds"", ""1 1/2 c. Bisquick original baking mix"", ""1/2 c. Sugar"", ""1/2 c. Lowfat milk or possibly water"", ""2 Tbsp. Vegetable oil"", ""1/2 tsp Vanilla"", ""1/2 tsp Almond extract"", ""1 x Egg Sweetened whipped cream or possibly ice cream, if you like""]","[""HEAT oven to 350 degrees."", ""Heat margarine in round pan, 9x1-1/2 inches, or possibly square pan, 8x8x2 inches, in oven till melted."", ""Sprinkle 1/4 c. sugar proportionately over melted margarine."", ""Arrange Blackberries with open ends up over sugar mix; sprinkle with almonds."", ""BEAT remaining ingredients except whipped cream in medium bowl on low speed 30 seconds, scraping bowl constantly."", ""Beat on medium speed 4 min, scraping bowl occasionally."", ""Pour batter over Blackberries."", ""BAKE 35 to 40 min or possibly till toothpick inserted in center comes out clean."", ""Immediately invert pan onto heatproof serving plate; leave pan over cake a few min."", ""Remove pan."", ""Let cake stand at least 10 min before serving."", ""Serve hot with whipped cream."", ""9 servings."", ""Pear Upside-down Cake: Substitute packed brown sugar for the sugar and 1 large pear, thinly sliced, for the Blackberries."", ""Substitute minced pecans for the almonds."", ""Increase vanilla to 1 tsp."", ""; omit almond extract."", ""Add in 1/2 tsp."", ""grnd mace or possibly cinnamon with the vanilla.""]",cookeatshare.com/recipes/blackberry-upside-down-cake-86757,Recipes1M,"[""margarine"", ""Sugar"", ""Blackberries"", ""almonds"", ""Bisquick original baking mix"", ""Sugar"", ""milk"", ""Vegetable oil"", ""Vanilla"", ""Egg""]",9
105540,Broken Wheat Pudding ( Lapsi Kheer ) Recipe,"[""150 gm broken wheat"", ""300 ml water"", ""4 Tbsp. jaggery grated"", ""300 ml coconut lowfat milk"", ""1 tsp cardamom pwdr""]","[""(To make coconut lowfat milk finely grate a fresh coconut add in 150ml of warm water and squeeze to extract the thick lowfat milk."", ""You should get roughly 300ml of coconut lowfat milk from a coconut.)"", ""Cook the broken wheat in the water till soft."", ""Stir in the jaggery and cook till blended."", ""Add in the coconut lowfat milk and cardamom pwdr."", ""Bring to the boil once and remove from the heat."", ""Serve hot."", ""This is a pudding from south India and is made on festive occasions."", ""Serves 4""]",cookeatshare.com/recipes/broken-wheat-pudding-lapsi-kheer-93485,Recipes1M,"[""broken wheat"", ""water"", ""jaggery grated"", ""coconut lowfat milk"", ""pwdr""]",4
105541,Boysenberry Tiramisu Recipe,"[""1 pkt Frzn red Boysenberries in, light syrup (10 ounce)"", ""2 sqr semisweet chocolate, (1 ounce)"", ""1 ct whipped cream, cheese (8 ounce)"", ""3 Tbsp. Coffee-flavor liqueur"", ""1 Tbsp. Lowfat milk"", ""1 tsp Vanilla extract"", ""1 1/2 c. Heavy or possibly whipping cream"", ""2/3 c. Vanilla wafers, coarsely Crumble (about 40 cookies) Fresh Boysenberries, garnish""]","[""About 3 hrs before serving or possibly early in day: Thaw frzn Boysenberries as label directs."", ""Meanwhile, grate semi-sweet chocolate."", ""Reserve 1/4 c. grated chocolate for garnish."", ""In large bowl, with wire whisk or possibly fork, beat cream cheese, coffee flavor liqueur, lowfat milk, vanilla extract, and remaining grated chocolate till well blended."", ""In small bowl, with mixer at medium speed, beat heavy or possibly whipping cream and confectioners' sugar till stiff peaks forms."", ""Reserve 2 c. mix for topping."", ""With rubber spatula or possibly wire whisk, fold remaining 1 c. whipped cream mix."", ""Into 8 dessert glasses, place half of crumbled vanilla wafers; top with half of cream mix."", ""Spoon half of thawed Boysenberries with their syrup over cheese mix; top with remaining vanilla wafers, remaining thawed raspberies, then with remaining cheese mix."", ""Spoon reserved whipped cream mix into decorating bag with small rosette tube."", ""Pipe whipped cream around edge of each dessert glass."", ""Sprinkle reserved grated chocolate in center of each dessert."", ""Garnish with fresh raspberies."", ""Chill at least 2 hrs to blend flavor."", ""Makes 8 servings.""]",cookeatshare.com/recipes/boysenberry-tiramisu-90221,Recipes1M,"[""red Boysenberries"", ""chocolate"", ""whipped cream"", ""Coffee-flavor"", ""milk"", ""Vanilla"", ""whipping cream"", ""Vanilla wafers""]",8


In [84]:
recipe_df_filtered.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105543 entries, 0 to 105542
Data columns (total 7 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   title         105543 non-null  object
 1   ingredients   105543 non-null  object
 2   directions    105543 non-null  object
 3   link          105543 non-null  object
 4   source        105543 non-null  object
 5   NER           105543 non-null  object
 6   serving_size  105543 non-null  int64 
dtypes: int64(1), object(6)
memory usage: 5.6+ MB


At this moment, no further data type conversions are needed.

#### 4.1.3 Ingredient Counter <a name="counter"></a>

The next thing I want to do is extract the unique list of ingredients.

First, I convert the strings within the `NER` column to a Python list of items to capture the name of the ingredients within quotation marks.

In [89]:
def ingredient_parse_list(NER):
    return re.findall(r'"([^"]*)"',NER) # "([^"]*)" extracts the words within quotation marks

recipe_df_filtered['NER_list'] = recipe_df_filtered['NER'].apply(ingredient_parse_list)
print(recipe_df_filtered['NER_list'])

0                                                                                [frozen corn, cream cheese, butter, garlic powder, salt, pepper]
1                                                                            [broccoli, bacon, green onions, raisins, mayonnaise, vinegar, sugar]
2                                                                                     [ground beef, tomato juice, oats, egg, onion, pepper, salt]
3                                                     [long, cooking oil, tomato sauce, water, brown sugar, mustard, Worcestershire sauce, onion]
4                                                          [flour, baking powder, salt, butter, sugar, egg yolks, vanilla, multi-colored candies]
                                                                           ...                                                                   
105538    [stuffing mix, butter, chicken broth, chicken, onion, celery, salad dressing, salt, eggs, milk, cream of mushroom 

Here we can see that the resulting dataframe has a new column name `NER_list` where it lists the name of the ingredients without the quotation marks.

In [91]:
recipe_df_filtered.head()

Unnamed: 0,title,ingredients,directions,link,source,NER,serving_size,NER_list
0,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg. cream cheese, cubed"", ""1/3 c. butter, cubed"", ""1/2 tsp. garlic powder"", ""1/2 tsp. salt"", ""1/4 tsp. pepper""]","[""In a slow cooker, combine all ingredients. Cover and cook on low for 4 hours or until heated through and cheese is melted. Stir well before serving. Yields 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""garlic powder"", ""salt"", ""pepper""]",6,"[frozen corn, cream cheese, butter, garlic powder, salt, pepper]"
1,Broccoli Salad,"[""1 large head broccoli (about 1 1/2 lb.)"", ""10 slices bacon, cooked and crumbled"", ""5 green onions, sliced or 1/4 c. chopped red onion"", ""1/2 c. raisins"", ""1 c. mayonnaise"", ""2 Tbsp. vinegar"", ""1/4 c. sugar""]","[""Trim off large leaves of broccoli and remove the tough ends of lower stalks. Wash the broccoli thoroughly. Cut the florets and stems into bite-size pieces. Place in a large bowl. Add bacon, onions and raisins. Combine remaining ingredients, stirring well. Add dressing to broccoli mixture and toss gently. Cover and refrigerate 2 to 3 hours. Makes about 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=50992,Gathered,"[""broccoli"", ""bacon"", ""green onions"", ""raisins"", ""mayonnaise"", ""vinegar"", ""sugar""]",6,"[broccoli, bacon, green onions, raisins, mayonnaise, vinegar, sugar]"
2,Prize-Winning Meat Loaf,"[""1 1/2 lb. ground beef"", ""1 c. tomato juice"", ""3/4 c. oats (uncooked)"", ""1 egg, beaten"", ""1/4 c. chopped onion"", ""1/4 tsp. pepper"", ""1 1/2 tsp. salt""]","[""Mix well."", ""Press firmly into an 8 1/2 x 4 1/2 x 2 1/2-inch loaf pan."", ""Bake in preheated moderate oven."", ""Bake at 350\u00b0 for 1 hour."", ""Let stand 5 minutes before slicing."", ""Makes 8 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=923674,Gathered,"[""ground beef"", ""tomato juice"", ""oats"", ""egg"", ""onion"", ""pepper"", ""salt""]",8,"[ground beef, tomato juice, oats, egg, onion, pepper, salt]"
3,Corral Barbecued Beef Steak Strips,"[""2 lb. round steak 1/2 to 3/4-inch thick, sliced in strips 1/8-inch thick (or thinner) and 3 1/2 to 4-inches long (easily sliced if partially frozen)"", ""2 Tbsp. cooking oil"", ""1 can (15 oz.) tomato sauce"", ""1/3 c. water"", ""2 Tbsp. brown sugar"", ""1 Tbsp. prepared mustard"", ""1 tbsp. Worcestershire sauce"", ""1 medium sized onion, thinly sliced""]","[""Brown strips in cooking oil."", ""Pour off drippings."", ""Combine tomato sauce, water, brown sugar, mustard and Worcestershire sauce."", ""Add sauce and onion to meat slices."", ""Cover and cook slowly, stirring occasionally 30 minutes or until meat is tender. Serve over rice or buttered noodles."", ""Yields 6 to 8 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=420402,Gathered,"[""long"", ""cooking oil"", ""tomato sauce"", ""water"", ""brown sugar"", ""mustard"", ""Worcestershire sauce"", ""onion""]",8,"[long, cooking oil, tomato sauce, water, brown sugar, mustard, Worcestershire sauce, onion]"
4,Mexican Cookie Rings,"[""1 1/2 c. sifted flour"", ""1/2 tsp. baking powder"", ""1/2 tsp. salt"", ""1/2 c. butter"", ""2/3 c. sugar"", ""3 egg yolks"", ""1 tsp. vanilla"", ""multi-colored candies""]","[""Sift flour, baking powder and salt together."", ""Cream together butter and sugar."", ""Add egg yolks and vanilla."", ""Beat until light and fluffy."", ""Mix in sifted dry ingredients."", ""Shape into 1-inch balls."", ""Push wooden spoon handle through center (twist)."", ""Shape into rings."", ""Dip each cookie into candies."", ""Place on lightly greased baking sheets."", ""Bake in 375\u00b0 oven for 10 to 12 minutes or until golden brown."", ""Cool on racks."", ""Serves 2 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=364136,Gathered,"[""flour"", ""baking powder"", ""salt"", ""butter"", ""sugar"", ""egg yolks"", ""vanilla"", ""multi-colored candies""]",24,"[flour, baking powder, salt, butter, sugar, egg yolks, vanilla, multi-colored candies]"


Now I am going to extract the unique ingredients from the `NER_list` column.

In [93]:
all_ingredients = [ingredient for sublist in recipe_df_filtered['NER_list'] for ingredient in sublist]
unique_ingredients = list(set(all_ingredients))
size_unique_ingredients = len(unique_ingredients)
print(size_unique_ingredients)
#print('Unique Ingredients:', unique_ingredients) # Leaving as a note here in case I wanted to print the entire list of unique ingredients.

28027


From the list of unique ingredients above, we can see there's 28,027 unique ingredients. However, it is important to note that this way of listing unique ingredients may lead to a higher count of unique words as it is counting different forms of the same words individually due to case variations, punctuation, typos or other forms of the same word.

As part of the preprocessing I am going to calculate an `ingredient_counter` to understand the distribution in amount of ingredients across the recipes in this dataset and explore some relationships later on.

In [96]:
ingredient_counter = []

for i, row in recipe_df_filtered.iterrows():
    ner_list = row['NER_list']
    ingredient_counter.append(len(ner_list))

In [97]:
recipe_df_filtered['ingredient_counter'] = ingredient_counter
recipe_df_filtered.head()

Unnamed: 0,title,ingredients,directions,link,source,NER,serving_size,NER_list,ingredient_counter
0,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg. cream cheese, cubed"", ""1/3 c. butter, cubed"", ""1/2 tsp. garlic powder"", ""1/2 tsp. salt"", ""1/4 tsp. pepper""]","[""In a slow cooker, combine all ingredients. Cover and cook on low for 4 hours or until heated through and cheese is melted. Stir well before serving. Yields 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""garlic powder"", ""salt"", ""pepper""]",6,"[frozen corn, cream cheese, butter, garlic powder, salt, pepper]",6
1,Broccoli Salad,"[""1 large head broccoli (about 1 1/2 lb.)"", ""10 slices bacon, cooked and crumbled"", ""5 green onions, sliced or 1/4 c. chopped red onion"", ""1/2 c. raisins"", ""1 c. mayonnaise"", ""2 Tbsp. vinegar"", ""1/4 c. sugar""]","[""Trim off large leaves of broccoli and remove the tough ends of lower stalks. Wash the broccoli thoroughly. Cut the florets and stems into bite-size pieces. Place in a large bowl. Add bacon, onions and raisins. Combine remaining ingredients, stirring well. Add dressing to broccoli mixture and toss gently. Cover and refrigerate 2 to 3 hours. Makes about 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=50992,Gathered,"[""broccoli"", ""bacon"", ""green onions"", ""raisins"", ""mayonnaise"", ""vinegar"", ""sugar""]",6,"[broccoli, bacon, green onions, raisins, mayonnaise, vinegar, sugar]",7
2,Prize-Winning Meat Loaf,"[""1 1/2 lb. ground beef"", ""1 c. tomato juice"", ""3/4 c. oats (uncooked)"", ""1 egg, beaten"", ""1/4 c. chopped onion"", ""1/4 tsp. pepper"", ""1 1/2 tsp. salt""]","[""Mix well."", ""Press firmly into an 8 1/2 x 4 1/2 x 2 1/2-inch loaf pan."", ""Bake in preheated moderate oven."", ""Bake at 350\u00b0 for 1 hour."", ""Let stand 5 minutes before slicing."", ""Makes 8 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=923674,Gathered,"[""ground beef"", ""tomato juice"", ""oats"", ""egg"", ""onion"", ""pepper"", ""salt""]",8,"[ground beef, tomato juice, oats, egg, onion, pepper, salt]",7
3,Corral Barbecued Beef Steak Strips,"[""2 lb. round steak 1/2 to 3/4-inch thick, sliced in strips 1/8-inch thick (or thinner) and 3 1/2 to 4-inches long (easily sliced if partially frozen)"", ""2 Tbsp. cooking oil"", ""1 can (15 oz.) tomato sauce"", ""1/3 c. water"", ""2 Tbsp. brown sugar"", ""1 Tbsp. prepared mustard"", ""1 tbsp. Worcestershire sauce"", ""1 medium sized onion, thinly sliced""]","[""Brown strips in cooking oil."", ""Pour off drippings."", ""Combine tomato sauce, water, brown sugar, mustard and Worcestershire sauce."", ""Add sauce and onion to meat slices."", ""Cover and cook slowly, stirring occasionally 30 minutes or until meat is tender. Serve over rice or buttered noodles."", ""Yields 6 to 8 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=420402,Gathered,"[""long"", ""cooking oil"", ""tomato sauce"", ""water"", ""brown sugar"", ""mustard"", ""Worcestershire sauce"", ""onion""]",8,"[long, cooking oil, tomato sauce, water, brown sugar, mustard, Worcestershire sauce, onion]",8
4,Mexican Cookie Rings,"[""1 1/2 c. sifted flour"", ""1/2 tsp. baking powder"", ""1/2 tsp. salt"", ""1/2 c. butter"", ""2/3 c. sugar"", ""3 egg yolks"", ""1 tsp. vanilla"", ""multi-colored candies""]","[""Sift flour, baking powder and salt together."", ""Cream together butter and sugar."", ""Add egg yolks and vanilla."", ""Beat until light and fluffy."", ""Mix in sifted dry ingredients."", ""Shape into 1-inch balls."", ""Push wooden spoon handle through center (twist)."", ""Shape into rings."", ""Dip each cookie into candies."", ""Place on lightly greased baking sheets."", ""Bake in 375\u00b0 oven for 10 to 12 minutes or until golden brown."", ""Cool on racks."", ""Serves 2 dozen.""]",www.cookbooks.com/Recipe-Details.aspx?id=364136,Gathered,"[""flour"", ""baking powder"", ""salt"", ""butter"", ""sugar"", ""egg yolks"", ""vanilla"", ""multi-colored candies""]",24,"[flour, baking powder, salt, butter, sugar, egg yolks, vanilla, multi-colored candies]",8


Now we can see that the `ingredient_counter` column has been added into our dataframe!

#### 4.1.4 Meal Types <a name="mealtypes"></a>

In this case I am going to label the meal types by breakfast, lunch, dinner, salad, dessert and drinks, so that users can have the ability to specify the type of meal they would like to prepare based on the ingredients they have on hand. This adds an additional layer of refinement so that users can get options that are more tailored to their preferences and needs.

##### 4.1.4.1 First Round of Recipe Labelling <a name="firstlabel"></a>

In [102]:
# Create a new column called 'Meal Type' to label recipes

In [103]:
recipe_df_filtered['NER_list'] = recipe_df_filtered['NER_list'].astype(str)

In [104]:
recipe_df_filtered['Meal_Type'] = None

In [105]:
# Check-in on results from initial round of recipe labelling

In [106]:
recipe_df_filtered['Meal_Type'].value_counts()

Series([], Name: count, dtype: int64)

###### Breakfast Recipes

**Filters on `Title`**

In [109]:
patterns = [
    r'\b(breakfast)\b',
    r'\b(scrambled)\b',
    r'\b(egg|eggs)\b',
    r'\b(oats)\b',
    r'\b(hash brown|hash browns|hashbrown|hashbrowns)\b',
    r'\b(omelet|omelets|omelette|omelettes)\b',
    r'\b(toast|toasts)\b',
    r'\b(waffle|waffles)\b',
    r'\b(pancake|pancakes|hotcake|hotcakes)\b',
    r'\b(huevo|huevos|hueros|huero|huervo|huervos)\b',
    r'\b(parfait|parfaits)\b',
    r'\b(smoothie|smoothies|smoothy)\b',
    r'\b(bagel|bagels)\b',
    r'\b(quiche|quiches)\b',
    r'\b(frittata|fritata|frittatas|fritatas)\b',
    r'\b(muffin|muffins|mcmuffin)\b',
    r'\b(tostada|tostadas)\b',
    r'\b(quesadilla|quesadillas)\b',
    r'\b(morning)\b',
    r'\b(ham)\s*(and)\s*(cheese)\b',
    r'\b(raisin|raisins)\s*(bread|breads)\b',
    r'\b(peanut)\s*(butter)\s*(sandwich|sandwiches)\b',
    r'\b(croissant|croissants)\b',
    r'\b(club|clubhouse)\s*(sandwich|sandwiches)\b',
    r'\b(fruit|fruits)\s*(cup|cups|bowl|bowls|medley|medleys)\b',
    r'\b(bacon|bacons|sausage|sausages)\s*(tortilla|tortillas|sandwich|sandwiches)\b',
    r'\b(bacon|bacons|sausage|sausages)\s*(cheese|cheeses)\b',
    r'\b(strata)\b',
    r'\b(grit|grits)\b',
    r'\b(banana|bananas)\s*(bread|breads)\b',
    r'\b(bacon|bacons)\s*(and)\s*(cheese|cheeses)\b',
    r'\b(benedict|benedicts)\b',
    r'\b(granola|granolas)\b',
    r'\b(home)\s*(fries)\b',
    r'\b(hash)\b',
    r'\b(yogurt|yogurts)\b',
    r'\b(bread|breads)\b',
    r'\b(sausage|sausages)\s*(patties)\b',
    r'\b(sausage|sausages)\s*(link|links)\b',
    r'\b(biscuit|biscuits)\b',
    r'\b(brunch|brunches)\b'
]

for pattern in patterns:
    recipe_df_filtered.loc[
        recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
        recipe_df_filtered['Meal_Type'].isnull(),
        'Meal_Type'
    ] = 'breakfast'

  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].st

**Other filters on `directions` and `NER_list`**

In [111]:
recipe_df_filtered.loc[
    recipe_df_filtered['directions'].str.contains(pat = r'\b(breakfast)\b', case = False, na=False, regex= True) &
    recipe_df_filtered['Meal_Type'].isnull(),
    'Meal_Type'] = 'breakfast'

  recipe_df_filtered['directions'].str.contains(pat = r'\b(breakfast)\b', case = False, na=False, regex= True) &


In [112]:
recipe_df_filtered.loc[
            recipe_df_filtered['NER_list'].str.contains(r'\b(potato|potatoes)\b', case = False, na=False) &
            recipe_df_filtered['NER_list'].str.contains(r'\b(egg|eggs)\b', case = False, na=False) &
            recipe_df_filtered['NER_list'].str.contains(r'\b(cheese)\b', case = False, na=False) &
            ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|pork|lamb)\b', case = False, na=False) &
            ~recipe_df_filtered['title'].str.contains(r'\b(meat loaf|meat loafs)\b', case = False, na=False) &
            ~recipe_df_filtered['title'].str.contains(r'\b(salad|salads)\b', case = False, na=False) &
            ~recipe_df_filtered['title'].str.contains(r'\b(lasagna|lasagnas)\b', case = False, na=False) &      
    recipe_df_filtered['Meal_Type'].isnull(),
    'Meal_Type'] = 'breakfast'

  recipe_df_filtered['NER_list'].str.contains(r'\b(potato|potatoes)\b', case = False, na=False) &
  recipe_df_filtered['NER_list'].str.contains(r'\b(egg|eggs)\b', case = False, na=False) &
  recipe_df_filtered['NER_list'].str.contains(r'\b(cheese)\b', case = False, na=False) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|pork|lamb)\b', case = False, na=False) &
  ~recipe_df_filtered['title'].str.contains(r'\b(meat loaf|meat loafs)\b', case = False, na=False) &
  ~recipe_df_filtered['title'].str.contains(r'\b(salad|salads)\b', case = False, na=False) &
  ~recipe_df_filtered['title'].str.contains(r'\b(lasagna|lasagnas)\b', case = False, na=False) &


In [113]:
recipe_df_filtered.loc[
            recipe_df_filtered['NER_list'].str.contains(r'\b(bacon|bacons)\b', case = False, na=False) &
            recipe_df_filtered['NER_list'].str.contains(r'\b(egg|eggs)\b', case = False, na=False) &
            recipe_df_filtered['NER_list'].str.contains(r'\b(cheese)\b', case = False, na=False) &
            ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|pork|lamb)\b', case = False, na=False) &
            recipe_df_filtered['title'].str.contains(r'\b(bacon|bacons)\b', case = False, na=False) &
            ~recipe_df_filtered['title'].str.contains(r'\b(meat loaf|meat loafs)\b', case = False, na=False) &
            ~recipe_df_filtered['title'].str.contains(r'\b(salad|salads)\b', case = False, na=False) &
            ~recipe_df_filtered['title'].str.contains(r'\b(lasagna|lasagnas)\b', case = False, na=False) &
    recipe_df_filtered['Meal_Type'].isnull(),
    'Meal_Type'] = 'breakfast'

  recipe_df_filtered['NER_list'].str.contains(r'\b(bacon|bacons)\b', case = False, na=False) &
  recipe_df_filtered['NER_list'].str.contains(r'\b(egg|eggs)\b', case = False, na=False) &
  recipe_df_filtered['NER_list'].str.contains(r'\b(cheese)\b', case = False, na=False) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|pork|lamb)\b', case = False, na=False) &
  recipe_df_filtered['title'].str.contains(r'\b(bacon|bacons)\b', case = False, na=False) &
  ~recipe_df_filtered['title'].str.contains(r'\b(meat loaf|meat loafs)\b', case = False, na=False) &
  ~recipe_df_filtered['title'].str.contains(r'\b(salad|salads)\b', case = False, na=False) &
  ~recipe_df_filtered['title'].str.contains(r'\b(lasagna|lasagnas)\b', case = False, na=False) &


###### Dessert Recipes

**Filters on `Title`**

In [116]:
patterns = [
    r'\b(dessert|desserts)\b',
    r'\b(cookie|cookies)\b',
    r'\b(custard|custard)\b',
    r'\b(mousse|mousses)\b',
    r'\b(panna cotta)\b',
    r'\b(cheesecake|cheesecakes|cheese cake|cheese cakes)\b',
    r'\b(tiramisu|tiarmisu)\b',
    r'\b(sorbet|sorbets)\b',
    r'\b(icecream|ice cream|icecreams|ice creams)\b',
    r'\b(brownie|brownies)\b',
    r'\b(doughnut|doughnuts|donut|donuts)\b',
    r'\b(biscotti)\b',
    r'\b(fudge|fudges)\b',
    r'\b(sundae|sundaes)\b',
    r'\b(ambrosia|ambrosias)\b',
    r'\b(chocolate chip|chocolate chips|choco chip|choco chips)\b',
    r'\b(crumble|crumbles)\b',
    r'\b(eclair|eclairs)\b',
    r'\b(gelato|gelatos)\b',
    r'\b(trifle|trifles)\b',
    r'\b(creme brulee)\b',
    r'\b(shortcake|shortcakes)\b',
    r'\b(tarte tatin)\b',
    r'\b(cannoli)\b',
    r'\b(zabaglione|zabagliones)\b',
    r'\b(churro|churros)\b',
    r'\b(arroz con leche)\b',
    r'\b(tres leches)\b',
    r'\b(mochi)\b',
    r'\b(baklava|baklavas)\b',
    r'\b(lemonade dessert|lemonade desserts|lemonade pie|lemonade pies)\b',
    r'\b(torte|tortes)\b'
]

for pattern in patterns:
    recipe_df_filtered.loc[
        recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
        recipe_df_filtered['Meal_Type'].isnull(),
        'Meal_Type'
    ] = 'dessert'

  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].st

**Other filters on `title` with `NER_list` ingredients**

In [118]:
recipe_df_filtered.loc[
            recipe_df_filtered['NER_list'].str.contains(r'(sugar|juice|vanilla)', case = False, na=False) &
            recipe_df_filtered['title'].str.contains(r'(souffle|souffles)', case = False, na=False) &
            recipe_df_filtered['Meal_Type'].isnull(),
            'Meal_Type'
] = 'dessert'

  recipe_df_filtered['NER_list'].str.contains(r'(sugar|juice|vanilla)', case = False, na=False) &
  recipe_df_filtered['title'].str.contains(r'(souffle|souffles)', case = False, na=False) &


In [119]:
recipe_df_filtered.loc[
            recipe_df_filtered['NER_list'].str.contains(r'(sugar|vanilla)', case = False, na=False) &
            recipe_df_filtered['title'].str.contains(r'(cornbread|corn bread)', case = False, na=False) &
            ~recipe_df_filtered['NER_list'].str.contains(r'(turkey|chicken|beef|onion|cheese)', case = False, na=False) &
            recipe_df_filtered['Meal_Type'].isnull(),
            'Meal_Type'
] = 'dessert'

  recipe_df_filtered['NER_list'].str.contains(r'(sugar|vanilla)', case = False, na=False) &
  recipe_df_filtered['title'].str.contains(r'(cornbread|corn bread)', case = False, na=False) &
  ~recipe_df_filtered['NER_list'].str.contains(r'(turkey|chicken|beef|onion|cheese)', case = False, na=False) &


In [120]:
recipe_df_filtered.loc[
            recipe_df_filtered['title'].str.contains(r'(pudding|puddings)', case = False, na=False) &
            ~recipe_df_filtered['NER_list'].str.contains(r'(turkey|chicken|beef|onion|cheese|bacon)', case = False, na=False) &
            recipe_df_filtered['Meal_Type'].isnull(),
            'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(r'(pudding|puddings)', case = False, na=False) &
  ~recipe_df_filtered['NER_list'].str.contains(r'(turkey|chicken|beef|onion|cheese|bacon)', case = False, na=False) &


In [121]:
recipe_df_filtered.loc[
            recipe_df_filtered['title'].str.contains(pat = r'\b(pie|pies)\b', case = False, na=False, regex= True) &
            ~recipe_df_filtered['NER_list'].str.contains(r'(turkey|chicken|beef|onion|cheese|bacon)', case = False, na=False) &           
            recipe_df_filtered['Meal_Type'].isnull(),
            'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(pie|pies)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['NER_list'].str.contains(r'(turkey|chicken|beef|onion|cheese|bacon)', case = False, na=False) &


In [122]:
recipe_df_filtered.loc[
            recipe_df_filtered['title'].str.contains(pat = r'\b(scone|scones)\b', case = False, na=False, regex= True) &
            ~recipe_df_filtered['NER_list'].str.contains(r'\b(ham)\b', case = False, na=False) &           
            recipe_df_filtered['Meal_Type'].isnull(),
            'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(scone|scones)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(ham)\b', case = False, na=False) &


In [123]:
recipe_df_filtered.loc[
            recipe_df_filtered['title'].str.contains(pat = r'\b(fondue|fondues)\b', case = False, na=False, regex= True) &
            ~recipe_df_filtered['NER_list'].str.contains(r'\b(cheese|garlic|salt|pepper|nutmeg|beef|fish|mustard)\b', case = False, na=False) &           
            recipe_df_filtered['Meal_Type'].isnull(),
            'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(fondue|fondues)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(cheese|garlic|salt|pepper|nutmeg|beef|fish|mustard)\b', case = False, na=False) &


In [124]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(cupcake|cup cake|cupcakes|cup cakes)\b', case = False, na=False, regex= True) &
                ~recipe_df_filtered['NER_list'].str.contains(pat = r'\b(beef)\b', case = False, na=False, regex= True),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(cupcake|cup cake|cupcakes|cup cakes)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['NER_list'].str.contains(pat = r'\b(beef)\b', case = False, na=False, regex= True),


In [125]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(flan|flans)\b', case = False, na=False, regex= True) &
                recipe_df_filtered['NER_list'].str.contains(pat = r'\b(sugar|vanilla|juice)\b', case = False, na=False, regex= True),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(flan|flans)\b', case = False, na=False, regex= True) &
  recipe_df_filtered['NER_list'].str.contains(pat = r'\b(sugar|vanilla|juice)\b', case = False, na=False, regex= True),


In [126]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(cake|cakes|red velvet)\b', case = False, na=False, regex= True) &
                recipe_df_filtered['NER_list'].str.contains(r'\b(sugar|egg|flour|choco|chocolate|vanilla|red velvet)\b', case = False, na=False) &                
                ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat)\b', case = False, na=False) &
                recipe_df_filtered['Meal_Type'].isnull(),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(cake|cakes|red velvet)\b', case = False, na=False, regex= True) &
  recipe_df_filtered['NER_list'].str.contains(r'\b(sugar|egg|flour|choco|chocolate|vanilla|red velvet)\b', case = False, na=False) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat)\b', case = False, na=False) &


In [127]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(log|logs)\b', case = False, na=False, regex= True) &         
                ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat)\b', case = False, na=False) &
                recipe_df_filtered['Meal_Type'].isnull(),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(log|logs)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat)\b', case = False, na=False) &


In [128]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(cobbler|cobblers)\b', case = False, na=False, regex= True) &         
                ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat)\b', case = False, na=False) &
                recipe_df_filtered['Meal_Type'].isnull(),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(cobbler|cobblers)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat)\b', case = False, na=False) &


In [129]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(tart|tarts)\b', case = False, na=False, regex= True) &         
                ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat)\b', case = False, na=False) &
                recipe_df_filtered['Meal_Type'].isnull(),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(tart|tarts)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat)\b', case = False, na=False) &


In [130]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(bar|bars)\b', case = False, na=False, regex= True) &         
                ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat)\b', case = False, na=False) &
                recipe_df_filtered['Meal_Type'].isnull(),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(bar|bars)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat)\b', case = False, na=False) &


In [131]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(crisp|crisps)\b', case = False, na=False, regex= True) &         
                ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato)\b', case = False, na=False) &   
                recipe_df_filtered['Meal_Type'].isnull(),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(crisp|crisps)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato)\b', case = False, na=False) &


In [132]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(bar|bars)\b', case = False, na=False, regex= True) &
                recipe_df_filtered['NER_list'].str.contains(r'\b(peanut|peanut butter|chocolate|sugar|)\b', case = False, na=False) &         
                ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato)\b', case = False, na=False) &   
                recipe_df_filtered['Meal_Type'].isnull(),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(bar|bars)\b', case = False, na=False, regex= True) &
  recipe_df_filtered['NER_list'].str.contains(r'\b(peanut|peanut butter|chocolate|sugar|)\b', case = False, na=False) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato)\b', case = False, na=False) &


In [133]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(bar|bars)\b', case = False, na=False, regex= True) &
                recipe_df_filtered['NER_list'].str.contains(r'\b(peanut|peanut butter|chocolate|sugar)\b', case = False, na=False) &                
                ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato)\b', case = False, na=False) &
                recipe_df_filtered['Meal_Type'].isnull(),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(bar|bars)\b', case = False, na=False, regex= True) &
  recipe_df_filtered['NER_list'].str.contains(r'\b(peanut|peanut butter|chocolate|sugar)\b', case = False, na=False) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato)\b', case = False, na=False) &


In [134]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(crepe|crepes)\b', case = False, na=False, regex= True) &
                recipe_df_filtered['NER_list'].str.contains(r'\b(peanut|peanut butter|chocolate|sugar)\b', case = False, na=False) &                
                ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato|broccoli|mushroom)\b', case = False, na=False) &
                recipe_df_filtered['Meal_Type'].isnull(),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(crepe|crepes)\b', case = False, na=False, regex= True) &
  recipe_df_filtered['NER_list'].str.contains(r'\b(peanut|peanut butter|chocolate|sugar)\b', case = False, na=False) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato|broccoli|mushroom)\b', case = False, na=False) &


In [135]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(cream puff|cream puffs|creampuff|creampuffs)\b', case = False, na=False, regex= True) &
                ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato|tuna)\b', case = False, na=False) &
                recipe_df_filtered['Meal_Type'].isnull(),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(cream puff|cream puffs|creampuff|creampuffs)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato|tuna)\b', case = False, na=False) &


In [136]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(gingerbread|gingerbreads|ginger bread|ginger breads)\b', case = False, na=False, regex= True) &
                ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato|tuna)\b', case = False, na=False) &
                recipe_df_filtered['Meal_Type'].isnull(),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(gingerbread|gingerbreads|ginger bread|ginger breads)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato|tuna)\b', case = False, na=False) &


In [137]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(candy)\b', case = False, na=False, regex= True) &
                ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato|tuna)\b', case = False, na=False) &
                recipe_df_filtered['Meal_Type'].isnull(),
                'Meal_Type'
] = 'dessert'

  recipe_df_filtered['title'].str.contains(pat = r'\b(candy)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(salmon|fish|beef|chicken|steak|bacon|turkey|pork|meat|potatoes|potato|tuna)\b', case = False, na=False) &


###### Drink Recipes

**Filters on `Title`**

In [140]:
patterns = [
    r'\b(drink|drinks)\b',
    r'\b(beverage|beverages)\b',
    r'\b(hot)\s*(tea)\b',
    r'\b(ice|iced)\s*(tea|teas)\b',
    r'\b(spritzer|spritzers)\b',
    r'\b(hot)\s*(chocolate|chocolates)\b',
    r'\b(chocolate)\s*(milk)\b',
    r'\b(milk)\s*(shake)\b',
    r'\b(juice|juices)\b',
    r'\b(sparkler)\b',
    r'\b(punch)\b',
    r'\b(lemonade)\b',
    r'\b(float|floats)\b',
    r'\b(shake|shakes)\b',
    r'\b(cream)\s*(soda|sodas)\b',
    r'\b(slush|slushie)\b',
    r'\b(agua)\b',
    r'\b(cappuccino|chocolate)\s*(mix)\b',
    r'\b(coffee)\s*(mix)\b',
    r'\b(water)\s*(ice)\b',
    r'\b(chai)\s*(tea)\b',
    r'\b(chai)\s*(mix)\b',
    r'\b(latte)\b',
    r'\b(sherbet|sherbets)\b',
    r'\b(cappuccino|cappuccinos)\b'
]

for pattern in patterns:
    recipe_df_filtered.loc[
        recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
        recipe_df_filtered['Meal_Type'].isnull(),
        'Meal_Type'
    ] = 'drink'

  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].st

**Other filters on `directions` and `NER_list`**

In [142]:
recipe_df_filtered.loc[
            recipe_df_filtered['NER_list'].str.contains('orange juice', case = False, na=False) &
            recipe_df_filtered['NER_list'].str.contains('champagne', case = False, na=False),
            'Meal_Type'] = 'drink'

###### Dinner Recipes

**Filters on `Title`**

In [145]:
patterns = [
    r'\b(pie|pies)\b',
    r'\b(dinner|dinners|supper|suppers|feast)\b',
    r'\b(main|banquet|banquets)\b',
    r'\b(steak|steaks)\b',
    r'\b(pasta|pastas)\b',
    r'\b(carbonara|carbonaras)\b',
    r'\b(alfredo)\b',
    r'\b(fettuccine|fettuccines|fettucine|fettucine)\b',
    r'\b(spaghetti|spaghettis)\b',
    r'\b(lasagna|lasagnas)\b',
    r'\b(stew|stews|stewed)\b',
    r'\b(roast|roasted|roasts)\b',
    r'\b(paella|risotto)\b',
    r'\b(stir fry|stir-fry|stirfry)\b',
    r'\b(scalloped)\b'
]

for pattern in patterns:
    recipe_df_filtered.loc[
        recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
        recipe_df_filtered['Meal_Type'].isnull(),
        'Meal_Type'
    ] = 'dinner'

  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].st

**Other filters on, `directions`, and `title` with `NER_list` ingredients**

In [147]:
recipe_df_filtered.loc[
            recipe_df_filtered['title'].str.contains(pat = r'\b(grill|grilled)\b', case = False, na=False, regex= True) &
            ~recipe_df_filtered['title'].str.contains(r'\b(chicken|chickens)\b', case = False, na=False) &
            recipe_df_filtered['Meal_Type'].isnull(),
            'Meal_Type'
] = 'dinner'

  recipe_df_filtered['title'].str.contains(pat = r'\b(grill|grilled)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['title'].str.contains(r'\b(chicken|chickens)\b', case = False, na=False) &


In [148]:
recipe_df_filtered.loc[
            recipe_df_filtered['title'].str.contains(pat = r'\b(fondue|fondues)\b', case = False, na=False, regex= True) &
            recipe_df_filtered['NER_list'].str.contains(r'\b(cheese|garlic|salt|pepper|nutmeg|beef|fish|mustard)\b', case = False, na=False) &
            recipe_df_filtered['Meal_Type'].isnull(),
            'Meal_Type'
] = 'dinner'

  recipe_df_filtered['title'].str.contains(pat = r'\b(fondue|fondues)\b', case = False, na=False, regex= True) &
  recipe_df_filtered['NER_list'].str.contains(r'\b(cheese|garlic|salt|pepper|nutmeg|beef|fish|mustard)\b', case = False, na=False) &


In [149]:
recipe_df_filtered.loc[
            recipe_df_filtered['directions'].str.contains(pat = r'\b(dinner|dinners|suppers|supper)\b', case = False, na=False, regex= True) &
            recipe_df_filtered['Meal_Type'].isnull(),
            'Meal_Type'
] = 'dinner'

  recipe_df_filtered['directions'].str.contains(pat = r'\b(dinner|dinners|suppers|supper)\b', case = False, na=False, regex= True) &


In [150]:
recipe_df_filtered.loc[
            recipe_df_filtered['title'].str.contains(pat = r'\b(taco|tacos)\b', case = False, na=False, regex= True) &
            ~recipe_df_filtered['NER_list'].str.contains(r'\b(chorizo|chorizos)\b', case = False, na=False) &
            recipe_df_filtered['Meal_Type'].isnull(),
            'Meal_Type'
] = 'dinner'

  recipe_df_filtered['title'].str.contains(pat = r'\b(taco|tacos)\b', case = False, na=False, regex= True) &
  ~recipe_df_filtered['NER_list'].str.contains(r'\b(chorizo|chorizos)\b', case = False, na=False) &


###### Lunch Recipes

**Filters on `Title`**

In [153]:
patterns = [
    r'\b(macaroni and cheese|mac and cheese|mac n cheese)\b',
    r'\b(sandwich|sandwiches)\b',
    r'\b(wrap|wraps)\b',
    r'\b(caesar salad|caesar salads|greek salad|greek salads|chicken salad|chicken salads)\b',
    r'\b(panini|paninis)\b',
    r'\b(bowl|bowls)\b',
    r'\b(lunch|lunches)\b',
    r'\b(bento|bentos)\b',
    r'\b(caprese)\b',
    r'\b(aglio)\b',
    r'\b(pita|pitas)\b',
    r'\b(grilled chicken|grilled chickens)\b',
    r'\b(shredded)\b',
    r'\b(burger|burgers)\b',
    r'\b(crostini|crostinis)\b',
    r'\b(gourmet)\b',
    r'\b(fried rice)\b',
    r'\b(blt)\b',
    r'\b(roll|rolls)\b',
    r'\b(rice)\b',
    r'\b(leftover|leftovers)\b',
    r'\b(noodle|noodles|noodle soup|noodle soups)\b',
    r'\b(fried chicken|fried chickens)\b',
    r'\b(meat loaf|meatloaf)\b',
    r'\b(tater|taters)\b',
    r'\b(barbecue rib|barbecue ribs|barbecued rib|barbecued ribs)\b'
]

for pattern in patterns:
    recipe_df_filtered.loc[
        recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
        recipe_df_filtered['Meal_Type'].isnull(),
        'Meal_Type'
    ] = 'lunch'

  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].str.contains(pattern, case=False, na=False, regex=True) &
  recipe_df_filtered['title'].st

**Other filters on, `directions`, and `title` with `NER_list` ingredients**

In [155]:
recipe_df_filtered.loc[
            recipe_df_filtered['title'].str.contains(pat = r'\b(pie|pies)\b', case = False, na=False, regex= True) &
            recipe_df_filtered['NER_list'].str.contains(pat = r'\b(spinach|feta|ham|cheese|chicken|beef|mushroom)\b', case = False, na=False, regex= True) &
            recipe_df_filtered['Meal_Type'].isnull(),
            'Meal_Type'
] = 'lunch'

  recipe_df_filtered['title'].str.contains(pat = r'\b(pie|pies)\b', case = False, na=False, regex= True) &
  recipe_df_filtered['NER_list'].str.contains(pat = r'\b(spinach|feta|ham|cheese|chicken|beef|mushroom)\b', case = False, na=False, regex= True) &


In [156]:
recipe_df_filtered.loc[
            recipe_df_filtered['directions'].str.contains(pat = r'\b(lunch|lunches)\b', case = False, na=False, regex= True),
            'Meal_Type'
] = 'lunch'

  recipe_df_filtered['directions'].str.contains(pat = r'\b(lunch|lunches)\b', case = False, na=False, regex= True),


In [157]:
recipe_df_filtered.loc[
            recipe_df_filtered['title'].str.contains(pat = r'\b(taco|tacos)\b', case = False, na=False, regex= True) &
            recipe_df_filtered['NER_list'].str.contains(r'\b(chorizo|chorizos)\b', case = False, na=False) &
            recipe_df_filtered['Meal_Type'].isnull(),
            'Meal_Type'
] = 'lunch'

  recipe_df_filtered['title'].str.contains(pat = r'\b(taco|tacos)\b', case = False, na=False, regex= True) &
  recipe_df_filtered['NER_list'].str.contains(r'\b(chorizo|chorizos)\b', case = False, na=False) &


###### Salad Recipes

**Filters on `Title`**

In [160]:
recipe_df_filtered.loc[
                recipe_df_filtered['title'].str.contains(pat = r'\b(salad|salads)\b', case = False, na=False, regex= True) &
                recipe_df_filtered['Meal_Type'].isnull(),
                'Meal_Type'
] = 'salad'

  recipe_df_filtered['title'].str.contains(pat = r'\b(salad|salads)\b', case = False, na=False, regex= True) &


##### 4.1.4.2 Second Round of Recipe Labelling with Supervised Learning <a name="secondlabel"></a>

To begin the second round of labelling, let's first create a new dataframe with the recipes that were labeled in the first round and use this subset of the data to train a random forest model to label from the ingredients that are typically part of a specific meal type. A random forest is an ensemble of decision trees. Each tree is trained on a different subset of the data. Ensemble methods generally perform well as it represents an average result across many models, in this case decision tree models.

In [163]:
test_df = recipe_df_filtered[recipe_df_filtered['Meal_Type'].notna()]

In [164]:
print(f'As a result of the initial labeling, we now have a total of {test_df.shape[0]} labeled recipes to train a random forest model.')

As a result of the initial labeling, we now have a total of 51205 labeled recipes to train a random forest model.


For this use case, the **features X** will include the **list of ingredients**, and the **target variable Y** is the **meal type** that the specific recipe corresponds to.

In [166]:
# Step 1. Specify X and Y

X = test_df['NER_list']
y = test_df['Meal_Type']

In [167]:
# Step 2. Split the data into training and testing sets

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 99)

In [168]:
# Step 3. Vectorize list of ingredients 

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)

In [169]:
# Step 4. Instantiate random forest model

from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier()
classifier.fit(X_train_vectorized, y_train)

In [170]:
# Step 5. Calculate predictions

y_pred = classifier.predict(X_test_vectorized)

In [171]:
results_df = pd.DataFrame({'Recipe_Name': X_test,
                           'Actual_Label': y_test,
                           'Predicted_Label': y_pred
                          })

In [172]:
results_df

Unnamed: 0,Recipe_Name,Actual_Label,Predicted_Label
67476,"['pineapple', 'fruit cocktail', 'O', 'sour cream']",salad,salad
29739,"['brown sugar', 'butter', 'eggs', 'flour', 'baking powder', 'nuts']",dessert,dessert
40130,"['celery', 'mushrooms', 'sour cream', 'onion', 'bouillon', 'beef']",dinner,lunch
4195,"['white rice', 'pork chops', 'shortening', 'salt', 'red pepper', 'black pepper', 'chicken broth', 'thyme', 'green pepper', 'onion']",lunch,lunch
425,"['onion soup', 'white potatoes', 'olive']",dinner,dinner
...,...,...,...
8643,"['cantaloupe chunks', 'strawberry halves', 'sugar', 'cinnamon']",breakfast,breakfast
20109,"['butter', 'vegetable oil', 'onions', 'potatoes', 'crust', 'salt', 'pepper']",dinner,dinner
35708,"['fruit cocktail', 'mandarin oranges', 'cherries', 'flaked coconut']",breakfast,dessert
66340,"['Ann cherries', 'dark sweet cherries', 'peaches', 'marshmallows', 'sour cream', 'whipping cream']",salad,salad


Here we can see a quick glance at how our model predicted the label of some of the recipes against the actual label. Let's investigate further how our model did by analyzing the classification report.

In [174]:
from sklearn.metrics import classification_report, confusion_matrix

In [175]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

   breakfast       0.82      0.65      0.73      1562
     dessert       0.82      0.90      0.86      2088
      dinner       0.76      0.85      0.80      2718
       drink       0.89      0.84      0.87       444
       lunch       0.82      0.74      0.78      1831
       salad       0.83      0.85      0.84      1598

    accuracy                           0.81     10241
   macro avg       0.83      0.80      0.81     10241
weighted avg       0.81      0.81      0.81     10241



**Precision scores** indicate that all of the predictions of a specifc meal type are correct. In this case, we can see that the model is very precise in predicting breakfast, dessert, drinks, lunch and salads. The precision score for dinner is lower compared to the rest, which could be explained by the variation of ingredients contained in dinner recipes, making the model less precise in identifying them.

On the other hand, a **recall scores** indicate that the model can correctly predict every meal type, meaning that the model can recall all instances of a type of meal. For example, breakfast meals only have a ~64% recall score which means that the model correctly identifies 65% of all actual breakfast meals. This indicates that there may be a number of breakfast meals that the model is missing (false negatives). Usually with these two scores, there is a balance. Meaning, the higher the precision score, the lower the recall scores and viceversa.


Let's now use this trained model to label the remaining recipes.

In [178]:
unlabeled_test_df = recipe_df_filtered[recipe_df_filtered['Meal_Type'].isna()]

In [179]:
X_unlabeled = unlabeled_test_df['NER_list']
y_unlabeled = unlabeled_test_df['Meal_Type']

In [180]:
vectorizer = TfidfVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_unlabeled_vectorized = vectorizer.transform(X_unlabeled)

In [181]:
predicted_labels = classifier.predict(X_unlabeled_vectorized)

In [182]:
unlabeled_test_df['Predicted_Label'] = predicted_labels

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unlabeled_test_df['Predicted_Label'] = predicted_labels


In [183]:
unlabeled_test_df.head()

Unnamed: 0,title,ingredients,directions,link,source,NER,serving_size,NER_list,ingredient_counter,Meal_Type,Predicted_Label
0,Creamy Corn,"[""2 (16 oz.) pkg. frozen corn"", ""1 (8 oz.) pkg. cream cheese, cubed"", ""1/3 c. butter, cubed"", ""1/2 tsp. garlic powder"", ""1/2 tsp. salt"", ""1/4 tsp. pepper""]","[""In a slow cooker, combine all ingredients. Cover and cook on low for 4 hours or until heated through and cheese is melted. Stir well before serving. Yields 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=10570,Gathered,"[""frozen corn"", ""cream cheese"", ""butter"", ""garlic powder"", ""salt"", ""pepper""]",6,"['frozen corn', 'cream cheese', 'butter', 'garlic powder', 'salt', 'pepper']",6,,dinner
5,Creole Flounder,"[""2 lb. flounder or pollack fillets"", ""1 1/2 c. chopped tomatoes"", ""1/2 c. chopped green pepper"", ""1/3 c. lemon juice"", ""1 Tbsp. salad oil"", ""2 tsp. salt"", ""2 tsp. minced onion"", ""1 tsp. basil leaves"", ""1/4 tsp. coarsely ground black pepper"", ""4 drops red pepper sauce"", ""green pepper rings""]","[""Heat oven to 500\u00b0."", ""Place fillets in single layer in baking dish, 13 1/2 x 9 x 2-inch."", ""Stir together remaining ingredients except pepper rings. Spoon over fillets."", ""Bake 5 to 8 minutes or until fish flakes easily with fork."", ""Remove fillets to warm platter. Garnish with green pepper rings. Makes 4 to 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=580768,Gathered,"[""flounder"", ""tomatoes"", ""green pepper"", ""lemon juice"", ""salad oil"", ""salt"", ""onion"", ""basil"", ""ground black pepper"", ""drops red pepper sauce"", ""green pepper""]",6,"['flounder', 'tomatoes', 'green pepper', 'lemon juice', 'salad oil', 'salt', 'onion', 'basil', 'ground black pepper', 'drops red pepper sauce', 'green pepper']",11,,dinner
8,Sesame Ginger Chicken,"[""1 Tbsp. sesame seed, toasted"", ""2 tsp. grated ginger"", ""2 Tbsp. honey"", ""2 Tbsp. reduced-sodium soy sauce"", ""4 (4 oz.) skinned chicken breast halves"", ""vegetable cooking spray"", ""thin green onion strips""]","[""Combine first 4 ingredients in a small bowl, stir well and set aside."", ""Place chicken between 2 sheets of waxed paper or heavy duty plastic wrap, and flatten to 1/4-inch thickness, using meat mallet or rolling pin."", ""Coat grill rack with cooking spray; place on grill over coals."", ""Place chicken on rack and cook 4 minutes on each side, basting frequently with soy sauce mixture."", ""Transfer chicken to a serving platter and garnish with green onion, if desired."", ""Yields 4 servings, about 200 calories.""]",www.cookbooks.com/Recipe-Details.aspx?id=352931,Gathered,"[""sesame seed"", ""grated ginger"", ""honey"", ""soy sauce"", ""chicken"", ""vegetable cooking spray"", ""thin green onion strips""]",4,"['sesame seed', 'grated ginger', 'honey', 'soy sauce', 'chicken', 'vegetable cooking spray', 'thin green onion strips']",7,,dinner
10,Sweet-N-Sour Chicken,"[""2 c. diced cooked chicken"", ""2 Tbsp. shortening"", ""1/2 c. onion (large chunks)"", ""2 c. carrot chunks"", ""1 1/4 c. water"", ""3 chicken bouillon cubes"", ""1/4 c. packed brown sugar"", ""2 Tbsp. cornstarch"", ""1/4 tsp. ginger"", ""1/4 c. catsup"", ""2 Tbsp. vinegar"", ""1 Tbsp. soy sauce"", ""1 c. green pepper (large chunks)"", ""8 oz. pineapple chunks""]","[""Saute onion in 2 tablespoons shortening."", ""Add carrots, water and bouillon cubes."", ""Simmer 5 minutes."", ""Combine next 6 ingredients."", ""Add to vegetable mixture and cook until clear."", ""Add pepper, pineapple and chicken."", ""Cover and simmer 5 minutes or until heated through."", ""Serve with rice or angel hair noodles. Serves 6 people.""]",www.cookbooks.com/Recipe-Details.aspx?id=228506,Gathered,"[""chicken"", ""shortening"", ""onion"", ""carrot chunks"", ""water"", ""chicken"", ""brown sugar"", ""cornstarch"", ""ginger"", ""catsup"", ""vinegar"", ""soy sauce"", ""green pepper"", ""pineapple""]",6,"['chicken', 'shortening', 'onion', 'carrot chunks', 'water', 'chicken', 'brown sugar', 'cornstarch', 'ginger', 'catsup', 'vinegar', 'soy sauce', 'green pepper', 'pineapple']",14,,dinner
13,Zucchini-Artichoke Continental,"[""1 (9 oz.) pkg. frozen artichoke hearts"", ""2 Tbsp. water"", ""3 medium zucchini (1 lb.), sliced 1/4-inch thick (4 c.)"", ""2 c. fresh mushrooms, halved"", ""2 Tbsp. finely chopped green onion"", ""2 cloves garlic, minced"", ""1 Tbsp. margarine or butter"", ""2 medium tomatoes, cut into wedges and seeded"", ""1/4 c. grated Parmesan cheese""]","[""In 2-quart microwave-safe casserole, microcook artichokes and water, covered, on 100% power (High) for 3 to 4 minutes or until thawed."", ""Stir."", ""Add next 3 ingredients."", ""Cover."", ""Cook on High for 9 to 11 minutes (low-wattage oven for 12 to 14 minutes) or just until tender; stir once."", ""Drain well."", ""Stir in garlic, 1/2 teaspoon salt and 1/4 teaspoon pepper."", ""Dot with margarine or butter."", ""Cover; cook on High for 1 minute."", ""Stir in tomatoes; sprinkle with cheese."", ""Let stand 2 minutes."", ""Makes 6 servings.""]",www.cookbooks.com/Recipe-Details.aspx?id=669859,Gathered,"[""frozen artichoke"", ""water"", ""zucchini"", ""fresh mushrooms"", ""green onion"", ""garlic"", ""margarine"", ""tomatoes"", ""Parmesan cheese""]",6,"['frozen artichoke', 'water', 'zucchini', 'fresh mushrooms', 'green onion', 'garlic', 'margarine', 'tomatoes', 'Parmesan cheese']",9,,dinner


**Let's put our labels together in our original dataframe**

In [185]:
recipe_df_filtered['Meal_Type'] = recipe_df_filtered['Meal_Type'].fillna(unlabeled_test_df['Predicted_Label'])

In [186]:
recipe_df_filtered.isna().sum()

title                 0
ingredients           0
directions            0
link                  0
source                0
NER                   0
serving_size          0
NER_list              0
ingredient_counter    0
Meal_Type             0
dtype: int64

All of our recipes have now been labeled!

## 5. Saving the data <a name="saving"></a>

Let's download the cleaned and preprocessed dataset to use in the next steps. This will save us time later on as we won't have to go through all of these steps again and allow us to readily access the cleaned data, as well as leverage it for the EDA and modelling phase.  

In [190]:
# Save the clean dataset
recipe_df_filtered.to_pickle('recipe_df_filtered.pkl')

In [310]:
recipe_df_filtered.to_csv('recipe_df_filtered.csv', index = False, sep =',', lineterminator='\n')

## 6. Conclusion <a name="conclusion"></a>

In summary, this notebook contains the steps to load the dataset, obtain an initial understanding of the fields and data types, as well as cleaning and preprocessing steps.

Here is the list of the changes and actions we performed:

* Cleaned up missing values and checked for duplicates.
* Extracted the serving size and converted to the same unit of measure (e.g. dozens to single units).
* Created a dataframe with only the subset of recipes which contained the number of serving size for the purpose of this project.
* Calculated a variable containing the number of ingredients in each recipe.
* Labeled each recipe by meal type: breakfast, lunch, dinner, drinks, dessert, and salad.

With the dataset now properly clean and preprocessed, we can now move onto the next phase of this project which involves EDA and modelling.