### Part 0: Basic Data Cleaning
The first step is to do some basic data cleaning and rid of all the columns that won't be of any use acrross any of the projects going forward, and add some useful columns to the dataset based on the existing ones that will come handy in both Data Analysis and ML/NLP.

* **Drop:** 
['Name', 'AuthorName', 'CookTime', 'PrepTime', 'TotalTime', 'DatePublished', 'Description', 'Images', 'ReviewCount']

* **Add:**
['TotalMinutes', 'YearPublished', 'MonthPublished', 'DayPublished', 'HourPublished']

* **Replace:**
['RecipeIngredientQuantities', 'RecipeIngredientParts'] with ones scraped from food.com froms scratch.

**Save:**
BasicCleanData.parquet 

We can perform classical data analysis on BasicCleanData.parquet


#### Imports and sanity checks

In [1]:
import sys
sys.executable

'C:\\Users\\mathe\\anaconda3\\envs\\deepchef\\python.exe'

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re

In [3]:
# This allows scrolling through all the columns. Useful for dataframes with too many columns.
pd.set_option('display.max_columns', 100)

In [39]:
recipes = pd.read_parquet('../recipes.parquet')

In [40]:
recipes.sample(2)

Unnamed: 0,RecipeId,Name,AuthorId,AuthorName,CookTime,PrepTime,TotalTime,DatePublished,Description,Images,RecipeCategory,Keywords,RecipeIngredientQuantities,RecipeIngredientParts,AggregatedRating,ReviewCount,Calories,FatContent,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions
374495,388030.0,Brodie's Spiced Rum Ice Cream Floats,570444,Chef Lisa S,,PT3M,PT3M,2009-09-01 11:02:00+00:00,Make and share this Brodie's Spiced Rum Ice Cr...,[],Beverages,"[Low Protein, Low Cholesterol, Healthy, < 15 M...","[2, 10, None, 2]","[spiced rum, lime wedge, vanilla ice cream]",,,390.1,8.0,4.9,31.7,83.7,45.7,0.5,40.7,2.7,1.0,1 drink,[Add ice cream to your cup pour Rum and Soda o...
326836,339187.0,Ribboned Zucchini Salad,283251,dicentra,,PT15M,PT15M,2008-11-24 01:03:00+00:00,Make and share this Ribboned Zucchini Salad re...,[],Vegetable,"[Summer, < 15 Mins]","[2, 1, 2, 2, 2, 2, 1, 1⁄4, 3, 1⁄2, 1⁄2, 1⁄2, 1...","[zucchini, salt, extra virgin olive oil, fresh...",,,90.6,6.0,0.8,1.1,551.5,8.6,3.2,3.3,2.9,6.0,,[Cut zucchini lengthwise into 1/8-inch-thick s...


In [41]:
recipes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 522517 entries, 0 to 522516
Data columns (total 28 columns):
 #   Column                      Non-Null Count   Dtype              
---  ------                      --------------   -----              
 0   RecipeId                    522517 non-null  float64            
 1   Name                        522517 non-null  object             
 2   AuthorId                    522517 non-null  int32              
 3   AuthorName                  522517 non-null  object             
 4   CookTime                    439972 non-null  object             
 5   PrepTime                    522517 non-null  object             
 6   TotalTime                   522517 non-null  object             
 7   DatePublished               522517 non-null  datetime64[ns, UTC]
 8   Description                 522512 non-null  object             
 9   Images                      522516 non-null  object             
 10  RecipeCategory              521766 non-null 

In [42]:
recipes.describe()

Unnamed: 0,RecipeId,AuthorId,AggregatedRating,ReviewCount,Calories,FatContent,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings
count,522517.0,522517.0,269294.0,275028.0,522517.0,522517.0,522517.0,522517.0,522517.0,522517.0,522517.0,522517.0,522517.0,339606.0
mean,271821.43697,45725850.0,4.632014,5.227784,484.43858,24.614922,9.559457,86.487003,767.2639,49.089092,3.843242,21.878254,17.46951,8.606191
std,155495.878422,292971400.0,0.641934,20.381347,1397.116649,111.485798,46.622621,301.987009,4203.621,180.822062,8.603163,142.620191,40.128837,114.319809
min,38.0,27.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
25%,137206.0,69474.0,4.5,1.0,174.2,5.6,1.5,3.8,123.3,12.8,0.8,2.5,3.5,4.0
50%,271758.0,238937.0,5.0,2.0,317.1,13.8,4.7,42.6,353.3,28.2,2.2,6.4,9.1,6.0
75%,406145.0,565828.0,5.0,4.0,529.1,27.4,10.8,107.9,792.2,51.1,4.6,17.9,25.0,8.0
max,541383.0,2002886000.0,5.0,3063.0,612854.6,64368.1,26740.6,130456.4,1246921.0,108294.6,3012.0,90682.3,18396.2,32767.0


#### Adding recipe urls to the dataframe
We will first reconstruct the recipe urls from the original recipes dataset. 
* We can use these urls to check recipe data recorded in the dataset and the actual info on the respective recipe webpages.
* We also use these links to scrape food.com in order to upgrade the ingredients (currently ongoing in another notebook).

In [43]:
recipes['url']= recipes['Name'].apply(lambda x: x.replace(' ','-')+'-')
recipes['url']

0                        Low-Fat-Berry-Blue-Frozen-Dessert-
1                                                  Biryani-
2                                            Best-Lemonade-
3                           Carina's-Tofu-Vegetable-Kebabs-
4                                             Cabbage-Soup-
                                ...                        
522512                      Meg's-Fresh-Ginger-Gingerbread-
522513    Roast-Prime-Rib-au-Poivre-with-Mixed-Peppercorns-
522514                               Kirshwasser-Ice-Cream-
522515            Quick-&-Easy-Asian-Cucumber-Salmon-Rolls-
522516                             Spicy-Baked-Scotch-Eggs-
Name: url, Length: 522517, dtype: object

In [44]:
recipes['url'] = recipes[['url', 'RecipeId']].apply(lambda x: 'https://www.food.com/recipe/' + x['url'] + str(int(x['RecipeId'])), axis=1)
recipes['url']

0         https://www.food.com/recipe/Low-Fat-Berry-Blue...
1                    https://www.food.com/recipe/Biryani-39
2              https://www.food.com/recipe/Best-Lemonade-40
3         https://www.food.com/recipe/Carina's-Tofu-Vege...
4               https://www.food.com/recipe/Cabbage-Soup-42
                                ...                        
522512    https://www.food.com/recipe/Meg's-Fresh-Ginger...
522513    https://www.food.com/recipe/Roast-Prime-Rib-au...
522514    https://www.food.com/recipe/Kirshwasser-Ice-Cr...
522515    https://www.food.com/recipe/Quick-&-Easy-Asian...
522516    https://www.food.com/recipe/Spicy-Baked-Scotch...
Name: url, Length: 522517, dtype: object

In [45]:
#recipes.to_csv('recipes_with_urls.pkl')

In [46]:
#recipes = pd.read_parquet('../recipes_with_urls.parquet')

In [47]:
recipes.sample(2)

Unnamed: 0,RecipeId,Name,AuthorId,AuthorName,CookTime,PrepTime,TotalTime,DatePublished,Description,Images,RecipeCategory,Keywords,RecipeIngredientQuantities,RecipeIngredientParts,AggregatedRating,ReviewCount,Calories,FatContent,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions,url
299215,310841.0,Mediterranean Baked Halibut,586469,Borealis Beegirl,PT25M,PT10M,PT35M,2008-06-30 00:59:00+00:00,Make and share this Mediterranean Baked Halibu...,[],Halibut,"[Healthy, < 60 Mins]","[2, 4, 1, 1⁄3, 2, 2 2⁄3]","[plum tomatoes, halibut steaks, onion, capers,...",,,627.5,11.8,1.7,131.8,624.4,35.7,3.2,2.9,88.6,4.0,,"[Preheat oven to 350°F., Arrange half of the t...",https://www.food.com/recipe/Mediterranean-Bake...
126357,132792.0,Ensalada De Noche Buena - Christmas Eve Salad,120566,mariposa13,,PT20M,PT20M,2005-08-08 23:33:00+00:00,Make and share this Ensalada De Noche Buena - ...,[],Pineapple,"[Apple, Tropical Fruits, Fruit, Mexican, Low P...","[1, 2, 2, 1, 1, 1, 1, None, 1⁄2, None]","[fresh pineapple, pineapple chunks, oranges, b...",,,199.9,6.4,0.9,0.0,23.0,35.5,6.0,24.0,5.1,,,"[Remove crown of fresh pineapple., Peel pineap...",https://www.food.com/recipe/Ensalada-De-Noche-...


In [48]:
recipes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 522517 entries, 0 to 522516
Data columns (total 29 columns):
 #   Column                      Non-Null Count   Dtype              
---  ------                      --------------   -----              
 0   RecipeId                    522517 non-null  float64            
 1   Name                        522517 non-null  object             
 2   AuthorId                    522517 non-null  int32              
 3   AuthorName                  522517 non-null  object             
 4   CookTime                    439972 non-null  object             
 5   PrepTime                    522517 non-null  object             
 6   TotalTime                   522517 non-null  object             
 7   DatePublished               522517 non-null  datetime64[ns, UTC]
 8   Description                 522512 non-null  object             
 9   Images                      522516 non-null  object             
 10  RecipeCategory              521766 non-null 

In [49]:
recipes.describe()

Unnamed: 0,RecipeId,AuthorId,AggregatedRating,ReviewCount,Calories,FatContent,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings
count,522517.0,522517.0,269294.0,275028.0,522517.0,522517.0,522517.0,522517.0,522517.0,522517.0,522517.0,522517.0,522517.0,339606.0
mean,271821.43697,45725850.0,4.632014,5.227784,484.43858,24.614922,9.559457,86.487003,767.2639,49.089092,3.843242,21.878254,17.46951,8.606191
std,155495.878422,292971400.0,0.641934,20.381347,1397.116649,111.485798,46.622621,301.987009,4203.621,180.822062,8.603163,142.620191,40.128837,114.319809
min,38.0,27.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
25%,137206.0,69474.0,4.5,1.0,174.2,5.6,1.5,3.8,123.3,12.8,0.8,2.5,3.5,4.0
50%,271758.0,238937.0,5.0,2.0,317.1,13.8,4.7,42.6,353.3,28.2,2.2,6.4,9.1,6.0
75%,406145.0,565828.0,5.0,4.0,529.1,27.4,10.8,107.9,792.2,51.1,4.6,17.9,25.0,8.0
max,541383.0,2002886000.0,5.0,3063.0,612854.6,64368.1,26740.6,130456.4,1246921.0,108294.6,3012.0,90682.3,18396.2,32767.0


In [50]:
recipes.isna().sum()

RecipeId                           0
Name                               0
AuthorId                           0
AuthorName                         0
CookTime                       82545
PrepTime                           0
TotalTime                          0
DatePublished                      0
Description                        5
Images                             1
RecipeCategory                   751
Keywords                           0
RecipeIngredientQuantities         0
RecipeIngredientParts              0
AggregatedRating              253223
ReviewCount                   247489
Calories                           0
FatContent                         0
SaturatedFatContent                0
CholesterolContent                 0
SodiumContent                      0
CarbohydrateContent                0
FiberContent                       0
SugarContent                       0
ProteinContent                     0
RecipeServings                182911
RecipeYield                   348071
R

#### Dropping Reduntant Columns <a class ='author' id='part-0'></a>
`TotalTime` is the sum of `CookTime` and `PrepTime`. Plus, the latter two seem to be missing from the recipes on the webpages. I'll just drop `CookTime` and `PrepTime`.

In [51]:
recipes.drop(['CookTime', 'PrepTime'], axis=1,inplace=True)

In [52]:
recipes.sample(2)

Unnamed: 0,RecipeId,Name,AuthorId,AuthorName,TotalTime,DatePublished,Description,Images,RecipeCategory,Keywords,RecipeIngredientQuantities,RecipeIngredientParts,AggregatedRating,ReviewCount,Calories,FatContent,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions,url
349123,362032.0,Chocolate Chip Cookie and Cream Tart,353579,pattikay in L.A.,PT37M,2009-03-21 02:36:00+00:00,The Essential Chocolate Chip Cookbook is a fab...,[https://img.sndimg.com/food/image/upload/w_55...,Tarts,"[Dessert, Cookie & Brownie, < 60 Mins, Easy]","[1, 2, 1⁄2, 1⁄2, 1⁄2, 1⁄2, 6, 1, 1, 1, 1⁄4, 1,...","[flour, flour, baking soda, salt, butter, brow...",,,463.8,30.4,18.6,89.4,278.8,48.5,2.0,34.4,4.0,,1 tart,[For the Crust: Preheat the oven to 350. But...,https://www.food.com/recipe/Chocolate-Chip-Coo...
409504,424460.0,Alice in Wonderland (Non-Alcoholic),56003,Darkhunter,PT5M,2010-05-10 15:48:00+00:00,Make and share this Alice in Wonderland (Non-A...,[],Beverages,"[Moroccan, African, < 15 Mins, Easy]","[3 1⁄2, 1, 3⁄4, 1⁄2, None, None]","[grapefruit juice, lemon juice, soda water, wh...",,,47.5,0.1,0.0,0.0,1.3,11.8,0.2,10.3,0.6,1.0,1 drink,[Combine all ingredients. Pour into a pretty ...,https://www.food.com/recipe/Alice-in-Wonderlan...


`AuthorName` has the numeric equivalent of `AuthorId`, so we drop it. Similar for `Name`, which has the equivalent of `RecipeId`. We will eventually also drop `url` but for now we keep it as it serves us.

In [53]:
recipes.drop(['Name', 'AuthorName'], axis=1,inplace=True)

In [54]:
recipes.sample(3)

Unnamed: 0,RecipeId,AuthorId,TotalTime,DatePublished,Description,Images,RecipeCategory,Keywords,RecipeIngredientQuantities,RecipeIngredientParts,AggregatedRating,ReviewCount,Calories,FatContent,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions,url
351388,364351.0,885416,PT16M,2009-04-03 17:22:00+00:00,Quick and easy to put together and tastes grea...,[https://img.sndimg.com/food/image/upload/w_55...,< 30 Mins,"[Beginner Cook, Easy]","[6 -8, 7, 7, 3, None]","[mayonnaise, parmesan cheese, seasoning salt]",5.0,6.0,318.0,14.3,3.9,108.1,414.2,6.5,0.0,1.7,41.2,4.0,,[lay all the fillets on a baking sheet sprayed...,https://www.food.com/recipe/Tilapia-With-Mayon...
57564,61865.0,6357,PT2H45M,2003-05-09 20:01:00+00:00,This is from today's Thursday magazine and is ...,[],Asian,"[Indian, Spicy, Weeknight, Stove Top, Small Ap...","[1, 3⁄4, 2, 2, 2, 1⁄2, None, 1, 1, 1, None, 6,...","[onion, garlic, red chili powder, ginger, vine...",,,381.4,3.6,0.8,92.5,504.5,33.9,6.0,14.4,53.2,4.0,,"[Cut the fish into equal sized square pieces.,...",https://www.food.com/recipe/Red-Indian-Fish-Ma...
371311,384765.0,62264,PT15M,2009-08-09 21:38:00+00:00,This sounds so elegant &amp; simple - such a l...,[https://img.sndimg.com/food/image/upload/w_55...,Lunch/Snacks,"[Chard, Greens, Vegetable, European, Low Prote...","[1, 3, 1, 1, 1⁄3, 2, None, None]","[swiss chard, butter, olive oil, fresh rosemar...",5.0,2.0,189.5,15.2,6.2,22.9,267.4,13.8,2.2,8.4,2.8,4.0,,[Remove the chard stems and the thick central ...,https://www.food.com/recipe/French-Swiss-Chard...


`DatePublished` has too much info in it. Instead we turn it into `YearPublished`, `MonthPublished` and `DayPublished`. 

We can later on use these to derive insights on what days, months and years havae the highest rate of published recipes, and so on.

In [55]:
recipes['DatePublished'].apply(lambda x: x.hour)

0         21
1         13
2         19
3         14
4          6
          ..
522512    15
522513    15
522514    15
522515    22
522516    22
Name: DatePublished, Length: 522517, dtype: int64

In [56]:
recipes['YearPublished'] = recipes['DatePublished'].apply(lambda x: x.year)
recipes['MonthPublished'] = recipes['DatePublished'].apply(lambda x: x.month)
recipes['DayPublished'] = recipes['DatePublished'].apply(lambda x: x.day)
recipes['HourPublished'] = recipes['DatePublished'].apply(lambda x: x.hour)

In [57]:
recipes.drop(['DatePublished'],axis=1,inplace=True)

In [58]:
recipes.sample(3)

Unnamed: 0,RecipeId,AuthorId,TotalTime,Description,Images,RecipeCategory,Keywords,RecipeIngredientQuantities,RecipeIngredientParts,AggregatedRating,ReviewCount,Calories,FatContent,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions,url,YearPublished,MonthPublished,DayPublished,HourPublished
101479,107070.0,154044,PT30M,This dessert tastes like chocolate truffle can...,[],< 30 Mins,"[Oven, Refrigerator]","[5, 10, 2, 8, 1⁄4, 2, 3⁄4, 2, None]","[eggs, butter, Grand Marnier, frozen unsweeten...",,,361.9,27.3,16.1,128.8,142.4,30.8,5.9,23.0,5.1,12.0,,"[Preheat oven to 400 degrees., Butter the bott...",https://www.food.com/recipe/Chocolate-Grand-Ma...,2004,12,28,20
479346,497001.0,2727281,PT1H,5-Ingredient Fix Contest Entry.\r\nIt is suita...,[],Potato,"[Vegetable, < 60 Mins, Easy]","[5, 3⁄4, 1 1⁄2, 12, 2]","[butter, half-and-half cream, swiss cheese, cr...",,,298.8,27.5,17.4,81.4,201.9,3.4,0.0,0.5,10.4,,,[Preheat oven to 350. ** After potatoes have ...,https://www.food.com/recipe/Mashed-Potato-Cass...,2013,3,9,11
489278,507319.0,1020526,PT50M,Simple but delicious. A great way to use up t...,[],Vegetable,"[Low Cholesterol, Weeknight, < 60 Mins]","[2, 1⁄4, 1 1⁄2, 8, 1⁄4, 4, 1⁄4, 1⁄4, None]","[coarse salt, coarse salt, tomatoes, Italian p...",5.0,1.0,366.9,14.8,2.1,0.0,3646.4,50.3,4.0,6.0,9.2,4.0,,"[In a tall stockpot, bring 3 quarts of water a...",https://www.food.com/recipe/Spaghetti-With-Tom...,2013,9,28,16


Now let's turn the `TotalTime` to numbers (in minutes). At the moment the values of this column look like one of the following: 'PT3H30M', 'PT3H', 'PT20M'

In [59]:
re.findall('\dH|\d*M','PT3H30M')

['3H', '30M']

In [60]:
[string.replace('H','') for string in re.findall('\dH|\d*M','PT3H30M')]

['3', '30M']

In [61]:
result = [int(x.replace('H', '')) * 60 if 'H' in x else int(x.replace('M', '')) for x in re.findall('\d+H|\d+M', 'PT3H30M')]
result

[180, 30]

In [62]:
recipes['TotalMinutes'] = recipes['TotalTime'].apply(lambda string: re.findall('\dH|\d*M', string))
recipes['TotalMinutes'] = recipes['TotalMinutes'].apply(lambda timelist: [int(x.replace('H', '')) * 60 if 'H' in x else int(x.replace('M', '')) for x in timelist])
recipes['TotalMinutes'] = recipes['TotalMinutes'].apply(lambda timelist: sum(timelist))
recipes['TotalMinutes']

0         285
1         265
2          35
3         260
4          50
         ... 
522512     95
522513    210
522514    240
522515     15
522516     40
Name: TotalMinutes, Length: 522517, dtype: int64

In [63]:
recipes.drop(['TotalTime'],axis=1,inplace=True)

In [64]:
recipes.sample(2)

Unnamed: 0,RecipeId,AuthorId,Description,Images,RecipeCategory,Keywords,RecipeIngredientQuantities,RecipeIngredientParts,AggregatedRating,ReviewCount,Calories,FatContent,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions,url,YearPublished,MonthPublished,DayPublished,HourPublished,TotalMinutes
80038,85070.0,119422,Make and share this My Mashed Potatoes recipe ...,[https://img.sndimg.com/food/image/upload/w_55...,Low Protein,"[Low Cholesterol, Healthy, < 30 Mins, Easy]","[4, 3⁄4, 1, 1⁄4, 3]","[russet potatoes, buttermilk, salt, pepper, ma...",5.0,3.0,411.3,4.5,0.9,4.7,735.3,84.2,10.0,6.5,10.8,,,[Place a few (small handful) of clean potato p...,https://www.food.com/recipe/My-Mashed-Potatoes...,2004,2,26,20,30
443874,460250.0,37779,Make and share this Catfish Ceviche recipe fro...,[],Catfish,[None],"[1, 1, 1⁄2, 1, 1⁄2, 2, 1, 1⁄2, 2, 2, 1, 1, 1, ...","[catfish fillet, lemon zest, fresh lemon juice...",,,207.7,14.1,2.4,41.5,467.1,9.4,3.2,2.9,12.8,6.0,,"[In a large resealable plastic bag, combine th...",https://www.food.com/recipe/Catfish-Ceviche-46...,2011,7,12,20,300


We won't be using `Images` anywhere in our projects, so I'll remove the column. (For now I'll keep `url` because it helps double checking recipe entries using the actual recipe url; I'll later drop that column too when we get to do ML.)

In [65]:
recipes.drop(['Images'],axis=1,inplace=True)

In [66]:
recipes.sample(2)

Unnamed: 0,RecipeId,AuthorId,Description,RecipeCategory,Keywords,RecipeIngredientQuantities,RecipeIngredientParts,AggregatedRating,ReviewCount,Calories,FatContent,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions,url,YearPublished,MonthPublished,DayPublished,HourPublished,TotalMinutes
444180,460562.0,1072593,The original indigenous cheese of New Orleans ...,Dessert,"[Creole, Easy, From Scratch]","[2, 1⁄4, 12]","[skim milk, buttermilk]",,,102.9,0.7,0.4,5.2,151.7,13.9,0.0,0.4,9.9,8.0,,"[Pour milk into a completely sanitized, 3 to 4...",https://www.food.com/recipe/Creole-Cream-Chees...,2011,7,19,11,240
516503,535145.0,2001984495,Garlic Chili Spicy Edamame are an easy to prep...,Asian,"[< 15 Mins, Easy]","[10, 4, 1⁄2, 1, 1, 2, 1]","[edamame, garlic cloves, mayonnaise, soy sauce...",,,230.9,10.8,1.3,0.0,357.4,18.0,6.1,0.2,19.4,2.0,,[1. In a small fry pan on medium low heat add ...,https://www.food.com/recipe/Garlic-Chili-Edama...,2018,2,12,22,10


In [67]:
recipes.isna().sum()

RecipeId                           0
AuthorId                           0
Description                        5
RecipeCategory                   751
Keywords                           0
RecipeIngredientQuantities         0
RecipeIngredientParts              0
AggregatedRating              253223
ReviewCount                   247489
Calories                           0
FatContent                         0
SaturatedFatContent                0
CholesterolContent                 0
SodiumContent                      0
CarbohydrateContent                0
FiberContent                       0
SugarContent                       0
ProteinContent                     0
RecipeServings                182911
RecipeYield                   348071
RecipeInstructions                 0
url                                0
YearPublished                      0
MonthPublished                     0
DayPublished                       0
HourPublished                      0
TotalMinutes                       0
d

### Dealing with categories

In [68]:
recipes['RecipeCategory'].unique(), recipes['RecipeCategory'].nunique()

(array(['Frozen Desserts', 'Chicken Breast', 'Beverages', 'Soy/Tofu',
        'Vegetable', 'Pie', 'Chicken', 'Dessert', 'Southwestern U.S.',
        'Sauces', 'Stew', 'Black Beans', '< 60 Mins', 'Lactose Free',
        'Weeknight', 'Yeast Breads', 'Whole Chicken', 'High Protein',
        'Cheesecake', 'Free Of...', 'High In...', 'Brazilian', 'Breakfast',
        'Breads', 'Bar Cookie', 'Brown Rice', 'Oranges', 'Pork',
        'Low Protein', 'Asian', 'Potato', 'Cheese', 'Halibut', 'Meat',
        'Lamb/Sheep', 'Very Low Carbs', 'Spaghetti', 'Scones',
        'Drop Cookies', 'Lunch/Snacks', 'Beans', 'Punch Beverage',
        'Pineapple', 'Low Cholesterol', '< 30 Mins', 'Quick Breads',
        'Sourdough Breads', 'Curries', 'Chicken Livers', 'Coconut',
        'Savory Pies', 'Poultry', 'Steak', 'Healthy', 'Lobster', 'Rice',
        'Apple', 'Broil/Grill', 'Spreads', 'Crab', 'Jellies', 'Pears',
        'Chowders', 'Cauliflower', 'Candy', 'Chutneys', 'White Rice',
        'Tex Mex', 'Bass',

We have 311 categories. Turning these into numerical values will add many dimensions to our dataframe. We can reduce these catgeories into some more major categories. Here's a suggestion:


**Desserts**: Frozen Desserts, Cheesecake, Pie, Dessert, Cheesecake, Gelatin, Candy, Jellies, Tarts, Sweet, Chocolate Chip Cookies, Bread Pudding, Lemon Cake, Key Lime Pie, Coconut Cream Pie, Ice Cream, Fruit Desserts, Apple Pie, Pumpkin, Coconut Cream Pie.

**Chicken**: Chicken Breast, Chicken, Chicken Thigh & Leg, Chicken Livers, Whole Chicken, Roast Chicken, Chicken Crock Pot.

**Beverages**: Beverages, Punch Beverage, Smoothies, Shakes.

**Vegetarian/Vegan**: Soy/Tofu, Vegetable, Vegan.

**Sauces/Condiments**: Sauces, Salad Dressings, Spreads, Chutneys.

**Meat**: Pork, Lamb/Sheep, Meat, Meatballs, Beef Organ Meats, Steak, Ground Meat, Roast Beef, Ham, Ground Beef, Ground Turkey.

**Seafood**: Halibut, Lobster, Crab, Crawfish, Bass, Tuna, Trout, Catfish, Squid, Mahi Mahi, Oysters, Salmon.

**International Cuisines**: Asian, Brazilian, Greek, German, Hungarian, Indonesian, Mexican, Dutch, Spanish, Russian, Thai, Cajun, Chinese, Turkish, Vietnamese, Lebanese, Moroccan, Korean, Polish, Scandinavian, African, Norwegian, Belgian, Australian, Scottish, Cuban, Portuguese, Hawaiian, Austrian, Egyptian, Filipino, Welsh, Czech, Iraqi, Pakistani, Chilean, Puerto Rican, Ecuadorean, Sudanese, Mongolian, Peruvian, Cambodian, Honduran, Sudanese, Mongolian, Peruvian.

**Side Dishes**: Potatoes, Rice, Grains, Pasta, Breads, Corn, Lentil, Yam/Sweet Potato, Greens, Collard Greens, Spinach, Chard, Artichoke, Mashed Potatoes.

**Breakfast/Brunch**: Breakfast, Breakfast Eggs, Brunch.

In [69]:
category_mapping = {
    'Frozen Desserts': 'Desserts',
    'Chicken Breast': 'Chicken',
    'Beverages': 'Beverages',
    'Soy/Tofu': 'Vegetarian/Vegan',
    'Vegetable': 'Vegetables',
    'Pie': 'Desserts',
    'Chicken': 'Chicken',
    'Dessert': 'Desserts',
    'Southwestern U.S.': 'Regional',
    'Sauces': 'Sauces/Condiments',
    'Stew': 'Main Dish',
    'Black Beans': 'Beans/Legumes',
    '< 60 Mins': 'Quick and Easy',
    'Lactose Free': 'Special Dietary Needs',
    'Weeknight': 'Quick and Easy',
    'Yeast Breads': 'Baked Goods',
    'Whole Chicken': 'Chicken',
    'High Protein': 'Healthy',
    'Cheesecake': 'Desserts',
    'Free Of...': 'Special Dietary Needs',
    'High In...': 'Healthy',
    'Brazilian': 'International',
    'Breakfast': 'Breakfast/Brunch',
    'Breads': 'Baked Goods',
    'Bar Cookie': 'Desserts',
    'Brown Rice': 'Nuts/Seeds/Grains',
    'Oranges': 'Fruit',
    'Pork': 'Meat',
    'Low Protein': 'Special Dietary Needs',
    'Asian': 'International',
    'Potato': 'Side Dishes',
    'Cheese': 'Dairy',
    'Halibut': 'Seafood',
    'Meat': 'Meat',
    'Lamb/Sheep': 'Meat',
    'Very Low Carbs': 'Healthy',
    'Spaghetti': 'Pasta',
    'Scones': 'Breads',
    'Drop Cookies': 'Desserts',
    'Lunch/Snacks': 'Lunch',
    'Beans': 'Beans/Legumes',
    'Punch Beverage': 'Beverages',
    'Pineapple': 'Fruit',
    'Low Cholesterol': 'Healthy',
    '< 30 Mins': 'Quick and Easy',
    'Quick Breads': 'Baked Goods',
    'Sourdough Breads': 'Baked Goods',
    'Curries': 'International',
    'Chicken Livers': 'Chicken',
    'Coconut': 'Fruit',
    'Savory Pies': 'Main Dish',
    'Poultry': 'Chicken',
    'Steak': 'Meat',
    'Healthy': 'Healthy',
    'Lobster': 'Seafood',
    'Rice': 'Nuts/Seeds/Grains',
    'Apple': 'Fruit',
    'Broil/Grill': 'Cooking Methods',
    'Spreads': 'Sauces/Condiments',
    'Crab': 'Seafood',
    'Jellies': 'Sauces/Condiments',
    'Pears': 'Fruit',
    'Chowders': 'Soups',
    'Cauliflower': 'Vegetables',
    'Candy': 'Desserts',
    'Chutneys': 'Sauces/Condiments',
    'White Rice': 'Nuts/Seeds/Grains',
    'Tex Mex': 'Regional',
    'Bass': 'Seafood',
    'German': 'International',
    'Fruit': 'Fruit',
    'European': 'International',
    'Smoothies': 'Beverages',
    'Hungarian': 'International',
    'Manicotti': 'Pasta',
    'Onions': 'Vegetables',
    'New Zealand': 'International',
    'Chicken Thigh & Leg': 'Chicken',
    'Indonesian': 'International',
    'Greek': 'International',
    'Corn': 'Vegetables',
    'Lentil': 'Beans/Legumes',
    'Summer': 'Seasonal',
    'Long Grain Rice': 'Nuts/Seeds/Grains',
    'Southwest Asia (middle East)': 'International',
    'Spanish': 'International',
    'Dutch': 'International',
    'Gelatin': 'Desserts',
    'Tuna': 'Seafood',
    'Citrus': 'Fruit',
    'Berries': 'Fruit',
    'Peppers': 'Vegetables',
    'Salad Dressings': 'Sauces/Condiments',
    'Clear Soup': 'Soups',
    'Mexican': 'International',
    'Raspberries': 'Fruit',
    'Crawfish': 'Seafood',
    'Beef Organ Meats': 'Meat',
    'Strawberry': 'Fruit',
    'Shakes': 'Beverages',
    'Short Grain Rice': 'Nuts/Seeds/Grains',
    '< 15 Mins': 'Quick and Easy',
    'One Dish Meal': 'Main Dish',
    'Spicy': 'Flavor Profiles',
    'Thai': 'International',
    'Cajun': 'Regional',
    'Oven': 'Cooking Methods',
    'Microwave': 'Cooking Methods',
    'Russian': 'International',
    'Melons': 'Fruit',
    'Papaya': 'Fruit',
    'Veal': 'Meat',
    'No Cook': 'Quick and Easy',
    '< 4 Hours': 'Quick and Easy',
    None: 'Uncategorized',
    'Roast': 'Cooking Methods',
    'Potluck': 'Occasions',
    'Orange Roughy': 'Seafood',
    'Canadian': 'International',
    'Caribbean': 'International',
    'Mussels': 'Seafood',
    'Medium Grain Rice': 'Nuts/Seeds/Grains',
    'Japanese': 'International',
    'Penne': 'Pasta',
    'Easy': 'Quick and Easy',
    'Elk': 'Meat',
    'Colombian': 'International',
    'Gumbo': 'Soups',
    'Roast Beef': 'Meat',
    'Perch': 'Seafood',
    'Vietnamese': 'International',
    'Rabbit': 'Meat',
    'Christmas': 'Occasions',
    'Lebanese': 'International',
    'Turkish': 'International',
    'Kid Friendly': 'Family-Friendly',
    'Vegan': 'Vegetarian/Vegan',
    'For Large Groups': 'Occasions',
    'Whole Turkey': 'Poultry',
    'Chinese': 'International',
    'Grains': 'Nuts/Seeds/Grains',
    'Yam/Sweet Potato': 'Side Dishes',
    'Native American': 'Regional',
    'Meatloaf': 'Meat',
    'Winter': 'Seasonal',
    'Trout': 'Seafood',
    'African': 'International',
    'Ham': 'Meat',
    'Goose': 'Poultry',
    'Pasta Shells': 'Pasta',
    'Stocks': 'Soups',
    "St. Patrick's Day": 'Occasions',
    'Meatballs': 'Meat',
    'Whole Duck': 'Poultry',
    'Scandinavian': 'International',
    'Greens': 'Vegetables',
    'Catfish': 'Seafood',
    'Dehydrator': 'Cooking Methods',
    'Duck Breasts': 'Poultry',
    'Savory': 'Flavor Profiles',
    'Stir Fry': 'Main Dish',
    'Polish': 'International',
    'Spring': 'Seasonal',
    'Deer': 'Meat',
    'Wild Game': 'Meat',
    'Pheasant': 'Meat',
    'No Shell Fish': 'Seafood',
    'Collard Greens': 'Vegetables',
    'Tilapia': 'Seafood',
    'Quail': 'Poultry',
    'Refrigerator': 'Preservation',
    'Canning': 'Preservation',
    'Moroccan': 'International',
    'Pressure Cooker': 'Cooking Methods',
    'Squid': 'Seafood',
    'Korean': 'International',
    'Plums': 'Fruit',
    'Danish': 'International',
    'Creole': 'Regional',
    'Mahi Mahi': 'Seafood',
    'Tarts': 'Desserts',
    'Spinach': 'Vegetables',
    'Hawaiian': 'Regional',
    'Homeopathy/Remedies': 'Healthy',
    'Austrian': 'International',
    'Thanksgiving': 'Occasions',
    'Moose': 'Meat',
    'Bath/Beauty': 'Healthy',
    'Swedish': 'International',
    'High Fiber': 'Healthy',
    'Kosher': 'Special Dietary Needs',
    'Norwegian': 'International',
    'Household Cleaner': 'Household',
    'Ethiopian': 'International',
    'Belgian': 'International',
    'Australian': 'International',
    'Pennsylvania Dutch': 'Regional',
    'Bear': 'Meat',
    'Scottish': 'International',
    'Tempeh': 'Vegetarian/Vegan',
    'Cuban': 'International',
    'Turkey Breasts': 'Poultry',
    'Cantonese': 'International',
    'Tropical Fruits': 'Fruit',
    'Peanut Butter': 'Sauces/Condiments',
    'Szechuan': 'International',
    'Portuguese': 'International',
    'Summer Dip': 'Appetizers',
    'Costa Rican': 'International',
    'Duck': 'Poultry',
    'Sweet': 'Flavor Profiles',
    'Nuts': 'Nuts/Seeds/Grains',
    'Filipino': 'International',
    'Welsh': 'International',
    'Camping': 'Outdoor Cooking',
    'Pot Pie': 'Main Dish',
    'Polynesian': 'International',
    'Mango': 'Fruit',
    'Cherries': 'Fruit',
    'Egyptian': 'International',
    'Chard': 'Vegetables',
    'Lime': 'Flavor Profiles',
    'Lemon': 'Flavor Profiles',
    'Brunch': 'Breakfast/Brunch',
    'Toddler Friendly': 'Family-Friendly',
    'Kiwifruit': 'Fruit',
    'Whitefish': 'Seafood',
    'South American': 'International',
    'Malaysian': 'International',
    'Octopus': 'Seafood',
    'Nigerian': 'International',
    'Mixer': 'Cooking Methods',
    'Venezuelan': 'International',
    'Halloween': 'Occasions',
    'Stove Top': 'Cooking Methods',
    'Bread Machine': 'Baked Goods',
    'French Toast': 'Breakfast/Brunch',
    'French Canadian': 'Regional',
    'Sauerkraut': 'Vegetables',
    'West Virginia': 'Regional',
    'Cooker': 'Cooking Methods',
    'Jewish': 'International',
    'Leek': 'Vegetables',
    'Asian Greens': 'Vegetables',
    'Buffalo': 'Meat',
    'Smoothie': 'Beverages',
    'Indian': 'International',
    'Cooking For One': 'Quick and Easy',
    'Kansas': 'Regional',
    'Carrot': 'Vegetables',
    'Australian And New Zealand': 'International',
    'Canadian Bacon': 'Meat',
    'Zucchini': 'Vegetables',
    'Flounder': 'Seafood',
    'Fijian': 'International',
    'Winter Squash': 'Vegetables',
    'Israeli': 'International',
    'Ethnic': 'International',
    'Eggplant': 'Vegetables',
    'Afghan': 'International',
    'Barbecue': 'Cooking Methods',
    'Vegetarian': 'Vegetarian/Vegan',
    'Main Dish': 'Main Dish',
    'Missouri': 'Regional',
    'Salmon': 'Seafood',
    'Pesto': 'Sauces/Condiments',
    'Braised': 'Cooking Methods',
    'Czech': 'International',
    'Salads': 'Salads',
    'Soul Food': 'Regional',
    'Swiss': 'International',
    'Jamaican': 'International',
    'Easter': 'Occasions',
    'Tex-Mex': 'Regional',
    'Northeastern United States': 'Regional',
    'Swiss Cheese': 'Dairy',
    'Pacific Northwestern': 'Regional',
    'Czechoslovakian': 'International',
    'Meals': 'Main Dish',
    'Microwave Appetizers': 'Appetizers',
    'Northwestern United States': 'Regional',
    'Moravian': 'International',
    'Special Occasion': 'Occasions',
    'California': 'Regional',
    'Mandarin Oranges': 'Fruit',
    'Pennsylvania': 'Regional',
    'Brazil': 'International',
    'Thai Sweet Rice': 'Nuts/Seeds/Grains',
    'Freezer': 'Preservation',
    'Cornish Hens': 'Poultry',
    'Arizona': 'Regional',
    'Pacific Islands': 'International',
    'Rhode Island': 'Regional',
    'Georgian': 'International',
    'Pork Tenderloin': 'Meat',
    'No-Cook': 'Quick and Easy',
    'Basque': 'International',
    'Thanksgiving Leftovers': 'Occasions',
    'Avocado': 'Fruit',
    'Alcoholic': 'Beverages',
    'Hamburger': 'Meat',
    'Michigan': 'Regional',
    'Red Beans And Rice': 'Beans/Legumes',
    'Pan Grilling': 'Cooking Methods',
    'Deep Fryer': 'Cooking Methods',
    'Muffins': 'Baked Goods',
    'Pan Frying': 'Cooking Methods',
    'English': 'International',
    'Pressure Cookers': 'Cooking Methods',
    'High Calcium': 'Healthy',
    'Low Saturated Fat': 'Healthy',
    'Game': 'Meat',
    'Gluten-Free': 'Special Dietary Needs',
    'Wheat': 'Nuts/Seeds/Grains',
    'Finnish': 'International',
    'New England': 'Regional',
    'Swedish Meatballs': 'Meat',
    'Algerian': 'International',
    'Pacific Rim': 'International',
    'Thermomix': 'Cooking Methods',
    'Nuts/Seeds': 'Nuts/Seeds/Grains',
    'Vegetables': 'Vegetables',
    'Apple Pie': 'Desserts',
    'Jerky': 'Meat',
    'Condiments, Etc.': 'Sauces/Condiments',
    'New York': 'Regional',
    'Colombia': 'International',
    'Chicago Style': 'Regional',
    'Mediterranean': 'International',
    'Irish': 'International',
    'Pressure Canning': 'Preservation',
    'Middle Eastern': 'International',
    'Plants': 'Vegetarian/Vegan',
    'Southwestern': 'Regional',
    'Jam': 'Sauces/Condiments',
    'Peaches': 'Fruit',
    'Egg-Free': 'Special Dietary Needs',
    'Eastern European': 'International',
    'Soft Drinks': 'Beverages',
    'Picnics': 'Outdoor Cooking',
    'Kiwi': 'Fruit',
    'Ice Cream': 'Desserts',
    'Turkey': 'Poultry',
    'Cherry': 'Fruit',
    'Vegetable Casserole': 'Vegetables',
    'Goat': 'Meat',
    'Dressings': 'Sauces/Condiments',
    'Cabbage': 'Vegetables',
    'Romaine': 'Vegetables',
    'Low Fat': 'Healthy',
    'Sausage': 'Meat',
    'Roasts': 'Meat',
    'Casseroles': 'Main Dish',
    'North American': 'International',
    'High Potassium': 'Healthy',
    'Soups': 'Soups',
    'Main Dishes': 'Main Dish',
    'Crisps': 'Desserts',
    'French Canadian Tourtiere': 'Regional',
    'Irish Soda Bread': 'Baked Goods',
    'Loaves': 'Baked Goods',
    'Crepes': 'Breakfast/Brunch',
    'Potatoes': 'Vegetables',
    'Rhubarb': 'Vegetables',
    'Salmon Lox': 'Seafood',
    'Apricot': 'Fruit',
    'Bbq': 'Cooking Methods',
    'Herb And Spice Mixes': 'Sauces/Condiments',
    'Low Calorie': 'Healthy',
    'Salmon Fillets': 'Seafood',
    'Apricots': 'Fruit',
    'South Carolina': 'Regional',
    'Shrimp': 'Seafood',
    'Chinese Five-Spice': 'Spices/Seasonings',
    'Grains/Cereals': 'Nuts/Seeds/Grains',
    'Honduran': 'International',
    'Chilean': 'International',
    'Flat Shell Fish': 'Seafood',
    'Portuguese Sausage': 'Meat',
    'Cinnamon': 'Spices/Seasonings',
    'Swiss Chard': 'Vegetables',
    'Bulgarian': 'International',
    'Champagne': 'Beverages',
    'Mashed Potatoes': 'Side Dishes',
    'Vermont': 'Regional',
    'Finger Food': 'Appetizers',
    'Side Dish': 'Side Dishes',
    'Steamed': 'Cooking Methods',
    'Raspberry': 'Fruit',
    'Berries And Currants': 'Fruit',
    'Kentucky': 'Regional',
    'Ethnic Foods': 'International',
    'New Hampshire': 'Regional',
    'Alfredo': 'Pasta',
    'Whole Chicken': 'Poultry',
    'North Dakota': 'Regional',
    'Gelatin Desserts': 'Desserts',
    'Iowa': 'Regional',
    'Spreads': 'Sauces/Condiments',
    'Dried Beans': 'Beans/Legumes',
    'Fruit': 'Fruit',
    'Oklahoma': 'Regional',
    'Pennsylvania Dutch Cooking': 'Regional',
    'Broccoli': 'Vegetables',
    'California Style': 'Regional',
    'Fish': 'Seafood',
    'Crab': 'Seafood',
    'Vegetarian/Vegan': 'Vegetarian/Vegan',
    'Brisket': 'Meat',
    'Jewish Holidays': 'Occasions',
    'Mussels/Squid': 'Seafood',
    'Wok': 'Cooking Methods',
    'St. Louis': 'Regional',
    'Breads': 'Baked Goods',
    'Polenta': 'Nuts/Seeds/Grains',
    'Rice Cooker': 'Cooking Methods',
    'Arizona Style': 'Regional',
    'Cucumber': 'Vegetables',
    'Pineapple': 'Fruit',
    'Cheese': 'Dairy',
    'Omelets': 'Breakfast/Brunch',
    'Cantaloupe': 'Fruit',
    'Pancakes And Waffles': 'Breakfast/Brunch',
    'Danish Pastry': 'Baked Goods',
    'Cherry Tomatoes': 'Vegetables',
    'Freshwater Fish': 'Seafood',
    'Lunch/Snacks': 'Lunch/Snacks',
    'Cornmeal': 'Nuts/Seeds/Grains',
    'Squash': 'Vegetables',
    'Meat': 'Meat',
    'Polynesian/Hawaiian': 'Regional',
    'High Protein': 'Healthy',
    'Chutneys': 'Sauces/Condiments',
    'Southwestern United States': 'Regional',
    'Wine': 'Beverages',
    'Smoothies': 'Beverages',
    'South Dakota': 'Regional',
    'High Fiber Cereals': 'Nuts/Seeds/Grains',
    'Chowders': 'Soups',
    'Chiles': 'Spices/Seasonings',
    'Lamb': 'Meat',
    'Mangoes': 'Fruit',
    'Belgian Waffle': 'Breakfast/Brunch',
    'Jamaican Patties': 'International',
    'Mozzarella': 'Dairy',
    'Fish Fry': 'Main Dish',
    'Swiss Fondue': 'International',
    'Jellies': 'Sauces/Condiments',
    'Southwest': 'Regional',
    'Lettuce': 'Vegetables',
    'Poppy Seeds': 'Nuts/Seeds/Grains',
    'Hummus': 'Sauces/Condiments',
    'Icing/Frosting': 'Desserts',
    'Lobster': 'Seafood',
    'St. Patrick\'s Day': 'Occasions',
    'Food Processor/Blender': 'Cooking Methods',
    'Hamburgers': 'Meat',
    'Lemon Juice': 'Flavor Profiles',
    'Valentine\'s Day': 'Occasions',
    'Cranberries': 'Fruit',
    'North Carolina': 'Regional',
    'Baked Goods': 'Baked Goods',
    'Poultry': 'Poultry',
    'Root Vegetables': 'Vegetables',
    'Tamales': 'International',
    'Vegetarian And Vegan': 'Vegetarian/Vegan',
    'Oats': 'Nuts/Seeds/Grains',
    'Brazilian': 'International',
    'High Vitamin C': 'Healthy',
    'Southern': 'Regional',
    'Hawaiian': 'International',
    'Kiwi Fruit': 'Fruit',
    'Ice Cream Maker': 'Cooking Methods',
    'South': 'Regional',
    'Creole/Cajun': 'Regional',
    'Pork': 'Meat',
    'American': 'International',
    'Moroccan Chicken': 'International',
    'Chicken Breasts': 'Poultry',
    'Austrian/German/Swiss': 'International',
    'Baked Potato': 'Side Dishes',
    'Pineapple Juice': 'Flavor Profiles',
    'Lunch': 'Lunch/Snacks',
    'Peanuts': 'Nuts/Seeds/Grains',
    'Mushrooms': 'Vegetables',
    'Smoker': 'Cooking Methods',
    'Stir-Fry': 'Main Dish',
    'Northwest': 'Regional',
    'Breakfast/Brunch': 'Breakfast/Brunch',
    'Chinese': 'International',
    'Hot Dogs/Poultry': 'Poultry',
    'Mixed Drinks': 'Beverages',
    'Grilled Cheese': 'Sandwiches',
    'South African': 'International',
    'Pakistani': 'International',
    'Pakistani And Indian': 'International',
    'Oranges': 'Fruit',
    'Jewish Cuisine': 'International',
    'Peppers': 'Vegetables',
    'Alaska': 'Regional',
    'Jewish Holidays And Events': 'Occasions',
    'Baked Beans': 'Beans/Legumes',
    'Low Sodium': 'Healthy',
    'Smoothie Bowl': 'Beverages',
    'Southern United States': 'Regional',
    'Alaskan King Crab': 'Seafood',
    'Diabetic': 'Special Dietary Needs',
    'Mideast': 'International',
    'Crock Pot': 'Cooking Methods',
    'Sourdough': 'Baked Goods',
    'German': 'International',
    'West Virginia Style': 'Regional',
    'Fish And Seafood': 'Seafood',
    'Puerto Rican': 'International',
    'Minnesota': 'Regional',
    'Okra': 'Vegetables',
    'Bass': 'Seafood',
    'Panfish': 'Seafood',
    'West': 'Regional',
    'Pumpkin': 'Vegetables',
    'Cajun/Creole': 'Regional',
    'Bundt Cake': 'Desserts',
    'Mexican': 'International',
    'Northwest Usa': 'Regional',
    'Congo': 'International',
    'Alcohol': 'Beverages',
    'Christmas': 'Occasions',
    'Czech Republic': 'International',
    'Vinegar': 'Sauces/Condiments',
    'Soy': 'Vegetarian/Vegan',
    'Sushi': 'International',
    'Crockpot': 'Cooking Methods',
    'California/Mexican': 'Regional',
    'Coffee': 'Beverages',
    'Jerk': 'International',
    'Cheddar Cheese': 'Dairy',
    'Minnesota Style': 'Regional',
    'Ranch Dressing': 'Sauces/Condiments',
    'West Coast': 'Regional',
    'Bavarian': 'International',
    'Spanish': 'International',
    'Middle East': 'International',
    'Southeast Asian': 'International',
    'Cheese Balls': 'Appetizers',
    'Bar Cookies': 'Desserts',
    'Zucchini And Yellow Squash': 'Vegetables',
    'Thai': 'International',
    'Latin American': 'International',
    'Peruvian': 'International',
    'Chocolate': 'Desserts',
    'Corn': 'Vegetables',
    'Seafood': 'Seafood',
    'Cucumber Salad': 'Salads',
    'Greek': 'International',
    'Veal': 'Meat',
    'Beef': 'Meat',
    'Southern Us': 'Regional',
    'Central American': 'International',
    'Scones': 'Baked Goods',
    'Beverages': 'Beverages',
    'Pumpkin Seeds': 'Nuts/Seeds/Grains',
    'Indian Subcontinent': 'International',
    'Italian': 'International',
    'Pork Chops': 'Meat',
    'Curry': 'International',
    'Caribbean': 'International',
    'Caribbean And West Indian': 'International',
    'Chinese Regional': 'International',
    'Hawaiian And Pacific Islands': 'International',
    'Canning/Preserving': 'Preservation',
    'Cookies': 'Desserts',
    'Cookies And Brownies': 'Desserts',
    'Hamburger Patties': 'Meat',
    'Sugar-Free': 'Special Dietary Needs',
    'Grapes': 'Fruit',
    'Meatloaf': 'Meat',
    'Greek Style': 'International',
    'Duck': 'Poultry',
    'Egg Nog': 'Beverages',
    'Bhutan': 'International',
    'Spice Blends': 'Spices/Seasonings',
    'Raisins': 'Fruit',
    'Rye': 'Nuts/Seeds/Grains',
    'Omelet/Frittatas': 'Breakfast/Brunch',
    'Canadian': 'International',
    'Ground Beef': 'Meat',
    'Turkey Leftovers': 'Meat',
    'Hummus And Pita': 'Sauces/Condiments',
    'Broccoli Rabe': 'Vegetables',
    'Polish': 'International',
    'Beans And Peas': 'Beans/Legumes',
    'Butternut Squash': 'Vegetables',
    'Cheddar': 'Dairy',
    'Butter': 'Dairy',
    'Sweet Potatoes/Yams': 'Vegetables',
    'Sesame': 'Nuts/Seeds/Grains',
    'Fish Fillets': 'Seafood',
    'New Mexico': 'Regional',
    'Broth': 'Soups',
    'Crock Pot/Slow Cooker': 'Cooking Methods',
    'Russian': 'International',
    'Tuna': 'Seafood',
    'Artichokes': 'Vegetables',
    'Finnish/Nordic': 'International',
    'Low Cholesterol': 'Healthy',
    'Irish Soda Bread Ii': 'Baked Goods',
    'Salsa': 'Sauces/Condiments',
    'North Carolina Style': 'Regional',
    'Nebraska': 'Regional',
    'Creole': 'Regional',
    'Iced/Cold Beverages': 'Beverages',
    'Southern Style': 'Regional',
    'Iowa Style': 'Regional',
    'Low Carbohydrate': 'Healthy',
    'Creole/Creole And Cajun': 'Regional',
    'Brazilian Favourites': 'International',
    'Asian': 'International',
    'Yogurt': 'Dairy',
    'Oregon': 'Regional',
    'Hamburgers/Hot Dogs': 'Meat',
    'Dairy': 'Dairy',
    'Low Protein': 'Healthy',
    'Freezer': 'Preservation',
    'Buttermilk': 'Dairy',
    'Jam/Jelly': 'Sauces/Condiments',
    'Candy': 'Desserts',
    'Main Dish': 'Main Dish',
    'Easy': 'Easy',
    'Korean': 'International',
    'Oktoberfest': 'Occasions',
    'Lobster/Crab/Shrimp': 'Seafood',
    'English': 'International',
    'Belizean': 'International',
    'Californian': 'Regional',
    'Lebanese': 'International',
    'South American': 'International',
    'Thanksgiving': 'Occasions',
    'Indian': 'International',
    'Fish And Chips': 'Seafood',
    'Vegetables/Fruits': 'Vegetables',
    'Vegetarian': 'Vegetarian/Vegan',
    'Kansas': 'Regional',
    'Salsa/Hot Sauces': 'Sauces/Condiments',
    'Salads': 'Salads',
    'Poultry And Game Birds': 'Poultry',
    'Sauces/Condiments': 'Sauces/Condiments',
    'Thanksgiving Leftovers': 'Meat',
    'Juices': 'Beverages',
    'Chile': 'International',
    'Garlic': 'Flavor Profiles',
    'Candy/Candy Making': 'Desserts',
    'Dutch Oven': 'Cooking Methods',
    'Condiments': 'Sauces/Condiments',
    'Main Course': 'Main Dish',
    'South American And Central American': 'International',
    'English/Irish/Scottish': 'International',
    'No-Cook': 'No-Cook',
    'Maryland': 'Regional',
    'Preservation': 'Preservation',
    'Greece': 'International',
    'Nut-Free': 'Special Dietary Needs',
    'Asian/Asian And Indian': 'International',
    'Jamaican': 'International',
    'German Regional': 'International',
    'French': 'International',
    'Yeast Breads': 'Baked Goods',
    'Scandinavian': 'International',
    'Minnesota Recipes': 'Regional',
    'Cake Mixes': 'Baked Goods',
    'Pacific Northwest': 'Regional',
    'Sweet Corn': 'Vegetables',
    'Cake Decorating': 'Desserts',
    'Moroccan': 'International',
    'Dairy-Free': 'Special Dietary Needs',
    'Icelandic': 'International',
    'European': 'International',
    'Meringue': 'Desserts',
    'Low Carb': 'Healthy',
    'Chickpeas': 'Beans/Legumes',
    'Low Sodium Main Dishes': 'Healthy',
    'Potato Salad': 'Salads',
    'Tarts': 'Desserts',
    'Low Sodium Desserts': 'Healthy',
    'New York Style': 'Regional',
    'Cheesecake': 'Desserts',
    'Candy Bars': 'Desserts',
    'North Carolina Style Bbq Sauce': 'Sauces/Condiments',
    'Condiment': 'Sauces/Condiments',
    'Creole And Cajun': 'Regional',
    'Illinois': 'Regional',
    'South African Cuisine': 'International',
    'Mexican/Southwestern': 'Regional',
    'Pacific Rim/Asian': 'International',
    'African': 'International',
    'Shellfish': 'Seafood',
    'English And Irish': 'International',
    'Lentils': 'Beans/Legumes',
    'Ethiopian': 'International',
    'East Indian': 'International',
    'African American': 'International',
    'German And Austrian': 'International',
    'Microwave': 'Cooking Methods',
    'Hawaiian Regional': 'Regional',
    'Mediterranean': 'International',
    'Quick Breads': 'Baked Goods',
    'Honduran': 'International',
    'Snacks': 'Lunch/Snacks',
    'Swiss': 'International',
    'Caribbean And Jamaican': 'International',
    'East Coast': 'Regional',
    'Chinese Regional And Chinese': 'International',
    'Bakery': 'Baked Goods',
    'Kansas City': 'Regional',
    'Party': 'Occasions',
    'Asian/Asian And Pacific Rim': 'International',
    'Southern/Cajun And Creole': 'Regional',
    'Greek Regional': 'International',
    'Valentine\'s Day And Romantic': 'Occasions',
    'Indian And South Asian': 'International',
    'Seafood/Fish': 'Seafood',
    'Caribbean And Latin American': 'International',
    'Beef Roast': 'Meat',
    'German And Austrian And Swiss': 'International',
    'Pasta': 'Pasta',
    'Baking': 'Baked Goods',
    'Potato': 'Vegetables',
    'Pork Loin': 'Meat',
    'Cajun': 'Regional',
    'Peruvian And Bolivian': 'International',
    'Turkey': 'Meat',
    'Ireland': 'International',
    'High Protein Low Carb': 'Healthy',
    'Indian And South African': 'International',
    'Asian/Indian': 'International',
    'Indian Subcontinent And Pakistan': 'International',
    'Potatoes': 'Vegetables',
    'Special Diets': 'Special Dietary Needs',
    'International': 'International',
    'Cabbage': 'Vegetables',
    'Stir-Fries': 'Main Dish',
    'Czechoslovakian': 'International',
    'New England': 'Regional',
    'Asian/Chinese': 'International',
    'Szechuan/Sichuan': 'International',
    'Czech': 'International',
    'Chile Pepper': 'Spices/Seasonings',
    'Microwave Cooking': 'Cooking Methods',
    'Mid-Atlantic': 'Regional',
    'Pizza': 'Main Dish',
    'Caribbean And Puerto Rican': 'International',
    'Pennsylvania': 'Regional',
    'Soups': 'Soups',
    'Iceland': 'International',
    'Low Cholesterol Desserts': 'Healthy',
    'Cocktails': 'Beverages',
    'Easy Main Dish': 'Easy',
    'Sauce': 'Sauces/Condiments',
    'German And Austrian': 'International',
    'Peruvian And Ecuadorian': 'International',
    'Nuts/Seeds': 'Nuts/Seeds/Grains',
    'Kentucky': 'Regional',
    'Colorado': 'Regional',
    'Asian/Japanese': 'International',
    'Japanese': 'International',
    'Jewish': 'International',
    'Middle Eastern': 'International',
    'Baking Mixes': 'Baked Goods',
    'Low Fat': 'Healthy',
    'Alabama': 'Regional',
    'Cheese Appetizers': 'Appetizers',
    'Jewish And Kosher': 'International',
    'Cakes': 'Desserts',
    'Southwestern': 'Regional',
    'Appetizers': 'Appetizers',
    'Alcoholic': 'Beverages',
    'Czechoslovakian And German': 'International',
    'Desserts': 'Desserts',
    'Maryland Regional': 'Regional',
    'Deli': 'Sandwiches',
    'Chile Pepper And Chile Pepper Sauce': 'Spices/Seasonings',
    'Seafood/Fish And Seafood': 'Seafood',
    'Oklahoma': 'Regional',
    'Salads/Salads And Dressings': 'Salads',
    'New England And Mid-Atlantic': 'Regional',
    'Dairy And Eggs': 'Dairy',
    'Soul Food': 'Regional',
    'Swedish': 'International',
    'Alcoholic Beverages': 'Beverages',
    'Eggs': 'Dairy',
    'Iowa': 'Regional',
    'Arizona': 'Regional',
    'Brazilian And South American': 'International',
    'Lunch/Snacks': 'Lunch/Snacks',
    'Noodles': 'Pasta',
    'Hot Drinks': 'Beverages',
    'Texas': 'Regional',
    'Maryland And Virginia': 'Regional',
    'Pacific Northwest And Western': 'Regional',
    'Poultry': 'Poultry',
    'British Isles': 'International',
    'Polish And Eastern European': 'International',
    'Apples': 'Fruit',
    'Italian Regional': 'International',
    'Gluten-Free': 'Special Dietary Needs',
    'Oregon Regional': 'Regional',
    'Mexican Regional': 'Regional',
    'Austrian': 'International',
    'Southwest': 'Regional',
    'Low Fat Main Dishes': 'Healthy',
    'Casserole': 'Main Dish',
    'Southern/Cajun And Creole And Cajun': 'Regional',
    'Eastern European': 'International',
    'Asian/Indian And South Asian': 'International',
    'Casseroles': 'Main Dish',
    'Noodles And Pasta': 'Pasta',
    'Breads': 'Baked Goods',
    'Sauces': 'Sauces/Condiments',
    'Quick': 'Quick',
    'Southwestern And Mexican': 'Regional',
    'New England And Eastern European': 'Regional',
    'Appetizer': 'Appetizers',
    'California': 'Regional',
    'Curries': 'International',
    'Baking Soda': 'Baking Ingredients',
    'Southwestern And Mexican And Tex-Mex': 'Regional',
    'Pennsylvania Dutch': 'International',
    'Southwestern And Mexican And Southwestern': 'Regional',
    'Caribbean And Central American': 'International',
    'Southern/Cajun And Creole And Southern': 'Regional',
    'Arizona And New Mexican': 'Regional',
    'Midwestern': 'Regional',
    'Middle Eastern And Israeli': 'International',
    'Southwestern And Mexican And Mexican': 'Regional',
    'German And Eastern European': 'International',
    'Dairy And Poultry': 'Dairy',
    'Eggs And Dairy': 'Dairy',
    'German And Polish': 'International',
    'British': 'International',
    'Pasta And Noodles': 'Pasta',
    'Irish': 'International',
    'Chinese': 'International',
    'Muffins': 'Baked Goods',
    'Southwestern And Tex-Mex': 'Regional',
    'Eastern European And Russian': 'International',
    'Dessert Sauces': 'Sauces/Condiments',
    'Jewish And Passover': 'International',
    'Northwestern': 'Regional',
    'Northern Italian': 'International',
    'Taco': 'Main Dish',
    'Italian Regional And Italian': 'International',
    'Crock-Pot': 'Cooking Methods',
    'Breads/Bread Machine': 'Baked Goods',
    'Salads/Salads And Vegetables': 'Salads',
    'Cabbage And Corned Beef': 'Meat',
    'Crepes': 'Breakfast/Brunch',
    'Southern/Cajun And Creole And Cajun And Creole': 'Regional',
    'Chocolate Chip': 'Desserts',
    'Sour Cream': 'Dairy',
    'Caribbean And Cuban': 'International',
    'Mexican And Southwestern': 'Regional',
    'Eastern European And German': 'International',
    'German': 'International',
    'Condiments And Sauces': 'Sauces/Condiments',
    'Southern/Cajun And Creole And Southern And Cajun And Creole': 'Regional',
    'Pies': 'Desserts',
    'German And Austrian And German And Austrian': 'International',
    'Healthy': 'Healthy',
    'Low Sodium': 'Healthy',
    'Scandinavian And Swedish': 'International',
    'Eastern European And Hungarian': 'International',
    'German And Austrian And Polish': 'International',
    'German And Austrian And Swiss And Swiss': 'International',
    'Middle Eastern And Jewish': 'International',
    'Peanut Butter': 'Nuts/Seeds/Grains',
    'Southern/Cajun And Creole And Southern And Creole': 'Regional',
    'Fruit': 'Fruit',
    'Southern/Cajun And Creole And Cajun And Creole And Southern': 'Regional',
    'Dips': 'Appetizers',
    'Thai And Southeast Asian': 'International',
    'South American And Mexican': 'International',
    'Quick And Easy': 'Quick',
    'Low Sodium Main Dishes And Healthy': 'Healthy',
    'Canning': 'Preservation',
    'Mexican And South American': 'International',
    'California And Southwestern': 'Regional',
    'Czech And Eastern European': 'International',
    'California And American': 'Regional',
    'Southern/Cajun And Creole And Southern And Creole And Cajun': 'Regional',
    'Greek And Italian': 'International',
    'Low Fat Desserts': 'Healthy',
    'North Dakota': 'Regional',
    'German And Polish And Eastern European': 'International',
    'Jewish And Hanukkah': 'International',
    'Artichoke': 'Vegetables',
    'Bean Soup': 'Soups',
    'Beef Liver': 'Meat',
    'Beginner Cook': 'Uncategorized',
    'Birthday': 'Occasions',
    'Black Bean Soup': 'Soups',
    'Bread Pudding': 'Desserts',
    'Breakfast Casseroles': 'Breakfast/Brunch',
    'Breakfast Eggs': 'Breakfast/Brunch',
    'Broccoli Soup': 'Soups',
    'Buttermilk Biscuits': 'Baked Goods',
    'Cambodian': 'International',
    'Chicken Crock Pot': 'Chicken',
    'Chocolate Chip Cookies': 'Desserts',
    'Coconut Cream Pie': 'Desserts',
    'Dairy Free Foods': 'Special Dietary Needs',
    'Deep Fried': 'Cooking Methods',
    'Desserts Fruit': 'Desserts',
    'Ecuadorean': 'International',
    'Egg Free': 'Special Dietary Needs',
    'Fish Salmon': 'Seafood',
    'Fish Tuna': 'Seafood',
    'From Scratch': 'Cooking Methods',
    'Guatemalan': 'International',
    'Ham And Bean Soup': 'Soups',
    'Hanukkah': 'Occasions',
    'Hunan': 'International',
    'Inexpensive': 'Budget',
    'Iraqi': 'International',
    'Key Lime Pie': 'Desserts',
    'Labor Day': 'Occasions',
    'Lemon Cake': 'Desserts',
    'Macaroni And Cheese': 'Pasta',
    'Main Dish Casseroles': 'Main Dish',
    'Margarita': 'Beverages',
    'Memorial Day': 'Occasions',
    'Mongolian': 'International',
    'Mushroom Soup': 'Soups',
    'Nepalese': 'International',
    'Oatmeal': 'Breakfast/Brunch',
    'Oysters': 'Seafood',
    'Palestinian': 'International',
    'Peanut Butter Pie': 'Desserts',
    'Pot Roast': 'Meat',
    'Potato Soup': 'Soups',
    'Roast Beef Crock Pot': 'Meat',
    'Small Appliance': 'Cooking Methods',
    'Snacks Sweet': 'Desserts',
    'Somalian': 'International',
    'Soups Crock Pot': 'Soups',
    'Spaghetti Sauce': 'Sauces/Condiments',
    'Steam': 'Cooking Methods',
    'Sudanese': 'International',
    'Turkey Gravy': 'Poultry',
    'Wheat Bread': 'Baked Goods',
    'Appetizers, Dietary Restrictions': 'Special Dietary Needs',
    'Beans/Legumes': 'Beans/Legumes',
    'Beef, Cooking Methods': 'Meat',
    'Beverage': 'Beverages',
    'Cake, Dessert': 'Desserts',
    'Casseroles, Main Dish': 'Main Dish',
    'Chicken, Cooking Methods': 'Chicken',
    'Cookies, Dessert': 'Desserts',
    'Cooking Methods': 'Cooking Methods',
    'Cooking Skill Level': 'Uncategorized',
    'Cooking Times': 'Cooking Times',
    'Cuisine': 'International',
    'Cost': 'Budget',
    'Dessert, Fruit': 'Desserts',
    'Dietary Restrictions': 'Special Dietary Needs',
    'Family-Friendly': 'Occasions',
    'Flavor Profiles': 'Flavor Profiles',
    'Gravy, Turkey': 'Poultry',
    'Health/Wellness': 'Healthy',
    'Household': 'Uncategorized',
    'Occasion': 'Occasions',
    'Occasions': 'Occasions',
    'Outdoor Cooking': 'Occasions',
    'Pasta, Cheese, Main Dish': 'Pasta',
    'Pie, Dessert': 'Desserts',
    'Quick and Easy': 'Quick and Easy',
    'Regional': 'Regional',
    'Sauce, Pasta': 'Pasta',
    'Seasonal':'Seasonal',
    'Side Dishes': 'Side Dishes',
    'Snacks, Dessert': 'Desserts',
    'Soup': 'Soups',
    'Soup, Cooking Methods' : 'Soups',
    'Special Dietary Needs': 'Special Dietary Needs',
    'Uncategorized': 'Uncategorized',
    'Gluten Free Appetizers': 'Special Dietary Needs',
    'Easy': 'Quick and Easy',
    'Family-Friendly': 'Occasions',
    'Outdoor Cooking': 'Occasions',
    

}

Checking if we didn't cover anything:

In [70]:
set(recipes['RecipeCategory'].unique()) - set(category_mapping.keys()) 

set()

In [71]:
recipes['RecipeCategory'] = recipes['RecipeCategory'].map(category_mapping)

In [72]:
recipes['RecipeCategory'].unique(), recipes['RecipeCategory'].nunique()

(array(['Desserts', 'Chicken', 'Beverages', 'Vegetarian/Vegan',
        'Vegetables', 'Regional', 'Sauces/Condiments', 'Main Dish',
        'Beans/Legumes', 'Quick and Easy', 'Special Dietary Needs',
        'Baked Goods', 'Poultry', 'Healthy', 'International',
        'Breakfast/Brunch', 'Nuts/Seeds/Grains', 'Fruit', 'Meat', 'Dairy',
        'Seafood', 'Pasta', 'Lunch/Snacks', 'Cooking Methods', 'Soups',
        'Seasonal', 'Flavor Profiles', 'Uncategorized', 'Occasions',
        'Family-Friendly', 'Side Dishes', 'Preservation', 'Household',
        'Appetizers', 'Outdoor Cooking', 'Budget'], dtype=object),
 36)

We now have 36 categories!

We can still see that some categories barfely have any memebers. I'll merge them into other categories:

In [74]:
recipes['RecipeCategory'].value_counts()

Desserts                 100616
Vegetables                50318
Main Dish                 40235
Meat                      36461
Lunch/Snacks              32586
Quick and Easy            32452
Baked Goods               30131
Chicken                   26383
Beverages                 22822
Sauces/Condiments         22812
Breakfast/Brunch          21913
Healthy                   18419
International             17495
Nuts/Seeds/Grains          8719
Fruit                      8567
Dairy                      8462
Beans/Legumes              7894
Seafood                    6584
Poultry                    6525
Soups                      5208
Pasta                      3962
Side Dishes                2624
Vegetarian/Vegan           1844
Occasions                  1809
Flavor Profiles            1392
Regional                   1319
Family-Friendly            1221
Special Dietary Needs      1142
Uncategorized               942
Seasonal                    817
Cooking Methods             472
Househol

In [76]:
new_cat_list = ['Desserts', 'Chicken', 'Beverages', 'Vegetarian/Vegan',
        'Vegetables', 'Regional', 'Sauces/Condiments', 'Main Dish',
        'Beans/Legumes', 'Quick and Easy', 'Special Dietary Needs',
        'Baked Goods', 'Poultry', 'Healthy', 'International',
        'Breakfast/Brunch', 'Nuts/Seeds/Grains', 'Fruit', 'Meat', 'Dairy',
        'Seafood', 'Pasta', 'Lunch/Snacks', 'Cooking Methods', 'Soups',
        'Seasonal', 'Flavor Profiles', 'Uncategorized', 'Occasions',
        'Family-Friendly', 'Side Dishes', 'Preservation', 'Household',
        'Appetizers', 'Outdoor Cooking', 'Budget']

In [77]:
new_cat_dict = {x:x for x in new_cat_list}

In [78]:
new_cat_dict

{'Desserts': 'Desserts',
 'Chicken': 'Chicken',
 'Beverages': 'Beverages',
 'Vegetarian/Vegan': 'Vegetarian/Vegan',
 'Vegetables': 'Vegetables',
 'Regional': 'Regional',
 'Sauces/Condiments': 'Sauces/Condiments',
 'Main Dish': 'Main Dish',
 'Beans/Legumes': 'Beans/Legumes',
 'Quick and Easy': 'Quick and Easy',
 'Special Dietary Needs': 'Special Dietary Needs',
 'Baked Goods': 'Baked Goods',
 'Poultry': 'Poultry',
 'Healthy': 'Healthy',
 'International': 'International',
 'Breakfast/Brunch': 'Breakfast/Brunch',
 'Nuts/Seeds/Grains': 'Nuts/Seeds/Grains',
 'Fruit': 'Fruit',
 'Meat': 'Meat',
 'Dairy': 'Dairy',
 'Seafood': 'Seafood',
 'Pasta': 'Pasta',
 'Lunch/Snacks': 'Lunch/Snacks',
 'Cooking Methods': 'Cooking Methods',
 'Soups': 'Soups',
 'Seasonal': 'Seasonal',
 'Flavor Profiles': 'Flavor Profiles',
 'Uncategorized': 'Uncategorized',
 'Occasions': 'Occasions',
 'Family-Friendly': 'Family-Friendly',
 'Side Dishes': 'Side Dishes',
 'Preservation': 'Preservation',
 'Household': 'Household

In [79]:
new_dict = {'Desserts': 'Desserts',
 'Chicken': 'Chicken',
 'Beverages': 'Beverages',
 'Vegetarian/Vegan': 'Vegetarian/Vegan',
 'Vegetables': 'Vegetables',
 'Regional': 'Regional',
 'Sauces/Condiments': 'Sauces/Condiments',
 'Main Dish': 'Main Dish',
 'Beans/Legumes': 'Beans/Legumes',
 'Quick and Easy': 'Quick and Easy',
 'Special Dietary Needs': 'Special Dietary Needs',
 'Baked Goods': 'Baked Goods',
 'Poultry': 'Poultry',
 'Healthy': 'Healthy',
 'International': 'International',
 'Breakfast/Brunch': 'Breakfast/Brunch',
 'Nuts/Seeds/Grains': 'Nuts/Seeds/Grains',
 'Fruit': 'Fruit',
 'Meat': 'Meat',
 'Dairy': 'Dairy',
 'Seafood': 'Seafood',
 'Pasta': 'Pasta',
 'Lunch/Snacks': 'Lunch/Snacks',
 'Cooking Methods': 'Cooking Methods',
 'Soups': 'Soups',
 'Seasonal': 'Seasonal',
 'Flavor Profiles': 'Flavor Profiles',
 'Uncategorized': 'Uncategorized',
 'Occasions': 'Occasions',
 'Family-Friendly': 'Family-Friendly',
 'Side Dishes': 'Side Dishes',
 'Preservation': 'Uncategorized',
 'Household': 'Uncategorized',
 'Appetizers': 'Uncategorized',
 'Outdoor Cooking': 'Occasions',
 'Budget': 'Uncategorized'}

In [81]:
recipes['RecipeCategory'] = recipes['RecipeCategory'].map(new_dict)

In [83]:
recipes['RecipeCategory'].value_counts(), recipes['RecipeCategory'].nunique()

(Desserts                 100616
 Vegetables                50318
 Main Dish                 40235
 Meat                      36461
 Lunch/Snacks              32586
 Quick and Easy            32452
 Baked Goods               30131
 Chicken                   26383
 Beverages                 22822
 Sauces/Condiments         22812
 Breakfast/Brunch          21913
 Healthy                   18419
 International             17495
 Nuts/Seeds/Grains          8719
 Fruit                      8567
 Dairy                      8462
 Beans/Legumes              7894
 Seafood                    6584
 Poultry                    6525
 Soups                      5208
 Pasta                      3962
 Side Dishes                2624
 Occasions                  1852
 Vegetarian/Vegan           1844
 Flavor Profiles            1392
 Regional                   1319
 Uncategorized              1270
 Family-Friendly            1221
 Special Dietary Needs      1142
 Seasonal                    817
 Cooking M

Good! Now we have 31 major categories..

In [88]:
recipes.sample(2)

Unnamed: 0,RecipeId,AuthorId,Description,RecipeCategory,Keywords,RecipeIngredientQuantities,RecipeIngredientParts,AggregatedRating,ReviewCount,Calories,FatContent,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions,url,YearPublished,MonthPublished,DayPublished,HourPublished,TotalMinutes
182943,191226.0,365320,My mom makes this for Christmas morning. It's ...,Baked Goods,"[Breakfast, < 30 Mins, Easy]","[1, 3⁄4, 1⁄4, 1, 1, 4, 1, 1, 1⁄2, 2, 1⁄2]","[boiling water, vanilla, vanilla pudding mix, ...",5.0,1.0,547.1,24.3,7.5,123.5,576.5,76.2,1.1,50.5,7.0,8.0,,"[Mix all ingredients together., Poor half of c...",https://www.food.com/recipe/Early--Morning-Cof...,2006,10,20,14,30
281537,292632.0,671810,Winter and Easter will soon be behind us. If y...,Beverages,"[Low Protein, Low Cholesterol, Healthy, Sweet,...","[6, 1, 1⁄2, 2, None, None]","[brewed coffee, milk, sugar, chocolate]",4.0,1.0,192.4,5.3,3.3,17.3,164.7,30.9,0.7,22.9,5.5,1.0,,"[Combine hot coffee, hot chocolate mix and sug...",https://www.food.com/recipe/Easy-Iced-Mocha-29...,2008,3,19,2,10


In [90]:
recipes.isna().sum()

RecipeId                           0
AuthorId                           0
Description                        5
RecipeCategory                     0
Keywords                           0
RecipeIngredientQuantities         0
RecipeIngredientParts              0
AggregatedRating              253223
ReviewCount                   247489
Calories                           0
FatContent                         0
SaturatedFatContent                0
CholesterolContent                 0
SodiumContent                      0
CarbohydrateContent                0
FiberContent                       0
SugarContent                       0
ProteinContent                     0
RecipeServings                182911
RecipeYield                   348071
RecipeInstructions                 0
url                                0
YearPublished                      0
MonthPublished                     0
DayPublished                       0
HourPublished                      0
TotalMinutes                       0
d

Note that we have also eliminated the null values for `RecipeCategory`. We also have 5 nul values for `Description`; let's drop them too:

In [97]:
recipes['Description'].dropna(inplace=True)

In [99]:
recipes['Description'].isna().sum()

5

In [101]:
recipes[recipes['Description'].isna()]

Unnamed: 0,RecipeId,AuthorId,Description,RecipeCategory,Keywords,RecipeIngredientQuantities,RecipeIngredientParts,AggregatedRating,ReviewCount,Calories,FatContent,SaturatedFatContent,CholesterolContent,SodiumContent,CarbohydrateContent,FiberContent,SugarContent,ProteinContent,RecipeServings,RecipeYield,RecipeInstructions,url,YearPublished,MonthPublished,DayPublished,HourPublished,TotalMinutes
3416,5177.0,1552,,Baked Goods,"[Breakfast, < 15 Mins, For Large Groups, Oven]","[1 1⁄2, 1⁄4, 1, 1, 1, 1, 1, 1⁄4]","[butter, margarine, parmesan cheese, rosemary,...",5.0,4.0,35.5,3.1,1.9,8.8,80.7,0.3,0.1,0.0,1.6,24.0,,"[Grease a fluted tube Bundt pan., combine chee...",https://www.food.com/recipe/Herb-Pull-Aparts-5177,1999,11,30,23,0
3526,5300.0,1992,,Chicken,"[Chicken, Beef Organ Meats, Beef Liver, Poultr...","[2, 1 1⁄2, 900, 9, 8, 1, None]","[sweet sherry, chicken livers, eggs, nutmeg]",,,4650.2,391.1,208.1,7517.1,1606.2,30.1,0.5,5.8,243.4,1.0,,[Bring cream to simmering point. Puree all oth...,https://www.food.com/recipe/Chicken-Liver-Parf...,1999,12,5,13,0
3645,5428.0,1534,,International,"[European, Very Low Carbs, < 15 Mins]","[1, 1⁄3, 10 -12, 1⁄4, 1⁄4, None, 2, 3]","[garlic, fresh swiss chard, red wine vinegar, ...",5.0,5.0,928.6,93.0,15.9,344.6,1172.6,7.2,2.1,2.5,16.4,1.0,,"[Marinate garlic clove in oil for 1 hour., Rem...",https://www.food.com/recipe/Hot-Swiss-Chard-Sa...,1999,12,15,23,0
4590,7426.0,1534,,Sauces/Condiments,[< 15 Mins],"[2, 1⁄2, 1, 1, 1⁄2, 1, 1]","[salt, garlic powder, parsley flakes, mayonnai...",,,119.9,2.3,1.4,9.8,1829.2,16.4,0.8,12.1,9.0,,1 batch,"[Mix instant onion mix, salt, garlic powder, p...",https://www.food.com/recipe/Hidden-Valley-Mix-...,1999,12,15,23,0
4591,7427.0,1534,,Fruit,"[Meat, < 15 Mins, Oven]","[2, 1, 2, 1⁄2, 1⁄3, 3, 2, 1⁄4, 1⁄4, 1, 12, 1, 1]","[beef, eggs, parsley, ketchup, onions, soy sau...",4.5,5.0,1264.1,109.3,45.1,211.8,1364.7,51.9,4.6,40.9,17.5,6.0,,"[In a large bowl, combine ground beef, cornfla...",https://www.food.com/recipe/Cranberry-Cocktail...,1999,12,15,23,0


In [104]:
recipes.drop([3416,3526,3645,4591,4590],axis=0,inplace=True)

In [105]:
recipes.isna().sum()

RecipeId                           0
AuthorId                           0
Description                        0
RecipeCategory                     0
Keywords                           0
RecipeIngredientQuantities         0
RecipeIngredientParts              0
AggregatedRating              253221
ReviewCount                   247487
Calories                           0
FatContent                         0
SaturatedFatContent                0
CholesterolContent                 0
SodiumContent                      0
CarbohydrateContent                0
FiberContent                       0
SugarContent                       0
ProteinContent                     0
RecipeServings                182910
RecipeYield                   348067
RecipeInstructions                 0
url                                0
YearPublished                      0
MonthPublished                     0
DayPublished                       0
HourPublished                      0
TotalMinutes                       0
d

Okay! We now have done some major cleaning on the original Kaggle dataset. I'm going to save what we have so far in a separate file so we can use this dataset in our ML and NLP projects down the road. 

**NOTE:** We didn't clean or transform some of the categorical columns. We will take appropriate action in later notebooks, depending on the specific need of the notebook (some of them need the categorical data for NLP tasks, while some don't need them at all). 

In [107]:
recipes.to_parquet('BasicCleanData.parquet') 