# Recipe Cleaner (Ingredients)

This notebook will read in the core recipe data, process/clean it, and save it out for later analyses.

In [5]:
# These are the possible input files
!ls ../data/RecSys

[34mcore-data-images[m[m           core-data_recipe.csv
core-data-test_rating.csv  [34mraw-data-images[m[m
core-data-train_rating.csv raw-data_interaction.csv
core-data-valid_rating.csv raw-data_recipe.csv


In [25]:
# Load in packages to use throughout
import pandas as pd
import json

## Read in Recipes

We can see the following columns:

- recipe_id: This will be relevant when linking this table with the user data
- recipe_name: The name of the food
- image_url: The path to find the food image
- ingredients: 

In [17]:
recipes = pd.read_csv('../data/RecSys/core-data_recipe.csv')
recipes.head()

Unnamed: 0,recipe_id,recipe_name,image_url,ingredients,cooking_directions,nutritions
0,240488,"Pork Loin, Apples, and Sauerkraut",https://images.media-allrecipes.com/userphotos...,sauerkraut drained^Granny Smith apples sliced^...,{'directions': u'Prep\n15 m\nCook\n2 h 30 m\nR...,"{u'niacin': {u'hasCompleteData': False, u'name..."
1,218939,Foolproof Rosemary Chicken Wings,https://images.media-allrecipes.com/userphotos...,chicken wings^sprigs rosemary^head garlic^oliv...,"{'directions': u""Prep\n20 m\nCook\n40 m\nReady...","{u'niacin': {u'hasCompleteData': True, u'name'..."
2,87211,Chicken Pesto Paninis,https://images.media-allrecipes.com/userphotos...,focaccia bread quartered^prepared basil pesto^...,{'directions': u'Prep\n15 m\nCook\n5 m\nReady ...,"{u'niacin': {u'hasCompleteData': True, u'name'..."
3,245714,Potato Bacon Pizza,https://images.media-allrecipes.com/userphotos...,red potatoes^strips bacon^Sauce:^heavy whippin...,{'directions': u'Prep\n20 m\nCook\n45 m\nReady...,"{u'niacin': {u'hasCompleteData': True, u'name'..."
4,218545,Latin-Inspired Spicy Cream Chicken Stew,https://images.media-allrecipes.com/userphotos...,skinless boneless chicken breast halves^diced ...,{'directions': u'Prep\n10 m\nCook\n8 h 15 m\nR...,"{u'niacin': {u'hasCompleteData': False, u'name..."


## Play with columns

The ingredients and cooking directions will need to be transformed a bit, as well as the nutrition information. Here I show how to do that transformation for one row.

**Ingredients:** I'll mostly keep this as is and just remove the 'Rub' word.

**Recipe:** I might consider formatting this better. For instance, Prep and the time 15m are on separate lines but maybe should be in one line? Can consider this later. I can keep this out of the saved data frame for now.

**Nutrional Facts:** Since I will save this all back into a data frame, it will make the most sense to exclude the nutrition information since for each row it is a data frame on it's own and not really needed for now.


In [90]:
# The ^ separates the different items
# Rub is more of a direction that can be removed for the ingredients
out = recipes.ingredients[0].replace('^Rub:','').split('^')
out

['sauerkraut drained',
 'Granny Smith apples sliced',
 'large onion',
 'caraway seeds',
 'apple cider divided',
 'brown sugar',
 'Thai seasoning',
 'salt',
 'garlic powder',
 'ground black pepper',
 'boneless pork loin roast']

In [89]:
# Format cooking directions
out = recipes.cooking_directions[0].replace("'", '"').replace('u"', '"')
out = json.loads(out)['directions']
print(out)

Prep
15 m
Cook
2 h 30 m
Ready In
2 h 45 m
Preheat oven to 325 degrees F (165 degrees C).
Mix sauerkraut, apples, onion, and caraway seeds in a large roasting pan. Stir 1/4 cup apple cider and brown sugar together in a separate bowl; pour over sauerkraut mixture.
Stir Thai seasoning, salt, garlic powder, and black pepper together in a small bowl; rub onto the top and bottom of the roast.
Make an indentation in the center of the sauerkraut mixture and place the seasoned roast in the indentation. Pour the remaining apple cider around the roast.
Bake in the preheated oven for 1 hour; baste roast with juices. Continue baking roast, basting every 30 minutes, until cooked through, 2 1/2 to 3 hours. An instant-read thermometer inserted into the center should read at least 145 degrees F (63 degrees C).


In [88]:
# I'm not sure what the 'has complete data' is all about
out = recipes.nutritions[0].replace("'", '"').replace('u"', '"').replace('False', 'false').replace('True', 'true')
out = json.loads(out)
out = pd.DataFrame(out)
out

Unnamed: 0,niacin,sugars,sodium,carbohydrates,vitaminB6,calories,thiamin,fat,folate,caloriesFromFat,calcium,fiber,magnesium,iron,cholesterol,protein,vitaminA,potassium,saturatedFat,vitaminC
hasCompleteData,False,False,False,True,False,True,False,True,False,True,True,False,False,True,False,True,False,False,False,False
name,Niacin Equivalents,Sugars,Sodium,Carbohydrates,Vitamin B6,Calories,Thiamin,Fat,Folate,Calories from Fat,Calcium,Dietary Fiber,Magnesium,Iron,Cholesterol,Protein,Vitamin A - IU,Potassium,Saturated Fat,Vitamin C
amount,15.6016,19.8415,2606.76,32.0818,1.32863,371.722,0.842312,11.6752,83.7392,105.077,135.454,10.2234,80.7371,6.62225,99.2,36.3988,73.1779,1088.92,3.64647,52.7685
percentDailyValue,120,0,104,10,83,19,84,18,47,-,17,41,29,66,33,73,1,31,18,88
displayValue,16,19.8,2607,32.1,1,372,< 1,11.7,84,105,135,10.2,81,7,99,36.4,73,1089,3.6,53
unit,mg,g,mg,g,mg,kcal,mg,g,mcg,kcal,mg,g,mg,mg,mg,g,IU,mg,g,mg


## Format Data

In [118]:
recipes2 = recipes[['recipe_id', 'recipe_name', 'ingredients']]
recipes2.head()

Unnamed: 0,recipe_id,recipe_name,ingredients
0,240488,"Pork Loin, Apples, and Sauerkraut",sauerkraut drained^Granny Smith apples sliced^...
1,218939,Foolproof Rosemary Chicken Wings,chicken wings^sprigs rosemary^head garlic^oliv...
2,87211,Chicken Pesto Paninis,focaccia bread quartered^prepared basil pesto^...
3,245714,Potato Bacon Pizza,red potatoes^strips bacon^Sauce:^heavy whippin...
4,218545,Latin-Inspired Spicy Cream Chicken Stew,skinless boneless chicken breast halves^diced ...


In [119]:
# Sometimes Rub shows up...it's more like a cooking instruction then ingredient so we remove
x = recipes2.loc[:,'ingredients'].apply(lambda x: x.replace('^Rub:', ''))
recipes2.loc[:,'ingredients'] = x # I think this is a bug where it gives a warning
recipes2.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


Unnamed: 0,recipe_id,recipe_name,ingredients
0,240488,"Pork Loin, Apples, and Sauerkraut",sauerkraut drained^Granny Smith apples sliced^...
1,218939,Foolproof Rosemary Chicken Wings,chicken wings^sprigs rosemary^head garlic^oliv...
2,87211,Chicken Pesto Paninis,focaccia bread quartered^prepared basil pesto^...
3,245714,Potato Bacon Pizza,red potatoes^strips bacon^Sauce:^heavy whippin...
4,218545,Latin-Inspired Spicy Cream Chicken Stew,skinless boneless chicken breast halves^diced ...


## Save out

In [122]:
recipes2.to_csv('../data/20_ingredients.csv')