## Problem Statement
Nowadays, there are many weight-loss diets. Keto diet is one of the best and is getting more popular. It primarily consists of high-fats, moderate-proteins, and very-low-carbohydrates.<br>
It could be time consuming to find a meal recipe that is consistent with Keto diet, along with personal preferences. Therefore, I created a recommender system that takes the user inputs, processes them, then outputs the best matches. It is based on a dataset of 1722 meal recipes. This system recommends the best meals that suit a user based on their preferences. It takes into consideration the user preferences of the meat type, such as fish, beef or chicken. It also considers the user's preferences of sorting the results.<br>
The dataset could be found here: https://www.kaggle.com/hawkash/spoonacular-food-dataset

## Importing the dataset and exploring it

In [1]:
import numpy as np
import scipy.stats as stats
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

sns.set(font_scale=1.5)
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

In [2]:
df= pd.read_csv('newfood.csv')

In [259]:
df.shape

(1722, 62)

In [260]:
df.head()

Unnamed: 0,id,title,pricePerServing,weightPerServing,vegetarian,vegan,glutenFree,dairyFree,sustainable,veryHealthy,...,Fiber/g,Vitamin A/IU,Vitamin D/µg,Vitamin K/µg,Vitamin C/mg,Alcohol/g,Caffeine/g,meat,meat_category,keto
0,1,fried anchovies with sage,5.6051,226,False,False,False,True,False,False,...,1.16,154.8,0.29,0.0,0.0,0.0,0.0,anchovies,fish,False
1,2,anchovies appetizer with breadcrumbs & scallions,0.8206,33,False,False,False,True,False,False,...,0.38,0.0,0.0,7.18,0.0,0.0,0.0,anchovies,fish,False
2,3,"carrots, cauliflower and anchovies",4.38,364,False,False,False,True,False,True,...,9.99,21572.42,0.0,104.27,32.6,0.0,0.0,anchovies,fish,False
3,4,bap story: stir fried anchovies (myulchi bokkeum),8.1122,711,False,False,True,True,False,True,...,2.1,180.52,6.24,16.02,3.88,0.0,0.0,anchovies,fish,True
4,5,"bread, butter and anchovies",0.2557,36,False,False,False,False,False,False,...,1.24,128.27,0.0,2.21,0.0,0.0,0.0,anchovies,fish,False


In [4]:
df.columns

Index(['id', 'title', 'pricePerServing', 'weightPerServing', 'vegetarian',
       'vegan', 'glutenFree', 'dairyFree', 'sustainable', 'veryHealthy',
       'veryPopular', 'gaps', 'lowFodmap', 'ketogenic', 'whole30',
       'readyInMinutes', 'spoonacularSourceUrl', 'image', 'aggregateLikes',
       'spoonacularScore', 'healthScore', 'percentProtein', 'percentFat',
       'percentCarbs', 'dishTypes', 'ingredients', 'cuisines', 'calories',
       'Fat/g', 'Saturated Fat/g', 'Carbohydrates/g', 'Sugar/g',
       'Cholesterol/mg', 'Sodium/mg', 'Protein/g', 'Vitamin B3/mg',
       'Selenium/µg', 'Phosphorus/mg', 'Iron/mg', 'Vitamin B2/mg',
       'Calcium/mg', 'Vitamin B1/mg', 'Folate/µg', 'Potassium/mg', 'Copper/mg',
       'Zinc/mg', 'Manganese/mg', 'Magnesium/mg', 'Vitamin B12/µg',
       'Vitamin B5/mg', 'Vitamin B6/mg', 'Vitamin E/mg', 'Fiber/g',
       'Vitamin A/IU', 'Vitamin D/µg', 'Vitamin K/µg', 'Vitamin C/mg',
       'Alcohol/g', 'Caffeine/g'],
      dtype='object')

In [6]:
df.isnull().sum()

id                         0
title                      0
pricePerServing            0
weightPerServing           0
vegetarian                 0
vegan                      0
glutenFree                 0
dairyFree                  0
sustainable                0
veryHealthy                0
veryPopular                0
gaps                       0
lowFodmap                  0
ketogenic                  0
whole30                    0
readyInMinutes             0
spoonacularSourceUrl       0
image                      5
aggregateLikes             0
spoonacularScore           0
healthScore                0
percentProtein             0
percentFat                 0
percentCarbs               0
dishTypes                 78
ingredients                0
cuisines                1378
calories                   0
Fat/g                      0
Saturated Fat/g            0
Carbohydrates/g            0
Sugar/g                    0
Cholesterol/mg             0
Sodium/mg                  0
Protein/g     

### Data Dictionary
| Feature |Type| Description | unit         
| :- |:-|-------------: | :-:
|id| int | a unique number for each recipe  | NA
|title| object | recipe name  | NA
|pricePerServing| float | cost per serving  | USD
|weightPerServing| int | weight per serving  | Gram
|vegetarian| bool | vegetarian recipe or not  | NA
|vegan| bool | vegan recipe or not | NA
|glutenFree| bool | gluten free or not | NA
|dairyFree| bool | dairy free or not  | NA
|sustainable| bool | is it sustainable? for more [click here](https://www.sustainweb.org/sustainablefood/what_is_sustainable_food/) | NA
|veryHealthy| bool | is it very healthy?   | NA
|veryPopular| bool | is it very popular?  | NA
|gaps| object | gaps diet, for more [click here](http://www.gapsdiet.com/) | NA
|lowFodmap| bool | low Fodmap diet or not, for more [click here](https://www.ibsdiets.org/fodmap-diet/fodmap-food-list/)  | NA
|ketogenic| bool | ketogenic diet or not, for more [click here](https://www.healthline.com/nutrition/ketogenic-diet-101)   | NA
|whole30| bool | whole30 diet or not,for more [click here](https://www.thekitchn.com/what-is-whole30-and-why-were-talking-about-it-this-month-239308)| NA
|readyInMinutes| int | time of cooking  | minute
|spoonacularSourceUrl| object | recipe source link   | NA
|image| object | recipe image link  | NA
|aggregateLikes| int | counts of likes  | NA
|spoonacularScore| float | spoonacular score compares a recipe with all the other recipes using top-secret formula  | NA
|healthScore| float | how health is the recipe   | NA
|percentProtein| float | Protein percentage in a recipe| NA
|percentFat| float | Fat percentage in a recipe  | NA
|percentCarbs| float | Carbs percentage in a recipe  | NA
|dishTypes| object | recipe dish type | NA
|ingredients| object | ingredients to make a specific recipe  | NA
|cuisines| object |  Regional recipe preparation traditions | NA
|calories| float |  calories in one recipe | NA
|other featuers| NA |  nutrition facts | different units



## Feature Engineering

In this section, I created three new columns; 'meat', 'meat_category', and 'keto'. The 'meat' column detrmines the type of meat contained in the meal, such as salmon, sausage, ribs or tuna. The 'meat_category' column determines the category of the meat contained in the meal. The categories are; fish, beef, chicken or pork. The 'keto' column determines whether or not the meal is keto.

For creating the 'meat' column, I gathered all the possible types of meat that are in the meals of this dataset. They are listed in the 'meats' list below.<br>
Then I broke it down into four lists based on the category of the meat. This step is for creating the second column, 'meat_category'

In [129]:
meats=['beef','salmon','chicken','fish','shrimp','shrimps','bacon','pork','sausage','anchovies',
       'hotdog','boquerones','anchovy','halibut','snapper','cod','sole','prawn','turkey','steak','sirloin',
       'bluefish','lamb','tuna','veal','crab','oysters','sea bass','bass','sea-bass','catfish','trout',
       'haddock','monkfish','walleye','swordfish','flounder','ham','tilapia','marlin','hake','sardines','meat','duck',
       'eel','scallop','calamari','ribs','seafood','mahimahi','bream','burgers','burger','mackerel','grouper','smelt',
      'skate']
fish=['salmon','fish','shrimp','shrimps','anchovies','boquerones','anchovy','halibut','snapper','cod','sole','prawn',
     'bluefish','tuna','crab','oysters','sea bass','bass','sea-bass','catfish','trout','haddock','monkfish',
      'walleye','swordfish','flounder','tilapia','marlin','hake','sardines','eel','scallop','calamari','seafood',
     'mahimahi','bream','mackerel','grouper','smelt','skate']
beef=['beef','turkey','steak','sirloin','lamb','veal','meat','duck','ribs','burgers','burger']
chicken=['chicken']
pork=['bacon','pork','sausage','hotdog','ham']

Here, I created a pattern that contains all the names of the meats.

In [130]:
mm= '(beef|salmon|chicken|fish|shrimp|shrimps|bacon|pork|sausage|anchovies|hotdog|boquerones|anchovy|halibut|snapper|cod|sole|prawn|turkey|steak|sirloin|bluefish|lamb|tuna|veal|crab|oysters|sea bass|bass|sea-bass|catfish|trout|haddock|monkfish|walleye|swordfish|flounder|ham|tilapia|marlin|hake|sardines|meat|duck|eel|scallop|calamari|ribs|seafood|mahimahi|bream|burgers|burger|mackerel|grouper|smelt|skate)'
    

Converting the text of the 'title' column to lowercase in order to extract the pattern.

In [62]:
df['title']=df['title'].apply(lambda x: x.lower())

Extracting the meat name from the 'title' and assigning it to the new column 'meat'. The process is done by passing the pattern to extract() function.

In [131]:

df['meat']=df['title'].str.extract(mm)

If there is no meat extracted from the pattern, fill it with 'not_meat' indicating that the meal is not meat-based.

In [141]:
df['meat'].fillna('not_meat',inplace=True)

Creating the 'meat_category' column by checking the list that 'meat' belongs to. The category lists are created above. 

In [133]:
df['meat_category']=df['meat'].apply(lambda x: 'fish' if x in fish else 'beef' if x in beef else 'chicken' if x in chicken else 'pork' if x in pork else 'not_meat')


In [139]:
df['meat_category'].value_counts()

fish        1123
not_meat     470
beef          65
chicken       39
pork          25
Name: meat_category, dtype: int64

In [142]:
df.columns

Index(['id', 'title', 'pricePerServing', 'weightPerServing', 'vegetarian',
       'vegan', 'glutenFree', 'dairyFree', 'sustainable', 'veryHealthy',
       'veryPopular', 'gaps', 'lowFodmap', 'ketogenic', 'whole30',
       'readyInMinutes', 'spoonacularSourceUrl', 'image', 'aggregateLikes',
       'spoonacularScore', 'healthScore', 'percentProtein', 'percentFat',
       'percentCarbs', 'dishTypes', 'ingredients', 'cuisines', 'calories',
       'Fat/g', 'Saturated Fat/g', 'Carbohydrates/g', 'Sugar/g',
       'Cholesterol/mg', 'Sodium/mg', 'Protein/g', 'Vitamin B3/mg',
       'Selenium/µg', 'Phosphorus/mg', 'Iron/mg', 'Vitamin B2/mg',
       'Calcium/mg', 'Vitamin B1/mg', 'Folate/µg', 'Potassium/mg', 'Copper/mg',
       'Zinc/mg', 'Manganese/mg', 'Magnesium/mg', 'Vitamin B12/µg',
       'Vitamin B5/mg', 'Vitamin B6/mg', 'Vitamin E/mg', 'Fiber/g',
       'Vitamin A/IU', 'Vitamin D/µg', 'Vitamin K/µg', 'Vitamin C/mg',
       'Alcohol/g', 'Caffeine/g', 'meat', 'meat_category'],
      dtype='

Creating the 'keto' column by calculating the ratios of fat, carbs, and protein in the meal. If the fat ratio is between 75% and 45%, the carbs ratio is between 15% and 3%, and the protein ratio is betwen 40% and 20%, then mark it as a keto meal

In [173]:
keto=[]

for row in df.itertuples():
    if (row.percentFat)<=75.00 and (row.percentFat)>45.00 and (row.percentCarbs)<=15.00 and (row.percentCarbs)>3.00 and (row.percentProtein)<=40.00 and (row.percentProtein)>20.00:
        keto.append(True)
    else:
        keto.append(False)
            

In [174]:
df['keto']=keto

In [175]:
df['keto'].value_counts()

False    1506
True      216
Name: keto, dtype: int64

### New Columns Data Dictionary

| Feature |Type| Description | unit         
| :- |:-|-------------: | :-:
|meat| object | The exact type of meat contained in the recipe  | NA
|meat_category| object | The category of the meat: fish, beef, chicken, pork, or not_meat | NA
|keto| boolean | Whether or not the recipe is a keto diet meal  | NA

## Creating the Keto Diet Recommender System

### User input
The system asks for user inputs. It prompts the user to select:
- The type of meat they prefer
- How to sort the results
- The number of results to be shown

In [267]:
print('This is the Keto diet recommender system\n')
meat=input('Please input the letter of your favorite meat category out of these options:\na: Fish\nb: Beef\nc: Chicken\nd: Pork\ne: no meat\n')
sort=input('How would you like to sort the results? Please input the letter of your choice\na: By health score\nb: by spoonacular score\n')
result=input('Choose the number of results. Please input the letter of your choice\na: Top 3\nb: Top 5\nc: Show all\n')


This is the Keto diet recommender system

Please input the letter of your favorite meat category out of these options:
a: Fish
b: Beef
c: Chicken
d: Pork
e: no meat
b
How would you like to sort the results? Please input the letter of your choice
a: By health score
b: by spoonacular score
b
Choose the number of results. Please input the letter of your choice
a: Top 3
b: Top 5
c: Show all
a


### Processing
The following code processes the user inputs

In [268]:
# the i variable is an indicator that the input is valid
i=0

#if meat input is a, assign all keto meals containing fish to df_new dataframe
if meat == 'a':
    df_new=df[(df['keto']==True)& (df['meat_category']=='fish')]
    i=1

#if meat input is b, assign all keto meals containing beef to df_new dataframe
elif meat== 'b':
    df_new=df[(df['keto']==True)& (df['meat_category']=='beef')]
    i=1

#if meat input is c, assign all keto meals containing chicken to df_new dataframe
elif meat== 'c':
    df_new=df[(df['keto']==True)& (df['meat_category']=='chicken')]
    i=1

#if meat input is d, assign all keto meals containing pork to df_new dataframe
elif meat== 'd':
    df_new=df[(df['keto']==True)& (df['meat_category']=='pork')]
    i=1

#if meat input is e, assign all keto meals containing no meet to df_new dataframe
elif meat== 'e':
    df_new=df[(df['keto']==True)& (df['meat']=='not_meat')]
    i=1

#if the input is neither of the options, alert the user
else:
    print('No correct input for meat category')
    i=0

    
if i==1:
#Selecting the columns that the user cares about the most
    df_new=df_new[['title','spoonacularScore','healthScore','percentProtein','percentFat','percentCarbs','meat','meat_category','spoonacularSourceUrl']]

#if sort input is a, sort result by health score
    if sort== 'a':
        df_new.sort_values(by=['healthScore'],ascending=False,inplace=True)

#if sort input is b, sort result by spoonacular score
    elif sort== 'b':
        df_new.sort_values(by=['spoonacularScore'],ascending=False,inplace=True)

#if the input is neither of the options, alert the user
    else:
        print('No correct input for sorting option')

#if result input is a, show the first 3 meals
    if result == 'a':
        df_new=df_new.head(3)

#if result input is b, show the first 5 meals
    elif result== 'b':
        df_new=df_new.head()
        
#if result input is c, show all meals
    elif result== 'c':
        df_new=df_new
        
#if the input is neither of the options, alert the user
    else:
        print('No correct input number of results')
    
    
    
    

### Output

In [269]:
print('These are the best meals for you')
df_new

These are the best meals for you


Unnamed: 0,title,spoonacularScore,healthScore,percentProtein,percentFat,percentCarbs,meat,meat_category,spoonacularSourceUrl
661,tuscan braised short ribs,87.0,35.0,36.74,55.79,7.47,ribs,beef,https://spoonacular.com/tuscan-braised-short-r...
261,pan-seared steak with caper-anchovy butter,83.0,22.0,28.22,68.15,3.63,steak,beef,https://spoonacular.com/pan-seared-steak-with-...
730,grilled rib-eye steaks with roasted peppers,79.0,53.0,26.92,67.75,5.33,steak,beef,https://spoonacular.com/grilled-rib-eye-steaks...
