# Captain Cook: the fabulous recipes explorator



Objectives:

- [X] Create our own JSON map to plot informations about the recipes by region more specifically
- [Pending] Make the map more interactive and correct the colormap issue
- [Pending] Finish the ingredients list cleaning
- [X] Use statistical properties of the English language or Levenshtein distance
- [Pending] Create a user friendly recipe finder 


Bonus:

- Try to compute missing nutritional informations
- Find meaningful substitutions for ingredients

In [1]:
# Basic imports
import re
import os.path
import numpy as np
import scipy as sp
import pandas as pd

# Map-related imports
import json
import branca
import folium
from pandas.io.json import json_normalize
from IPython.core.display import display, HTML

# Plot-related imports
import seaborn as sns
import ipywidgets as widgets
import matplotlib.pyplot as plt
from ipywidgets import interact, interactive, fixed, interact_manual
from ipywidgets.embed import embed_minimal_html

from PIL import Image
#from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

ModuleNotFoundError: No module named 'wordcloud'

In [None]:
# General parameters
%matplotlib inline
plt.style.use('seaborn')#switch to seaborn style
plt.rcParams["figure.figsize"] = [16,10]

DATA_FOLDER = './data/'

# 1. Data Loading
  
The Data has been fetched and cleaned with `BASH`scripts, please look in the *dataCleaning* section to understand how this was achieved.  

**Home made fetched dataset:**

In [None]:
# Importing ingredients to Pandas DF
allrecipes_df = pd.read_csv(DATA_FOLDER + 'allrecipes.csv', sep='\t',  header=None, encoding = "utf-8")
allrecipes_df.columns = ['ID', 'Region', 'Title', 'Ingredients', 'kcal', 'carb', 'fat', 'protein', 'sodium', 'cholesterol']

# Bug?? need to convert into numeric somes, TODO EFFICIENT WAY TO DO THIS???
allrecipes_df['kcal'] = pd.to_numeric(allrecipes_df['kcal'], errors='coerce')
allrecipes_df['carb'] = pd.to_numeric(allrecipes_df['carb'], errors='coerce') / 1000.0 # convert to g
allrecipes_df['fat'] = pd.to_numeric(allrecipes_df['fat'], errors='coerce') / 1000.0 # convert to g
allrecipes_df['protein'] = pd.to_numeric(allrecipes_df['protein'], errors='coerce')
allrecipes_df['sodium'] = pd.to_numeric(allrecipes_df['sodium'], errors='coerce') / 1000.0
allrecipes_df['cholesterol'] = pd.to_numeric(allrecipes_df['cholesterol'], errors='coerce')

# Remove any rows which isn't properly formatted
allrecipes_df = allrecipes_df.dropna()

# Remove any duplicated lines
allrecipes_df = allrecipes_df.drop_duplicates().set_index('ID')

# Printing
allrecipes_df.head(5)

In [None]:
# Importing descriptions to Pandas DF
allrecipes_desc_df = pd.read_csv(DATA_FOLDER + 'allrecipes_desc.csv', sep='Â£',  header=None, encoding = "utf-8",  engine='python')
allrecipes_desc_df.columns = ['ID', 'Description']

# Remove any duplicated lines
allrecipes_desc_df = allrecipes_desc_df.drop_duplicates().set_index('ID')

allrecipes_desc_df.head(5)

In [None]:
print("Number of recipes:", len(allrecipes_df.index.unique()))

**Provided Dataset**

This dataset was provided with the assignment and cleaned with the provided `Perl` scripts. 

Thanks to the scripts, we obtain two datasets:

1. `cleaned_ing.csv` contains the list of ingredients for each recipe,
2. `cleaned_nutri.csv` contains the corresponding nutritional values.

Our objective is to merge these two sets to obtain a unique set with all useful informations.

In [None]:
# Importing ingredients to Pandas DF
ing_df = pd.read_csv(DATA_FOLDER + 'cleaned_ing.csv', sep='\t',  header=None, encoding = "utf-8")
ing_df.columns = ['ID', 'Title', 'Ingredients']

# Importing nutritional values to Pandas DF
nutri_df = pd.read_csv(DATA_FOLDER + 'cleaned_nutri.csv', sep='\t',  header=None, encoding = "utf-8")
nutri_df.columns = ['ID', 'kcal', 'carb', 'fat', 'protein', 'sodium', 'cholesterol']

# Merging
ing_df = ing_df.set_index('ID')
nutri_df = nutri_df.set_index('ID')
provided_df = ing_df.merge(nutri_df, on='ID', how='inner')

# Drop NaNs and duplicate lines
provided_df = provided_df.dropna().drop_duplicates()

provided_df.head()

We can observe that some nutritional values are missing, which can be solved either by removing the lines or by trying to calculate these values from the given ingredients.

As trying to calculate the values from ingredients with different units (i.e. grams, cups, tbsp, etc) requires a set of informations that we do not have, we decided to leave these lines as they are for now. 

In [None]:
# Bug?? need to convert into numeric somes, TODO EFFICIENT WAY TO DO THIS???
provided_df['kcal'] = pd.to_numeric(provided_df['kcal'], errors='coerce')
provided_df['carb'] = pd.to_numeric(provided_df['carb'], errors='coerce')
provided_df['fat'] = pd.to_numeric(provided_df['fat'], errors='coerce')
provided_df['protein'] = pd.to_numeric(provided_df['protein'], errors='coerce')
provided_df['sodium'] = pd.to_numeric(provided_df['sodium'], errors='coerce')
provided_df['cholesterol'] = pd.to_numeric(provided_df['cholesterol'], errors='coerce')

# Insert Region column to match the other DF
provided_df.insert(loc=1, column='Region', value=np.nan)
provided_df.head(5)

In [None]:
print("Number of recipes:", len(provided_df.index.unique()))

In [None]:
# Concatenate the 2 DF and drop any duplicated lines, it is possible since some data come from the same website!
recipes_df = allrecipes_df.append(provided_df, sort=False).drop_duplicates()
recipes_df['Region'] = recipes_df['Region'].astype('category')

recipes_df.head()

In [None]:
print("Number of total recipes:", len(recipes_df.index.unique()))

In [None]:
len(recipes_df[recipes_df['Region']=='italian'])/365

We see that the total number of recipes is enough to eat italian recipes everyday for almost 7 years!!  
We can save this DataFrame to be use later on.

In [None]:
recipes_df.to_csv(DATA_FOLDER + 'recipes_df.csv', sep='\t', encoding='utf-8')

# 2. Ingredient parsing
The cleaning is presented in `DataCleaning.ipynb`, here we use directly the result which is a list of good ingredient that can be matched in the recipes and are relevant for any statistical analysis

In [None]:
# Load csv file 'hand' cleaned
with open(DATA_FOLDER + 'cleaned_list') as f:
    ing_list = f.read().splitlines()
    
# Load the dictionnary that correct the name mispelled
ing_dict = np.load(DATA_FOLDER + 'ing_dict.npy').item()

In [None]:
# USEFUL
recipes_copy = recipes_df.copy()
recipes_copy['Ingredients'] = recipes_copy['Ingredients'].str.lower()

# Remove non alphabetic values expect of '|' which is the seperating char
recipes_copy['Ingredients'] = recipes_copy['Ingredients'].str.replace('[^a-zA-Z ]+', ' ')

In [None]:
# This step is needed to clean the dataset ingredient!
# Function that apply the cleaning dictionnary on every word in ingredients column for each recipes
def matcher(k):
    x = (i for i in ing_dict if i in k.split(' '))
    return '|'.join(map(ing_dict.get, x))

# Cleaned!!!
recipes_copy['Cleaned_Ing'] = recipes_copy['Ingredients'].map(matcher)
recipes_copy

----
**The ingredient list is now perfectly cleaned: we can do some neat analysis on it**

In [None]:
# Expand the cleaned ingredient for each recipes
matrix = recipes_copy['Cleaned_Ing'].str.get_dummies('|')
# matrix.head()

In [None]:
# Convert to numpy array for mathematic manipulations
X = matrix.values

# Compute the adjacency of the ingredient Graphs
# Indeed, since each values belong to the set {0,1}
# we get a matrix which corresponds to the number of link
# between two ingredients, it is define by the number at a given line - column
# TODO explanation

adjacency = X.T @ X

In [None]:
# Graph Plotting

# TODO: that would be nice https://www.curiousgnu.com/reddit-comments



-----
**Test with Holoviews**

In [None]:
df = pd.DataFrame(data=adjacency,    # values
...               index=matrix.columns,    # 1st column as index
...               columns=matrix.columns)  # 1st row as the column names

df = df.unstack().to_frame().reset_index().drop_duplicates()
df.columns = ['src', 'trg', 'number']
df = df.nlargest(200, columns='number')
df

In [None]:
#import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
%output fig='html' size=300

%opts Chord [label_index='index' color_index='src' edge_color_index='src'] 
%opts Chord (cmap='Category20' edge_cmap='Category20')
hv.Chord(df)

---

In [None]:
recipes_copy.head()

In [None]:
# Count occurences
ing_ds = recipes_copy['Ingredients'].str.split(" ", expand=True) \
                                    .stack() \
                                    .map(ing_dict) \
                                    .value_counts()  \
                                    .to_frame()  \
                                    .reset_index()

# TODO THIS SHOULD BE MOVE BELOW, IT IS AN ANALYSIS -> Quite useless but hey :P
ing_ds.head()

**Try visualization of WordCloud**

In [None]:
ing_ds_region = recipes_copy

regions = ing_ds_region['Region'].unique().dropna()

top_5_regions = ing_ds_region.groupby('Region').count().sort_values(by='Title', ascending=False).head(5)\
.index.tolist()

top_10_regions = ing_ds_region.groupby('Region').count().sort_values(by='Title', ascending=False).head(10)\
.index.tolist()

top_15_regions = ing_ds_region.groupby('Region').count().sort_values(by='Title', ascending=False).head(15)\
.index.tolist()


recipes_regions = ing_ds_region.groupby('Region').count().sort_values(by='Title', ascending=False)['Title']

recipes_regions = recipes_regions.to_frame()

recipes_regions.columns = ['Recipe Count']

recipes_regions.plot.bar()

plt.savefig("./website/freelancer-theme/img/recipe_count_per_region.png", format='png')

plt.show()



In [None]:
df = recipes_copy

titles = df.Title.tolist()

titles = ''.join(titles)

wordcloud = WordCloud(background_color="white", mode="RGBA", max_words=1000, width=800, height=400)\
.generate(titles)

print(mask.shape)
# create coloring from image
plt.figure(figsize=[10,10])
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")

# store to file
plt.savefig("./website/freelancer-theme/img/Flags/titles_wordcloud.svg", format="svg")

plt.show()

In [None]:
ingredients_per_region_list = []
for region in regions:
    ing_ds_region_2 = ing_ds_region[ing_ds_region['Region'] == region]['Ingredients'].str.split(" ", expand=True) \
                                                                    .stack() \
                                                                    .map(ing_dict) \
                                                                    .value_counts()  \
                                                                    .to_frame()  \
                                                                    .reset_index()
    ing_ds_region_2['Region'] = region
    ingredients_per_region_list.append(ing_ds_region_2)

ingredients_per_region = pd.concat(ingredients_per_region_list)
ingredients_per_region = ingredients_per_region.set_index('Region')
ingredients_per_region.columns = ['Ingredient', 'Count']

In [None]:
def get_region_ing(region):
    ingredients = ingredients_per_region[ingredients_per_region.index == region]
    return dict(zip(ingredients['Ingredient'], ingredients['Count']))
    

for region in top_5_regions:
    ing_list = get_region_ing(region)
    # Generate a word cloud image
    mask = np.array(Image.open("Masks/"+region+".png"))

    wordcloud = WordCloud(background_color="white", mode="RGBA", max_words=1000, mask=mask, width=800, height=400)\
    .generate_from_frequencies(ing_list)

    print(mask.shape)
    # create coloring from image
    image_colors = ImageColorGenerator(mask)
    plt.figure(figsize=[10,10])
    plt.imshow(wordcloud.recolor(color_func=image_colors), interpolation="bilinear")
    plt.axis("off")

    # store to file
    plt.savefig("./website/freelancer-theme/img/Flags/"+region+"_ing.svg", format="svg")

    plt.show()


**Try visualization with holoviews**

In [None]:
df = ingredients_per_region.reset_index()
top_ingredients_list = []

for region in regions:
    # Select 50 most used ingredients
    top_ingredients = df[df['Region'] == region].head(50)
    top_ingredients_list.append(top_ingredients)
    
top_ingredients = pd.concat(top_ingredients_list)


def compare_lists_ing(list1, list2):
    '''This function finds how many ingredients are in both lists'''
    counter = 0
    for ingredient in list1:
        if ingredient in list2:
            counter += 1
    return counter

coeff = []
region_1 = []
region_2 = []

# Create a list of coefficients between different regions
for i, region1 in enumerate(top_15_regions):
    for j, region2 in enumerate(top_15_regions):
        if region1 != region2:
            list1 = top_ingredients[top_ingredients['Region'] == region1]['Ingredient'].tolist()
            list2 = top_ingredients[top_ingredients['Region'] == region2]['Ingredient'].tolist()
            
            region_1.append(region1)
            region_2.append(region2)
            coeff.append(compare_lists_ing(list1, list2))
            

In [None]:
from math import log
coeff_log = [int(0.001*(x)**4+0.001*(x)**2-400) for x in coeff]

df = pd.DataFrame({'Region 1': region_1, 'Region 2': region_2, 'Coeff': coeff_log})  

top_regions_df = pd.DataFrame({'Region': top_15_regions})

top_regions_df, df

In [None]:
import holoviews as hv
hv.extension('bokeh')
%output fig='html' size=200

%opts Chord [label_index='Region' color_index='Region' edge_color_index='Region 2'] 
%opts Chord (cmap='Category20' edge_cmap='Category20')
nodes = hv.Dataset(top_regions_df, 'Region')

chord = hv.Chord((df, nodes), ['Region 1', 'Region 2'], ['Coeff'])

renderer = hv.renderer('bokeh')

plot = renderer.get_plot(chord).state

from bokeh.io import output_file, save, show
save(plot, './website/freelancer-theme/img/ingredients_chord.html')
show(plot)

In [None]:
#Copy of the df with all the ingredients per region
ingredients_per_region_copy = ingredients_per_region.reset_index()

#Compute the total number of ingredients per region
total_count_per_region = ingredients_per_region_copy.groupby('Region').agg('sum').reset_index()

#Create a dict  to divide later
dic_prop = dict(zip(total_count_per_region.Region,total_count_per_region.Count))

In [None]:
def prop_ing(df,dic):
    results=[]
    for i in range(len(df)):
        results.append(df['Count'][i]/dic[df['Region'][i]])
    df['Proportion']=results
    return df
    

In [None]:
#Define the proportion of each ingredient per country
proportion_df = prop_ing(ingredients_per_region_copy,dic_prop)
proportion_df.head(12)



In [None]:
#Ingredient frequency in the world
world_prop=ingredients_per_region.groupby('Ingredient').agg('sum').reset_index()

#Compute the total count of the ingredients
total_count = sum(world_prop.Count)

#Compute the frequency of each ingredients
world_prop['Frequency']=world_prop['Count']/total_count

world_prop.sort_values('Frequency', ascending=False).head()

# 3. Cooking time study-case

In this part we would like to analyze the cooking time of the recipes to be able to classify which regions have the highest and lowest cooking time.

In [None]:
# Extract all timing from recipes
timing_df = allrecipes_desc_df['Description'].str.extractall(r'(\d+) minutes|(\d+) hour|hours')
timing_df.columns = ['minutes', 'hours']

#Replace Nan by 0 and switch to int type
timing_df = timing_df.fillna(0).astype(int)

#Sum the number of minutes to get the recipe time
timing_df['Time (min)'] = timing_df['minutes']+timing_df['hours']*60

timing_df.head()

In [None]:
# Sum the total amount of time for each recipe
time_recipe = timing_df.groupby('ID').agg('sum')
time_recipe = time_recipe.drop(['minutes','hours'], axis=1)

time_recipe.head()

# 4. Merging
Finally, we can merge everything to a single DataFrame to use it for Visualization

In [None]:
# Merging Cooking Time
cleaned_df = recipes_df.merge(time_recipe, on='ID', how='left')

# Cleaning ingredient and ingredient substition
# This is not yet implemented but we are close to achieve this

cleaned_df.sample(5)

# 5. Analysis

This part presents some basic statistical analysis of the data.

First we analyse the data by region and observe *mean*, *median*, *min* and *max* for each nutritional value.

In [None]:
# Some classic analysis
stats_regions = cleaned_df.groupby('Region')
stats_regions = stats_regions.agg({'kcal' : ['mean', 'median', 'min', 'max'],
                                       'carb' : ['mean', 'median', 'min', 'max'],
                                       'fat' : ['mean', 'median', 'min', 'max'],
                                       'protein' : ['mean', 'median', 'min', 'max'],
                                       'sodium' : ['mean', 'median', 'min', 'max'],
                                       'cholesterol' : ['mean', 'median', 'min', 'max'],
                                       'Time (min)' : ['mean', 'median', 'min', 'max']})
stats_regions.sort_values([('kcal', 'mean')], ascending=False).head()

# 6. Visualization

In this part we present the overall visualization of informations we retrieve in the dataset.

###  Plots

In [None]:
# Interactive plot of correlation between nutritive values 
def f(nutritive1, nutritive2):
    
    sns.set_context("notebook", font_scale=1.5)
    sns.scatterplot(cleaned_df[nutritive1], cleaned_df[nutritive2])
    plt.show()
    
# Interact
interact(f, nutritive1=['kcal', 'carb', 'fat', 'protein', 'sodium', 'cholesterol'],
            nutritive2=['kcal', 'carb', 'fat', 'protein', 'sodium', 'cholesterol']);

In the plot above we can see the correlation between the different nutritional values. 

For example, there are many recipes where high carbs and fats correspond to high caloric plates, but less so for high proteins. Also it would seem that fats and cholesterol are not as correlated as we would think.

Below is a plot that shows the correlation coefficient for pairs of nutritional values by region. 

In [None]:
# Correlation between nutritional values shown per region
def f(region):
    sns.set_context("notebook", font_scale=1.5)
    
    # .iloc[:,:-1] is to avoid the Time column
    # It can be interesting to see if there is a correlation
    corr = cleaned_df.iloc[:,:-1][cleaned_df['Region'] == region].corr()
    sns.heatmap(corr, 
            xticklabels=corr.columns.values,
            yticklabels=corr.columns.values)
plt.show()
    
# Interact
interact(f, region=cleaned_df.Region.unique().dropna());

Below is a plot that shows the statistics by nutritional or time value of recipes classified by region. The plot is automatically ordered by median, so we obtain the region with the highest median for that item value.

In [None]:
# Item value statistics by regions
def f(item):
    recipe_sorted = stats_regions.sort_values([(item, 'median')], ascending=False)

    sns.set_context("notebook", font_scale=1.5)
    sns.boxplot(cleaned_df[item], cleaned_df['Region'], order=recipe_sorted.index)
    
    ## There is a big outlier for Sodium & Time, we will handle it later
    if(item == 'sodium'):
        plt.xlim(-0.5, 10)
        
    if(item == 'Time (min)'):
        plt.xlim(-50, 1500)
    ##
    plt.show()
    
# Interact
plt_interact = interact(f, item=['kcal', 'carb', 'fat', 'protein', 'sodium', 'cholesterol', 'Time (min)']);
embed_minimal_html('export.html', views=[plt_interact], title='Widgets export')

We can see that the most calorical, fat and protein rich recipes belong to Malaysia, while the sodium intake is won by the korean recipes. The ones that have to be most careful about the cholesterol intake seem to be the French.

By comparing the median we also see that the longest cooking time and preparation in total is for Persian recipes, whereas Japanese's recipes are the shortest

### Maps

In [None]:
# Loading JSON of world map
map_recipes_json = json.load(open(DATA_FOLDER + 'recipes_map.json'))

In [None]:
def layer_colormap(topojson, df, column, colorscale):
    
    # Create a layer
    feature_map = folium.FeatureGroup(name=column, overlay=False)  
    
    def style_function(feature):
    # Fetching values for the mean of the category for the given asked continent
        value = df[df['Region'] == feature['properties']['Region']][column].mean()
        return {
            'color': 'black',
            'weight': 1,
            'fillOpacity': 0.5,
            'fillColor': '#black' if np.isnan(value) else colorscale(value)
                }
    # Fetch values from the DataFrame and apply the colormap to the values
    # If the value is NaN, the corresponding color is dark-grey
    folium.GeoJson(topojson, style_function=style_function).add_to(feature_map)

    return feature_map;

In [None]:
# Create a new empty map
map_info  = folium.Map([30,0], tiles='cartodbpositron', zoom_start=2)

# Add for each nutritive information the map
for category in ['kcal','carb','fat','protein','sodium','cholesterol', 'Time (min)']:
    colorscale = branca.colormap.linear.YlOrRd_09.scale((min(stats_regions[category]['mean'])), max(stats_regions[category]['mean']))
    layer_colormap(map_recipes_json, cleaned_df, category, colorscale).add_to(map_info)
    
# Add a legend to the colormap and append it to the base layer
colorscale.caption = 'Mean of the nutritive value selected'
map_info.add_child(colorscale) 

# Adding the tile Layer thus it is prettier
folium.TileLayer(tiles='cartodbpositron', overlay=True).add_to(map_info)

# Layer Control to select the different layer created before
folium.LayerControl(collapsed=False, position='bottomleft').add_to(map_info);

# Save/Display
map_info.save('map_info.html')
#map_info

In [None]:
%%HTML
<iframe src="map_info.html" width=100% height=700></iframe>

On the previous map, we can see how the different nutritive properties of the recipes vary through the different continents. We can thus see some correlations like the kcal of the recipe and the fat which are both high in the same continents.  

**Note:** we actually have a small issue with the colormap and we will be fixing it by using a different kind of interactive map to show more interesting information (Ingredients distribution, min/max or median for nutrition)

In [None]:
##FINAL MAP 
import geopandas as gpd

#Define a geopandas df
df_geo = gpd.read_file((DATA_FOLDER + 'recipes_map.json'))

#Dissolve the geopandas df into region
region_geo = df_geo.dissolve(by='Region')
region_geo.head()

#Assign two column with the polygon center
region_geo['x'] = region_geo.centroid.map(lambda p: p.x)
region_geo['y'] = region_geo.centroid.map(lambda p: p.y)
region_geo.head()

In [None]:
#Now we assign the text to the corresponding Region
stats_regions_copy = stats_regions.reset_index()

#First we create one level column index
stats_regions_copy.columns = stats_regions_copy.columns.map('_'.join)

for col in stats_regions_copy.columns:
    stats_regions_copy[col] = stats_regions_copy[col].astype(str)

#We create a text with all the column 
stats_regions_copy['text'] = stats_regions_copy['Region_'] + '<br>' +\
    'Kcal '+stats_regions_copy['kcal_median']+' Carbone '+stats_regions_copy['carb_median']+'<br>'+\
    ' Fat '+stats_regions_copy['fat_median']+' Protein ' + stats_regions_copy['protein_median']+'<br>'+\
    ' Sodium '+stats_regions_copy['sodium_median']+' Cholesterol '+stats_regions_copy['cholesterol_median']
    

#Add the text to the geopandas df
list_text = stats_regions_copy.text.tolist()
region_geo['Text']=list_text

In [None]:
#Create a map
stats_map = folium.Map([30,0], tiles='cartodbpositron', zoom_start=2)

#Create an empty list of Markers
from folium.plugins import MarkerCluster
marker_cluster = MarkerCluster().add_to(stats_map)

# Add the coordinates to the Polygon Marker
for idx, row in region_geo.iterrows():
    # Get lat and lon of points
    lon = row['x']
    lat = row['y']

    # Get address information
    text = row['Text']
    # Add marker to the map
    folium.RegularPolygonMarker(location=[lat, lon], popup=text, fill_color='#2b8cbe', number_of_sides=6, radius=8).add_to(marker_cluster)
    
#Plot the map 
stats_map

stats_map.save('stats_map.html')
#map_info

In [None]:
%%HTML
<iframe src="stats_map.html" width=100% height=700></iframe>

In [None]:
## Map with the 10 most popular ingredients per region 
ten_imp_ing = proportion_df.groupby('Region').apply(lambda x: x.nlargest(10,['Proportion']))
ten_imp_ing.head()

ten_imp_df = ten_imp_ing.groupby('Region').agg({'Ingredient':lambda x:', '.join(x)})



In [None]:
region_geo_ing = region_geo.copy()

list_ing = ten_imp_df.Ingredient.tolist()
region_geo_ing['Ingredient'] = list_ing
region_geo_ing.head()                       

In [None]:
#Create a map
ing_map = folium.Map([30,0], tiles='cartodbpositron', zoom_start=2)

#Create an empty list of Markers
from folium.plugins import MarkerCluster
marker_cluster2 = MarkerCluster().add_to(ing_map)

# Add the coordinates to the Polygon Marker
for idx, row in region_geo_ing.iterrows():
    # Get lat and lon of points
    lon = row['x']
    lat = row['y']

    # Get address information
    text = row['Ingredient']
    # Add marker to the map
    folium.RegularPolygonMarker(location=[lat, lon], popup=text, fill_color='#2b8cbe', number_of_sides=6, radius=8).add_to(marker_cluster2)
    

ing_map.save('ing_map.html')


In [None]:
%%HTML
<iframe src="ing_map.html" width=100% height=700></iframe>

In [None]:
## SEARCH FOR A SPECIFIC RECIPE GIVEN A LIST OF ING
def search(df, words):  #1
    """
    Return a sub-DataFrame of those rows whose Name column match all the words.
    """
    return df[np.logical_and.reduce([df['Ingredients'].str.contains(word) for word in words])] 

In [None]:
print('Enter the list of ingredients and let Captain Cook look for a special recipe !')
answer = input()
answer = answer.split()
    

recipe_list = search(recipes_copy,answer)
if recipe_list.empty == True:
    print ('Sorry but we have not recipe with all these ingredients')
else:
    print ('Here are your recipes !!')
recipe_list