Recipe Analysis

These scripts require the following dependencies:

numpy
scipy
random
matplotlib
plotly
colorama
pickle
pandas
requests
bs4
tqdm
networkx

They have been tested with Python 3.7.

Data download

https://cosylab.iiitd.edu.in/recipedb/ can be scraped using the get_dataRDB.py script. The URL indices start at 2610 and end at 149191. Within this range, some URLs do not contain information. These are filtered out later.

To use the script:

Change the start and end variables such that: 2610 <= start <= end <= 149191
Run the script

Copies of this script can be made and started independently to scrape the website faster. If the URL is found, the time taken by the request and processing of the data is printed. Otherwise, only the URL index is printed.

Combining the data

Once the data is downloaded, the combine_pkl_files.py script can be customized and ran to combine all data into a single pandas dataframe.

To customize the script:

Edit the read_pickle statements to add all generated dataframes into a single list.
Run the script

This will merge all dataframes into one and remove URLs that could not be reached. The resulting dataframe is saved in the current directory.

Processing the data

The process_data.py script provides:

some examples of reading and compiling some results
some data processing and filtering for the data to build a graph.
saving of data for specific locations with a large enough number of recipes.

The full filtered data gets saved to the same directory. The country data gets saved to the country_data directory. The region data gets saved to the region_data directory. The continent data gets saved to the continent_data directory. The world data gets saved to the root directory. The script can be executed without any modifications.

Building the graphs

The construct_graphs.py script is used to build two graphs:

An unweighted undirected graph between recipes and ingredients with _ingredients appended to the original file name.
A weighted undirected graph between recipes and nutrients (normalized by energy) with _nutrients appended to the original file name.

Two .gml files are saved to the same directory where the .pkl file resides. The script can be executed without any modifications.

Processing the graphs

Ingredients Graphs

The ingredients_graph_processing.py script processes the graph by

Removing recipes with a small number of ingredients
Removing ingredients that are only used a few number of times (and those that have a name less than 2 characters long)
Creating a bipartite graph for the remaining recipes and ingredients
Creating the 1-mode projections onto the recipes and ingredients

The script prints out some data (number of nodes/edges before/after the removal of nodes).

The script saves three .gml files: one for the reduced graph, one for the projection on ingredients, and one for the projection on the recipes. Simply update the list of locations and run the script.

Nutrients Graphs

The nutrients_graph_processing.py script processes the graph by

Using the recipes from the "Ingredients Graphs" section
Only keeping five nutrients: Fats, Protein, Carbs, Sugars, Fiber
Reweighing the edges to percentage weight of the five nutrients
Removing edges with low weight
Creating a bipartite graph for the remaining nutrients and ingredients
Creating the 1-mode projections onto the nutrients and ingredients

The script prints out some data (number of nodes/edges before/after the removal of nodes).

The script saves three .gml files: one for the reduced graph, one for the projection on nutrients, and one for the projection on the recipes. Simply update the list of locations and run the script.

Analyzing the graphs

`ingredients_graphs_analysis.py`

Analyzes the ingredients graphs' 1-mode projection on the ingredients generating details about

graph diameters
degree centrality
betweenness centrality
degree distribution

`ingredients_graphs_analysis_result_plots_and_tabels.py`

Plots graphs and saves tables for results from ingredients_graphs_analysis.py

`nutrients_graphs_analysis.py`

Analyzes the nutrients graphs' 1-mode projection on the nutrients generating details about

recipes that have only one main nutrient
recipes that connect two nutrients
radar and matrix graphs of the nutrition content of each graph (normalized)

`assortativity.py`

Looks at the assortativity within the 1-mode projection of the ingredients graphs on the ingredients. Each ingredient is labeled its dominant macro-nutrient (fat, protein, or carb). Ingredients that cannot be labeled by one of these three are removed from the graph. Networkx modularity is used to compute the modularity with respect to these three classes. We also perform a random sampling of the nodes and compute the modularity of the induced subgraph.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
continent_data		continent_data
country_data		country_data
figures		figures
presentation		presentation
region_data		region_data
results		results
.gitignore		.gitignore
README.md		README.md
analysis_functions.py		analysis_functions.py
assortativity.py		assortativity.py
combine_pkl_files.py		combine_pkl_files.py
construct_graphs.py		construct_graphs.py
get_dataRDB.py		get_dataRDB.py
graph_processing_funtions.py		graph_processing_funtions.py
ingredients_graph_processing.py		ingredients_graph_processing.py
ingredients_graphs_analysis.py		ingredients_graphs_analysis.py
ingredients_graphs_analysis_result_plots_and_tabels.py		ingredients_graphs_analysis_result_plots_and_tabels.py
nutrients_graph_processing.py		nutrients_graph_processing.py
nutrients_graphs_analysis.py		nutrients_graphs_analysis.py
plotting_functions.py		plotting_functions.py
printing_functions.py		printing_functions.py
process_data.py		process_data.py

The-SS/recipe_analysis

Folders and files

Latest commit

History

Repository files navigation

Recipe Analysis

Data download

Combining the data

Processing the data

Building the graphs

Processing the graphs

Ingredients Graphs

Nutrients Graphs

Analyzing the graphs

ingredients_graphs_analysis.py

ingredients_graphs_analysis_result_plots_and_tabels.py

nutrients_graphs_analysis.py

assortativity.py

About

Resources

Stars

Watchers

Forks

Languages

`ingredients_graphs_analysis.py`

`ingredients_graphs_analysis_result_plots_and_tabels.py`

`nutrients_graphs_analysis.py`

`assortativity.py`