# What's in an Avocado Toast: A Supply Chain Analysis

You're in London, making an avocado toast, a quick-to-make dish that has soared in popularity on breakfast menus since the 2010s. A simple smashed avocado toast can be made with five ingredients: one ripe avocado, half a lemon, a big pinch of salt flakes, two slices of sourdough bread and a good drizzle of extra virgin olive oil.

It's no small feat that most of these ingredients are readily available in grocery stores. In this project, you'll conduct a supply chain analysis of the ingredients used in an avocado toast, utilizing the [Open Food Facts database](https://world.openfoodfacts.org/). This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish. The data is contained in `.csv` files in the `data/` folder provided.

After completing this project, you'll be armed with a list of ingredients and their countries of origin, and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

![](avocado_wallpaper.jpeg)

In [69]:
import pandas as pd

In [70]:
def read_and_filter_data(file:str, sep:str, columns:list,
                        categories_list:list,
                        categories_col:str = 'categories_tags',
                        country_col:str = 'countries',
                        country:str = 'United Kingdom',
                        origin_col = 'origins_tags', ingredient='avocadoes'):
    full_df = pd.read_csv(file, sep=sep, usecols=columns)
    full_df = full_df[full_df[categories_col].notna()].astype(str)
    filtered_df = pd.DataFrame()
    for category in categories_list:
        df = full_df[full_df[categories_col].str.contains(category)]
        filtered_df = pd.concat([filtered_df,df],axis=0)
    filtered_df.drop_duplicates(inplace=True)
    uk_df = filtered_df[filtered_df[country_col]==country]
    try:
        origin_df = uk_df[uk_df[origin_col]!='nan']
        origin = origin_df[origin_col].value_counts().index[0]
        origin = origin.split(':')[1].title().replace('-',' ')
        origin_count = origin_df[origin_col].value_counts()[0]
        if origin_count == 1:
            print(f'''There is no top country {ingredient} in the {country} come from.\nAll {ingredient} products come from different countries.''')
        else:
            print(f'''The top country {ingredient} in the {country} come from is {origin}, supplying {origin_count} products''')
    except:
        origin, origin_count = '',''
        print(f'There is no data about the origin of {ingredient} in the {country}')
    return filtered_df, origin, origin_count

In [71]:
data_cols = [ 'code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']
avocado_tag_list = ['en:avocadoes', 'en:avocados', 'en:fresh-foods', 'en:fresh-vegetables', 'en:fruchte', 'en:fruits', 'en:raw-green-avocados', 'en:tropical-fruits', 'en:tropische-fruchte', 'en:vegetables-based-foods','fr:hass-avocados']

avocado, avocado_origin, avocado_origin_count = \
read_and_filter_data(file='data/avocado.csv', sep='\t', columns=data_cols,
                     categories_list = avocado_tag_list)

The top country avocadoes in the United Kingdom come from is Peru, supplying 2 products


In [72]:
lemon_tag_list = ['en:aromatic-plants', 'en:citron', 'en:citrus', 'en:fresh-fruits', 'en:fresh-lemons', 'en:fruits', 'en:lemons', 'en:unwaxed-lemons']

lemon, lemon_origin, lemon_origin_count = read_and_filter_data\
(file='data/lemon.csv', sep='\t', columns=data_cols, categories_list=lemon_tag_list, ingredient='lemons')

There is no top country lemons in the United Kingdom come from.
All lemons products come from different countries.


In [73]:
with open("data/relevant_olive_oil_categories.txt", "r") as file:
    olive_oil_categories = file.read().splitlines()
    file.close()
    
olive_oil, olive_oil_origin, olive_oil_origin_count = read_and_filter_data\
(file='data/olive_oil.csv', sep='\t', columns=data_cols, categories_list=olive_oil_categories, ingredient='olive oils')

The top country olive oils in the United Kingdom come from is Greece, supplying 6 products


In [74]:
with open("data/relevant_sourdough_categories.txt", "r") as file:
    sourdough_categories = file.read().splitlines()
    file.close()
    
sourdough, sourdough_origin, sourdough_origin_count = read_and_filter_data\
(file='data/sourdough.csv', sep='\t', columns=data_cols, categories_list=sourdough_categories, ingredient='sourdoughs')

The top country sourdoughs in the United Kingdom come from is United Kingdom, supplying 3 products


In [75]:
salt_tag_list = ['en:edible-common-salt', 'en:salts', 'en:sea-salts']

saltflakes, saltflakes_origin, saltflakes_origin_count = read_and_filter_data\
(file='data/salt_flakes.csv', sep='\t', columns=data_cols, categories_list=salt_tag_list, ingredient='salt flakes')

There is no data about the origin of salt flakes in the United Kingdom
