# What's in an Avocado Toast: A Supply Chain Analysis

![](avocado_wallpaper.jpeg)

You find yourself in London, crafting a delectable avocado toast, a dish that has risen dramatically in popularity on breakfast menus since the 2010s. This straightforward recipe requires just five ingredients: a ripe avocado, half a lemon, a generous pinch of salt flakes, two slices of sourdough bread, and a good drizzle of extra virgin olive oil. Most of these ingredients are now staples in grocery stores, and as you will find with this project, that is no small feat!

In this project, you'll conduct a supply chain analysis of three ingredients used in avocado toast using the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish.

Three pairs of files are provided in the data folder:
- A CSV file for each ingredient, such as `avocado.csv`, with data about each food item and countries of origin.
- A TXT file for each ingredient, such as `relevant_avocado_categories`, containing only the category tags of interest for that food.

Here are some other key points about these files:
- Some of the rows of data in each of the three CSV files do not contain relevant data for your investigation. In each dataset, you will need to filter out rows with irrelevant data, based on values in the `categories_tags` column. Examples of categories are fruits, vegetables, and fruit-based oils. Filter the DataFrame to include only rows where `categories_tags` contains one of the tags in the relevant categories for that ingredient.
- Each row of data usually has multiple category tags in the `categories_tags` column.
There is a column in each CSV file called `origins_tags`, which contains strings for the country of origin of each item.

After completing this project, you'll be armed with a list of ingredients and their countries of origin and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

[Open Food Facts database](https://world.openfoodfacts.org/)

Apply your data manipulation and analysis skills to investigate the supply chain of ingredients for making avocado toast in the U.K. Your task is to determine the following information:

The name of the most common country of origin for each of the three key ingredients: avocados, olive oil, and sourdough.
Store the most common country of origin for each ingredient in the following variables: top_avocado_origin, top_olive_oil_origin, and top_sourdough_origin. Ensure that the country names contain only letters (A-Z) and spaces, with no hyphens or other characters.

To focus your analysis, subset each of the DataFrames to include only these relevant columns: 'code', 'lc', 'productnameen', 'quantity', 'servingsize', 'packagingtags', 'brands', 'brandstags', 'categoriestags', 'labelstags', 'countries', 'countriestags', 'origins', 'origins_tags'.

After completing this project, feel free to explore other questions using the food data!

In [214]:
import pandas as pd

In [215]:
avocado = pd.read_csv('data/avocado.csv', sep = '\t')
lemon = pd.read_csv('data/lemon.csv', sep = '\t')
olive_oil = pd.read_csv('data/olive_oil.csv', sep = '\t')
salt_flakes = pd.read_csv('data/salt_flakes.csv', sep = '\t')
sourdough = pd.read_csv('data/sourdough.csv', sep = '\t')

  lemon = pd.read_csv('data/lemon.csv', sep = '\t')
  olive_oil = pd.read_csv('data/olive_oil.csv', sep = '\t')


In [216]:
for col in avocado.columns:
    print(col)


code
lc
product_name_de
product_name_el
product_name_en
product_name_es
product_name_fi
product_name_fr
product_name_id
product_name_it
product_name_lt
product_name_lv
product_name_nb
product_name_nl
product_name_pl
product_name_ro
product_name_sl
product_name_sv
generic_name_de
generic_name_en
generic_name_es
generic_name_fr
generic_name_sv
quantity
serving_size
packaging
packaging_tags
brands
brands_tags
brand_owner
categories
categories_tags
labels
labels_tags
countries
countries_tags
stores
stores_tags
obsolete
obsolete_since_date
origins
origins_tags
origin_en
origin_fr
manufacturing_places
manufacturing_places_tags
emb_codes
emb_codes_tags
ingredients_text_de
ingredients_text_en
ingredients_text_es
ingredients_text_fi
ingredients_text_fr
ingredients_text_id
ingredients_text_it
ingredients_text_lt
ingredients_text_lv
ingredients_text_nb
ingredients_text_nl
ingredients_text_pl
ingredients_text_ro
ingredients_text_sv
allergens
allergens_tags
traces
traces_tags
no_nutrition_data
nutr

In [217]:


avo_subset_cat = ['code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins', 'origins_tags']

avocado = avocado[avo_subset_cat]

print(avocado)

               code  lc                                    product_name_en  \
0     0059749979702  fr                                                NaN   
1     7610095131409  en                                                NaN   
2     4005514005578  en                  Gelbe Linse Avocado Brotaufstrich   
3     0879890002513  en                           Avocado toast chili lime   
4     0223086613685  en                                            Avocado   
...             ...  ..                                                ...   
1780  0819573012712  en         Organic Baby Food, Apples, Kale & Avocados   
1781  0052200072097  en                     Just Pineapple, Pear & Avocado   
1782  0793613300000  en                                Spinach Avocado Dip   
1783       05252428  en  Organic Just Apple, Raspberry & Avocado, Apple...   
1784  0052200072141  en       Baby Food Puree, Just Mango, Apple & Avocado   

     quantity     serving_size packaging_tags  \
0         NaN 

In [218]:
with open("data/relevant_avocado_categories.txt", "r") as file:
    relevant_avocado_categories = file.read().splitlines()


print(relevant_avocado_categories)

['en:avocadoes', 'en:avocados', 'en:fresh-foods', 'en:fresh-vegetables', 'en:fruchte', 'en:fruits', 'en:raw-green-avocados', 'en:tropical-fruits', 'en:tropische-fruchte', 'en:vegetables-based-foods', 'fr:hass-avocados']


In [219]:
avocado['categories_list'] = avocado['categories_tags'].str.split(',')

print(avocado)


               code  lc                                    product_name_en  \
0     0059749979702  fr                                                NaN   
1     7610095131409  en                                                NaN   
2     4005514005578  en                  Gelbe Linse Avocado Brotaufstrich   
3     0879890002513  en                           Avocado toast chili lime   
4     0223086613685  en                                            Avocado   
...             ...  ..                                                ...   
1780  0819573012712  en         Organic Baby Food, Apples, Kale & Avocados   
1781  0052200072097  en                     Just Pineapple, Pear & Avocado   
1782  0793613300000  en                                Spinach Avocado Dip   
1783       05252428  en  Organic Just Apple, Raspberry & Avocado, Apple...   
1784  0052200072141  en       Baby Food Puree, Just Mango, Apple & Avocado   

     quantity     serving_size packaging_tags  \
0         NaN 

In [220]:
avocado = avocado.dropna(subset = 'categories_list')

print(avocado)

               code  lc                    product_name_en   quantity  \
0     0059749979702  fr                                NaN        NaN   
1     7610095131409  en                                NaN        NaN   
2     4005514005578  en  Gelbe Linse Avocado Brotaufstrich        NaN   
5     3662994002063  fr                                NaN   3 fruits   
6     8437013031011  fr                                NaN       1 kg   
...             ...  ..                                ...        ...   
1754       00193696  en                        Egg avocado        NaN   
1756  4311527575718  de                                NaN        NaN   
1757  4311527571499  en                                NaN        NaN   
1769  3439496511399  fr                                NaN        NaN   
1771  5010251741985  en                Extra large avocado  1 avocado   

     serving_size packaging_tags                        brands  \
0             NaN            NaN                     Natu

In [221]:
avocado = avocado[avocado['categories_list'].apply(lambda x: any([i for i in x if i in relevant_avocado_categories]))]

print(avocado)

               code  lc      product_name_en   quantity serving_size  \
5     3662994002063  fr                  NaN   3 fruits          NaN   
6     8437013031011  fr                  NaN       1 kg          NaN   
14    4016249238155  de                  NaN       135g         100g   
17    8718963381532  de                  NaN        NaN          NaN   
23    8436002746707  es                  NaN        NaN          NaN   
...             ...  ..                  ...        ...          ...   
1751  3700353611218  fr                  NaN        NaN          NaN   
1756  4311527575718  de                  NaN        NaN          NaN   
1757  4311527571499  en                  NaN        NaN          NaN   
1769  3439496511399  fr                  NaN        NaN          NaN   
1771  5010251741985  en  Extra large avocado  1 avocado          NaN   

     packaging_tags                        brands  \
5               NaN  la compagnie des fruits mûrs   
6               NaN          

In [222]:
avocados_uk = avocado[(avocado['countries']=='United Kingdom')]

print(avocados_uk)

               code  lc               product_name_en   quantity serving_size  \
361        00985833  en                       Avacado      650 g          NaN   
381        00040464  en                       Avocado        NaN          NaN   
414   4088600100173  en                       Avocado      100 g          NaN   
468        01307351  en              Avacados organic        NaN          NaN   
508   5057172125395  en      Just Essentials Avocados      4pack          NaN   
510        23066755  en         Ready to Eat Avocados          2          NaN   
708        03201985  en                       Avocado          2          80g   
781        10096369  en                       Avocado        NaN        100 g   
850        00184915  en        Rich & creamy avocados        NaN          NaN   
1190       01600322  en  Ripe & ready medium avocados          2          NaN   
1301  5000128606387  en                      Avocados        NaN          NaN   
1413  2322725400995  en     

In [223]:
avocado_origin = (avocados_uk['origins_tags'].value_counts().index[0])
avocado_origin = avocado_origin.lstrip("en:")

print(avocado_origin)

peru


In [224]:
def read_and_filter_data(filename, relevant_categories):
  df = pd.read_csv('data/' + filename, sep='\t')
  
  # Subset large DataFrame to include only relevant columns
  subset_columns = [ 'code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']
  df = df[subset_columns]

  # Split tags into lists
  df['categories_list'] = df['categories_tags'].str.split(',')

  # Drop rows with null categories data
  df = df.dropna(subset = 'categories_list')

  # Filter data for relevant categories
  df = df[df['categories_list'].apply(lambda x: any([i for i in x if i in relevant_categories]))]
    
  # Filter data for the UK
  df_uk = df[(df['countries']=='United Kingdom')]

  # Find top origin country string with the highest count
  top_origin_string = (df_uk['origins_tags'].value_counts().index[0])

  # Clean up top origin country string
  top_origin_country = top_origin_string.lstrip("en:")
  top_origin_country = top_origin_country.replace('-', ' ')

  print(f'**{filename[:-4]} origins**','\n', top_origin_country, '\n')

  print ("Top origin country: ", top_origin_country)
  print ("\n")

  # End of function - return top origin country for this ingredient
  return top_origin_country

In [225]:
# Analyze avocado origins again, this time by calling function
top_avocado_origin = read_and_filter_data('avocado.csv',relevant_avocado_categories)

**avocado origins** 
 peru 

Top origin country:  peru




In [226]:
### Repeat process above with new function for the other 2 ingredients

# Gather relevant categories data for olive oil
with open("data/relevant_olive_oil_categories.txt", "r") as file:
    relevant_olive_oil_categories = file.read().splitlines()
    file.close()


In [228]:
# Call user-defined function on olive_oil.csv
top_olive_oil_origin = read_and_filter_data('olive_oil.csv',relevant_olive_oil_categories)

print(top_olive_oil_origin)

**olive_oil origins** 
 greece 

Top origin country:  greece


greece


  df = pd.read_csv('data/' + filename, sep='\t')


In [229]:
# Gather relevant categories data for sourdough
with open("data/relevant_sourdough_categories.txt", "r") as file:
    relevant_sourdough_categories = file.read().splitlines()
    file.close()

# Call user-defined function on sourdough.csv
top_sourdough_origin = read_and_filter_data('sourdough.csv',relevant_sourdough_categories)

**sourdough origins** 
 united kingdom 

Top origin country:  united kingdom


