# What's in an Avocado Toast: A Supply Chain Analysis

![](avocado_wallpaper.jpeg)

You find yourself in London, crafting a delectable avocado toast, a dish that has risen dramatically in popularity on breakfast menus since the 2010s. This straightforward recipe requires just five ingredients: a ripe avocado, half a lemon, a generous pinch of salt flakes, two slices of sourdough bread, and a good drizzle of extra virgin olive oil. Most of these ingredients are now staples in grocery stores, and as you will find with this project, that is no small feat!

In this project, you'll conduct a supply chain analysis of three ingredients used in avocado toast using the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish.

Three pairs of files are provided in the data folder:
- A CSV file for each ingredient, such as `avocado.csv`, with data about each food item and countries of origin.
- A TXT file for each ingredient, such as `relevant_avocado_categories`, containing only the category tags of interest for that food.

Here are some other key points about these files:
- Some of the rows of data in each of the three CSV files do not contain relevant data for your investigation. In each dataset, you will need to filter out rows with irrelevant data, based on values in the `categories_tags` column. Examples of categories are fruits, vegetables, and fruit-based oils. Filter the DataFrame to include only rows where `categories_tags` contains one of the tags in the relevant categories for that ingredient.
- Each row of data usually has multiple category tags in the `categories_tags` column.
There is a column in each CSV file called `origins_tags`, which contains strings for the country of origin of each item.

After completing this project, you'll be armed with a list of ingredients and their countries of origin and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

[Open Food Facts database](https://world.openfoodfacts.org/)

In [2]:
import pandas as pd
import os


In [3]:
# df = pd.read_csv("data/avocado.txt", sep="\t")
os.listdir("data")

['relevant_avocado_categories.txt',
 'olive_oil.csv',
 'sourdough.csv',
 'lemon.csv',
 'relevant_olive_oil_categories.txt',
 'salt_flakes.csv',
 'relevant_sourdough_categories.txt',
 'avocado.csv']

In [4]:
df1 = pd.read_csv("data/avocado.csv", sep="\t")
df2 = pd.read_csv("data/olive_oil.csv", sep="\t")
df3 = pd.read_csv("data/sourdough.csv", sep="\t")

  df2 = pd.read_csv("data/olive_oil.csv", sep="\t")


In [5]:
df1.head()

Unnamed: 0,code,lc,product_name_de,product_name_el,product_name_en,product_name_es,product_name_fi,product_name_fr,product_name_id,product_name_it,...,off:ecoscore_data.adjustments.packaging.non_recyclable_and_non_biodegradable_materials,off:ecoscore_data.adjustments.production_system.value,off:ecoscore_data.adjustments.threatened_species.value,sources_fields:org-database-usda:available_date,sources_fields:org-database-usda:fdc_category,sources_fields:org-database-usda:fdc_data_source,sources_fields:org-database-usda:fdc_id,sources_fields:org-database-usda:modified_date,sources_fields:org-database-usda:publication_date,data_sources
0,59749979702,fr,,,,,,Naturalia Avocado Oil,,,...,1.0,0.0,,,,,,,,"App - yuka, Apps"
1,7610095131409,en,,,,,,Avocado Bowl chips,,,...,1.0,0.0,,,,,,,,"App - Yuka, Apps, Producers, Producer - zweifel"
2,4005514005578,en,,,Gelbe Linse Avocado Brotaufstrich,,,,,,...,1.0,15.0,,,,,,,,"App - yuka, Apps, App - smoothie-openfoodfacts"
3,879890002513,en,,,Avocado toast chili lime,,,,,,...,1.0,0.0,,,,,,,,"App - Yuka, Apps, App - InFood"
4,223086613685,en,,,Avocado,,,,,,...,1.0,0.0,,,,,,,,"App - Yuka, Apps"


In [11]:
print(f"df1 {df1.isnull().sum()}")
print(f"df2 {df2.isnull().sum()}")
print(f"df3 {df3.isnull().sum()}")


df1 code                                                    0
lc                                                      0
product_name_de                                      1684
product_name_el                                      1784
product_name_en                                       512
                                                     ... 
sources_fields:org-database-usda:fdc_data_source     1469
sources_fields:org-database-usda:fdc_id              1469
sources_fields:org-database-usda:modified_date       1469
sources_fields:org-database-usda:publication_date    1469
data_sources                                          105
Length: 184, dtype: int64
df2 code                                                      0
producer_product_id                                    8280
producer_version_id                                    8296
lc                                                        0
product_name_ar                                        8290
                            

In [76]:
df1 = df1.dropna(subset=['origins_tags'])
print(df1['origins_tags'])


14                    en:european-union
65                             en:spain
102                            en:chile
117                           en:to-317
146                           en:turkey
256                           en:mexico
283                           en:mexico
288                             en:peru
361                             en:peru
410                           en:mexico
490     en:dairy-from-the-united-states
508                             en:peru
510                    en:spain,en:peru
586                       en:etats-unis
619                           en:mexico
626                         en:zimbabwe
702                    en:united-states
708                    en:chile,en:peru
737                             en:peru
750                             en:peru
759                        en:australia
812                             en:peru
828                            en:spain
829                         en:colombia
838                             en:peru


In [77]:
def top_origin(df,relevant_cols,categories_tags):
    df = df[relevant_cols]
    
    df = df.dropna(subset=['countries_tags'])
    df = df[df['countries'].str.contains("United Kingdom", na=False)]

    with open(categories_tags, "r") as f:
        cat_tags = {line.strip() for line in f if line.strip()}
    
    df = df.dropna(subset=["categories_tags"])
    df = df[df["categories_tags"].apply( lambda tags: any(tag in cat_tags for tag in tags.split(','))
          )]

    df = df.dropna(subset=["origins","origins_tags"])
    top_tag = df['origins_tags'].mode()[0]
    top_rows = df[df['origins_tags'] == top_tag]
    top_origin = top_rows['origins'].iloc[0]

    return top_origin
    
    
relevant_cols = ['code', 'lc', 'product_name_en', 'quantity', 'serving_size',
                     'packaging_tags', 'brands', 'brands_tags','categories_tags', 
                     'labels_tags', 'countries', 'countries_tags','origins', 'origins_tags']

top_avocado_origin = top_origin( df1,relevant_cols,'data/relevant_avocado_categories.txt')

top_olive_oil_origin = top_origin(df2,relevant_cols,'data/relevant_olive_oil_categories.txt')

top_sourdough_origin = top_origin(df3,relevant_cols,'data/relevant_sourdough_categories.txt')

print(f"The top avocado origin is {top_avocado_origin}\n"
      f"The top olive oil origin is {top_olive_oil_origin}\n"
      f"The top sourdough origin is {top_sourdough_origin}")
    
    

The top avocado origin is Peru
The top olive oil origin is Greece
The top sourdough origin is United Kingdom
