# What's in an Avocado Toast: A Supply Chain Analysis

![](avocado_wallpaper.jpeg)

You find yourself in London, crafting a delectable avocado toast, a dish that has risen dramatically in popularity on breakfast menus since the 2010s. This straightforward recipe requires just five ingredients: a ripe avocado, half a lemon, a generous pinch of salt flakes, two slices of sourdough bread, and a good drizzle of extra virgin olive oil. Most of these ingredients are now staples in grocery stores, and as you will find with this project, that is no small feat!

In this project, you'll conduct a supply chain analysis of three ingredients used in avocado toast using the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish.

Three pairs of files are provided in the data folder:
- A CSV file for each ingredient, such as `avocado.csv`, with data about each food item and countries of origin.
- A TXT file for each ingredient, such as `relevant_avocado_categories`, containing only the category tags of interest for that food.

Here are some other key points about these files:
- Some of the rows of data in each of the three CSV files do not contain relevant data for your investigation. In each dataset, you will need to filter out rows with irrelevant data, based on values in the `categories_tags` column. Examples of categories are fruits, vegetables, and fruit-based oils. Filter the DataFrame to include only rows where `categories_tags` contains one of the tags in the relevant categories for that ingredient.
- Each row of data usually has multiple category tags in the `categories_tags` column.
There is a column in each CSV file called `origins_tags`, which contains strings for the country of origin of each item.

After completing this project, you'll be armed with a list of ingredients and their countries of origin and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

[Open Food Facts database](https://world.openfoodfacts.org/)

In [126]:
import pandas as pd

In [127]:
filename_avocado = 'data/avocado.csv'
avocado_data = pd.read_csv(filename_avocado, sep='\t')
row_count = len(avocado_data)
print(f"Cantidad de datos inicial: {row_count}")
avocado_data = avocado_data[['code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins', 'origins_tags']]
avocado_data.head()

Cantidad de datos inicial: 1785


Unnamed: 0,code,lc,product_name_en,quantity,serving_size,packaging_tags,brands,brands_tags,categories_tags,labels_tags,countries,countries_tags,origins,origins_tags
0,59749979702,fr,,,,,Naturalia,naturalia,"en:plant-based-foods-and-beverages,en:plant-ba...",,Canada,en:canada,,
1,7610095131409,en,,,,,Zweifel,zweifel,"en:snacks,en:salty-snacks,en:appetizers,en:chi...","en:vegetarian,en:vegan","Switzerland, World","en:switzerland,en:world",,
2,4005514005578,en,Gelbe Linse Avocado Brotaufstrich,,,,Tartex,tartex,de:abendbrotsufstrich,"en:organic,en:eu-organic,en:eg-oko-verordnung",Germany,en:germany,,
3,879890002513,en,Avocado toast chili lime,,,,,,,,United States,en:united-states,,
4,223086613685,en,Avocado,,,,,,,,United States,en:united-states,,


In [128]:
filename_categories = 'data/relevant_avocado_categories.txt'
with open(filename_categories, 'r') as file:
    relevant_avocado_categories = [line.strip() for line in file.readlines()]

# Display the list of lines
print(relevant_avocado_categories)

['en:avocadoes', 'en:avocados', 'en:fresh-foods', 'en:fresh-vegetables', 'en:fruchte', 'en:fruits', 'en:raw-green-avocados', 'en:tropical-fruits', 'en:tropische-fruchte', 'en:vegetables-based-foods', 'fr:hass-avocados']


In [129]:
# Convert the 'categories' column to strings
avocado_data['categories_tags'] = avocado_data['categories_tags'].astype(str)
avocado_data['categories_tags'] = avocado_data['categories_tags'].fillna('')


In [130]:
avocado_data['exists_in_other'] = avocado_data['categories_tags'].apply(
    lambda x: any(item.strip() in relevant_avocado_categories for item in x.split(','))
)

filtered_data = avocado_data[avocado_data['exists_in_other'] == True]

# Display the filtered DataFrame
print(f"\nCantidad de registros que están en las categorias relevantes: {len(filtered_data)}")
filtered_data



Cantidad de registros que están en las categorias relevantes: 182


Unnamed: 0,code,lc,product_name_en,quantity,serving_size,packaging_tags,brands,brands_tags,categories_tags,labels_tags,countries,countries_tags,origins,origins_tags,exists_in_other
5,3662994002063,fr,,3 fruits,,,la compagnie des fruits mûrs,la-compagnie-des-fruits-murs,"en:plant-based-foods-and-beverages,en:plant-ba...",,France,en:france,,,True
6,8437013031011,fr,,1 kg,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,France,en:france,,,True
14,4016249238155,de,,135g,100g,de:gläschen,Allos,allos,"en:plant-based-foods-and-beverages,en:plant-ba...","en:organic,en:vegetarian,en:eu-organic,en:no-g...",Deutschland,en:germany,Europäische Union,en:european-union,True
17,8718963381532,de,,,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,Deutschland,en:germany,,,True
23,8436002746707,es,,,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,España,en:spain,,,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1751,3700353611218,fr,,,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,France,en:france,,,True
1756,4311527575718,de,,,,de:aufkleber,Edeka,edeka,"en:plant-based-foods-and-beverages,en:plant-ba...",,Deutschland,en:germany,Peru,en:peru,True
1757,4311527571499,en,,,,en:aufkleber,Edeka,edeka,"en:plant-based-foods-and-beverages,en:plant-ba...",,Germany,en:germany,,,True
1769,3439496511399,fr,,,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,France,en:france,,,True


In [131]:
#Top avocado origin 
uk_avocadoes = filtered_data[(filtered_data['countries'] == 'United Kingdom')]
top_avocado_origin=uk_avocadoes['origins_tags'].value_counts().index[0].lstrip("en:")
print(top_avocado_origin)


peru


Now is needed to created a function and make this a parametrized work.

In [132]:
def get_top_ingredient_origin(ingredient_filename, relevant_categories_filename):
    
    #Get ingredient data
    filename_ingredient = ingredient_filename
    ingredient_data = pd.read_csv(filename_ingredient, sep='\t')
    
    relevant_ingredient_data = ingredient_data[['code', 'lc', 'product_name_en', 'quantity',   'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins', 'origins_tags']]
    relevant_ingredient_data.head() 
    
    #Get relevant categories
    filename_categories = relevant_categories_filename
    with open(filename_categories, 'r') as file:
        relevant_ingredient_categories = [line.strip() for line in file.readlines()]
    
    # Convert the 'categorie_tags' column to strings
    relevant_ingredient_data['categories_tags'] = relevant_ingredient_data['categories_tags'].astype(str)
    relevant_ingredient_data = relevant_ingredient_data.dropna(subset = 'categories_tags')
    relevant_ingredient_data['categories_tags'] = relevant_ingredient_data['categories_tags'].str.split(',')
    
    relevant_ingredient_data = relevant_ingredient_data[relevant_ingredient_data['categories_tags'].apply(lambda x: any([i for i in x if i in relevant_ingredient_categories]))]
    
    #Top avocado origin 
    filtered_data_uk = relevant_ingredient_data[(relevant_ingredient_data['countries'] == 'United Kingdom')]
    top_ingredient_origin = filtered_data_uk['origins_tags'].value_counts().index[0].lstrip("en:").replace('-',' ')
    return top_ingredient_origin
    

In [133]:
# INGREDIENT: OLIVE OIL
top_olive_oil_origin = get_top_ingredient_origin('data/olive_oil.csv','data/relevant_olive_oil_categories.txt')
print("The most common country of origin for olive oil is: ",top_olive_oil_origin)

# INGREDIENT: SOUR DOUGH
top_sourdough_origin = get_top_ingredient_origin('data/sourdough.csv','data/relevant_sourdough_categories.txt')
print("The most common country of origin for sourdough is: ", top_sourdough_origin)


The most common country of origin for olive oil is:  greece
The most common country of origin for sourdough is:  united kingdom
