# What's in an Avocado Toast: A Supply Chain Analysis

![](avocado_wallpaper.jpeg)

You find yourself in London, crafting a delectable avocado toast, a dish that has risen dramatically in popularity on breakfast menus since the 2010s. This straightforward recipe requires just five ingredients: a ripe avocado, half a lemon, a generous pinch of salt flakes, two slices of sourdough bread, and a good drizzle of extra virgin olive oil. Most of these ingredients are now staples in grocery stores, and as you will find with this project, that is no small feat!

In this project, you'll conduct a supply chain analysis of three ingredients used in avocado toast using the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish.

Three pairs of files are provided in the data folder:
- A CSV file for each ingredient, such as `avocado.csv`, with data about each food item and countries of origin.
- A TXT file for each ingredient, such as `relevant_avocado_categories`, containing only the category tags of interest for that food.

Here are some other key points about these files:
- Some of the rows of data in each of the three CSV files do not contain relevant data for your investigation. In each dataset, you will need to filter out rows with irrelevant data, based on values in the `categories_tags` column. Examples of categories are fruits, vegetables, and fruit-based oils. Filter the DataFrame to include only rows where `categories_tags` contains one of the tags in the relevant categories for that ingredient.
- Each row of data usually has multiple category tags in the `categories_tags` column.
There is a column in each CSV file called `origins_tags`, which contains strings for the country of origin of each item.

After completing this project, you'll be armed with a list of ingredients and their countries of origin and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

[Open Food Facts database](https://world.openfoodfacts.org/)

In [58]:
# Importing libraries

import pandas as pd

## FOR AVOCADOS:

In [59]:
# Importing the data with error handling for inconsistent rows
pd_avocado = pd.read_csv('data/avocado.csv', sep='\t', on_bad_lines='skip')

# Looking at the data
print(pd_avocado.head())
print(pd_avocado.describe())

            code  lc product_name_de product_name_el  \
0  0059749979702  fr             NaN             NaN   
1  7610095131409  en             NaN             NaN   
2  4005514005578  en             NaN             NaN   
3  0879890002513  en             NaN             NaN   
4  0223086613685  en             NaN             NaN   

                     product_name_en product_name_es product_name_fi  \
0                                NaN             NaN             NaN   
1                                NaN             NaN             NaN   
2  Gelbe Linse Avocado Brotaufstrich             NaN             NaN   
3           Avocado toast chili lime             NaN             NaN   
4                            Avocado             NaN             NaN   

         product_name_fr product_name_id product_name_it  ...  \
0  Naturalia Avocado Oil             NaN             NaN  ...   
1     Avocado Bowl chips             NaN             NaN  ...   
2                    NaN           

In [60]:
# Subsetting useful columns

useful_columns = ['code', 'lc', 'productnameen', 'quantity', 'servingsize', 'packagingtags', 'brands', 'brandstags', 'categoriestags', 'labelstags', 'countries', 'countriestags', 'origins', 'origins_tags']
existing_useful_columns = [col for col in useful_columns if col in pd_avocado]

pd_avocado_subset = pd_avocado[existing_useful_columns]
print(pd_avocado_subset)

               code  lc quantity                                   brands  \
0     0059749979702  fr      NaN                                Naturalia   
1     7610095131409  en      NaN                                  Zweifel   
2     4005514005578  en      NaN                                   Tartex   
3     0879890002513  en      NaN                                      NaN   
4     0223086613685  en      NaN                                      NaN   
...             ...  ..      ...                                      ...   
1780  0819573012712  en      NaN                                Happybaby   
1781  0052200072097  en      NaN  Beech-Nut,  Beech-Nut Nutrition Company   
1782  0793613300000  en      NaN                           Classy Delites   
1783       05252428  en      NaN                                Beech-Nut   
1784  0052200072141  en      NaN                                Beech-Nut   

               countries origins origins_tags  
0                 Canada   

In [61]:
# Sorting and subsetting for country of highest origin for avocado:

# Filter for which country's data:
pd_avocado_subset_UK = pd_avocado_subset[pd_avocado_subset['countries'] == "United Kingdom"]

# Group by 'countries' and count the occurrences
country_counts = pd_avocado_subset_UK['origins_tags'].value_counts()

# Sort the counts in descending order
top_avocado_origin = country_counts.sort_values(ascending=False)

print(top_avocado_origin.head())

origins_tags
en:peru             2
en:spain,en:peru    1
en:chile,en:peru    1
en:israel           1
en:south-africa     1
Name: count, dtype: int64


In [62]:
# Submitting the answer
top_avocado_origin = top_avocado_origin.index[0]
top_avocado_origin = top_avocado_origin.replace('en:', '').replace('-', ' ').title()
print(top_avocado_origin)

Peru


In [63]:
# Function to call for each ingredient
def common_origin(df):
    """
    Analyzes the origins of products filtered by country.

    Parameters:
    df (DataFrame): A pandas DataFrame containing product metadata.

    Returns:
    Series: A sorted count of origin tags from products associated with the United Kingdom.

    Notes:
    - Only includes relevant columns if present.
    - Filters data where 'countries' is 'United Kingdom'.
    - Returns a Series where the index contains origin tags and the values are their respective counts.
    """
    useful_columns = ['code', 'lc', 'productnameen', 'quantity', 'servingsize', 'packagingtags', 'brands',
                      'brandstags', 'categoriestags', 'labelstags', 'countries', 'countriestags', 'origins', 
                      'origins_tags']
    existing_useful_columns = [col for col in useful_columns if col in df.columns]
    pd_subset = df[existing_useful_columns]
    
    # Filter for which country's data:
    pd_subset = pd_subset[pd_subset['countries'] == "United Kingdom"]
    
    # Group by 'countries' and count the occurrences
    country_counts = pd_subset['origins_tags'].value_counts()

    # Sort the counts in descending order
    country_counts_sorted = country_counts.sort_values(ascending=False)
    
    return country_counts_sorted

## FOR OLIVE OIL:

In [64]:
# Importing the data with error handling for inconsistent rows
pd_olive_oil = pd.read_csv('data/olive_oil.csv', sep='\t', on_bad_lines='skip')

# Finding the most common country of origin for each ingredient:
top_olive_oil_origin = common_origin(pd_olive_oil)
print(top_olive_oil_origin.head)

<bound method NDFrame.head of origins_tags
en:greece                                             6
en:spain                                              4
en:italy                                              4
en:greece,en:italy,en:portugal,en:spain,en:tunisia    2
en:produce-of-italy                                   1
en:morocco                                            1
en:european-union-and-non-european-union              1
en:maldives                                           1
en:united-kingdom                                     1
en:southwest-atlantic                                 1
en:produced-in-italy                                  1
en:european-union                                     1
en:india                                              1
Name: count, dtype: int64>


  pd_olive_oil = pd.read_csv('data/olive_oil.csv', sep='\t', on_bad_lines='skip')


In [65]:
# Submitting the answer
top_olive_oil_origin = top_olive_oil_origin.index[0]
top_olive_oil_origin = top_olive_oil_origin.replace('en:', '').replace('-', ' ').title()
print(top_olive_oil_origin)

Greece


## FOR SOURDOUGH

In [66]:
# Importing the data with error handling for inconsistent rows
pd_sourdough = pd.read_csv('data/sourdough.csv', sep='\t', on_bad_lines='skip')

# Finding the most common country of origin for each ingredient:
top_sourdough_origin = common_origin(pd_sourdough)
print(top_sourdough_origin)

origins_tags
en:united-kingdom    5
en:italy             2
en:france            1
Name: count, dtype: int64


In [67]:
# Submitting the answer
top_sourdough_origin = top_sourdough_origin.index[0]
top_sourdough_origin = top_sourdough_origin.replace('en:', '').replace('-', ' ').title()
print(top_sourdough_origin)

United Kingdom
