# What's in an Avocado Toast: A Supply Chain Analysis

You're in London, making an avocado toast, a quick-to-make dish that has soared in popularity on breakfast menus since the 2010s. A simple smashed avocado toast can be made with five ingredients: one ripe avocado, half a lemon, a big pinch of salt flakes, two slices of sourdough bread and a good drizzle of extra virgin olive oil. It's no small feat that most of these ingredients are readily available in grocery stores. 

In this project, you'll conduct a supply chain analysis of three of these ingredients used in an avocado toast, utilizing the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish.

Three pairs of files are provided in the data folder:
- A CSV file for each ingredient, such as `avocado.csv`, with data about each food item and countries of origin
- A TXT file for each ingredient, such as `relevant_avocado_categories`, containing only the category tags of interest for that food.

Here are some other key points about these files:
- Some of the rows of data in each of the three CSV files do not contain relevant data for your investigation. In each dataset, you will need to filter out rows with irrelevant data, based on values in the `categories_tags` column. Examples of categories are, fruits, vegetables, and fruit-based oils. Filter the DataFrame to include only rows where `categories_tags` contains one of the tags in the relevant categories for that ingredient.
- Each row of data usually has multiple categories tags in the `categories_tags` column.
- There is a column in each CSV file called `origins_tags` with strings for country of origin of that item.

After completing this project, you'll be armed with a list of ingredients and their countries of origin, and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

![](avocado_wallpaper.jpeg)

In [None]:
# Ignoring the warning outputs when reading some of the files
import warnings
warnings.filterwarnings('ignore')

In [16]:
import pandas as pd

avocado = pd.read_csv('data/avocado.csv', delimiter = '\t')

# Subsetting with the relevant columns
relevant_columns = ['code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']

avocado = avocado[relevant_columns]

avocado.head()


Unnamed: 0,code,lc,product_name_en,quantity,serving_size,packaging_tags,brands,brands_tags,categories_tags,labels_tags,countries,countries_tags,origins,origins_tags
0,59749979702,fr,,,,,Naturalia,naturalia,"en:plant-based-foods-and-beverages,en:plant-ba...",,Canada,en:canada,,
1,7610095131409,en,,,,,Zweifel,zweifel,"en:snacks,en:salty-snacks,en:appetizers,en:chi...","en:vegetarian,en:vegan","Switzerland, World","en:switzerland,en:world",,
2,4005514005578,en,Gelbe Linse Avocado Brotaufstrich,,,,Tartex,tartex,de:abendbrotsufstrich,"en:organic,en:eu-organic,en:eg-oko-verordnung",Germany,en:germany,,
3,879890002513,en,Avocado toast chili lime,,,,,,,,United States,en:united-states,,
4,223086613685,en,Avocado,,,,,,,,United States,en:united-states,,


In [15]:
# Getting the informations from the txt file
with open('data/relevant_avocado_categories.txt', "r") as file:
    relevant_avocado_categories = file.read().splitlines()
    file.close()
    
relevant_avocado_categories

['en:avocadoes',
 'en:avocados',
 'en:fresh-foods',
 'en:fresh-vegetables',
 'en:fruchte',
 'en:fruits',
 'en:raw-green-avocados',
 'en:tropical-fruits',
 'en:tropische-fruchte',
 'en:vegetables-based-foods',
 'fr:hass-avocados']

In [20]:
avocado.head()

Unnamed: 0,code,lc,product_name_en,quantity,serving_size,packaging_tags,brands,brands_tags,categories_tags,labels_tags,countries,countries_tags,origins,origins_tags
0,59749979702,fr,,,,,Naturalia,naturalia,"en:plant-based-foods-and-beverages,en:plant-ba...",,Canada,en:canada,,
1,7610095131409,en,,,,,Zweifel,zweifel,"en:snacks,en:salty-snacks,en:appetizers,en:chi...","en:vegetarian,en:vegan","Switzerland, World","en:switzerland,en:world",,
2,4005514005578,en,Gelbe Linse Avocado Brotaufstrich,,,,Tartex,tartex,de:abendbrotsufstrich,"en:organic,en:eu-organic,en:eg-oko-verordnung",Germany,en:germany,,
3,879890002513,en,Avocado toast chili lime,,,,,,,,United States,en:united-states,,
4,223086613685,en,Avocado,,,,,,,,United States,en:united-states,,


In [21]:
avocado['categories_list'] = avocado['categories_tags'].str.split(',')
avocado = avocado.dropna(subset = 'categories_list')
# Filtering based off of the txt file categories
avocado = avocado[avocado['categories_list'].apply(lambda x: any([i for i in x if i in relevant_avocado_categories]))]

avocado

Unnamed: 0,code,lc,product_name_en,quantity,serving_size,packaging_tags,brands,brands_tags,categories_tags,labels_tags,countries,countries_tags,origins,origins_tags,categories_list
5,3662994002063,fr,,3 fruits,,,la compagnie des fruits mûrs,la-compagnie-des-fruits-murs,"en:plant-based-foods-and-beverages,en:plant-ba...",,France,en:france,,,"[en:plant-based-foods-and-beverages, en:plant-..."
6,8437013031011,fr,,1 kg,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,France,en:france,,,"[en:plant-based-foods-and-beverages, en:plant-..."
14,4016249238155,de,,135g,100g,de:gläschen,Allos,allos,"en:plant-based-foods-and-beverages,en:plant-ba...","en:organic,en:vegetarian,en:eu-organic,en:no-g...",Deutschland,en:germany,Europäische Union,en:european-union,"[en:plant-based-foods-and-beverages, en:plant-..."
17,8718963381532,de,,,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,Deutschland,en:germany,,,"[en:plant-based-foods-and-beverages, en:plant-..."
23,8436002746707,es,,,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,España,en:spain,,,"[en:plant-based-foods-and-beverages, en:plant-..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1751,3700353611218,fr,,,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,France,en:france,,,"[en:plant-based-foods-and-beverages, en:plant-..."
1756,4311527575718,de,,,,de:aufkleber,Edeka,edeka,"en:plant-based-foods-and-beverages,en:plant-ba...",,Deutschland,en:germany,Peru,en:peru,"[en:plant-based-foods-and-beverages, en:plant-..."
1757,4311527571499,en,,,,en:aufkleber,Edeka,edeka,"en:plant-based-foods-and-beverages,en:plant-ba...",,Germany,en:germany,,,"[en:plant-based-foods-and-beverages, en:plant-..."
1769,3439496511399,fr,,,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,France,en:france,,,"[en:plant-based-foods-and-beverages, en:plant-..."


In [22]:
# Filter for when the country receiving is United Kingdom
top_avocado = avocado[avocado['countries'] == 'United Kingdom']
# Counting the country of origin from which we get the most avocados
top_avocado_origin = top_avocado['origins_tags'].value_counts().index[0]

top_avocado_origin = top_avocado_origin.lstrip('en:')

top_avocado_origin


'peru'

In [12]:
def read_and_filter_data(filename, txt_file):
    '''
    Takes a csv file and a text file with the most relevant avocado categories,
    and returns the top relevant country of origin for the product when ordered in the UK.

    Args:
        filename (.csv file): File with the information.
        txt_file (.txt file): Text file containing the relevant categories. 

    Returns:
        top_origin (str): string of the top country for the selected product when ordering from the UK.
    '''
    # Read the file
    df = pd.read_csv(filename, delimiter = '\t')
    
    # Getting the informations from the txt file
    with open(txt_file, "r") as file:
        relevant_categories = file.read().splitlines()
        file.close()
    
    # Filtering for the relevant categories
    df['categories_list'] = df['categories_tags'].str.split(',')
    df = df.dropna(subset = 'categories_list')
    # Filtering based off of the txt file categories
    df = df[df['categories_list'].apply(lambda x: any([i for i in x if i in relevant_categories]))]
    
    # Filter for when the country receiving is United Kingdom
    top = df[df['countries'] == 'United Kingdom']
    # Counting the country of origin from which we get the most avocados
    top_origin = top['origins_tags'].value_counts().index[0]

    top_origin = top_origin.lstrip('en:')
    top_origin = top_origin.replace('-', ' ')
   

    return top_origin



In [37]:
# Making sure the function works, the expected output is 'peru'
print(f"The top origin for Avocado is {read_and_filter_data('data/avocado.csv', 'data/relevant_avocado_categories.txt')}")

The top origin for Avocado is peru


In [36]:
# Executing for oil and sourdough
top_olive_oil_origin = read_and_filter_data('data/olive_oil.csv', 'data/relevant_olive_oil_categories.txt')

top_sourdough_origin = read_and_filter_data('data/sourdough.csv', 'data/relevant_sourdough_categories.txt')

results = {'Olive oil': top_olive_oil_origin, 'Sourdough': top_sourdough_origin}

for key, value in results.items():
    print(f"The top origin for {key} is {value}")

The top origin for Olive oil is greece
The top origin for Sourdough is united kingdom


## Conclusion

Using the relevant categories that were given to us in combination with the data that we have on the different products, we were able to autmomate the retrieval of the top origin when ordering from the UK for each product (Avocado, Olive oil and Sourdough).