# What's in an Avocado Toast: A Supply Chain Analysis

You're in London, making an avocado toast, a quick-to-make dish that has soared in popularity on breakfast menus since the 2010s. A simple smashed avocado toast can be made with five ingredients: one ripe avocado, half a lemon, a big pinch of salt flakes, two slices of sourdough bread and a good drizzle of extra virgin olive oil. It's no small feat that most of these ingredients are readily available in grocery stores. 

In this project, you'll conduct a supply chain analysis of three of these ingredients used in an avocado toast, utilizing the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish.

Three pairs of files are provided in the data folder:
- A CSV file for each ingredient, such as `avocado.csv`, with data about each food item and countries of origin
- A TXT file for each ingredient, such as `relevant_avocado_categories`, containing only the category tags of interest for that food.

Here are some other key points about these files:
- Some of the rows of data in each of the three CSV files do not contain relevant data for your investigation. In each dataset, you will need to filter out rows with irrelevant data, based on values in the `categories_tags` column. Examples of categories are, fruits, vegetables, and fruit-based oils. Filter the DataFrame to include only rows where `categories_tags` contains one of the tags in the relevant categories for that ingredient.
- Each row of data usually has multiple categories tags in the `categories_tags` column.
- There is a column in each CSV file called `origins_tags` with strings for country of origin of that item.

After completing this project, you'll be armed with a list of ingredients and their countries of origin, and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.
![](avocado_wallpaper.jpeg)

In [1]:
import pandas as pd

# Begin coding here ...

let's import our data

In [2]:
df = pd.read_csv('data/avocado.csv', sep='\t')

we want to understand the structure of out data 

In [3]:
df.head()

Unnamed: 0,code,lc,product_name_de,product_name_el,product_name_en,product_name_es,product_name_fi,product_name_fr,product_name_id,product_name_it,...,off:ecoscore_data.adjustments.packaging.non_recyclable_and_non_biodegradable_materials,off:ecoscore_data.adjustments.production_system.value,off:ecoscore_data.adjustments.threatened_species.value,sources_fields:org-database-usda:available_date,sources_fields:org-database-usda:fdc_category,sources_fields:org-database-usda:fdc_data_source,sources_fields:org-database-usda:fdc_id,sources_fields:org-database-usda:modified_date,sources_fields:org-database-usda:publication_date,data_sources
0,59749979702,fr,,,,,,Naturalia Avocado Oil,,,...,1.0,0.0,,,,,,,,"App - yuka, Apps"
1,7610095131409,en,,,,,,Avocado Bowl chips,,,...,1.0,0.0,,,,,,,,"App - Yuka, Apps, Producers, Producer - zweifel"
2,4005514005578,en,,,Gelbe Linse Avocado Brotaufstrich,,,,,,...,1.0,15.0,,,,,,,,"App - yuka, Apps, App - smoothie-openfoodfacts"
3,879890002513,en,,,Avocado toast chili lime,,,,,,...,1.0,0.0,,,,,,,,"App - Yuka, Apps, App - InFood"
4,223086613685,en,,,Avocado,,,,,,...,1.0,0.0,,,,,,,,"App - Yuka, Apps"


Creating names of variables

In [4]:
relevant_columns = ['code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']

df = df[relevant_columns]

df.head()

Unnamed: 0,code,lc,product_name_en,quantity,serving_size,packaging_tags,brands,brands_tags,categories_tags,labels_tags,countries,countries_tags,origins,origins_tags
0,59749979702,fr,,,,,Naturalia,naturalia,"en:plant-based-foods-and-beverages,en:plant-ba...",,Canada,en:canada,,
1,7610095131409,en,,,,,Zweifel,zweifel,"en:snacks,en:salty-snacks,en:appetizers,en:chi...","en:vegetarian,en:vegan","Switzerland, World","en:switzerland,en:world",,
2,4005514005578,en,Gelbe Linse Avocado Brotaufstrich,,,,Tartex,tartex,de:abendbrotsufstrich,"en:organic,en:eu-organic,en:eg-oko-verordnung",Germany,en:germany,,
3,879890002513,en,Avocado toast chili lime,,,,,,,,United States,en:united-states,,
4,223086613685,en,Avocado,,,,,,,,United States,en:united-states,,


Gather relevant categories data for avocado

In [5]:
with open("data/relevant_avocado_categories.txt", "r") as file:
    relevant_avocado_categories = file.read().splitlines()

filter dataframe by categories_tags

In [6]:
# Turn a column of comma-separated tags into a column of lists
df['categories_list'] = df['categories_tags'].str.split(',')

df.head()

Unnamed: 0,code,lc,product_name_en,quantity,serving_size,packaging_tags,brands,brands_tags,categories_tags,labels_tags,countries,countries_tags,origins,origins_tags,categories_list
0,59749979702,fr,,,,,Naturalia,naturalia,"en:plant-based-foods-and-beverages,en:plant-ba...",,Canada,en:canada,,,"[en:plant-based-foods-and-beverages, en:plant-..."
1,7610095131409,en,,,,,Zweifel,zweifel,"en:snacks,en:salty-snacks,en:appetizers,en:chi...","en:vegetarian,en:vegan","Switzerland, World","en:switzerland,en:world",,,"[en:snacks, en:salty-snacks, en:appetizers, en..."
2,4005514005578,en,Gelbe Linse Avocado Brotaufstrich,,,,Tartex,tartex,de:abendbrotsufstrich,"en:organic,en:eu-organic,en:eg-oko-verordnung",Germany,en:germany,,,[de:abendbrotsufstrich]
3,879890002513,en,Avocado toast chili lime,,,,,,,,United States,en:united-states,,,
4,223086613685,en,Avocado,,,,,,,,United States,en:united-states,,,


In [7]:
#drop na values in categories_list
df = df.dropna(subset=['categories_list'])

In [8]:
df = df[df['categories_list'].apply(lambda x: any([i for i in x if i in relevant_avocado_categories]))]

In [9]:
df.head()

Unnamed: 0,code,lc,product_name_en,quantity,serving_size,packaging_tags,brands,brands_tags,categories_tags,labels_tags,countries,countries_tags,origins,origins_tags,categories_list
5,3662994002063,fr,,3 fruits,,,la compagnie des fruits mûrs,la-compagnie-des-fruits-murs,"en:plant-based-foods-and-beverages,en:plant-ba...",,France,en:france,,,"[en:plant-based-foods-and-beverages, en:plant-..."
6,8437013031011,fr,,1 kg,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,France,en:france,,,"[en:plant-based-foods-and-beverages, en:plant-..."
14,4016249238155,de,,135g,100g,de:gläschen,Allos,allos,"en:plant-based-foods-and-beverages,en:plant-ba...","en:organic,en:vegetarian,en:eu-organic,en:no-g...",Deutschland,en:germany,Europäische Union,en:european-union,"[en:plant-based-foods-and-beverages, en:plant-..."
17,8718963381532,de,,,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,Deutschland,en:germany,,,"[en:plant-based-foods-and-beverages, en:plant-..."
23,8436002746707,es,,,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,España,en:spain,,,"[en:plant-based-foods-and-beverages, en:plant-..."


filter data based on country UK

In [10]:
df_uk = df[(df['countries']=='United Kingdom')]
df_uk

Unnamed: 0,code,lc,product_name_en,quantity,serving_size,packaging_tags,brands,brands_tags,categories_tags,labels_tags,countries,countries_tags,origins,origins_tags,categories_list
361,985833,en,Avacado,650 g,,,Marks & Spencer,marks-spencer,"en:plant-based-foods-and-beverages,en:plant-ba...",,United Kingdom,en:united-kingdom,Peru,en:peru,"[en:plant-based-foods-and-beverages, en:plant-..."
381,40464,en,Avocado,,,,,,"en:plant-based-foods-and-beverages,en:plant-ba...",,United Kingdom,en:united-kingdom,,,"[en:plant-based-foods-and-beverages, en:plant-..."
414,4088600100173,en,Avocado,100 g,,en:mixed-plastic-unknown,Aldi,aldi,"en:plant-based-foods-and-beverages,en:plant-ba...",,United Kingdom,en:united-kingdom,,,"[en:plant-based-foods-and-beverages, en:plant-..."
468,1307351,en,Avacados organic,,,"en:card-tray,en:ldpe-bag",Sainsbury’s SO organic,sainsbury-s-so-organic,"en:plant-based-foods-and-beverages,en:plant-ba...","en:organic,en:eu-organic,en:non-eu-agriculture...",United Kingdom,en:united-kingdom,,,"[en:plant-based-foods-and-beverages, en:plant-..."
508,5057172125395,en,Just Essentials Avocados,4pack,,en:mixed-plastic-film-packet-to-recycle,Asda,asda,"en:plant-based-foods-and-beverages,en:plant-ba...","en:class-i,en:contains-stones",United Kingdom,en:united-kingdom,Peru,en:peru,"[en:plant-based-foods-and-beverages, en:plant-..."
510,23066755,en,Ready to Eat Avocados,2,,en:mixed-plastic-bag,Asda,asda,"en:plant-based-foods-and-beverages,en:plant-ba...",,United Kingdom,en:united-kingdom,"Spain, Peru","en:spain,en:peru","[en:plant-based-foods-and-beverages, en:plant-..."
708,3201985,en,Avocado,2,80g,"en:mixed-plastic-packet,en:plastic-film",Tesco,tesco,"en:plant-based-foods-and-beverages,en:plant-ba...","en:tesco-nurture,en:vitamin-e-source",United Kingdom,en:united-kingdom,"Chile, Peru","en:chile,en:peru","[en:plant-based-foods-and-beverages, en:plant-..."
781,10096369,en,Avocado,,100 g,,Tesco,tesco,"en:plant-based-foods-and-beverages,en:plant-ba...",,United Kingdom,en:united-kingdom,,,"[en:plant-based-foods-and-beverages, en:plant-..."
850,184915,en,Rich & creamy avocados,,,"en:card-tray,en:mixed-plastic-sleeve",By Sainsbury's,by-sainsbury-s,"en:plant-based-foods-and-beverages,en:plant-ba...",,United Kingdom,en:united-kingdom,,,"[en:plant-based-foods-and-beverages, en:plant-..."
1190,1600322,en,Ripe & ready medium avocados,2,,"en:card-tray,en:ldpe-bag",Sainsbury's,sainsbury-s,"en:plant-based-foods-and-beverages,en:plant-ba...",en:class-i,United Kingdom,en:united-kingdom,Israel,en:israel,"[en:plant-based-foods-and-beverages, en:plant-..."


find the most common cities 

In [11]:
common_city_uk = (df_uk['origins_tags'].value_counts().index[0]).lstrip('en:')
common_city_uk

'peru'

Most common city in UK is Peru

Create a user-defined function to call for each ingredient

In [12]:
def read_and_filter_data(filename, relevant_categories):

    df = pd.read_csv(f'data/{filename}', sep='\t', low_memory=False)

    # Subset large DataFrame to include only relevant columns
    subset_columns = [ 'code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']
    df = df[subset_columns]

    # Split tags into lists
    df['categories_list'] = df['categories_tags'].str.split(',')

    # Drop rows with null categories data
    df = df.dropna(subset=['categories_list'])

    # Filter data for relevant categories
    df = df.dropna(subset=['categories_list'])


    df = df[df['categories_list'].apply(lambda x: any([i for i in x if i in relevant_categories]))]
    # Filter data for the UK
    df_uk = df[(df['countries']=='United Kingdom')]
    # Find top origin country string with the highest count
    top_origin_string = (df_uk['origins_tags'].value_counts().index[0])

    # Clean up top origin country string
    top_origin_country = top_origin_string.lstrip("en:")
    top_origin_country = top_origin_country.replace('-', ' ')

    print(f'**{filename[:-4]} origins**\n\t top_origin_country \n')

    print(f"Top origin country: {top_origin_country}")

    return top_origin_country

Analyze avocado origins again, this time by calling function

In [13]:
top_avocado_origin = read_and_filter_data('avocado.csv', relevant_avocado_categories)

**avocado origins**
	 top_origin_country 

Top origin country: peru


let's check the function on another dataset

In [14]:
with open('data/relevant_olive_oil_categories.txt', 'r') as f:
    relevant_olive_categories = f.read().splitlines()

In [15]:
top_olive_oil_origin = read_and_filter_data('olive_oil.csv', relevant_olive_categories)

**olive_oil origins**
	 top_origin_country 

Top origin country: greece


top region of olive oil is Greece

In [16]:
with open('data/relevant_sourdough_categories.txt', 'r') as f:
    relevant_sourdough_categories = f.read().splitlines()

In [17]:
top_sourdough_origin = read_and_filter_data('sourdough.csv', relevant_sourdough_categories)

**sourdough origins**
	 top_origin_country 

Top origin country: united kingdom


Top region of sourdough is United Kingdom 