# What's in an Avocado Toast: A Supply Chain Analysis

You're in London, making an avocado toast, a quick-to-make dish that has soared in popularity on breakfast menus since the 2010s. A simple smashed avocado toast can be made with five ingredients: one ripe avocado, half a lemon, a big pinch of salt flakes, two slices of sourdough bread and a good drizzle of extra virgin olive oil. It's no small feat that most of these ingredients are readily available in grocery stores. 

In this project, you'll conduct a supply chain analysis of three of these ingredients used in an avocado toast, utilizing the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish.

Three pairs of files are provided in the data folder:
- A CSV file for each ingredient, such as `avocado.csv`, with data about each food item and countries of origin
- A TXT file for each ingredient, such as `relevant_avocado_categories`, containing only the category tags of interest for that food.

Here are some other key points about these files:
- Some of the rows of data in each of the three CSV files do not contain relevant data for your investigation. In each dataset, you will need to filter out rows with irrelevant data, based on values in the `categories_tags` column. Examples of categories are, fruits, vegetables, and fruit-based oils. Filter the DataFrame to include only rows where `categories_tags` contains one of the tags in the relevant categories for that ingredient.
- Each row of data usually has multiple categories tags in the `categories_tags` column.
- There is a column in each CSV file called `origins_tags` with strings for country of origin of that item.

After completing this project, you'll be armed with a list of ingredients and their countries of origin, and be well-positioned to launch into other analyses that explore how long, on average, these ingredients spend at sea.

![](avocado_wallpaper.jpeg)

You will apply your data manipulation and analysis skills on the supply chain of ingredients for making an avocado toast in the U.K. You need to determine this information:

The name of the most common country(s) of origin for three key ingredients:   

  * avocados, 
  * olive oil, and 
  * sourdough.
  
For the solution, store this most common country of origin for each ingredient as a string, with one string for each country, in the appropriate variable:  
    * `top_avocado_origin `, 
    * `top_olive_oil_origin `, 
    * `top_sourdough_origin `. 
    
If there are any hyphens or other letters in the country name data, this needs to be cleaned up so there are only A-Z letters and (maybe) spaces in the name.

Note: Because the CSV data files are quite large, and have numerous unused columns, you should subset each of the DataFrames to only include these relevant columns:  `'code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags' `.

After you complete this project, feel free to analyze this food data for other questions you might be interested in exploring!



In [24]:
# Find our file names.
!ls data

avocado.csv			 relevant_olive_oil_categories.txt
olive_oil.csv			 relevant_sourdough_categories.txt
relevant_avocado_categories.txt  sourdough.csv


In [25]:
import pandas as pd, numpy as np

avocado = pd.read_csv('data/avocado.csv', sep='\t')
olive = pd.read_csv('data/olive_oil.csv', sep='\t')
sour = pd.read_csv('data/sourdough.csv', sep='\t')

# Make lists of relevant catgories
cat = pd.read_csv('data/relevant_avocado_categories.txt', header=None)
avocado_cat = list(pd.Series(cat[0]).values)
cat = pd.read_csv('data/relevant_olive_oil_categories.txt', header=None)
olive_cat = list(pd.Series(cat[0]).values)
cat = pd.read_csv('data/relevant_sourdough_categories.txt', header=None)
sour_cat = list(pd.Series(cat[0]).values)

subset = ['code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']

avocado, olive, sour = avocado[subset], olive[subset], sour[subset]

avocado.dropna(subset=['categories_tags', 'origins_tags'], inplace=True)
olive.dropna(subset=['categories_tags', 'origins_tags'], inplace=True)
sour.dropna(subset=['categories_tags', 'origins_tags'], inplace=True)

# Make categories_tags column into lists
for df in [avocado, olive, sour]:
    df['categories_tags'] = df['categories_tags'].str.split(',')

    
display(avocado_cat[:5], olive_cat[:5], sour_cat[:5])
print("\n")
display(avocado['categories_tags'].head(), olive['categories_tags'].head(), sour['categories_tags'].head())

['en:avocadoes',
 'en:avocados',
 'en:fresh-foods',
 'en:fresh-vegetables',
 'en:fruchte']

['ar:huile-d-olive',
 'ar:oil',
 'bg:green-olive-paste',
 'de:ol',
 'en:aceites-de-oliva']

['en:bagel-breads',
 'en:baguettes',
 'en:bakery-products',
 'en:bran-bread',
 'en:breads']





14     [en:plant-based-foods-and-beverages, en:plant-...
65     [en:plant-based-foods-and-beverages, en:plant-...
102    [en:plant-based-foods-and-beverages, en:plant-...
117    [en:plant-based-foods-and-beverages, en:plant-...
146    [en:plant-based-foods-and-beverages, en:plant-...
Name: categories_tags, dtype: object

0    [en:plant-based-foods-and-beverages, en:plant-...
1    [en:plant-based-foods-and-beverages, en:plant-...
2    [en:plant-based-foods-and-beverages, en:plant-...
3    [en:plant-based-foods-and-beverages, en:plant-...
4    [en:seafood, en:canned-foods, en:fishes, en:fa...
Name: categories_tags, dtype: object

32     [en:meats-and-their-products, en:meals, en:piz...
159    [en:plant-based-foods-and-beverages, en:plant-...
185    [en:plant-based-foods-and-beverages, en:plant-...
243    [en:snacks, en:salty-snacks, en:appetizers, en...
342                                 [en:sourdough-bread]
Name: categories_tags, dtype: object

In [26]:
print(avocado.shape, olive.shape, sour.shape, "\n")

def contains_target_elements(row_list, target_list):
    for elem in row_list:
        if elem in target_list:
            return True
    return False


# Loop won't work.  Don't know why.
avocado = avocado[avocado['categories_tags'].apply(contains_target_elements, target_list=avocado_cat)]

olive = olive[olive['categories_tags'].apply(contains_target_elements, target_list=olive_cat)]

sour = sour[sour['categories_tags'].apply(contains_target_elements, target_list=sour_cat)]
print(avocado.shape, olive.shape, sour.shape, "\n")
    


(54, 14) (355, 14) (18, 14) 

(44, 14) (298, 14) (14, 14) 



In [27]:
for df in [avocado, olive, sour]:
    df['origins_tags'] = df['origins_tags'].str.replace(r'^[a-z]{2}:', '', regex=True)
    df['origins_tags'] = df['origins_tags'].str.replace('-', ' ')
    
# Loop does not work here either.
# You're in London!
avocado = avocado[avocado['countries']=='United Kingdom']
olive = olive[olive['countries']=='United Kingdom']
sour = sour[sour['countries']=='United Kingdom']

display(avocado[['categories_tags', 'countries', 'origins_tags']])
display(olive[['categories_tags', 'countries', 'origins_tags']])
display(sour[['categories_tags', 'countries', 'origins_tags']])

Unnamed: 0,categories_tags,countries,origins_tags
361,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,peru
508,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,peru
510,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,"spain,en:peru"
708,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,"chile,en:peru"
1190,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,israel


Unnamed: 0,categories_tags,countries,origins_tags
164,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,spain
240,[en:olive-oil-from-kolymvari],United Kingdom,greece
296,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,italy
304,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,italy
491,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,produce of italy
707,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,greece
1175,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,spain
1300,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,european union and non european union
1315,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,italy
1365,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,"greece,en:italy,en:portugal,en:spain,en:tunisia"


Unnamed: 0,categories_tags,countries,origins_tags
159,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,united kingdom
185,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,united kingdom
634,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,france
776,"[en:plant-based-foods-and-beverages, en:plant-...",United Kingdom,united kingdom


In [28]:
top_avocado_origin = avocado['origins_tags'].value_counts().index[0]
top_olive_oil_origin = olive['origins_tags'].value_counts().index[0]
top_sourdough_origin = sour['origins_tags'].value_counts().index[0]

print(top_avocado_origin, top_olive_oil_origin, top_sourdough_origin)


peru greece united kingdom
