In this project, you'll conduct a supply chain analysis of three of these ingredients used in an avocado toast, utilizing the Open Food Facts database. This database contains extensive, openly-sourced information on various foods, including their origins. Through this analysis, you will gain an in-depth understanding of the complex supply chain involved in producing a single dish.

In [2]:
import pandas as pd

In [1]:
#Create a general function to read and filter data for a particular ingredient,
###    and return the top origin country for that food item
def read_and_filter_data(filename, relevant_categories):
  df = pd.read_csv(filename, sep='\t')

  # Subset large DataFrame to include only relevant columns
  subset_columns = [ 'code', 'lc', 'product_name_en', 'quantity', 'serving_size', 'packaging_tags', 'brands', 'brands_tags', 'categories_tags', 'labels_tags', 'countries', 'countries_tags', 'origins','origins_tags']
  df = df[subset_columns]

  # Split tags into lists
  df['categories_list'] = df['categories_tags'].str.split(',')

  # Drop rows with null categories data
  df = df.dropna(subset = 'categories_list')

  # Filter data for relevant categories
  df = df[df['categories_list'].apply(lambda x: any([i for i in x if i in relevant_categories]))]

  # Filter data for the UK
  df_uk = df[(df['countries']=='United Kingdom')]

  # Find top origin country string with the highest count
  top_origin_string = (df_uk['origins_tags'].value_counts().index[0])

  # Clean up top origin country string
  top_origin_country = top_origin_string.lstrip("en:")
  top_origin_country = top_origin_country.replace('-', ' ')

  print(f'**{filename[:-4]} origins**','\n', top_origin_country, '\n')

  print ("Top origin country: ", top_origin_country)
  print ("\n")

  # End of function - return top origin country for this ingredient
  return top_origin_country

In [3]:
#avocado oil supply chain
path_avocado = '/content/drive/MyDrive/dataset/avocado.csv'
with open('/content/relevant_avocado_categories.txt','r') as file:
  relevant_avocado = file.read().splitlines()
  file.close()

top_avocado = read_and_filter_data(path_avocado, relevant_avocado)

**/content/drive/MyDrive/dataset/avocado origins** 
 peru 

Top origin country:  peru




In [4]:
#olive oil supply chain
path_olive = '/content/drive/MyDrive/dataset/olive_oil.csv'

#loading the txt file
with open('/content/relevant_olive_oil_categories.txt','r') as file:
  relevant_olive = file.read().splitlines()
  file.close()

#top country for olive oil origin
top_olive = read_and_filter_data(path_olive, relevant_olive)

  df = pd.read_csv(filename, sep='\t')


**/content/drive/MyDrive/dataset/olive_oil origins** 
 greece 

Top origin country:  greece




In [5]:
#sourdough supply chain
path_sour = '/content/drive/MyDrive/dataset/sourdough.csv'

with open('/content/relevant_sourdough_categories.txt','r') as file:
  relevant_sour = file.read().splitlines()
  file.close()

top_sour = read_and_filter_data(path_sour, relevant_sour)

**/content/drive/MyDrive/dataset/sourdough origins** 
 united kingdom 

Top origin country:  united kingdom


