Use for previewing and processing raw data 

In [36]:
import pandas as pd
import numpy as np

In [37]:
# Loading food waste database: Joakim_Arvidsson_Food Waste, updated 2023
# Source: https://www.kaggle.com/datasets/joebeachcapital/food-waste

# Load the dataframe
surplus_raw_df = pd.read_csv('raw_data/Joakim_Arvidsson_Food Waste data and research - by country.csv')

# Combine the household and retail food waste columns
surplus_raw_df['Total Food Waste (tonnes/year)'] = surplus_raw_df['Household estimate (tonnes/year) '] + surplus_raw_df['Retail estimate (tonnes/year) ']

# Filter it so we only have the columns needed
surplus_filtered_df = surplus_raw_df.filter(items=['Country                      ', 'Total Food Waste (tonnes/year)'])

surplus_filtered_df.head()

Unnamed: 0,Country,Total Food Waste (tonnes/year)
0,Afghanistan,3704135
1,Albania,283550
2,Algeria,4591889
3,Andorra,7485
4,Angola,3667278


### Problem: Even the poorest countries have lots of food waste according to this database
Let's load the food scarcity database

In [45]:
# Loading food scarcity database: Maryam Sikander_Zero_Hunger
# https://www.kaggle.com/datasets/maryamsikander/sdg-2-zero-hunger

# Load the dataframe
scarcity_raw_df = pd.read_excel('raw_data/Maryam Sikander_Zero_Hunger.xlsx', sheet_name="Prevalence-of-food-insecurity")

# Data is provided from years 2015 - 2020. Since our other dataset is recent, we will just take the data from 2020
scarcity_2020_df = scarcity_raw_df[scarcity_raw_df["Year"] == 2020]

# Since year is now always 2020 we can drop the column
scarcity_2020_df = scarcity_2020_df.drop(columns=["Year"])
scarcity_2020_df.head()

Unnamed: 0,Entity,Code,Prevalence of moderate or severe food insecurity in the total population (%age)
5,Afghanistan,AFG,70.0
11,Africa (FAO),,55.5
17,Albania,ALB,30.9
23,Algeria,DZA,19.0
27,Angola,AGO,77.7


Here we see that our food scarcity database contains percentages of the total population who are experiencing food insecurity. 

It also contains regions in the database such as Northern Europe - these are easily identified by their lack of country code in the "Code" column

We can combine this data with a population database to estimate the number of people experiencing food scarcity per country in 2020: 

In [43]:
# Loading population statistics database: 
# https://www.kaggle.com/datasets/iamsouravbanerjee/world-population-dataset

population_raw_df = pd.read_csv('raw_data/Sourav Banerjee_world_population.csv')
population_filtered_df = population_raw_df.filter(items=["CCA3", "2020 Population"])

population_filtered_df.head()

Unnamed: 0,CCA3,2020 Population
0,AFG,38972230
1,ALB,2866849
2,DZA,43451666
3,ASM,46189
4,AND,77700


Now, we can use this information to cross-reference countries by their country code and estimate the number of people experiencing food scarcity

In [66]:
# Add the 2020 Population statistics aligned by country code
scarcity_population_df = scarcity_2020_df.join(population_filtered_df.set_index('CCA3'), on='Code')

# Remove any that don't have a Code
scarcity_population_df = scarcity_population_df.dropna(subset=['2020 Population'])

# Create a new column with our estimated number of people experiencing food scarcity
scarcity_population_df['People experiencing scarcity'] = scarcity_population_df['Prevalence of moderate or severe food insecurity in the total population (%age) '] / 100 * scarcity_population_df['2020 Population']

# Round to the nearest person
scarcity_population_df = scarcity_population_df.round(0)

# Filter the just the country and the scarcity number
filtered_scarcity_df = scarcity_population_df.filter(items=["Entity", "People experiencing scarcity"])

filtered_scarcity_df

Unnamed: 0,Entity,People experiencing scarcity
5,Afghanistan,27280561.0
17,Albania,885856.0
23,Algeria,8255817.0
27,Angola,25973933.0
28,Antigua and Barbuda,30579.0
...,...,...
900,Uzbekistan,7878764.0
903,Vanuatu,72623.0
907,Vietnam,7345300.0
937,Zambia,13154762.0


In [67]:
surplus_filtered_df

Unnamed: 0,Country,Total Food Waste (tonnes/year)
0,Afghanistan,3704135
1,Albania,283550
2,Algeria,4591889
3,Andorra,7485
4,Angola,3667278
...,...,...
209,Venezuela (Boliv. Rep. of),2511455
210,Viet Nam,8855406
211,Yemen,3483045
212,Zambia,1671079
