<a href="https://colab.research.google.com/github/dindararas/Data-Science-Portfolio/blob/main/Global_Food_Production_and_Population.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **1. INTRODUCTION**

## **1.1 Problem Identification**

In the last 50 years, the world population doubled to 7.5 billion people. According to the Food and Agriculture Organization (FAO), we need **60% more food** to feed the world population of 9.3 billion **by 2050**.  

To meet the food demand, crop production has doubled in the period of 1960 and 2000. Actually, more than enough food is produced around the world. However, there are still **829 million people** experience hunger.



## **1.2 Objectives**

This project have several objectives including :
1. Giving an overview of changes in food supply and food production
2. Looking into the relationship between population growth and hunger
3. Finding reasons why hunger can be happened


# **2. IMPORTING LIBRARY AND DATA**

## **2.1 Importing Library**

In [None]:
!pip install raceplotly

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting raceplotly
  Downloading raceplotly-0.1.6-py3-none-any.whl (7.2 kB)
Installing collected packages: raceplotly
Successfully installed raceplotly-0.1.6


In [None]:
!pip install pycountry-convert

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pycountry-convert
  Downloading pycountry_convert-0.7.2-py3-none-any.whl (13 kB)
Collecting pprintpp>=0.3.0
  Downloading pprintpp-0.4.0-py2.py3-none-any.whl (16 kB)
Collecting repoze.lru>=0.7
  Downloading repoze.lru-0.7-py3-none-any.whl (10 kB)
Collecting pytest-mock>=1.6.3
  Downloading pytest_mock-3.8.2-py3-none-any.whl (9.1 kB)
Collecting pytest-cov>=2.5.1
  Downloading pytest_cov-3.0.0-py3-none-any.whl (20 kB)
Collecting pycountry>=16.11.27.1
  Downloading pycountry-22.3.5.tar.gz (10.1 MB)
[K     |████████████████████████████████| 10.1 MB 5.5 MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Collecting coverage[toml]>=5.2.1
  Downloading coverage-6.4.4-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (

In [None]:
# Standard libraries
import pandas as pd
import numpy as np

# Visualization libraries
import plotly
import plotly.express as px
import plotly.io as pio
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pycountry_convert as pc
from raceplotly.plots import barplot
import plotly.offline as iplo

# Useful libraries
import warnings
warnings.filterwarnings('ignore')

# Settings
pd.set_option('display.max_rows', 100) 
pd.set_option('display.max_columns', 100)
# reprsents the maximum number of columns/rows to be displayed

## **2.2 Importing Data**

In [None]:
# population dataset
population = pd.read_csv('/content/population_total_long.csv')

# food supply dataset
food_supply = pd.read_csv('/content/food-supply-per-person-per-day-calories-1961–2013.csv')

# global hunger index 
ghi = pd.read_csv('/content/global-hunger-index-scale-1-100-1992–2016.csv')

# food production dataset
food_prod = pd.read_csv('/content/food-production-index-relative-to-2004-2006-1961–2020.csv')

# **3. DATA CLEANING AND DATA TRANSFORMATION**

## **3.1 Food Supply Dataset**

In [None]:
# Check number of rows and columns in dataset
print(f'Shape of dataframe : {food_supply.shape[0]} rows and {food_supply.shape[1]} columns' )

Shape of dataframe : 178 rows and 54 columns


In [None]:
# Check first 5 data in dataset
food_supply.head()

Unnamed: 0,Country,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013
0,Afghanistan,2999.0,2917.0,2698.0,2953.0,2956.0,2737.0,2971.0,2918.0,2935.0,2534.0,2512.0,2658.0,2721.0,2713.0,2752.0,2824.0,2489.0,2621.0,2621.0,2478.0,2484.0,2480.0,2524.0,2452.0,2403.0,2397.0,2727.0,2518.0,2462.0,2314.0,2044.0,1891.0,1910.0,1820.0,1844.0,1843.0,1874.0,1903.0,1852.0,1790.0,1737.0,1826.0,1892.0,1967.0,1948.0,1966.0,2046.0,2041.0,2081.0,2104.0,2107.0,2100.0,2090.0
1,Albania,2223.0,2242.0,2156.0,2270.0,2254.0,2254.0,2262.0,2343.0,2404.0,2415.0,2360.0,2388.0,2432.0,2494.0,2494.0,2680.0,2776.0,2689.0,2607.0,2596.0,2676.0,2664.0,2798.0,2721.0,2565.0,2690.0,2497.0,2594.0,2569.0,2568.0,2572.0,2654.0,2795.0,2877.0,2717.0,2843.0,2725.0,2725.0,2797.0,2734.0,2803.0,2864.0,2772.0,2792.0,2874.0,2855.0,2860.0,2947.0,2993.0,3076.0,3132.0,3184.0,3193.0
2,Algeria,1619.0,1569.0,1528.0,1540.0,1591.0,1571.0,1647.0,1706.0,1705.0,1675.0,1720.0,1849.0,1851.0,1984.0,2058.0,2047.0,2209.0,2344.0,2445.0,2566.0,2597.0,2570.0,2553.0,2500.0,2613.0,2627.0,2631.0,2696.0,2760.0,2754.0,2733.0,2865.0,2865.0,2763.0,2785.0,2784.0,2733.0,2792.0,2843.0,2812.0,2886.0,2925.0,2970.0,2987.0,2958.0,3047.0,3041.0,3048.0,3110.0,3142.0,3217.0,3272.0,3296.0
3,Angola,1798.0,1819.0,1853.0,1862.0,1877.0,1890.0,1921.0,1856.0,1946.0,1965.0,2002.0,1958.0,1915.0,1876.0,1858.0,1887.0,1952.0,1991.0,1966.0,1967.0,1886.0,1758.0,1729.0,1673.0,1687.0,1644.0,1632.0,1605.0,1586.0,1641.0,1611.0,1625.0,1564.0,1644.0,1687.0,1731.0,1732.0,1752.0,1805.0,1792.0,1833.0,1915.0,1983.0,2030.0,2077.0,2119.0,2173.0,2245.0,2303.0,2345.0,2407.0,2384.0,2473.0
4,Antigua and Barbuda,2008.0,2185.0,2052.0,2126.0,2222.0,2172.0,1959.0,2007.0,2055.0,2240.0,2165.0,1957.0,1863.0,1799.0,1857.0,1791.0,1711.0,1650.0,1745.0,1971.0,2045.0,1951.0,2023.0,2248.0,2191.0,2214.0,2266.0,2364.0,2489.0,2467.0,2532.0,2545.0,2376.0,2255.0,2188.0,2190.0,2163.0,2154.0,2158.0,2150.0,2085.0,2069.0,2071.0,2107.0,2331.0,2328.0,2411.0,2380.0,2322.0,2316.0,2369.0,2293.0,2417.0


### **3.1.1 Data Cleaning**

#### **Checking and removing duplicates**

In [None]:
# Checking duplicates in dataset
print(f'Number of duplicated data : {food_supply.duplicated().sum()}')

Number of duplicated data : 0


#### **Checking and treating missing values**

In [None]:
food_supply_info= pd.DataFrame({'Dtype': food_supply.dtypes, 'Missing values(%)': round(food_supply.isnull().sum()/food_supply.shape[0]*100, 2)}).rename_axis('Columns', axis='rows')                       

food_supply_info

Unnamed: 0_level_0,Dtype,Missing values(%)
Columns,Unnamed: 1_level_1,Unnamed: 2_level_1
Country,object,0.0
1961,float64,14.61
1962,float64,14.61
1963,float64,14.61
1964,float64,14.61
1965,float64,14.61
1966,float64,14.61
1967,float64,14.61
1968,float64,14.61
1969,float64,14.61


In [None]:
# Checking the number of rows containing at least one missing values
null_rows = food_supply.loc[food_supply.isnull().any(axis=1)]

print(f'Number of rows with missing values : {null_rows.shape[0]}')
print(f'Percentage of rows with missing values :  {(null_rows.shape[0]/food_supply.shape[0]) *100}%')

Number of rows with missing values : 33
Percentage of rows with missing values :  18.53932584269663%


We can see that the highest percentage of missing values are the food supply data in the period of 1961 and 1991.Percentage of rows containing at least one missing value is **18.53%** of data which is quite high if we remove all of them. 

Instead of removing all of rows containing at least one missing value, I will remove rows where the food supply values throughout the entire period are **negative** or **NaN**. Moreover, I will transform float64 into int64

In [None]:
# Select a dataframe contaning all the columns of production years
supply = food_supply.loc[:, '1961' : ]

# Remove all the rows containing at least one negative value
drop_rows = supply.loc[((supply < 0).any(axis=1))].index
food_supply.drop(drop_rows, inplace = True)

# Remove all the rows with NaN from 1961 to 2013
food_supply.dropna(how = 'all')

# Fillna for rows containing at least one missing value
food_supply.fillna(0, inplace = True)

# Transform 'float64''s to 'int64'
food_supply.loc[:, '1961':] = food_supply.loc[:, '1961':].astype('int64')

#### **Qualitative variables**

We will check if there are duplicated country names. If same countries with different names are found, we will change countries' names.

In [None]:
# Check country names
food_supply.Country.unique()

array(['Afghanistan', 'Albania', 'Algeria', 'Angola',
       'Antigua and Barbuda', 'Argentina', 'Armenia', 'Australia',
       'Austria', 'Azerbaijan', 'Bahamas', 'Bangladesh', 'Barbados',
       'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bolivia',
       'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'Brunei',
       'Bulgaria', 'Burkina Faso', 'Cape Verde', 'Cambodia', 'Cameroon',
       'Canada', 'Central African Republic', 'Chad', 'Chile', 'Hong Kong',
       'Macau', 'China', 'Taiwan', 'Colombia', 'Costa Rica',
       'Ivory Coast', 'Croatia', 'Cuba', 'Cyprus', 'Czech Republic',
       'Czechoslovakia', 'North Korea', 'Denmark', 'Djibouti', 'Dominica',
       'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Estonia',
       'Ethiopia', 'Fiji', 'Finland', 'France', 'French Polynesia',
       'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Greece',
       'Grenada', 'Guatemala', 'Guinea', 'Guinea Bissau', 'Guyana',
       'Haiti', 'Honduras', 'Hungary', 'Icel

I will not modify any country names in this dataset

#### **Dataframe Melt**

This dataset show the food supply data in every year in columns. For this dataset, I unpivot dataframe from wide to long format to make data visualization easier.

In [None]:
food_supply = food_supply.melt(id_vars=['Country'], 
    value_vars=[str(n) for n in range(1961, 2013+1)], 
    var_name="Years", 
    value_name="Food Supply (cal/person/day)")
food_supply.head()

Unnamed: 0,Country,Years,Food Supply (cal/person/day)
0,Afghanistan,1961,2999
1,Albania,1961,2223
2,Algeria,1961,1619
3,Angola,1961,1798
4,Antigua and Barbuda,1961,2008


## **3.2 Food Production Index Dataset**

We will do same actions on this dataset

In [None]:
# Check number of rows and columns in dataset
print(f'Shape of dataframe : {food_prod.shape[0]} rows and {food_prod.shape[1]} columns' )

Shape of dataframe : 202 rows and 59 columns


In [None]:
# Check first 5 data in dataset
food_prod.head(5)

Unnamed: 0,Country,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,Afghanistan,41.39,41.71,41.53,45.03,47.14,48.88,53.55,55.92,56.63,51.84,48.61,51.87,55.75,57.43,59.06,63.74,57.22,61.53,60.79,61.14,60.99,60.43,58.86,56.45,52.66,48.21,49.94,49.36,49.13,52.9,60.72,58.85,62.12,65.68,68.6,72.2,77.29,81.52,80.59,68.54,62.28,75.57,77.65,75.6,84.49,78.25,85.68,80.55,93.36,92.79,88.45,96.99,94.08,98.2,97.49,104.32,98.69,96.52
1,Albania,21.35,21.94,20.72,23.37,22.24,25.48,27.3,28.02,28.35,29.1,30.18,30.69,32.87,33.83,34.43,38.1,39.81,41.54,42.28,41.45,43.09,44.48,48.45,44.96,44.34,44.06,45.59,44.32,49.41,48.68,41.35,45.6,51.52,57.28,61.6,62.09,58.81,61.38,60.76,64.01,65.89,66.38,69.58,72.89,71.65,74.42,75.45,77.74,80.52,85.65,89.91,95.07,95.65,96.85,100.03,103.11,104.13,106.13
2,Algeria,18.6,19.37,19.36,17.43,20.05,13.6,14.99,19.17,17.81,18.3,19.06,17.45,16.62,17.86,19.54,18.35,15.35,15.48,17.41,18.88,19.24,17.74,18.54,19.91,24.48,24.62,24.85,23.7,26.36,25.77,35.21,37.39,37.18,33.68,37.83,45.05,36.37,40.98,44.28,41.85,45.04,47.79,55.58,60.7,65.05,72.12,62.13,67.63,81.14,88.39,98.01,101.92,112.08,108.8,105.06,86.13,86.79,89.97
3,American Samoa,57.65,58.43,64.55,65.47,67.15,68.45,64.68,63.94,66.94,63.1,58.91,58.62,58.22,56.09,55.21,53.97,53.02,49.54,42.73,48.56,46.4,46.15,45.13,43.35,42.16,43.13,40.17,40.08,40.02,43.56,40.75,43.19,47.37,50.38,52.92,56.98,59.18,73.91,59.49,66.06,75.09,78.76,88.07,86.22,88.44,90.14,91.87,93.51,95.84,97.52,95.94,92.68,99.58,100.7,99.46,99.84,100.51,101.42
4,Angola,14.74,15.27,15.62,16.34,16.88,17.13,17.6,17.64,18.6,19.51,19.62,19.13,20.06,19.44,19.26,19.0,18.55,18.61,18.06,18.3,17.94,18.21,18.52,18.78,19.1,19.8,20.05,19.96,19.96,20.05,20.95,22.41,22.24,25.04,24.19,26.0,25.93,30.13,28.95,33.81,38.9,44.06,47.52,51.31,55.94,56.92,64.55,68.71,85.89,90.87,99.25,81.63,111.42,97.02,99.96,103.02,103.67,99.02


###**3.2.1 Data Cleaning**

#### **Removing unnecessary columns**

As we can see that this dataset contains food-net production index from 1961 to 2018. Because food supply data only until 2013, I'll remove column '2014' to '2018'

In [None]:
food_prod.drop(columns = ['2014', '2015', '2016', '2017', '2018'], inplace = True)

In [None]:
food_prod.head()

Unnamed: 0,Country,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013
0,Afghanistan,41.39,41.71,41.53,45.03,47.14,48.88,53.55,55.92,56.63,51.84,48.61,51.87,55.75,57.43,59.06,63.74,57.22,61.53,60.79,61.14,60.99,60.43,58.86,56.45,52.66,48.21,49.94,49.36,49.13,52.9,60.72,58.85,62.12,65.68,68.6,72.2,77.29,81.52,80.59,68.54,62.28,75.57,77.65,75.6,84.49,78.25,85.68,80.55,93.36,92.79,88.45,96.99,94.08
1,Albania,21.35,21.94,20.72,23.37,22.24,25.48,27.3,28.02,28.35,29.1,30.18,30.69,32.87,33.83,34.43,38.1,39.81,41.54,42.28,41.45,43.09,44.48,48.45,44.96,44.34,44.06,45.59,44.32,49.41,48.68,41.35,45.6,51.52,57.28,61.6,62.09,58.81,61.38,60.76,64.01,65.89,66.38,69.58,72.89,71.65,74.42,75.45,77.74,80.52,85.65,89.91,95.07,95.65
2,Algeria,18.6,19.37,19.36,17.43,20.05,13.6,14.99,19.17,17.81,18.3,19.06,17.45,16.62,17.86,19.54,18.35,15.35,15.48,17.41,18.88,19.24,17.74,18.54,19.91,24.48,24.62,24.85,23.7,26.36,25.77,35.21,37.39,37.18,33.68,37.83,45.05,36.37,40.98,44.28,41.85,45.04,47.79,55.58,60.7,65.05,72.12,62.13,67.63,81.14,88.39,98.01,101.92,112.08
3,American Samoa,57.65,58.43,64.55,65.47,67.15,68.45,64.68,63.94,66.94,63.1,58.91,58.62,58.22,56.09,55.21,53.97,53.02,49.54,42.73,48.56,46.4,46.15,45.13,43.35,42.16,43.13,40.17,40.08,40.02,43.56,40.75,43.19,47.37,50.38,52.92,56.98,59.18,73.91,59.49,66.06,75.09,78.76,88.07,86.22,88.44,90.14,91.87,93.51,95.84,97.52,95.94,92.68,99.58
4,Angola,14.74,15.27,15.62,16.34,16.88,17.13,17.6,17.64,18.6,19.51,19.62,19.13,20.06,19.44,19.26,19.0,18.55,18.61,18.06,18.3,17.94,18.21,18.52,18.78,19.1,19.8,20.05,19.96,19.96,20.05,20.95,22.41,22.24,25.04,24.19,26.0,25.93,30.13,28.95,33.81,38.9,44.06,47.52,51.31,55.94,56.92,64.55,68.71,85.89,90.87,99.25,81.63,111.42


#### **Checking and removing duplicates**

In [None]:
# Checking duplicates in dataset
print(f'Number of duplicated data : {food_prod.duplicated().sum()}')

Number of duplicated data : 0


#### **Checking and treating missing values**

In [None]:
food_info= pd.DataFrame({'Dtype': food_prod.dtypes, 'Missing values(%)': round(food_prod.isnull().sum()/food_prod.shape[0]*100, 2)}).rename_axis('Columns', axis='rows')                       

food_info

Unnamed: 0_level_0,Dtype,Missing values(%)
Columns,Unnamed: 1_level_1,Unnamed: 2_level_1
Country,object,0.0
1961,float64,15.84
1962,float64,15.84
1963,float64,15.84
1964,float64,15.84
1965,float64,15.84
1966,float64,15.35
1967,float64,15.35
1968,float64,15.35
1969,float64,15.35


Highest percentage of missing values in this dataset is also the same as food supply dataset.

In [None]:
# Select a dataframe contaning all the columns of production years
prod = food_prod.loc[:, '1961' : ]

# Remove all the rows containing at least one negative value
drop_rows = prod.loc[((prod < 0).any(axis=1))].index
food_prod.drop(drop_rows, inplace = True)

# Remove all the rows with NaN from 1961 to 2013
food_prod.dropna(how = 'all')

# Fillna for rows containing at least one missing value
food_prod.fillna(0, inplace = True)

# Transform 'float64''s to 'int64'
food_prod.loc[:, '1961':] = food_prod.loc[:, '1961':].astype('int64')

#### **Qualitative variables**

In [None]:
food_prod['Country'].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'American Samoa', 'Angola',
       'Antigua and Barbuda', 'Argentina', 'Armenia', 'Australia',
       'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh',
       'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda',
       'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana',
       'Brazil', 'British Virgin Islands', 'Brunei', 'Bulgaria',
       'Burkina Faso', 'Burundi', 'Cape Verde', 'Cambodia', 'Cameroon',
       'Canada', 'Cayman Islands', 'Central African Republic', 'Chad',
       'Chile', 'China', 'Colombia', 'Comoros', 'Congo (Kinshasa)',
       'Congo (Brazzaville)', 'Costa Rica', 'Ivory Coast', 'Croatia',
       'Cuba', 'Cyprus', 'Czech Republic', 'Denmark', 'Djibouti',
       'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt',
       'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia',
       'Swaziland', 'Ethiopia', 'Faroe Islands', 'Fiji', 'Finland',
       'France', 'French Polynesia', 'Gabon', 'Gam

There are no any duplicated or misspelling names, so there will be no any changes for 'Country'

##### **Dataframe melt**

In [None]:
food_prod = food_prod.melt(id_vars=['Country'], 
    value_vars=[str(n) for n in range(1961, 2013+1)], 
    var_name='Years', 
    value_name='Food Production Index')
food_prod.head()

Unnamed: 0,Country,Years,Food Production Index
0,Afghanistan,1961,41
1,Albania,1961,21
2,Algeria,1961,18
3,American Samoa,1961,57
4,Angola,1961,14


## **3.3 World Population**

In [None]:
# Check number of rows and columns in dataset
print(f'Shape of dataframe : {population.shape[0]} rows and {population.shape[1]} columns' )

Shape of dataframe : 12595 rows and 3 columns


In [None]:
# Check 5 first data
population.head(5)

Unnamed: 0,Country Name,Year,Count
0,Aruba,1960,54211
1,Afghanistan,1960,8996973
2,Angola,1960,5454933
3,Albania,1960,1608800
4,Andorra,1960,13411


In [None]:
# Change 'Count' column name to 'Population'
population.rename(columns = {'Count' : 'Population'}, inplace = True)

### **3.3.1 Data Cleaning**

#### **Checking and removing duplicates**

In [None]:
# Checking duplicates in dataset
print(f'Number of duplicated data : {population.duplicated().sum()}')

Number of duplicated data : 0


#### **Checking and treating missing values**

In [None]:
pop_info= pd.DataFrame({'Dtype': population.dtypes, 'Missing values(%)': round(population.isnull().sum()/population.shape[0]*100, 2)}).rename_axis('Columns', axis='rows')                       

pop_info

Unnamed: 0_level_0,Dtype,Missing values(%)
Columns,Unnamed: 1_level_1,Unnamed: 2_level_1
Country Name,object,0.0
Year,int64,0.0
Population,int64,0.0


This dataset doesn't have any missing values, so we'll moving on to checking qualitative variables

#### **Qualitative variables**

In [None]:
population['Country Name'].unique()

array(['Aruba', 'Afghanistan', 'Angola', 'Albania', 'Andorra',
       'United Arab Emirates', 'Argentina', 'Armenia', 'American Samoa',
       'Antigua and Barbuda', 'Australia', 'Austria', 'Azerbaijan',
       'Burundi', 'Belgium', 'Benin', 'Burkina Faso', 'Bangladesh',
       'Bulgaria', 'Bahrain', 'Bahamas, The', 'Bosnia and Herzegovina',
       'Belarus', 'Belize', 'Bermuda', 'Bolivia', 'Brazil', 'Barbados',
       'Brunei Darussalam', 'Bhutan', 'Botswana',
       'Central African Republic', 'Canada', 'Switzerland',
       'Channel Islands', 'Chile', 'China', "Cote d'Ivoire", 'Cameroon',
       'Congo, Dem. Rep.', 'Congo, Rep.', 'Colombia', 'Comoros',
       'Cabo Verde', 'Costa Rica', 'Caribbean small states', 'Cuba',
       'Curacao', 'Cayman Islands', 'Cyprus', 'Czech Republic', 'Germany',
       'Djibouti', 'Dominica', 'Denmark', 'Dominican Republic', 'Algeria',
       'Ecuador', 'Egypt, Arab Rep.', 'Eritrea', 'Spain', 'Estonia',
       'Ethiopia', 'Finland', 'Fiji', 'France', 

## **3.4 Global Hunger Index**

In [None]:
# Check 5 first data
ghi.head(5)

Unnamed: 0,Country,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
0,Afghanistan,49.3,49.6875,50.075,50.4625,50.85,51.2375,51.625,52.0125,52.4,50.75,49.1,47.45,45.8,44.15,42.5,40.85,39.2,38.65,38.1,37.55,37.0,36.45,35.9,35.35,34.8
1,Albania,20.4,20.4875,20.575,20.6625,20.75,20.8375,20.925,21.0125,21.1,20.575,20.05,19.525,19.0,18.475,17.95,17.425,16.9,16.275,15.65,15.025,14.4,13.775,13.15,12.525,11.9
2,Algeria,16.8,16.55,16.3,16.05,15.8,15.55,15.3,15.05,14.8,14.3,13.8,13.3,12.8,12.3,11.8,11.3,10.8,10.5375,10.275,10.0125,9.75,9.4875,9.225,8.9625,8.7
3,Angola,65.9,64.8875,63.875,62.8625,61.85,60.8375,59.825,58.8125,57.8,55.6375,53.475,51.3125,49.15,46.9875,44.825,42.6625,40.5,39.5375,38.575,37.6125,36.65,35.6875,34.725,33.7625,32.8
4,Argentina,5.8,5.7375,5.675,5.6125,5.55,5.4875,5.425,5.3625,5.3,5.2625,5.225,5.1875,5.15,5.1125,5.075,5.0375,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0


In [None]:
# Drop unnecessary columns
ghi = ghi.drop(columns = ['2014', '2015', '2016'])

#### **Checking and removing duplicates**

In [None]:
# Checking duplicates in dataset
print(f'Number of duplicated data : {ghi.duplicated().sum()}')

Number of duplicated data : 0


#### **Checking and treating missing values**

In [None]:
ghi_info= pd.DataFrame({'Dtype': ghi.dtypes, 'Missing values(%)': round(ghi.isnull().sum()/ghi.shape[0]*100, 2)}).rename_axis('Columns', axis='rows')                       

ghi_info

Unnamed: 0_level_0,Dtype,Missing values(%)
Columns,Unnamed: 1_level_1,Unnamed: 2_level_1
Country,object,0.0
1992,float64,18.1
1993,float64,18.1
1994,float64,18.1
1995,float64,18.1
1996,float64,18.1
1997,float64,18.1
1998,float64,18.1
1999,float64,18.1
2000,float64,2.59


In [None]:
# Select a dataframe contaning all the columns of production years
hunger = ghi.loc[:, '1992' : ]

# Remove all the rows containing at least one negative value
drop_rows = hunger.loc[((hunger < 0).any(axis=1))].index
ghi.drop(drop_rows, inplace = True)

# Remove all the rows with NaN from 1992 to 2013
ghi.dropna(how = 'all')

# Fillna for rows containing at least one missing value
ghi.fillna(0, inplace = True)

#### **Dataframe melt**

In [None]:
ghi = ghi.melt(id_vars=['Country'], 
    value_vars=[str(n) for n in range(1992, 2013+1)], 
    var_name='Years', 
    value_name='GHI')
ghi.head()

Unnamed: 0,Country,Years,GHI
0,Afghanistan,1992,49.3
1,Albania,1992,20.4
2,Algeria,1992,16.8
3,Angola,1992,65.9
4,Argentina,1992,5.8


# **4. DATA EXPLORATION AND VISUALIZATION**

## **4.1 Food Production Index and Food Supply**

#### **Food Supply**

In [None]:
# Functoin to make map by using px.choropleth
def make_map(df, location, color, min, max, hover, title) :
  fig = px.choropleth(df, locations = df[location], locationmode='country names' ,color = df[color], 
                      color_discrete_sequence=px.colors.sequential.Plasma, range_color = (min, max),hover_name = df[hover], animation_frame = 'Years')

  fig.update_layout(title_text = title, title_x = 0.5, geo=dict(showframe = False, showcoastlines = True))

  fig.show() 

In [None]:
make_map(food_supply, 'Country', 'Food Supply (cal/person/day)', 0, 4000, 'Country', 'Food Supply Change' )

#### **Food Production Index**

In [None]:
make_map(food_prod, 'Country', 'Food Production Index', 0,130, 'Country', 'Food Production Index')

Based on these maps, it can be concluded that :
1. Overall, food supply and food production index increases over a 52-year-period. 
2. **Developed countries** tend to have **higher food supply** compared to **developing countries**. It can be seen that economic condition in countries determine food supply for people in those contries. 
3. Food supply in developed countries was around 2,500 - 3,000 cal/person/day in 1961 and it increased to above 3,000 cal/person/day in 2013. In comparison, food supply in developing countries in 1961  was below 2,000 cal/person/day. There was an increase of around 500 - 1,000 cal/person/day in 2013. 
4. There are no food production index and food supply data in Russia, Ukraine, Estonia, Latvia, Turkmenistan, Azerbaijan, Kazakhstan, Kyrgiztan, Belarus, Armenia, Lithuania, and Uzbekistan from 1961 to 1991. It makes sense because these countries are parts of Uni Soviet back then. 
5. Most of African countries have low food supply over the period.
6. In the beginning, China has very low food supply (1,415 cal/person/day). However, it skyrocketed to 3,108 cal/person/day. It can be understood because China experienced **Great Famine in 1961**.  
7. Food production index is measured food production in each year compared to the baseline 2014 - 2016 data. Food production index in most of African countries became better throughout years. Food production index in these countries even reached above 100 in 2013. It is shown that food production in 2013 was much better than food production in 2014 - 2016. It might be happened because most of African countries can utilize resources well which lead to increasing yield.  

## **4.2 Global Hunger Index and World Population**

First, I want to know whether GHI increase as world population increase over the years. To visualize this kind of data, I'l' transform long format to wide format

In [None]:
# Transform long format to wide format
ghi_pivot = pd.pivot(ghi, index = 'Years', columns = 'Country', values = 'GHI')
pop_pivot = pd.pivot(population, index='Year', columns = 'Country Name', values = 'Population')

# Sum world population and average GHI every year 
pop_pivot['Total'] = pop_pivot.sum(axis=1)
ghi_pivot['Mean'] = ghi_pivot.mean(axis=1)

# concatenate 
year = pd.DataFrame(population.Year.unique(), columns = ['Year'])
total = pd.DataFrame(pop_pivot.loc[:, 'Total'], columns = ['Total'])
world = year.merge(total, how = 'inner', on ='Year')
# Just focus on data from 1992 to 2013
world = world.loc[32 : 53, ]
mean = ghi_pivot.loc[:, 'Mean'].values.tolist()

world['Mean GHI'] = mean

In [None]:
world.describe()

Unnamed: 0,Year,Total,Mean GHI
count,22.0,22.0,22.0
mean,2002.5,6300085000.0,23.26564
std,6.493587,526317400.0,2.524635
min,1992.0,5440572000.0,18.806358
25%,1997.25,5880137000.0,21.1132
50%,2002.5,6299241000.0,24.008513
75%,2007.75,6723245000.0,25.258675
max,2013.0,7153755000.0,26.787931


In [None]:
trace1  = go.Scatter(mode='lines+markers', x = world['Year'], y = world['Mean GHI'],
         name="Global Human Index", marker_color='crimson')
trace2 = go.Bar(
         x = world['Year'], y = world['Total'],name='World Population', yaxis = 'y2',
         marker_color ='blue', marker_line_width=1.5, marker_line_color='rgb(8,48,107)', opacity=0.5)

data = [trace1, trace2]
layout = go.Layout(title_text='World Population and Global Hunger Index', yaxis=dict(range = [0, 30], side = 'right'),
     yaxis2=dict(overlaying='y',anchor='y3',))
fig = go.Figure(data=data, layout=layout)
fig.show()

It can be seen that as world population increases, average global human index decreased. Based on this graph, it can be concluded that people are aware about the need for food fulfillment to feed all of people around the world. Thus, global hunger index can be reduced by increasing food production. This statement is strengthened by maps of food production index and food supply where these two parameters tend to increase over the years. Let's see most populated countries and countries with the highest GHI.


In [None]:
# Top 10 countries with the highest population
raceplot = barplot(population,  item_column='Country Name', value_column='Population', time_column='Year')
raceplot.plot(item_label = 'Most Populated Countries', value_label = 'Population', frame_duration = 600)

China, India, and United States of America are top 3 countries with the highest population over the period. Populations of China and India are very far from populations in other countries. In 1961, while populations in other countries reached maximum of 187 million people, populations of China and India reached 660.3 and 459.6 million people respectively. 

In [None]:
# Highest GHI 
raceplot2 = barplot(ghi,  item_column='Country', value_column='GHI', time_column='Years')
raceplot2.plot(item_label = 'Highest GHI', value_label = 'GHI', frame_duration = 600)

Based on two graphs above, it can be explained that :
1.  **African countries** dominated Top 10 countries with the highest GHI throughout the period. There are several countries such as Afghanistan, Central African Republic, Timor Leste, and Haiti that are not considered as Top 10 countries with the highest GHI in the beginning of period. However, these countries became part of this list from the early 2000. In contrast, Myanmar, Niger, and Djibouti have successfully left this list.
2. Top 10 most populated countries are not parts of Top 10 countries with the highest GHI. It can be seen that more people does not mean lacking more food. It might be happened because in those countries have sufficient resources (fund, natural resources, and human resources).





If I combine information from these graphs and two previous maps, I can get some new information :
1. Countries with the highest GHI have lower food supply compared to other countries.
2. In the beginning of period, food supply in Niger was very low. It is also strengthened by the fact that Niger was one of Top 10 countries with the highest GHI. However, Niger can improve food supply over the years then it is not a part of top 10 countries with the highest GHI anymore.
3. Most of African countries have high food production index, but most of them still have problems with global hunger index (GHI). It can be concluded that higher food production doesn't indicate food demand can be fulfilled. Hunger problems is not only about food production, but the food accessibility is also important factor to be considered.

## **4.3 Global Hunger Index and Food Supply**

##### I would like to know 2013 Global Hunger Index (GHI) and food supply by continent

In [None]:
# Check number of countries each year in dataset
ghi.Years.value_counts()

1992    116
1993    116
2012    116
2011    116
2010    116
2009    116
2008    116
2007    116
2006    116
2005    116
2004    116
2003    116
2002    116
2001    116
2000    116
1999    116
1998    116
1997    116
1996    116
1995    116
1994    116
2013    116
Name: Years, dtype: int64

In [None]:
# Focus on 2013 data
ghi_13 = ghi.tail(116) # there are 116 countries in this dataset

# Reset index
ghi_13.reset_index(inplace = True)

# Fixing country names to be the same as required names for pyconvert
cond = ghi_13['Country'] == 'Congo (Brazzaville)'
ghi_13.loc[cond, 'Country'] = 'Congo'

cond = ghi_13['Country'] == 'Guinea Bissau'
ghi_13.loc[cond, 'Country'] = 'Guinea-Bissau'

cond = ghi_13['Country'] == "Korea (Democratic People's Republic of)"
ghi_13.loc[cond, 'Country'] = "North Korea"

cond = ghi_13['Country'] == 'Laos'
ghi_13.loc[cond, 'Country'] = "Lao People's Democratic Republic"

cond = ghi_13['Country'] == 'Moldova'
ghi_13.loc[cond, 'Country'] = 'Moldova, Republic of'

cond = ghi_13['Country'] == 'Burma (Myanmar)'
ghi_13.loc[cond, 'Country'] = 'Myanmar'

cond = ghi_13['Country'] == 'Macedonia (F.Y.R.O.M.)'
ghi_13.loc[cond, 'Country'] = 'North Macedonia'

# Drop Timor Leste because it will give an error
ghi_13 = ghi_13.drop(102)

In [None]:
# Make a list of country in GHI dataset
countries = ghi_13.loc[:, 'Country'].tolist()
continent = []

for country in countries :
  country_code = pc.country_name_to_country_alpha2(country)
  continent_code = pc.country_alpha2_to_continent_code(country_code)
  continent_name = pc.convert_continent_code_to_continent_name(continent_code)
  continent.append(continent_name)

ghi_13['Continent'] = pd.DataFrame(continent)

# Dataframe of Mean GHI by continent
ghi_continent = ghi_13.groupby('Continent')['GHI'].mean()
ghi_continent = pd.DataFrame(ghi_continent)
ghi_continent.reset_index(inplace = True)


In [None]:
fig = px.bar(ghi_continent, x = 'Continent', y = 'GHI', color = 'Continent')
fig.update_layout(title_text = 'Global Human Index in 2013', title_x = 0.5)
fig.show()

It can seen that African continent has the highest GHI in 2013 which is 27.65 compared to 16.77 for average GHI in Asian continent. As I expected that European continent will have the lowest GHI (8.13).  As we know that most of African people living in poverty, while European people are richer. It can be concluded that hunger problems are also related to welfare. As I mentioned before, hunger problems are affected by food accessibility. One of factor contributing to minimal access to food is people's ability to afford food.

Let's see food supply by continent! I want to focus on 2013 data, so I'll check the number of countries in each year





In [None]:
food_supply.Years.value_counts()

1961    178
1988    178
1990    178
1991    178
1992    178
1993    178
1994    178
1995    178
1996    178
1997    178
1998    178
1999    178
2000    178
2001    178
2002    178
2003    178
2004    178
2005    178
2006    178
2007    178
2008    178
2009    178
2010    178
2011    178
2012    178
1989    178
1987    178
1962    178
1986    178
1963    178
1964    178
1965    178
1966    178
1967    178
1968    178
1969    178
1970    178
1971    178
1972    178
1973    178
1974    178
1975    178
1976    178
1977    178
1978    178
1979    178
1980    178
1981    178
1982    178
1983    178
1984    178
1985    178
2013    178
Name: Years, dtype: int64

In [None]:
food_13 = food_supply.tail(178)

food_13.reset_index(inplace = True)

In [None]:
# Check country names
food_13.Country.unique()

array(['Afghanistan', 'Albania', 'Algeria', 'Angola',
       'Antigua and Barbuda', 'Argentina', 'Armenia', 'Australia',
       'Austria', 'Azerbaijan', 'Bahamas', 'Bangladesh', 'Barbados',
       'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bolivia',
       'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'Brunei',
       'Bulgaria', 'Burkina Faso', 'Cape Verde', 'Cambodia', 'Cameroon',
       'Canada', 'Central African Republic', 'Chad', 'Chile', 'Hong Kong',
       'Macau', 'China', 'Taiwan', 'Colombia', 'Costa Rica',
       'Ivory Coast', 'Croatia', 'Cuba', 'Cyprus', 'Czech Republic',
       'Czechoslovakia', 'North Korea', 'Denmark', 'Djibouti', 'Dominica',
       'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Estonia',
       'Ethiopia', 'Fiji', 'Finland', 'France', 'French Polynesia',
       'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Greece',
       'Grenada', 'Guatemala', 'Guinea', 'Guinea Bissau', 'Guyana',
       'Haiti', 'Honduras', 'Hungary', 'Icel

In [None]:
# Fixing country names to be the same as required names for pyconvert
cond = food_13['Country'] == 'Guinea Bissau'
food_13.loc[cond, 'Country'] = 'Guinea-Bissau'

cond = food_13['Country'] == 'Laos'
food_13.loc[cond, 'Country'] = "Lao People's Democratic Republic"

cond = food_13['Country'] == 'Moldova'
food_13.loc[cond, 'Country'] = 'Moldova, Republic of'

cond = food_13['Country'] == 'Burma (Myanmar)'
food_13.loc[cond, 'Country'] = 'Myanmar'

cond = food_13['Country'] == 'Macedonia (F.Y.R.O.M.)'
food_13.loc[cond, 'Country'] = 'North Macedonia'

cond = food_13['Country'] == 'Saint Vincent and Grenadines'
food_13.loc[cond, 'Country'] = 'Saint Vincent and the Grenadines'

cond = food_13['Country'] == 'Saint Thomas and Principe'
food_13.loc[cond, 'Country'] = 'Sao Tome and Principe'

# Drop Timor Leste, Yugoslavia, Czechoslovakia, Caribbean Netherlands, and Soviet Union because it will give an error
# Get indexes for these countries
timor = food_13[food_13['Country'] == 'Timor Leste'].index.tolist()
yugo = food_13[food_13['Country'] == 'Yugoslavia'].index.tolist()
union = food_13[food_13['Country'] == 'Soviet Union'].index.tolist()
car = food_13[food_13['Country'] == 'Caribbean Netherlands'].index.tolist()
cze = food_13[food_13['Country']== 'Czechoslovakia'].index.tolist()
mont = food_13[food_13['Country'] == 'Serbia and Montenegro'].index.tolist()

indexes = []
indexes.append(yugo)
indexes.append(union)
indexes.append(timor)
indexes.append(mont)
indexes.append(car)
indexes.append(cze)

for ind in indexes :
  food_13 = food_13.drop(ind)

In [None]:
# Make a list of country in GHI dataset
countries = food_13.loc[:, 'Country'].tolist()
continent = []

for country in countries :
  country_code = pc.country_name_to_country_alpha2(country)
  continent_code = pc.country_alpha2_to_continent_code(country_code)
  continent_name = pc.convert_continent_code_to_continent_name(continent_code)
  continent.append(continent_name)

food_13['Continent'] = pd.DataFrame(continent)

# Dataframe of Mean GHI by continent
food_continent = food_13.groupby('Continent')['Food Supply (cal/person/day)'].mean()
food_continent = pd.DataFrame(food_continent)
food_continent.reset_index(inplace = True)

In [None]:
fig = px.bar(food_continent, x = 'Continent', y = 'Food Supply (cal/person/day)', color = 'Continent')
fig.update_layout(title_text = 'Food Supply in 2013', title_x = 0.5)
fig.show()

Overall, food supply in all of continents is in the range of 2,400 - 2,900 cal/person/day. If we look closely at the graph, African continent has similar food supply as European continent. Although African continent has the highest GHI, food supply in this continent is high. It might be happened because there is uneven food distribution in African countries. Thus, there are people who will get more food compared to other people. 

# **5. CONCLUSION**

In conclusion, food production index and food supply tend to increase over the years even though there is an increasing number of world population. This situation leads to a decreasing GHI (Global Hunger Index) in global level. However, people around the world need to pay more attention to African countries because most of these countries have higher GHI compared to countries in other continents. Solution for dealing with famine problems around the world, expecially African countries is not only increasing food production, but food accessibility is also important thing to be improved. 