# Economic Data Preprocessing

In this notebook, we are going to preprocess economic data collected from [World Bank](https://data.worldbank.org/), these data are include Gross Domestic Product(GDP), GDP per capita, unemployment rates, inflation of consumer prices, trade and other economic indicators.Our aim is to explore how these economic indicators has influence on Food Consumption Score(FCS) and Household Dietary Diversity Scores (HDDS) consequently predict food security levels within a country. 

For instance, we may find that countries with higher GDP per capita tend to have higher FCS and HDDS, indicating a positive correlation between economic prosperity and food security. Similarly, we may analyze how changes in unemployment rates or inflation affect access to diverse and nutritious diets, providing valuable insights to address food insecurity in a country.

[World bank Data Licenses](https://datacatalog.worldbank.org/public-licenses#cc-by) [here]()

## Import Libraries

In [11]:
import pandas as pd
from pandas import read_csv
import numpy as np
from functools import reduce
import seaborn as sns
import matplotlib.pyplot as plt

## Helper Functions

In [12]:
def save_country_data(data_frame, country_paths):
    
    '''
    Saves specific country data from a provided DataFrame to designated file paths.

    Parameters:
        data_frame (pandas.DataFrame): The DataFrame containing the combined economic data.
        country_paths (list of tuples): A list where each tuple contains a country name and the corresponding 
        file path to save the data.
    
    '''
    for country, path in country_paths:
        # Extract the data for the specified country
        country_data = data_frame[data_frame['country'] == country]

        # Save the data to the specified file path
        country_data.to_csv(path, index=False)
        print(f"Data for {country} saved to {path}")

In [22]:
def interpolate_null_values(data, method='linear', **kwargs):
    """
    Interpolate null values in the dataframe for each country separately.
    
    Parameters:
    data (pd.DataFrame): The input dataframe with potential null values.
    method (str): The method of interpolation (default is 'linear').
    **kwargs: Additional keyword arguments to pass to the interpolation function.
    
    Returns:
    pd.DataFrame: The dataframe with interpolated values.
    """
    interpolated_data = data.groupby('country').apply(lambda group: group.interpolate(method=method, axis=0, limit_direction='both', **kwargs))
    return interpolated_data.reset_index(drop=True)

## Preprocess Economic Data
In this section we will define a function(process_economic_data) that will helps us to handle and streamline the preprocessing of economic data that is typically distributed across different files, each detailing specific indicators such as trade percentages, GDP growth rates, unemployment rate etc. This function will efficiently transforms and merges these separate datasets into a single DataFrame. By doing so, it facilitates comprehensive and comparative economic analysis across different countries and years from a unified view. The transformation includes reshaping yearly data into a consistent format and merging it based on common dimensions such as country and year. The result is a rounded, neatly formatted, and combined dataset, saved to a specified location for easy access and further analysis.

In [23]:
def process_economic_data(data_list, output_path, country_paths):
    """
    Processes multiple economic data files, merges them, interpolates missing values, and outputs a combined CSV.

    Parameters:
        data_list (list of tuples): A list where each tuple contains a DataFrame and the target column name.
        output_path (str): The file path where the combined CSV will be saved.
        country_paths (list of tuples): A list where each tuple contains a country name and the corresponding 
                                        file path to save the data.

    Returns:
        pandas.DataFrame: The combined DataFrame from all provided DataFrames.
    """
    # List to hold each DataFrame after processing
    combined_data = []

    # Process each DataFrame with the specified column name
    for df, column_name in data_list:
        # Melt the DataFrame to format 'Country Name', 'Year', and the value column
        year_columns = [col for col in df.columns if "YR" in col]
        df_melted = df.melt(id_vars=["Country Name"], value_vars=year_columns,
                            var_name="Year", value_name=column_name)

        # Extract the year from the 'Year' column and round values to three decimal places
        df_melted['Year'] = df_melted['Year'].apply(lambda x: x.split()[0])
        df_melted[column_name] = pd.to_numeric(df_melted[column_name], errors='coerce').round(3)

        # Append to the list of DataFrames
        combined_data.append(df_melted[['Country Name', 'Year', column_name]])

    # Merge all the dataframes on 'Country Name' and 'Year', then rename columns
    final_df = reduce(lambda left, right: pd.merge(left, right, on=["Country Name", "Year"], how='outer'), combined_data)
    final_df.rename(columns={"Country Name": "country", "Year": "year"}, inplace=True)
    
    # Interpolate null values in the final combined DataFrame
    final_df = interpolate_null_values(final_df)

    # Save the each country_data separate
    save_country_data(final_df, country_paths)
    
    # Save the full combined data
    final_df.to_csv(output_path, index=False)

    return final_df


## Define Data Directory

In [24]:
data_dir = 'economic_data/'
tz_dir = 'tanzania_data/economic_data/'
rw_dir = 'rwanda_data/economic_data/'

## Load Data Files

In [25]:
df_gdp_growth = read_csv(data_dir+ 'gdp_growth.csv', delimiter = ',')
df_gdp_per_capita = read_csv(data_dir+ 'gdp_per_capita.csv', delimiter = ',')
df_trade = read_csv(data_dir+ 'trade.csv', delimiter = ',')
df_growth_national_exp = read_csv(data_dir+ 'growth_national_expenditure.csv', delimiter = ',')
df_hh_final_consum = read_csv(data_dir+ 'hh_final_consumption.csv', delimiter = ',')
df_infl_consum_price = read_csv(data_dir+ 'inflation_consumer_price.csv', delimiter = ',')
df_population_growth = read_csv(data_dir+ 'population_growth.csv', delimiter = ',')
df_fdi_out = read_csv(data_dir+ 'fdi_net_outflows.csv', delimiter = ',')
df_fdi_in = read_csv(data_dir+ 'fdi_net_inflows.csv', delimiter = ',')
df_unemployment = read_csv(data_dir+ 'unemployment.csv', delimiter = ',')
df_military_expenditure = read_csv(data_dir+ 'military_expenditure.csv', delimiter = ',')
df_m_trade = read_csv(data_dir+ 'merchandise_trade.csv', delimiter = ',')

#extra which dont see to relate much with economic data
#df_refugee_population = read_csv(data_dir+ 'refugee_population.csv', delimiter = ',')

In [26]:
#df_trade.head()

## Define Parameters

In [27]:
# Define the economic data list
data_list = [
    (df_gdp_growth, 'gdp_growth'),  
    (df_gdp_per_capita, 'gdp_capita'),
    (df_trade, 'trade'),  
    (df_growth_national_exp, 'growth_national_exp'),
    (df_hh_final_consum, 'hh_final_consum'),  
    (df_infl_consum_price, 'infl_consum_price'),
    (df_population_growth, 'pop_growth'),  
    (df_unemployment, 'unemployment'),
    (df_fdi_out, 'fdi_outflows'),  
    (df_fdi_in, 'fdi_inflows'),
    (df_m_trade, 'merchandise_trade'),  
    (df_military_expenditure, 'military_exp'),
]

# Define the output path for the combined CSV
output_path = data_dir + 'combined_economic_data.csv'

# Define the list of country paths
country_paths = [
    ('Tanzania', tz_dir + 'tz_economic_data.csv'),
    ('Rwanda', rw_dir + 'rw_economic_data.csv'),
]

## Call a Function

In [29]:
# Call the function
combined_df = process_economic_data(data_list, output_path, country_paths)

Data for Tanzania saved to tanzania_data/economic_data/tz_economic_data.csv
Data for Rwanda saved to rwanda_data/economic_data/rw_economic_data.csv


In [30]:
combined_df.head(50)

Unnamed: 0,country,year,gdp_growth,gdp_capita,trade,growth_national_exp,hh_final_consum,infl_consum_price,pop_growth,unemployment,fdi_outflows,fdi_inflows,merchandise_trade,military_exp
0,Rwanda,2000,8.371,7.029,27.483,116.73,88.373,3.9,1.246,11.831,0.0,0.392,12.712,3.536
1,Rwanda,2001,8.485,6.981,29.197,114.66,85.539,3.343,1.395,11.851,0.0,0.941,18.661,3.396
2,Rwanda,2002,13.192,11.186,27.608,115.247,85.499,1.993,1.788,11.522,0.0,0.076,15.92,3.049
3,Rwanda,2003,2.202,-0.132,29.309,113.757,83.571,7.45,2.31,11.751,0.0,0.22,15.058,2.447
4,Rwanda,2004,7.448,4.712,33.46,112.674,81.767,12.251,2.579,11.784,0.0,0.324,16.073,1.973
5,Rwanda,2005,9.378,6.537,34.214,112.84,81.286,9.014,2.632,11.928,0.0,0.271,20.314,1.743
6,Rwanda,2006,9.227,6.355,33.22,111.454,80.348,8.883,2.665,11.65,0.0,0.923,22.229,1.644
7,Rwanda,2007,7.633,4.773,35.969,108.635,77.685,9.081,2.694,11.639,0.318,2.021,23.288,1.366
8,Rwanda,2008,11.161,8.22,37.602,115.058,81.434,15.438,2.682,11.567,0.0,1.975,27.837,1.306
9,Rwanda,2009,6.248,3.479,36.803,116.069,82.305,12.944,2.641,11.676,0.0,2.091,27.19,1.327


In [31]:
process_economic_data(data_list, output_path ,country_paths)

Data for Tanzania saved to tanzania_data/economic_data/tz_economic_data.csv
Data for Rwanda saved to rwanda_data/economic_data/rw_economic_data.csv


Unnamed: 0,country,year,gdp_growth,gdp_capita,trade,growth_national_exp,hh_final_consum,infl_consum_price,pop_growth,unemployment,fdi_outflows,fdi_inflows,merchandise_trade,military_exp
0,Rwanda,2000,8.371,7.029,27.483,116.73,88.373,3.9,1.246,11.831,0.0,0.392,12.712,3.536
1,Rwanda,2001,8.485,6.981,29.197,114.66,85.539,3.343,1.395,11.851,0.0,0.941,18.661,3.396
2,Rwanda,2002,13.192,11.186,27.608,115.247,85.499,1.993,1.788,11.522,0.0,0.076,15.92,3.049
3,Rwanda,2003,2.202,-0.132,29.309,113.757,83.571,7.45,2.31,11.751,0.0,0.22,15.058,2.447
4,Rwanda,2004,7.448,4.712,33.46,112.674,81.767,12.251,2.579,11.784,0.0,0.324,16.073,1.973
5,Rwanda,2005,9.378,6.537,34.214,112.84,81.286,9.014,2.632,11.928,0.0,0.271,20.314,1.743
6,Rwanda,2006,9.227,6.355,33.22,111.454,80.348,8.883,2.665,11.65,0.0,0.923,22.229,1.644
7,Rwanda,2007,7.633,4.773,35.969,108.635,77.685,9.081,2.694,11.639,0.318,2.021,23.288,1.366
8,Rwanda,2008,11.161,8.22,37.602,115.058,81.434,15.438,2.682,11.567,0.0,1.975,27.837,1.306
9,Rwanda,2009,6.248,3.479,36.803,116.069,82.305,12.944,2.641,11.676,0.0,2.091,27.19,1.327
