# TIME SHIFTING

When studying the correlations between the indicators and the GDP, we should take into consideration that the effects of said indicators might not be immediate, but built up over the course of the years. Thus, in this notebook we will be exploring the correlations after applying time shifting, that is, comparing the values of indicators for year X to the GDP of year X + N.

Note that we could also do the opposite: study the effect of GDP on the indicators. In order to have a full view of these interactions, we can choose to show this on the last table of the notebook by selecting negative values in the range of shifts.

## IMPORTS AND CONSTANTS

In [1]:
import os
import numpy as np
import pandas as pd
import ipywidgets as widgets
from ipywidgets import interact, interact_manual
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

from Project.Utils.norm import norm
from Project.Utils.shift_corr import shift_corr
from Project.Utils.max_corr import max_corr

PVALUE_VAR = 0.05

read_path = os.getcwd() + '/Output/'
write_path = os.getcwd() + '/Output/'

col_country = 'Country'
col_region = 'Region'
col_year = 'Year'
col_gdp = 'GDP'
col_shift = 'Shift'

## LOADING DATAFRAME

We will load the ready-to-use GoldDataframe, which will be the base for our study in this notebook.

In [2]:
# Read and display the GoldDataframe
df = pd.read_csv(read_path + 'GoldDataframe.csv', index_col = [col_country, col_region, col_year])
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,AgriShareGDP,CreditToAgriFishForest,EmploymentRural,GDP,% Soldiers,Employment in industry,Employment in services,Birth Rate,Cost business start-up,Death Rate,...,Researchers in R&D,R&D expenditure %GDP,% Rural Population,Tertiary School Gender Parity,% Vulnerable female employment,% Vulnerable male employment,Civil Liberties,Freedom of Expression,% Healthcare Investment,Population
Country,Region,Year,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
Afghanistan,South Asia,2000,54.06300,,,3342.034168,7.887961,9.48,24.680000,48.021,72.0,11.718,...,,,77.922,,98.720002,91.879999,0.400,0.625,1.21,20779957.0
Afghanistan,South Asia,2001,54.06300,,,3598.470576,5.020511,8.98,24.719999,47.505,72.0,11.387,...,,,77.831,,98.760003,92.399998,0.400,0.625,1.21,21606992.0
Afghanistan,South Asia,2002,45.13440,,,4141.523943,2.153062,9.99,25.590000,46.901,72.0,11.048,...,,,77.739,,98.669999,91.460001,0.400,0.625,1.21,22600774.0
Afghanistan,South Asia,2003,41.90340,,,4729.042179,2.208290,10.35,25.950001,46.231,72.0,10.704,...,,,77.647,,98.599998,91.039999,0.403,0.687,5.46,23680871.0
Afghanistan,South Asia,2004,35.61280,,,5388.482107,0.435599,10.61,26.120001,45.507,72.0,10.356,...,,,77.500,,98.549998,90.960003,0.403,0.677,3.60,24726689.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zimbabwe,Sub-Saharan Africa,2016,7.87399,,,20548.678073,0.759750,7.05,26.070000,32.864,121.5,8.286,...,,,67.704,,75.970001,56.789999,0.430,0.389,,14030338.0
Zimbabwe,Sub-Saharan Africa,2017,8.34095,,,22040.902301,0.750720,6.90,26.629999,31.732,110.0,8.044,...,,,67.763,,76.579998,56.609999,0.488,0.431,,14236599.0
Zimbabwe,Sub-Saharan Africa,2018,8.30469,,,24311.560545,0.738210,6.75,27.230000,30.676,110.7,7.883,...,,,67.791,,77.170002,56.380000,0.447,0.471,,14438812.0
Zimbabwe,Sub-Saharan Africa,2019,8.17322,,,21935.075306,0.738210,6.57,27.240000,29.747,76.6,7.773,...,,,67.790,,79.299999,57.090001,0.403,0.434,,14645473.0


## NORMALIZING DATA

In order to discard units and work with a homogenius scale, we will proceed to normalize the data.

In [3]:
norm_df = norm(df)

norm_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,AgriShareGDP,CreditToAgriFishForest,EmploymentRural,GDP,% Soldiers,Employment in industry,Employment in services,Birth Rate,Cost business start-up,Death Rate,...,Researchers in R&D,R&D expenditure %GDP,% Rural Population,Tertiary School Gender Parity,% Vulnerable female employment,% Vulnerable male employment,Civil Liberties,Freedom of Expression,% Healthcare Investment,Population
Country,Region,Year,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
Afghanistan,South Asia,2000,1.000000,,,0.000000,1.000000,0.052247,0.000000,1.000000,0.838875,1.000000,...,,,1.000000,,0.995694,0.966879,0.683333,0.000000,0.000000,0.000000
Afghanistan,South Asia,2001,1.000000,,,0.015588,0.616216,0.000000,0.002801,0.969410,0.838875,0.940478,...,,,0.976950,,1.000000,1.000000,0.683333,0.000000,0.000000,0.045571
Afghanistan,South Asia,2002,0.717647,,,0.048598,0.232432,0.105538,0.063725,0.933602,0.838875,0.879518,...,,,0.953647,,0.990312,0.940128,0.683333,0.000000,0.000000,0.100329
Afghanistan,South Asia,2003,0.615471,,,0.084310,0.239824,0.143156,0.088936,0.893882,0.838875,0.817659,...,,,0.930344,,0.982777,0.913376,0.733333,0.508197,1.000000,0.159844
Afghanistan,South Asia,2004,0.416541,,,0.124395,0.002564,0.170324,0.100840,0.850960,0.838875,0.755080,...,,,0.893110,,0.977395,0.908281,0.733333,0.426230,0.562353,0.217470
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zimbabwe,Sub-Saharan Africa,2016,0.041281,,,0.790525,0.055000,0.096000,0.483679,0.514903,0.074896,0.057883,...,,,0.963384,,0.575255,0.946578,0.436548,0.587940,,0.720743
Zimbabwe,Sub-Saharan Africa,2017,0.067705,,,0.873595,0.031943,0.066000,0.649851,0.364949,0.055713,0.034301,...,,,0.988215,,0.653061,0.931553,0.730964,0.798995,,0.789925
Zimbabwe,Sub-Saharan Africa,2018,0.065653,,,1.000000,0.000000,0.036000,0.827893,0.225063,0.056881,0.018612,...,,,1.000000,,0.728317,0.912354,0.522843,1.000000,,0.857749
Zimbabwe,Sub-Saharan Africa,2019,0.058214,,,0.867704,0.000000,0.000000,0.830861,0.102000,0.000000,0.007893,...,,,0.999579,,1.000000,0.971620,0.299492,0.814070,,0.927064


## DEFINING DATAFRAME-RELATED PARAMETERS

Once the Dataframe has been succesfully loaded and normalized, we will store some parameters related to it so later operations become easier to read and understand.

In [4]:
# List of countries and list of regions.
country_list = list(np.sort(norm_df.index.get_level_values(col_country).unique()))
region_list = list(np.sort(norm_df.index.get_level_values(col_region).unique()))

# Range of years.
min_year = norm_df.index.get_level_values(col_year).min()
max_year = norm_df.index.get_level_values(col_year).max()

# List of all indicators, except for the GDP.
indicators_list = norm_df.columns.tolist()
indicators_list.remove(col_gdp)
indicators_list.sort()

## COMPUTING CORRELATIONS

In order to be able to show the results in a more agile way, we pre-compute them in the following cell. The tables for each view mode will be stored in the show_dictionary, so the display cell only has to change which table is shown and what conditions use to filter it, instead of processing it once again.

The range for the shifts that can be shown is defined here, as well.

In [13]:
# The show_dict will be used to store the full tables that can be seen for each view mode, using said mode as the key.
show_list = [col_country, col_region]
show_dict = {}
for method in show_list:
    shift_corr_df = shift_corr(norm_df, method, confidence = 1.0)
    show_dict[method] = shift_corr_df
    #shift_corr_df.to_csv(write_path + 'Shifted_Corr_' + method + '.csv')

# After computing all the modes, the shifts range is established by the minimum and maximum ranges for all the tables.
shifts_range = range(
    min(min(show_dict[how].index.get_level_values(col_shift)) for how in show_list),
    max(max(show_dict[how].index.get_level_values(col_shift)) for how in show_list)
)

## DISPLAYING RESULTS

The following cell shows the strongest correlation between the indicator and the GDP, for the given area selected. Hovering over the cell, it will be displayed what shift that correlation corresponds to.

By default, only positive shifts are shown, which translates into a shifted GDP, that is, the effects of indicators over future GDP. Negative values can be selected too, which will show the effects of the GDP over indicators' future values.

In [14]:
# Define auxiliary variables for our widgets.
selector_dict = {}
selector_dict[col_country] = country_list
selector_dict[col_region] = region_list

### MAXIMUM CORRELATIONSHIPS TABLE

In the following table, the maximum values of the correlations for each indicator and each country or area will be displayed. The maximum is found comparing the all the correlations obtained from every possible shift computed, and can be viewed by hovering the mouse over the cell.

A dropdown box allows us to show the study for country or area.
Should we want to restrict our study to a given range of shifts, the slider will allow us to adjust it.

In [15]:
# Widgets: dropdown for country or area and intrangeslider for shift range.
dropdown_show = widgets.Dropdown(
    options = show_list,
    description = 'Show: '
)

intslider_shift_range = widgets.IntRangeSlider(
    value = (max(0, shifts_range.start), shifts_range.stop),
    min = shifts_range.start,
    max = shifts_range.stop,
    step = shifts_range.step,
    description = 'Shifts Range: ')

# Show maximum correlations for country or area, in the given shift range.
def table_MaxCorr(selection: str, sh_range: tuple):    

    # Load the corresponding Dataframe and apply the user-introduced restrictions.
    df = show_dict[selection]
    min_sh = sh_range[0]
    max_sh = sh_range[1]
    df = df.loc[(min_sh <= df.index.get_level_values(col_shift)) & (df.index.get_level_values(col_shift) <= max_sh)]
    max_corr_df, max_corr_index_df = max_corr(df, selection, PVALUE_VAR, raw = False)

    # Apply style and display.
    df_s = max_corr_df.style
    df_s.set_tooltips(max_corr_index_df.applymap(lambda x:
                                                    'No data' if np.isnan(x) else 
                                                    'Shift: ' + str(int(x))
                                                    
                                                )
                    )
    df_s.background_gradient(cmap='RdBu')

    display(df_s)

widgets.interact(table_MaxCorr, selection = dropdown_show, sh_range = intslider_shift_range)

interactive(children=(Dropdown(description='Show: ', options=('Country', 'Region'), value='Country'), IntRange…

<function __main__.table_MaxCorr(selection: str, sh_range: tuple)>

### INDICATOR AND GDP CHART

In the following chart, we can experiment applying different values of the shift and seeing how the degree of similarity between both lines changes. This will provide a more visual approach to study how correlation may increase or decrease with different shift values.

REMINDER: a positive shift allows us to see the effect of the indicator in the future values of the GDP, while a negative shift shows the effects of GDP on the future values of the selected indicator.

In [19]:
# Widgets: dropdowns for country or area, select a particular country or area and select an indicator; and an intslider to choose the shift to apply.

dropdown_select = widgets.Dropdown(
    options = show_list,
    description = 'Select: '
)

dropdown_show_e = widgets.Dropdown(
    options = selector_dict[dropdown_select.value],
    description = 'Show: '
)

dropdown_indicators = widgets.Dropdown(
    options = indicators_list,
    description = 'Indicator: '
)

intslider_shift = widgets.IntSlider(
    value = max(0, shifts_range.start),
    min = shifts_range.start,
    max = shifts_range.stop,
    step = shifts_range.step,
    description = 'Shifts: ')

def change_selection (selector):
    dropdown_show_e.options = selector_dict[selector]

widgets.interact(change_selection, selector = dropdown_select)

def show_chart(element, indicator, shift):
    data_s = norm_df.loc[norm_df.index.get_level_values(dropdown_select.value) == element, [col_gdp, indicator]].groupby(level = col_year).median()

    min_year_gdp = min_year + max(shift, 0)
    max_year_gdp = max_year + min(shift, 0)

    min_year_ind = min_year - min(shift, 0)
    max_year_ind = max_year - max(shift, 0)

    norm_gdp = norm(data_s.loc[min_year_gdp : max_year_gdp, [col_gdp]], None)
    norm_ind = norm(data_s.loc[min_year_ind : max_year_ind, [indicator]], None)

    plt.figure(figsize = (8,8))
    plt.plot(
                #data_s.loc[min_year_gdp:max_year_gdp, col_gdp].reset_index(drop = True),
                norm_gdp.index.get_level_values(col_year),
                norm_gdp.reset_index(drop = True),
                color = "red", label = col_gdp)
    plt.plot(
                #data_s.loc[min_year_ind:max_year_ind, indicator].reset_index(drop = True),
                norm_ind.index.get_level_values(col_year),
                norm_ind.reset_index(drop = True),
                color = "green", label = indicator)
    plt.legend(loc = "lower right")

widgets.interact(show_chart, element = dropdown_show_e, indicator = dropdown_indicators, shift = intslider_shift)

interactive(children=(Dropdown(description='Select: ', options=('Country', 'Region'), value='Country'), Output…

interactive(children=(Dropdown(description='Show: ', options=('Afghanistan', 'Albania', 'Algeria', 'Angola', '…

<function __main__.show_chart(element, indicator, shift)>

### CORRELATIONSHIPS EVOLUTION BASED ON SHIFTING

This last chart allows us to visualize how the correlationship values changes for every shift we apply. This way, we can explore the interaction between the GDP and the selected indicator as we did in the previous chart, but this time, in a more global, quantitative way.

Once again, we may choose to see countries or regions, select one of either group and the indicator whose correlation we want to visualize.

In [18]:
dropdown_select_corr = widgets.Dropdown(
    options = show_list,
    description = 'Select: '
)

dropdown_show_corr = widgets.Dropdown(
    options = selector_dict[dropdown_select.value],
    description = 'Show: '
)

dropdown_indicators_corr = widgets.Dropdown(
    options = indicators_list,
    description = 'Indicator: '
)

def change_selection_corr (selector):
    dropdown_show_corr.options = selector_dict[selector]

def show_evolution(element, indicator):
    
    df_m = show_dict[dropdown_select_corr.value]
    df_s = df_m.loc[df_m.index.get_level_values(dropdown_select_corr.value) == element, indicator]

    plt.figure(figsize = (8,8))
    plt.plot(
                df_s.index.get_level_values(col_shift),
                df_s.reset_index(drop = True),
                color = "green", label = indicator)
    plt.legend(loc = "lower right")

widgets.interact(change_selection_corr, selector = dropdown_select)

widgets.interact(show_evolution, element = dropdown_show_corr, indicator = dropdown_indicators_corr)

interactive(children=(Dropdown(description='Select: ', options=('Country', 'Region'), value='Country'), Output…

interactive(children=(Dropdown(description='Show: ', options=('Afghanistan', 'Albania', 'Algeria', 'Angola', '…

<function __main__.show_evolution(element, indicator)>