## Multi time series prediction - Granger causality 
---

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another [more](https://en.wikipedia.org/wiki/Granger_causality). 

In our case, For example, knowing that cluster 3 contain Italy and Switzerland, Do we have statistical evidence that Switzerland has strong predictive power toward Italy?

Building and testing different ways to implement Granger Causality Test:
1. Using it to extract list of countries that have at least one lag with p < 0.05
2. Using it to extract countries and their lags that has p < 0.05

We will follow the App structure we built (partially in FB Prophet notebook).

**Data Preparation**

In [7]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from statsmodels.tsa.stattools import grangercausalitytests

In [8]:
gdp_df_imf = pd.read_excel('/content/gdp_imf_clustered.xlsx')

In [9]:
print('List of countries:\n',gdp_df_imf['country'].values.tolist())
target_country = input('\n\nWhich country you want to project or calculate the "GDP" to? write as in list above').title()

List of countries:
 ['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cabo Verde', 'Cambodia', 'Cameroon', 'Canada', 'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Comoros', 'Costa Rica', 'Croatia', 'Cyprus', 'Czech Republic', "Côte d'Ivoire", 'Democratic Republic of the Congo', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Eswatini', 'Ethiopia', 'Fiji', 'Finland', 'France', 'Gabon', 'Georgia', 'Germany', 'Ghana', 'Greece', 'Grenada', 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Honduras', 'Hong Kong SAR', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iraq', 'Ireland', 

In [10]:
target_country

'Canada'

In [11]:
# get the cluster
target_cluster = gdp_df_imf[gdp_df_imf['country']==target_country]['Cluster'].values[0]
# get the target df
target_df = gdp_df_imf[gdp_df_imf['Cluster']==target_cluster]
# list of countries in the cluster
target_cluster_countries = target_df['country'].tolist()

In [12]:
# get list of countries to predict to 
# make it in an if else statement

print(f'{target_country} shares cluster {target_cluster} with:\n{target_cluster_countries}\n\n')

Canada shares cluster 3 with:
['Australia', 'Brazil', 'Canada', 'France', 'India', 'Indonesia', 'Italy', 'Korea', 'Mexico', 'Netherlands', 'Russia', 'Spain', 'Turkey', 'United Kingdom']




In [13]:
user_choice = input('Do you want to provide projections for one of the countries? answer with yes or no\n ').lower()

countries_to_drop = []
countries_to_drop.append(target_country)

countries_to_predict = []

if user_choice == 'no':
  print('No Country Provided, we will predict them all💪\n')
  #countries_to_predict = [c for c in target_cluster_countries if c != countries_to_drop]
else:
  user_input_proj = input('What country you have projection to? write the name as in list above!\n ').title()
  if user_input_proj not in target_cluster_countries:
    print(f'country you provide {user_input_proj} is not in the list of countries in cluster {target_cluster}\nOr its not written correctly')
  else:
    countries_to_drop.append(user_input_proj)
    #countries_to_predict = [c for c in target_cluster_countries if c not in countries_to_drop]



countries_to_predict = [c for c in target_cluster_countries if c not in countries_to_drop]

Do you want to provide projections for one of the countries? answer with yes or no
 no
No Country Provided, we will predict them all💪



In [14]:
countries_to_predict, len(countries_to_predict)

(['Australia',
  'Brazil',
  'France',
  'India',
  'Indonesia',
  'Italy',
  'Korea',
  'Mexico',
  'Netherlands',
  'Russia',
  'Spain',
  'Turkey',
  'United Kingdom'],
 13)

In [86]:
# preprocessing the dataframe before running the prediction

processed_df = target_df.drop(columns='Cluster').T.reset_index()
processed_df.columns = processed_df.loc[0]
processed_df = processed_df.iloc[1:,:]
processed_df.rename(columns={"country": "year" }, inplace = True)
processed_df['year'] = pd.to_datetime(processed_df['year'], format='%Y')
processed_df.head()

Unnamed: 0,year,Australia,Brazil,Canada,France,India,Indonesia,Italy,Korea,Mexico,Netherlands,Russia,Spain,Turkey,United Kingdom
1,1980-01-01,162.628,145.819,276.035,702.243,189.438,99.296,482.019,65.368,228.606,193.758,0.0,230.759,96.596,603.983
2,1981-01-01,188.067,167.583,307.246,618.954,196.535,110.848,437.124,72.934,293.61,162.4,0.0,204.588,97.865,587.652
3,1982-01-01,186.709,179.166,314.639,588.015,203.537,113.799,432.001,78.349,213.077,157.338,0.0,197.643,88.918,558.72
4,1983-01-01,179.151,143.652,341.863,562.499,222.049,103.149,448.304,87.761,173.714,153.179,0.0,172.856,84.968,532.476
5,1984-01-01,196.777,142.957,356.728,532.339,215.556,107.218,442.925,97.511,204.86,142.578,0.0,172.381,82.642,504.571


**Granger Causality Test**

In [16]:
#
test = processed_df[['Italy','Canada']]
test.head()


Unnamed: 0,Italy,Canada
1,482.019,276.035
2,437.124,307.246
3,432.001,314.639
4,448.304,341.863
5,442.925,356.728


In [17]:
gc_res = grangercausalitytests(test, 7)


Granger Causality
number of lags (no zero) 1
ssr based F test:         F=0.0010  , p=0.9754  , df_denom=38, df_num=1
ssr based chi2 test:   chi2=0.0010  , p=0.9743  , df=1
likelihood ratio test: chi2=0.0010  , p=0.9743  , df=1
parameter F test:         F=0.0010  , p=0.9754  , df_denom=38, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=0.6179  , p=0.5448  , df_denom=35, df_num=2
ssr based chi2 test:   chi2=1.4124  , p=0.4935  , df=2
likelihood ratio test: chi2=1.3880  , p=0.4996  , df=2
parameter F test:         F=0.6179  , p=0.5448  , df_denom=35, df_num=2

Granger Causality
number of lags (no zero) 3
ssr based F test:         F=0.6214  , p=0.6063  , df_denom=32, df_num=3
ssr based chi2 test:   chi2=2.2721  , p=0.5179  , df=3
likelihood ratio test: chi2=2.2084  , p=0.5303  , df=3
parameter F test:         F=0.6214  , p=0.6063  , df_denom=32, df_num=3

Granger Causality
number of lags (no zero) 4
ssr based F test:         F=0.8209  , p=0.5225  , df_d

In [61]:
for i in processed_df.iloc[:,1:]:
  print(f'Granger causality test of Canada by {i}\n')
  testing_df = processed_df[['Canada',i]]
  gc_res = grangercausalitytests(testing_df, 12)
  print('='*100)

Granger causality test of Canada by Australia


Granger Causality
number of lags (no zero) 1
ssr based F test:         F=0.8175  , p=0.3716  , df_denom=38, df_num=1
ssr based chi2 test:   chi2=0.8820  , p=0.3476  , df=1
likelihood ratio test: chi2=0.8727  , p=0.3502  , df=1
parameter F test:         F=0.8175  , p=0.3716  , df_denom=38, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=3.4502  , p=0.0429  , df_denom=35, df_num=2
ssr based chi2 test:   chi2=7.8862  , p=0.0194  , df=2
likelihood ratio test: chi2=7.1979  , p=0.0274  , df=2
parameter F test:         F=3.4502  , p=0.0429  , df_denom=35, df_num=2

Granger Causality
number of lags (no zero) 3
ssr based F test:         F=2.2870  , p=0.0974  , df_denom=32, df_num=3
ssr based chi2 test:   chi2=8.3620  , p=0.0391  , df=3
likelihood ratio test: chi2=7.5761  , p=0.0556  , df=3
parameter F test:         F=2.2870  , p=0.0974  , df_denom=32, df_num=3

Granger Causality
number of lags (no zero) 4
ssr bas

In [57]:
df_tr = processed_df.iloc[:,1:]
df_tr.head()

Unnamed: 0,Australia,Brazil,Canada,France,India,Indonesia,Italy,Korea,Mexico,Netherlands,Russia,Spain,Turkey,United Kingdom
1,162.628,145.819,276.035,702.243,189.438,99.296,482.019,65.368,228.606,193.758,0.0,230.759,96.596,603.983
2,188.067,167.583,307.246,618.954,196.535,110.848,437.124,72.934,293.61,162.4,0.0,204.588,97.865,587.652
3,186.709,179.166,314.639,588.015,203.537,113.799,432.001,78.349,213.077,157.338,0.0,197.643,88.918,558.72
4,179.151,143.652,341.863,562.499,222.049,103.149,448.304,87.761,173.714,153.179,0.0,172.856,84.968,532.476
5,196.777,142.957,356.728,532.339,215.556,107.218,442.925,97.511,204.86,142.578,0.0,172.381,82.642,504.571


In [87]:
maxlag=12
test = 'ssr_ftest'

def grangers_causation_matrix(data, variables, test='ssr_ftest', verbose=False):    
   
    df = pd.DataFrame(np.zeros((len(variables), len(variables))), columns=variables, index=variables)
    for c in df.columns:
        for r in df.index:
            test_result = grangercausalitytests(data[[r, c]], maxlag=maxlag, verbose=False)
            p_values = [round(test_result[i+1][0][test][1],4) for i in range(maxlag)]
            if verbose: print(f'Y = {r}, X = {c}, P Values = {p_values}')
            min_p_value = np.min(p_values)
            df.loc[r, c] = min_p_value
    df.columns = [var + '_x' for var in variables]
    df.index = [var + '_y' for var in variables]
    return df

grangers_causation_matrix(df_tr, variables = df_tr.columns)

Unnamed: 0,Australia_x,Brazil_x,Canada_x,France_x,India_x,Indonesia_x,Italy_x,Korea_x,Mexico_x,Netherlands_x,Russia_x,Spain_x,Turkey_x,United Kingdom_x
Australia_y,1.0,0.123,0.0027,0.0359,0.006,0.02,0.0469,0.0107,0.0003,0.0065,0.0106,0.0018,0.1879,0.0026
Brazil_y,0.0091,1.0,0.0002,0.0038,0.0006,0.0011,0.0036,0.0099,0.0024,0.0007,0.0035,0.0001,0.0259,0.0002
Canada_y,0.0429,0.0173,1.0,0.1757,0.1107,0.002,0.2339,0.0041,0.0029,0.0326,0.0036,0.0568,0.0894,0.0184
France_y,0.3221,0.0155,0.1004,1.0,0.0572,0.2064,0.2262,0.0383,0.0159,0.0838,0.0775,0.0408,0.3247,0.0
India_y,0.0525,0.0052,0.0016,0.0379,1.0,0.567,0.0498,0.0004,0.0002,0.0034,0.0073,0.008,0.0178,0.0
Indonesia_y,0.1089,0.0884,0.0194,0.0228,0.0248,1.0,0.0109,0.0022,0.0036,0.0098,0.1085,0.002,0.0087,0.0008
Italy_y,0.5121,0.1217,0.275,0.2183,0.0731,0.4046,1.0,0.1003,0.035,0.3569,0.0871,0.1047,0.1842,0.0037
Korea_y,0.5715,0.6745,0.2293,0.0977,0.639,0.3592,0.2697,1.0,0.0155,0.3333,0.2687,0.2432,0.2957,0.0416
Mexico_y,0.1858,0.2761,0.1152,0.0261,0.1959,0.0876,0.0125,0.0925,1.0,0.0216,0.3312,0.0289,0.3352,0.018
Netherlands_y,0.3142,0.0122,0.1038,0.1565,0.0258,0.1354,0.6267,0.0082,0.0035,1.0,0.066,0.1183,0.1606,0.0


In [100]:
list_ps = []
list_lags = []
countries = []

c_dic = dict()

lagsNumber = 12
threshold = 0.05

target_country_name = 'Canada'

for country in processed_df.iloc[:,1:]:
  print(f'Granger causality test of {target_country_name} by {country}\n')
  testing_df = processed_df[[target_country_name,country]]
  stats_test = grangercausalitytests(testing_df,maxlag = lagsNumber,verbose=False)
  # loop over the rsults from each country's test 
  for c in range(lagsNumber):
    p_value = round(stats_test[c+1][0]['ssr_ftest'][1],4)
    if p_value < threshold:
      countries.append(country)
      list_ps.append(p_value)
      lag = stats_test[c+1][0]['ssr_ftest'][3]
      list_lags.append(lag)
  
  c_dic[country] = list_lags
    # else:
    #   print('P_Value Over the threshold')
  print(f'Lags that are statistically significant: {list_lags}\nWith P_Values {list_ps} respectively')
  print('='*100)

Granger causality test of Canada by Australia

Lags that are statistically significant: [2]
With P_Values [0.0429] respectively
Granger causality test of Canada by Brazil

Lags that are statistically significant: [2, 1, 3, 4, 5, 6]
With P_Values [0.0429, 0.0256, 0.0173, 0.0261, 0.04, 0.0333] respectively
Granger causality test of Canada by Canada

Lags that are statistically significant: [2, 1, 3, 4, 5, 6]
With P_Values [0.0429, 0.0256, 0.0173, 0.0261, 0.04, 0.0333] respectively
Granger causality test of Canada by France

Lags that are statistically significant: [2, 1, 3, 4, 5, 6]
With P_Values [0.0429, 0.0256, 0.0173, 0.0261, 0.04, 0.0333] respectively
Granger causality test of Canada by India

Lags that are statistically significant: [2, 1, 3, 4, 5, 6]
With P_Values [0.0429, 0.0256, 0.0173, 0.0261, 0.04, 0.0333] respectively
Granger causality test of Canada by Indonesia

Lags that are statistically significant: [2, 1, 3, 4, 5, 6, 12]
With P_Values [0.0429, 0.0256, 0.0173, 0.0261, 0.0

In [101]:
important_countries = list(dict.fromkeys(countries))
important_countries

['Australia',
 'Brazil',
 'Indonesia',
 'Korea',
 'Mexico',
 'Netherlands',
 'Russia',
 'United Kingdom']

In [98]:
df_test = processed_df[['Canada','United Kingdom']]

lags = 13

testing = grangercausalitytests(df_test,maxlag=lags, verbose=False)

testing[1][0]['ssr_ftest']#[1]

# for i in range(lags):
#   print(testing[i+1][0]['ssr_ftest'][3])

# lagsList = []
# p_list = []

# for x in range(13):
  
#   p_value1 = round(testing[x+1][0]['ssr_ftest'][1],4)
#   #p_list.append(p_value1)
#   lag1 = testing[x+1][0]['ssr_ftest'][3]
#   #lagsList.append(lag1)
#   print(p_value1, lag1)
#   print('---'*5)



# lagsList, p_list


(1.0305918795634827, 0.3164401425150084, 38.0, 1)

# End