## Vector Autoregression (VAR) Model for Economic and Financial Data Analysis
This jupyter notebook provides an example of how to perform a Vector Autoregression (VAR) analysis on economic and financial data for a specific country. 
The data is sourced from two CSV files: 'world_bank_data.csv' and 'stock_data.csv' (downloaded from data.py file).

### Merging datasets
The 'world_bank_data.csv' file contains economic data from the World Bank, while 'stock_data.csv' contains financial data. The two datasets are merged based on the year and the country code.

### The 'var_country' function
The var_country function is the core of this script. It takes as input the merged data and a country code, and performs the following steps:

- Data Preprocessing: The function first selects the 'value' and 'Close' columns from the data, which represent the economic and financial indicators, respectively. These columns are converted to numeric types and then standardized using the StandardScaler from sklearn.preprocessing.

- Data Filtering: The function then filters the data for the specified country. If no data is available for the country, or if the data contains NaN or infinite values, the function returns an error message.

- Stationarity Check: Before applying the VAR model, the function checks if the time series are stationary using the Augmented Dickey-Fuller test. If the p-value is greater than 0.05, the function suggests differencing the series and returns.

- VAR Model: If the series are stationary, the function creates a VAR model using the VAR class from statsmodels.tsa.vector_ar.var_model. It fits the model with a maximum lag order of 12, selected based on the Akaike information criterion (AIC).

- Results: Finally, the function prints the summary of the VAR model, which includes the coefficients, standard errors, t-statistics, and p-values for each predictor, as well as some overall model fit statistics.

### Usage
To use this script, simply replace 'DEU' with your desired country code in the last line of the script, and then run the script. The output will be the VAR model summary for the specified country.



In [None]:
# Import pandas, a data manipulation and analysis library
import pandas as pd

# Import VAR model from statsmodels for time series analysis
from statsmodels.tsa.vector_ar.var_model import VAR

# Import StandardScaler from sklearn for standardizing data
from sklearn.preprocessing import StandardScaler

# Import numpy for numerical computing
import numpy as np

# Import adfuller (Augmented Dickey-Fuller) function from statsmodels.
from statsmodels.tsa.stattools import adfuller

In [None]:
# Load the World Bank data from a CSV file (downloaded from 'data.py' file) into a pandas DataFrame 
economic_data = pd.read_csv('world_bank_data.csv')

# Load the stock data from a CSV file (downloaded from 'data.py' file) into a pandas DataFrame
financial_data = pd.read_csv('stock_data.csv')

# Convert the 'date' column in the economic_data DataFrame from string to datetime format, 
# then extract the year and replace the original 'date' column with the year
economic_data['date'] = pd.to_datetime(economic_data['date'], format='%Y').dt.year

# Convert the 'Date' column in the financial_data DataFrame from string to datetime format, 
# then extract the year and replace the original 'Date' column with the year
financial_data['Date'] = pd.to_datetime(financial_data['Date']).dt.year

# Merge the economic_data and financial_data DataFrames based on the year and country code.
# The 'left_on' parameter specifies the columns to use from the left DataFrame (economic_data),
# and the 'right_on' parameter specifies the columns to use from the right DataFrame (financial_data).
merged_data = pd.merge(economic_data, financial_data, left_on=['date', 'countryiso3code'], right_on=['Date', 'Country'])

In [None]:
# Define a function to perform Vector Autoregression (VAR) on data for a specific country
def var_country(data, country):
    # Select the 'value' and 'Close' columns from the data, convert them to numeric types, 
    # and handle any errors during conversion by replacing the problematic values with NaN
    data_for_regression = data[['value', 'Close']].apply(pd.to_numeric, errors='coerce')

    # Initialize a StandardScaler object to standardize the features to have mean=0 and variance=1
    scaler = StandardScaler()
    # Fit the scaler to the data and transform the data
    # Then convert the result back to a DataFrame and keep the original column names
    data_for_regression = pd.DataFrame(scaler.fit_transform(data_for_regression), columns=data_for_regression.columns)

    # Print a message indicating the start of the analysis for the current country
    print(f"Analyzing data for {country}...")
    
    # Filter the standardized data to include only the rows for the current country
    data_country = data_for_regression[data['Country'] == country]

    # If the filtered data is empty (i.e., there's no data for the current country), print a message and exit the function
    if data_country.empty:
        print(f"No data available for {country}")
        return

    # If the filtered data contains any NaN or infinite values, print a message and exit the function
    if data_country.isna().any().any() or np.isinf(data_country).any().any():
        print(f"Data for {country} contains NaN or infinite values")
        return

    # For each column in the filtered data, perform the Augmented Dickey-Fuller test to check for stationarity
    # If the p-value is greater than 0.05, the series is likely non-stationary, so print a message and exit the function
    for column in data_country.columns:
        result = adfuller(data_country[column])
        if result[1] > 0.05:
            print(f"The {column} series is not stationary. Consider differencing.")
            return

    # Create a VAR model with the filtered data
    model = VAR(data_country)

    # Fit the model with a maximum lag order of 12, selected based on the Akaike information criterion (AIC)
    results = model.fit(maxlags=12, ic='aic')

    # Print the summary of the model fit, which includes the coefficients, standard errors, t-statistics, and p-values
    print(results.summary())
    print("\n\n")  # Print two newline characters for better readability in the output

In [None]:
# Specify the usage of the function var_country

# Set the variable 'country' to the ISO code of the country you want to analyse.
# In this case, 'DEU' stands for Germany. Replace 'DEU' with the code of your desired country.
country = 'DEU'

# Call the function var_country with the merged data and the specified country.
# This will perform a Vector Autoregression (VAR) analysis on the data for the specified country.
var_country(merged_data, country)

### Comment

The Vector Autoregression (VAR) model was used to analyze the data for Germany (DEU). The model was fitted with a maximum lag order of 12, selected based on the Akaike information criterion (AIC).

For the 'value' equation, the most significant predictor is the first lag of 'value' itself (L1.value), with a coefficient of 0.997370 and a p-value of 0.000, indicating strong evidence against the null hypothesis of the coefficient being zero. However, the coefficients for other lags of 'value' and 'Close' are not statistically significant (p-values are close to 1).

For the 'Close' equation, the constant term and the lags of 'Close' are statistically significant (p-values close to 0). The coefficients for the lags of 'value' are not statistically significant.

The correlation matrix of residuals shows a very low correlation between the residuals of 'value' and 'Close', suggesting that there is little linear relationship left in the residuals of the two series, which is a good sign for the model fit.

### Note

Please note that the interpretation of VAR model results can be complex and requires a good understanding of the subject matter and the data. The statistical significance of a variable does not necessarily imply its practical significance. It's also important to check other diagnostic measures and consider the model's assumptions and limitations.