Upon examination of the results from the TimeseriesModelling file across various time windows, it was determined that testing the hypothesis that our exogenous sentiment score contains more information than other publicly available sentiment scores, as exemplified by the MCSI, is only feasible through the inclusion of lagged values of the sentiment score. In order to assess the prediction improvement, we will fit ARMAX(1,1) models on a rolling basis, using the past 36 months to predict the return and its sign of the following month. This will enable us to calculate average performance measures such as RMSE, MAPE, or accuracy (in predicting the correct return sign) for each of the models (Base, MCSI, Sentiment, MCSI + Sentiment, called after the set of included exo vars).

In [16]:
import pandas as pd
import numpy as np
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_squared_error

import warnings
warnings.filterwarnings('ignore')

df = pd.read_csv("./tsdata/modeldata.csv", index_col='Date', parse_dates=True)
df.index.freq = "MS"
df['msci_lag1'] = df['msci'].shift(1)
df['sentiment_lag1'] = df['sentiment'].shift(1)
df = df.dropna()
print(df.head())

             returns      msci  sentiment  msci_lag1  sentiment_lag1
Date                                                                
2012-02-01  0.146806  0.011673   0.148969   0.198444       -0.008786
2012-03-01  0.083715  0.035019  -0.140396   0.011673        0.148969
2012-04-01  0.001341  0.007782  -0.019119   0.035019       -0.140396
2012-05-01 -0.047849  0.112840  -0.070178   0.007782       -0.019119
2012-06-01 -0.005572 -0.237354   0.241201   0.112840       -0.070178


In [17]:
def mape(actual, pred):
    actual, pred = np.array(actual), np.array(pred)
    return np.mean(np.abs((actual - pred) / actual)) * 100

def rmse(actual, pred):
    return np.sqrt(mean_squared_error(actual, pred))

results = pd.DataFrame(columns=['Date', 'Model', 'MAPE', 'RMSE', 'Correct Sign', 'AIC', 'BIC'])

window_size = 36

#loop over each time window, starting at index 36 up to the second last value of the df
for i in range(window_size, len(df) - 1):
    #select the training data set reaching from (i - window size) to i (e.g. i = 36; reaching from index 0 to 36)
    train = df.iloc[i - window_size:i]
    #select the test data set satarting with the first value after the training window
    test = df.iloc[i:i+1]
    
    #build data sets
    y_train = train['returns']
    y_test = test['returns']

    X_msci_train = train[["msci_lag1"]]
    X_sent_train = train[["sentiment_lag1"]]
    X_both_train = train[["msci_lag1", "sentiment_lag1"]]

    X_msci_test = test[["msci_lag1"]]
    X_sent_test = test[["sentiment_lag1"]]
    X_both_test = test[["msci_lag1", "sentiment_lag1"]]

    #specify models
    models = {'Base': (y_train, None),
              'MSCI': (y_train, X_msci_train),
              'Sentiment': (y_train, X_sent_train),
              'Both': (y_train, X_both_train)}
    test_exog = {'Base': None, 'MSCI': X_msci_test, 'Sentiment': X_sent_test, 'Both': X_both_test}

    #for each of the above selected data sets for each i in the window size, loop over each of the models
    for model_name, (y, X) in models.items():
        #specify AMRA(1,1) model
        model = SARIMAX(y, X, order=(1, 0, 1), enforce_stationarity=True, enforce_invertibility=True)
        #fit model to the data
        fitted_model = model.fit(disp=False, maxiter=300)
        #predict
        prediction = fitted_model.get_forecast(steps=1, exog=test_exog[model_name]).predicted_mean
        #calculate measuers, MAPE, RMSE, and the sum of correct sign detechtion
        mape_score = mape(y_test.values, prediction.values)
        rmse_score = rmse(y_test.values, prediction.values)
        correct_sign = (np.sign(y_test.values) == np.sign(prediction.values)).astype(int)

        #Create a performance data frame where resutls for each models are stored
        new_row = pd.DataFrame({
            'Date': [test.index[0]],
            'Model': [model_name],
            'MAPE': [mape_score],
            'RMSE': [rmse_score],
            'Correct Sign': [correct_sign[0]],
            'AIC': [fitted_model.aic],
            'BIC': [fitted_model.bic]
        })
        results = pd.concat([results, new_row], ignore_index=True)

#Calculate the average performance measures for each model
average_metrics = results.groupby('Model').agg({
    'MAPE': 'mean',
    'RMSE': 'mean',
    'AIC': 'mean',
    'BIC': 'mean',
    'Correct Sign': 'mean'
}).reset_index()

average_metrics.rename(columns={
    'MAPE': 'Average MAPE',
    'RMSE': 'Average RMSE',
    'Correct Sign': 'Correct Sign Percentage'
}, inplace=True)


average_metrics

Unnamed: 0,Model,Average MAPE,Average RMSE,AIC,BIC,Correct Sign Percentage
0,Base,138.083023,0.059705,-92.833047,-88.08249,0.56383
1,Both,139.926247,0.059814,-90.445959,-82.528364,0.595745
2,MSCI,132.120311,0.059778,-91.492446,-85.15837,0.553191
3,Sentiment,142.347203,0.059677,-91.66231,-85.328234,0.62766


The results show that ARMAX (1,1) models with our own sentiment score perform on average best in terms of RMSE and of accuracy of predicting the correct return sign.