# Backtest Strategy

## Methodology:

In order to simulate predictive prowess of the sentiment factor calculated above, we implement a simple paper portfolio backtest strategy. There were quite a few alternatives in order to implement paper portfolios. Few of the things we came up with were:

- Act on the instantaneous sentiment signal calculated everyday,
    - Hold for 1 day
    - Hold for 5 days
-  Act o the instantaneous sentiment signal, calculated based on weighting given to different news depending on its impact on a given day
    - Act on the Momentum of the sentiment signal
        - Transact S \& P 500
        - Transact VIX
    - Act on the persistent sentiment signal (rolling average of X days)
        - Transact S \& P 500
        - Transact VIX

In [1]:
import numpy as np
import pandas as pd
from pandas_datareader import data as pdr
from math import *

  from pandas.util.testing import assert_frame_equal


In [2]:
path = r''
file_2 = 'data.csv'
file_1 = 'probs.csv'

In [3]:
data = pd.read_csv(path+file_2, header=[0]).iloc[:,1:]
probs = pd.read_csv(path+file_1, header=[0]).iloc[:,1:]
vix = pd.read_excel(path+'VIX.xlsx', sheet_name='Sheet2',header=[0], index_col=[0])
vix.index = pd.to_datetime(vix.index)
spx = pdr.get_data_yahoo("^GSPC", start="2006-01-01", end="2014-04-30")
spx['Returns'] = spx['Adj Close'].pct_change()
spx.loc[:, 'Vol'] = sqrt(252)*spx['Returns'].rolling(33).std()

In [4]:
merged_df = data.merge(probs, left_index=True, right_index=True)
merged_df['Date'] = pd.to_datetime(merged_df['Date'])
merged_df = merged_df.rename(columns={'0': "Negative", '1': "Positive"})
merged_df = merged_df.set_index('Date')#.loc['2012-1-1':]

merged_df = merged_df.merge(spx, left_index=True, right_index=True, how='left').dropna(subset=['Close'])

In [5]:
day_grp = merged_df.groupby(merged_df.index)['Positive'].mean().to_frame('Positive')
day_grp['Negative'] = merged_df.groupby(merged_df.index)['Negative'].mean()
day_grp['Positive_Max'] = merged_df.groupby(merged_df.index)['Positive'].max()
day_grp['Positive_75'] = merged_df.groupby(merged_df.index)['Positive'].quantile(.75)
day_grp['Positive_25'] = merged_df.groupby(merged_df.index)['Positive'].quantile(.25)
day_grp['Positive_10'] = merged_df.groupby(merged_df.index)['Positive'].quantile(.1)
day_grp['Negative_Max'] = merged_df.groupby(merged_df.index)['Negative'].max()
day_grp['Negative_75'] = merged_df.groupby(merged_df.index)['Negative'].quantile(.75)
day_grp['Negative_25'] = merged_df.groupby(merged_df.index)['Negative'].quantile(.25)
day_grp['Negative_10'] = merged_df.groupby(merged_df.index)['Negative'].quantile(.10)
day_grp = day_grp.merge(spx, left_index=True, right_index=True, how='left')
day_grp = day_grp.sort_index(axis=0)

In [6]:
day_grp['Rolling_Positive'] = day_grp['Positive'].rolling(20).mean()

Looking at the correlation among several different factors below:

In [7]:
day_grp[['Rolling_Positive','Positive', 'Returns', 'Vol', 'Close']].corr()

Unnamed: 0,Rolling_Positive,Positive,Returns,Vol,Close
Rolling_Positive,1.0,0.393633,-0.017788,0.08447,0.292111
Positive,0.393633,1.0,0.025746,0.074213,0.119236
Returns,-0.017788,0.025746,1.0,0.112009,0.030711
Vol,0.08447,0.074213,0.112009,1.0,-0.459332
Close,0.292111,0.119236,0.030711,-0.459332,1.0


We note that the correlation of rolling positive, which is an average sentiment of past 20 days, to have significant correlation with close.

Thus, we implemented the one with the following characteristics:


We implement the above on the test dataset which ranges from 15th Dec, 2011 to 20th Nov, 2013. With averaging, we have signals from 17th Jan, 2012 to 20th Nov, 2013. This gives us a total of approximately 2 years of signal data to test on. Because of the bias in the signal, we anchor on 0.4, with factor signalling buy if p $\ge$ 0.4, and sell if p $\le$ 0.4.

## BackTest: Signal and Performance

Implementing the above signal we get the following paper portfolio characteristics:



We calculate the different variables as:

   - Signal: We calculate the signal based on the previous 20 days for a given day. Then we adjust by bias, and take sign of signal where we buy if signal > 0 and sell if signal < 0
   - Information Coefficient (IC): correlation coefficient of signal on a given day with returns of the day
   - Returns: We calculate the returns of the paper portfolio on a given day as signal times returns 
        - Stats such as mean, volatility)
   - Risk free rate (RFR): prevailing risk free rate of return during 2012, 2\% p.a.
   - Sharpe Ratio (SR): Paper portfolio risk adjusted excess returns measure
   - Long Only: another simulated portfolio, where we buy and hold S\&P500 for the given period


Here, we note that the sentiment signal indeed has some predictability over short-term such as next day returns. One could ask that the index itself performed well and that we had a long-only sort of portfolio. In order to evaluate better, we consider a long only portfolio and compare various statistics against that. Here, see that there were marginal improvements over long only portfolio and thus, a sign of improvements of using sentiment factor as opposed to long only.

In [8]:
day_grp['Factor'] = day_grp['Positive'].rolling(20).mean()
day_grp['Factor'] = (day_grp['Factor'] - 0.4)  # Demeaning Anchor
day_grp['Factor'] = day_grp['Factor'].shift(1)  # Applying delay to trade next day
day_grp['Signal'] = day_grp['Factor'].apply(lambda x: 1 if x >= 0 else -1) # Buy when factor > 0, Sell when factor < 0

In [9]:
day_grp = day_grp.dropna(subset=['Factor'])
sf_r = day_grp['Returns']*day_grp['Signal']
sp_r = day_grp['Returns']

back_test_strat_output = pd.DataFrame(index=['IC', 'Mean (Monthly)', 'Vol (Monthly)', 'SR'],
                                      columns=['Sentiment Factor', 'Long Only'])

In [10]:
mean_sf = sf_r.mean()*252
std_sf = sf_r.std()*sqrt(252)
rf = 2./100.
sr_sf = (mean_sf - rf)/std_sf

back_test_strat_output.loc['IC', 'Sentiment Factor'] = day_grp[['Returns', 'Signal']].corr().iloc[1,0]
back_test_strat_output.loc['Mean (Monthly)', 'Sentiment Factor'] = mean_sf/12.
back_test_strat_output.loc['Vol (Monthly)', 'Sentiment Factor'] = std_sf/sqrt(12)
back_test_strat_output.loc['SR', 'Sentiment Factor'] = sr_sf

In [11]:
mean_sp = sp_r.mean()*252
std_sp = sp_r.std()*sqrt(252)
sr_sp = (mean_sp - rf)/std_sp
back_test_strat_output.loc['IC', 'Long Only'] = np.NaN
back_test_strat_output.loc['Mean (Monthly)', 'Long Only'] = mean_sp/12.
back_test_strat_output.loc['Vol (Monthly)', 'Long Only'] = std_sp/sqrt(12)
back_test_strat_output.loc['SR', 'Long Only'] = sr_sp

In [12]:
pd.options.display.float_format = '{:,.3f}'.format
back_test_strat_output = back_test_strat_output.round(2)
back_test_strat_output

Unnamed: 0,Sentiment Factor,Long Only
IC,0.065,
Mean (Monthly),0.017,0.015
Vol (Monthly),0.035,0.035
SR,1.535,1.337
