## Quant Trading and Analysis with Python: Final Exam 2021 (Retake)

### Solutions

Many academic papers report a so-called low risk anomaly, e.g., betting against beta, etc.

Your task is to provide a simple test of such anomaly using a particular trading strategy:

* Using Pandas Datareader download daily returns from 01.01.2000 for 25 Fama-French portfolios
sorted by the ME and BTM characteristics `25_Portfolios_5x5_Daily` (Rename the portfolios as `ptf1` to `ptf25`).
Organize returns into a data frame indexed by dates. Download daily returns for the 3-factor model (market, size
value) from 01.01.2000.

* For each portfolio on each day compute the past volatility, and the average past correlation with other portfolios
over the past 21 trading days (use window = 21). Check how the levels of past volatility and average correlations are
related for each pair of stocks.

* Using a simple unconditional cross-sectional test, check how average returns are related to the average 21-day volatility
and average 21-day correlation computed in the previous task.

* Instead of a regression, test the same claim using daily rebalanced zero-cost portfolios, in which on each day you go long
 stocks with the value of characteristic in the top 20%, and short -- stocks with the value of characteristic in the bottom 20%.
 Use equal weighting for both long and short sides. Use separately average volatility and average correlation computed on each day to rebalance portfolios after the same day's close.
Compute the average return, and Sharpe ratios of both portfolios.

* Discuss the results. Reflect on your evidence about the low risk anomaly.

In [60]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import pandas_datareader as web
import datetime

In [61]:
startdt  = datetime.datetime(2000, 1, 1)
# sets = web.famafrench.get_available_datasets()

d1          = web.DataReader('25_Portfolios_5x5_Daily','famafrench',start = startdt)
ret         = d1[0]/100 # save them just in case
ret.columns = ['ptf' + str(z) for z in np.arange(1,ret.shape[1]+1)]

d1  = web.DataReader('F-F_Research_Data_Factors','famafrench',start = startdt)
ff  = pd.DataFrame(d1[0]/100)

In [62]:
vols = ret.rolling(window = 21).std().dropna()
corrs = ret.rolling(window = 21).corr().mean(axis = 1).unstack()-1/25 # sime adjustment for diagonal of 1
corrs = corrs.reindex_like(vols)

print(corrs.head())

clevel = [np.corrcoef(x = vols[z], y = corrs[z])[0,1] for z in vols.columns]
print(clevel)

# the correlations in levels are positive, but not extremely high

                ptf1      ptf2      ptf3      ptf4      ptf5      ptf6  \
Date                                                                     
2000-02-01  0.605859  0.647110  0.610917  0.743167  0.657495  0.707407   
2000-02-02  0.635544  0.638084  0.680411  0.735563  0.653285  0.701130   
2000-02-03  0.547596  0.559851  0.607489  0.692102  0.590851  0.637303   
2000-02-04  0.527869  0.554290  0.604135  0.686439  0.588710  0.629772   
2000-02-07  0.524813  0.549771  0.608247  0.690344  0.591095  0.629449   

                ptf7      ptf8      ptf9     ptf10  ...     ptf16     ptf17  \
Date                                                ...                       
2000-02-01  0.701203  0.716835  0.650754  0.726479  ...  0.678073  0.722287   
2000-02-02  0.695804  0.707227  0.638016  0.721645  ...  0.697756  0.720999   
2000-02-03  0.628213  0.656494  0.535786  0.660808  ...  0.650755  0.654750   
2000-02-04  0.628331  0.646950  0.535854  0.660855  ...  0.662062  0.652767   
2000-02

In [63]:
mret = ret.mean(axis = 0)*251
mret.name = 'Mean returns, p.a.'# annualize for simpler interpretation
mvols = vols.mean(axis = 0)*np.sqrt(251)
mvols.name = 'Average Volatility'
mcorrs = corrs.mean(axis = 0)
mcorrs.name = 'Average Correlation'

#
mvols = sm.add_constant(mvols)
mcorrs = sm.add_constant(mcorrs)

#
resvols = sm.OLS(mret, mvols).fit()
rescorrs = sm.OLS(mret, mcorrs).fit()

#
print(resvols.summary())
print(rescorrs.summary())


                            OLS Regression Results                            
Dep. Variable:     Mean returns, p.a.   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.043
Method:                 Least Squares   F-statistic:                  0.002021
Date:                Fri, 07 May 2021   Prob (F-statistic):              0.965
Time:                        15:44:59   Log-Likelihood:                 58.361
No. Observations:                  25   AIC:                            -112.7
Df Residuals:                      23   BIC:                            -110.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                  0.1198      0

The results are as follows:
                          coef    std err          t      P>|t|      [0.025      0.975]

Average Volatility     0.0102      0.227      0.045      0.965      -0.459       0.479

Average Correlation     0.4938      0.107      4.628      0.000       0.273       0.714

So the average volatility does not seem to be related to returns, but the average correlation has a strong
positive relation to returns

In [64]:
# put the data into one panel
retl = ret.shift(-1).stack() # change to tomorrow's return
retl.name = 'ret_fw'

volsl = vols.stack()
volsl.name = 'vols'

corrsl = corrs.stack()
corrsl.name = 'corrs'

data = pd.concat([retl,volsl, corrsl], axis = 1).dropna()
data

Unnamed: 0_level_0,Unnamed: 1_level_0,ret_fw,vols,corrs
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2000-02-01,ptf1,0.0119,0.020788,0.605859
2000-02-01,ptf10,0.0166,0.013730,0.726479
2000-02-01,ptf11,0.0213,0.024786,0.716138
2000-02-01,ptf12,0.0103,0.018776,0.749728
2000-02-01,ptf13,0.0018,0.012780,0.640608
...,...,...,...,...
2021-03-30,ptf5,0.0073,0.032028,0.670461
2021-03-30,ptf6,0.0306,0.029019,0.665896
2021-03-30,ptf7,0.0141,0.022665,0.776958
2021-03-30,ptf8,0.0107,0.020406,0.783966


In [65]:
def long_short_ptf(data, charact, perf_ret = 'ret_fw', perc = 20):
    col_name = perf_ret
    l,h = np.percentile(data[charact].values, [perc, 100-perc])
    longs_ = data[data[charact] >= h]
    shorts_ = data[data[charact] <= l]
    res = np.append(longs_[col_name].values / longs_.index.size, - shorts_[col_name].values / shorts_.index.size)
    if len(res) == 0:
        return np.nan
    else:
        return res.sum()
#
str_vols = data.groupby(['Date']).apply(lambda x: long_short_ptf(data = x, charact = 'vols'))
str_corrs = data.groupby(['Date']).apply(lambda x: long_short_ptf(data = x, charact = 'corrs'))

In [66]:
mn_vols, vol_vols =  str_vols.mean()*252, str_vols.std()*np.sqrt(252)
mn_corrs, vol_corrs =  str_corrs.mean()*252, str_corrs.std()*np.sqrt(252)

print(f'Vola-based performance: Mean {mn_vols*100: .2f} % p.a., Volatility {vol_vols*100: .2f} % p.a., SR {mn_vols/vol_vols: .2f}')
print(f'Corrs-based performance: Mean {mn_corrs*100: .2f} % p.a., Volatility {vol_corrs*100: .2f} % p.a., SR {mn_corrs/vol_corrs: .2f}')

Vola-based performance: Mean -4.74 % p.a., Volatility  14.73 % p.a., SR -0.32
Corrs-based performance: Mean  0.76 % p.a., Volatility  9.07 % p.a., SR  0.08


#########  Q5

Both higher volatility and higher correlations with the other stocks indicate higher risk, and shall be compensated
by expected returns. Using unconditional cross-sectional tests we observe that correlation is positively related
to returns in the cross-section, while volatility is insignificant. Using a conditional strategy with very simple rules,
we find a negative link for volatility, and a very wek positive link for correlations. It seems that for this particular
choice of parameters a conditional strategy of betting against higher risk might work, though we need more tests to
make a conclusion. We would need to see the factor exposure of the resulting strategy, check different rebalancing and
weighting rules, and, certainly, establish the significance of the strategy performance in statistical terms.





