## Case 1: Innovating in Active ETFs

Download the file `data_case1.csv` from the section `Modules/Week 6`. Make sure that the file is in the same data folder that you typically use for the other codes. 

The file contains data from May 2013 until July 2023. The first column contains the date, then the 3 Fama and French factors (the excess return on the market, `mktrf`, the size factor, `smb`, and the value factor, `hml`), then the momentum factor (`umd`), followed by 4 ETFs: `chep` from Quantshares, `mom` from Quantshares, `vbr` from Vanguard, and `mtum` from iShares. The ETF returns are in excess of 30-day T-bill rate.

The goal of the `chep` ETF is to provide exposure to the value factor, while the goal of `mom` is to provide exposure to the momentum factor. `vbr` is a small-cap value ETF, while the goal of `mtum` is to provide exposure to the momentum factor.

The ETFs from Quantshares were discontinued in 2020 and 2021, while those from iShares and Vanguard are still traded and very successful. We first analyze the performance of the different ETFs, and then explore several key strategic questions when introducing ETFs.

We start by initializing Python.

In [4]:
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col

We then load the data and print the first couple of lines to understand the structure of the data.

In [6]:
# Load data
df = pd.read_csv("/Users/siyuanguo/PycharmProjects/QPM/case1/Data/data_case1.csv", index_col="date")

print(df.head())
print(df.tail())

          mktrf     smb     hml     umd      chep       mom       vbr  \
date                                                                    
5/31/13  0.0280  0.0170  0.0263 -0.0202  0.046602 -0.024174  0.031136   
6/28/13 -0.0120  0.0133  0.0003  0.0052  0.000000 -0.006564 -0.012739   
7/31/13  0.0565  0.0187  0.0057  0.0176 -0.007792  0.000853  0.068459   
8/30/13 -0.0271  0.0027 -0.0269  0.0002 -0.017577  0.002981 -0.041709   
9/30/13  0.0377  0.0288 -0.0122  0.0306 -0.016559  0.040340  0.052976   

             mtum  
date               
5/31/13  0.003086  
6/28/13 -0.008329  
7/31/13  0.059763  
8/30/13 -0.035704  
9/30/13  0.030237  
          mktrf     smb     hml     umd  chep  mom       vbr      mtum
date                                                                  
3/31/23  0.0251 -0.0551 -0.0885 -0.0249   NaN  NaN -0.060590  0.000519
4/30/23  0.0061 -0.0335 -0.0004  0.0163   NaN  NaN -0.013074  0.020450
5/31/23  0.0035  0.0161 -0.0772 -0.0063   NaN  NaN -0.037815 -0.

We first report the summary statistics of the factors and the four ETFs. This code snippet is the same as the one we have used in the volatility-timing code.

**Question 1:** Discuss the summary statistics. Based on the summary statistics *alone*, can we conclude that the ETFs achieve their stated goals?

In [7]:
# Compute summary statistics
summary = df.describe().T[['mean', 'std']]

# Annualize the mean
summary['mean'] = summary['mean'] * 12

# Annualize the standard deviation
summary['std'] = summary['std'] * np.sqrt(12)

# Compute the Sharpe ratio
summary['sr'] = summary['mean'] / summary['std']

# Print the mean, standard deviation, and Sharpe ratio
print(summary.round(3))

        mean    std     sr
mktrf  0.123  0.153  0.800
smb   -0.002  0.091 -0.027
hml   -0.022  0.127 -0.175
umd    0.010  0.134  0.077
chep  -0.060  0.119 -0.504
mom    0.006  0.137  0.041
vbr    0.103  0.190  0.541
mtum   0.117  0.153  0.766



**Question 2:** Next, we regress the excess returns of each of the ETFs on the excess return on the market, that is, the CAPM regression. Explain the difference in CAPM betas between the Quantshares ETFs (`chep`, `mom`) and those of either Vanguard (`vbr`) or iShares (`mtum`).

In [8]:
# ETF: CHEP
model_chep = smf.ols(formula='chep ~ mktrf', data=df)
results_chep = model_chep.fit()

# ETF: MOM
model_mom = smf.ols(formula='mom ~ mktrf', data=df)
results_mom = model_mom.fit()

# ETF: VBR
model_vbr = smf.ols(formula='vbr ~ mktrf', data=df)
results_vbr = model_vbr.fit()

# ETF: MTUM
model_mtum = smf.ols(formula='mtum ~ mktrf', data=df)
results_mtum = model_mtum.fit()

# Create the summary table
models = [results_chep, results_mom, results_vbr, results_mtum]
performance_table = summary_col(models,stars=True)
print(performance_table)


                  chep      mom        vbr       mtum  
-------------------------------------------------------
Intercept      -0.0084** 0.0053     -0.0029   0.0007   
               (0.0035)  (0.0038)   (0.0022)  (0.0019) 
mktrf          0.2975*** -0.4070*** 1.1255*** 0.8875***
               (0.0803)  (0.0889)   (0.0477)  (0.0413) 
R-squared      0.1337    0.1855     0.8218    0.7921   
R-squared Adj. 0.1240    0.1766     0.8203    0.7903   
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01


We now want to explore what happens if we control for size (`smb`), value (`hml`), and momentum (`umd`) in addition to the market factor (`mktrf`). 
- In Python, if we want to regress $y$ on $x_1$ and $x_2$, $y = a + b_1x_1 + b_2x_2 + e$, then we write `model = smf.ols(formula='y ~ x1 + x2', data=df)`. 


**Question 3a:** Complete the code below to regress the excess returns of each of the ETFs on the market factor, smb, hml, and the momentum factor. Report the regression table.

**Question 3b:** Based on the table, do you conclude that the ETFs are successful in achieving their stated goals? In answering the question, discuss both the estimates of the alphas, the betas, and the R-squared. Discuss the benefits of market- and factor-neutral investing.

In [9]:
model_chep = smf.ols(formula='chep ~ mktrf + smb + hml + umd', data=df).fit()

# For the ETF 'mom'
model_mom = smf.ols(formula='mom ~ mktrf + smb + hml + umd', data=df).fit()

# For the ETF 'vbr'
model_vbr = smf.ols(formula='vbr ~ mktrf + smb + hml + umd', data=df).fit()

# For the ETF 'mtum'
model_mtum = smf.ols(formula='mtum ~ mktrf + smb + hml + umd', data=df).fit()

models = [model_chep, model_mom, model_vbr, model_mtum]

# Create a performance table using summary_col
performance_table = summary_col(models, stars=True)
print(performance_table)


                  chep       mom       vbr       mtum  
-------------------------------------------------------
Intercept      -0.0020    -0.0005   -0.0010   -0.0013  
               (0.0024)   (0.0022)  (0.0009)  (0.0013) 
mktrf          0.1179*    -0.0911   1.0257*** 1.0374***
               (0.0613)   (0.0561)  (0.0215)  (0.0322) 
smb            -0.0651    0.0064    0.5059*** -0.0628  
               (0.0989)   (0.0845)  (0.0339)  (0.0508) 
hml            0.6016***  -0.1493*  0.4483*** -0.0531  
               (0.0942)   (0.0830)  (0.0243)  (0.0365) 
umd            -0.3058*** 0.7935*** -0.0139   0.3867***
               (0.0800)   (0.0706)  (0.0256)  (0.0385) 
R-squared      0.6232     0.7586    0.9721    0.9026   
R-squared Adj. 0.6057     0.7477    0.9711    0.8993   
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01


For the following questions, you can use the material in the case, lecture notes, et cetera

**Question 4:** What are some of the benefits of ETFs? And of active ETFs? How would you characterize the competitive landscape?

**Question 5:** Why would retail and institutional investors be interested in factor investing? How might each be expected to use QuantShares? What concerns might they have?

**Question 6:** How should Karunakaran stage the upcoming launch and future expansion of the QuantShares business in the current environment? How should he address the direct and indirect marketing opportunities? How might FFCM establish and maintain a direct competitive advantage factor-based ETFs?