# Resit Take-Home Assignment - Data Analysis and Quantitative trading 

**This is an individual assignment. It has been solved by:** 

|     Name       | Student number    | Email           |
| :------------: | :---------------: | :-------------: | 
| [name] |       [student number]    |    [email]      |

Please fill your credentials in the table above by double clicking on the text and replacing the current text in brackets by your own name, student number and email. 

By submitting this assignment you consent to the course policy on cheating and UvA policy on plagiarism.

# Part 1. Theoretical questions

Objective: In this section, you must explain a few key concepts from the course in your own words.

To Complete:
- Submit answers to questions below. 

**Step 1(a)**    
Explain in general terms how quantiative trading firms can use finance theory to identify convergence arbitrage opportunities. Also explain why theory does not offer guidance to firms on how long the arbitrage trade must be kept open.

<div class="alert alert-block alert-warning">
Click on this box and type your answer. It is not necessary to fill out any tables.
</div>

**Step 1(b)**    
The CDS-Bond basis examples and pricing formulas presented in the course focus on the case where the bond trades at par. Briefly explain in words why the formula does not work exactly when the bond trades above par, i.e., the price is greater than the face value.

Highlight the problem by filling out the below table, which is based on Slide 24 of Lecture 2. In the example, assume a 5-year bond has face value of 1,000, price of 1,200, and a recovery rate of 70%. 

<div class="alert alert-block alert-warning">
Answer by double clicking this window and filling out the table below. You may not need to use all the rows in the table.
    <br><br>
    
| Position <br>      | Initial <br> Payment     | Settlement, <br> No Default in Year 5    | Settlement, <br> Default in Year 5|
|:-------------------|:---------------:|:------------:|:----------------:|
|Position 1          |[answer]         |[answer]      |[answer]          |
|Position 2          |[answer]         |[answer]      |[answer]          |
|Position 3          |[answer]         |[answer]      |[answer]          |
|Position 4          |[answer]         |[answer]      |[answer]          |
|Total               |[answer]         |[answer]      |[answer]          |
    
</div>

**Step 1(c)**    
Define the SMB factor, i.e., how is it measured and what does it represent. Explain why asset pricing researchers decided to add the factor to the CAPM model, and what is their argument for why the factor helps explain stock returns. 

<div class="alert alert-block alert-warning">
Click on this box and type your answer. It is not necessary to fill out any tables.
</div>

# Part 2. Determinants of the CDS-Bond Basis

**Objective:** In this section, you will examine how deviations from the CDS-Bond basis relate to various firm characteristics. The goal is to understand what type of firms experience deviations in the CDS-Bond basis---this can be useful to determine whether those deviations are due to market inefficiency or could be explained by other factors such as lack of liquidity. 

In the steps below, you need to combine data on the CDS-Bond basis with firm fundamentals from the Compustat database. You then need to estimate linear regressions relating CDS-Bond basis deviations to the firm fundamentals.

**Getting started**: The output file from Step #4 of Homework 2 is necessary to complete this part of the assignment. Throughout these instructions, I will refer to this file as “HW2_output.txt”. You also need the file "firm_fundamentals.txt" with data from Compustat. 

**To Complete:**
- Fill in all code boxes 
- Fill in all tables and answer boxes below.

**Note on writing code:** To solve the individual steps of this part of the assignment, you can use either the "input-output" method of reading in files line-by-line or the Pandas module (or both). Some steps may be significantly easier to complete using Pandas. 

**Step 2(a)**    
Read in the file HW2_output.txt and briefly describe the data that it contains. Explain at what level the observations are defined, and the time period that the sample covers. 

<div class="alert alert-block alert-warning">
Click on this box and type your answer. It is not necessary to fill out any tables.
</div>

In [1]:
import pandas as pd
df = pd.read_csv("HW2_output.txt", error_bad_lines=False,sep="\t")
df.head()

Unnamed: 0,company,gvkey,date,market_cds_spread,bond_yield,swap_fixed_rate,implied_CDS_spread,basis_deviation
0,3M COMPANY,7435,20040102,10.0,,3.79,,
1,3M COMPANY,7435,20040105,10.8,,3.79,,
2,3M COMPANY,7435,20040106,10.8,,3.67,,
3,3M COMPANY,7435,20040107,10.8,,3.62,,
4,3M COMPANY,7435,20040108,10.5,,3.62,,


In [2]:
df.shape

(1374147, 8)

In [3]:
df.describe()

Unnamed: 0,gvkey,date,market_cds_spread,bond_yield,swap_fixed_rate,implied_CDS_spread,basis_deviation
count,1374147.0,1374147.0,1096650.0,76914.0,1299383.0,63687.0,63687.0
mean,31551.64,20102770.0,277.1288,3.664372,2.694218,169.641668,-6.43403
std,49887.69,36993.96,1274.475,1.96105,1.47194,181.899404,203.391909
min,1045.0,20040100.0,1.0,0.024789,0.73,-500.7,-542.2
25%,5073.0,20070730.0,50.0,2.131615,1.49,41.99,-50.855
50%,9459.0,20101020.0,102.68,2.819478,2.18,93.7,-7.22
75%,25340.0,20131130.0,247.11,5.130947,4.08,248.955,21.685
max,260329.0,20161230.0,61117.11,12.27193,5.76,1029.19,5047.93


**Step 2(b)**     
In this step you will begin to prepare a dataset for estimating linear regressions. Read through HW2_output.txt and complete the following:
- Retain only observations with a non-missing value for the variable *basis_deviation*. (Missing values are expressed as "NA" in the file.)
- Linear regression estimates are very sensitive to the presence of outliers in the data. Therefore omit some observations that are potential outliers: Those with *market_cds_spread* of 2,000 or higher, *bond_yield* of 20 or higher, and *basis_deviation* of 2,000 or higher. **Note: Write a single logical expression that executes this.**
- Create a new variable called *negative_deviation* that equals 1 for observations with a negative value of *basis_deviation*, and 0 for observations with a non-negative value. 
- Create a new variable called *pre_crisis_period* that equals 1 for observations from January 1, 2004 though August 31, 2008, and 0 for all observations from April 1, 2009 though December 31, 2015. Omit other observations from the height of the financial crisis. 

After completing the above, save only the variables *gvkey*, *date*, *basis_deviation*, *pre_crisis_period*, and *negative_deviation* in a separate output file or Pandas data frame.

In [4]:
## ENTER YOUR CODE IN THIS BOX
df.dropna(subset = ["basis_deviation"], inplace=True)
df

Unnamed: 0,company,gvkey,date,market_cds_spread,bond_yield,swap_fixed_rate,implied_CDS_spread,basis_deviation
2016,3M COMPANY,7435,20110926,52.259998,1.551084,1.17,38.11,14.15
2017,3M COMPANY,7435,20110927,52.270000,1.504463,1.25,25.45,26.82
2018,3M COMPANY,7435,20110928,51.230000,1.451097,1.27,18.11,33.12
2019,3M COMPANY,7435,20110929,50.240002,1.437551,1.27,16.76,33.48
2020,3M COMPANY,7435,20110930,51.259998,1.392073,1.25,14.21,37.05
...,...,...,...,...,...,...,...,...
1374077,YUM! BRANDS,65417,20160926,156.089996,3.840031,1.16,268.00,-111.91
1374078,YUM! BRANDS,65417,20160927,156.580002,3.544972,1.15,239.50,-82.92
1374079,YUM! BRANDS,65417,20160928,153.710007,3.593457,1.14,245.35,-91.64
1374098,YUM! BRANDS,65417,20161025,143.690002,3.749937,1.29,245.99,-102.30


In [5]:
df = df.drop(df[df.market_cds_spread >= 2000].index)
df = df.drop(df[df.bond_yield >= 20].index)
df

Unnamed: 0,company,gvkey,date,market_cds_spread,bond_yield,swap_fixed_rate,implied_CDS_spread,basis_deviation
2016,3M COMPANY,7435,20110926,52.259998,1.551084,1.17,38.11,14.15
2017,3M COMPANY,7435,20110927,52.270000,1.504463,1.25,25.45,26.82
2018,3M COMPANY,7435,20110928,51.230000,1.451097,1.27,18.11,33.12
2019,3M COMPANY,7435,20110929,50.240002,1.437551,1.27,16.76,33.48
2020,3M COMPANY,7435,20110930,51.259998,1.392073,1.25,14.21,37.05
...,...,...,...,...,...,...,...,...
1374077,YUM! BRANDS,65417,20160926,156.089996,3.840031,1.16,268.00,-111.91
1374078,YUM! BRANDS,65417,20160927,156.580002,3.544972,1.15,239.50,-82.92
1374079,YUM! BRANDS,65417,20160928,153.710007,3.593457,1.14,245.35,-91.64
1374098,YUM! BRANDS,65417,20161025,143.690002,3.749937,1.29,245.99,-102.30


In [6]:
df["negative_deviation"] = ""
df['negative_deviation'][df['basis_deviation'] < 0] = 1
df['negative_deviation'][df['basis_deviation'] >= 0] = 0
df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['negative_deviation'][df['basis_deviation'] < 0] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['negative_deviation'][df['basis_deviation'] >= 0] = 0


Unnamed: 0,company,gvkey,date,market_cds_spread,bond_yield,swap_fixed_rate,implied_CDS_spread,basis_deviation,negative_deviation
2016,3M COMPANY,7435,20110926,52.259998,1.551084,1.17,38.11,14.15,0
2017,3M COMPANY,7435,20110927,52.270000,1.504463,1.25,25.45,26.82,0
2018,3M COMPANY,7435,20110928,51.230000,1.451097,1.27,18.11,33.12,0
2019,3M COMPANY,7435,20110929,50.240002,1.437551,1.27,16.76,33.48,0
2020,3M COMPANY,7435,20110930,51.259998,1.392073,1.25,14.21,37.05,0
...,...,...,...,...,...,...,...,...,...
1374077,YUM! BRANDS,65417,20160926,156.089996,3.840031,1.16,268.00,-111.91,1
1374078,YUM! BRANDS,65417,20160927,156.580002,3.544972,1.15,239.50,-82.92,1
1374079,YUM! BRANDS,65417,20160928,153.710007,3.593457,1.14,245.35,-91.64,1
1374098,YUM! BRANDS,65417,20161025,143.690002,3.749937,1.29,245.99,-102.30,1


In [7]:
df["pre_crisis_period"] = ""
df['pre_crisis_period'][(df['date'] <= 20080831) & (df['date']>= 20040101)] = 1
df['pre_crisis_period'][(df['date'] <= 20151231) & (df['date'] >= 20090104)] = 0
df = df.drop(df[df.date > 20151231].index)
df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['pre_crisis_period'][(df['date'] <= 20080831) & (df['date']>= 20040101)] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['pre_crisis_period'][(df['date'] <= 20151231) & (df['date'] >= 20090104)] = 0


Unnamed: 0,company,gvkey,date,market_cds_spread,bond_yield,swap_fixed_rate,implied_CDS_spread,basis_deviation,negative_deviation,pre_crisis_period
2016,3M COMPANY,7435,20110926,52.259998,1.551084,1.17,38.11,14.15,0,0
2017,3M COMPANY,7435,20110927,52.270000,1.504463,1.25,25.45,26.82,0,0
2018,3M COMPANY,7435,20110928,51.230000,1.451097,1.27,18.11,33.12,0,0
2019,3M COMPANY,7435,20110929,50.240002,1.437551,1.27,16.76,33.48,0,0
2020,3M COMPANY,7435,20110930,51.259998,1.392073,1.25,14.21,37.05,0,0
...,...,...,...,...,...,...,...,...,...,...
1373876,YUM! BRANDS,65417,20151218,206.979996,4.103321,1.67,243.33,-36.35,1,0
1373877,YUM! BRANDS,65417,20151221,207.910004,4.001055,1.67,233.11,-25.20,1,0
1373878,YUM! BRANDS,65417,20151222,207.899994,3.940443,1.69,225.04,-17.14,1,0
1373879,YUM! BRANDS,65417,20151223,206.929993,4.005939,1.71,229.59,-22.66,1,0


In [8]:
df2 = df[['gvkey', 'date', 'basis_deviation', 'pre_crisis_period', 'negative_deviation']].copy()
df2

Unnamed: 0,gvkey,date,basis_deviation,pre_crisis_period,negative_deviation
2016,7435,20110926,14.15,0,0
2017,7435,20110927,26.82,0,0
2018,7435,20110928,33.12,0,0
2019,7435,20110929,33.48,0,0
2020,7435,20110930,37.05,0,0
...,...,...,...,...,...
1373876,65417,20151218,-36.35,0,1
1373877,65417,20151221,-25.20,0,1
1373878,65417,20151222,-17.14,0,1
1373879,65417,20151223,-22.66,0,1


**Step 2(c)**    
Read in the file firm_fundamentals.txt and briefly describe the data that it contains. Explain at what level the observations are defined and the time period that the sample covers. Also briefly list three variables which represent firm characteristics that you think may affect the CDS-Bond basis.

<div class="alert alert-block alert-warning">
Click on this box and type your answer. It is not necessary to fill out any tables.
</div>

In [39]:
import pandas as pd
dffirm = pd.read_csv("firm_fundamentals.txt", error_bad_lines=False,sep="\t")
dffirm.head(10)

Unnamed: 0,gvkey,year,fiscal_year_end_date,company_name,sp1500_firm,country,total_assets,book_value_equity,market_value_equity,cash_assets,total_debt,debt_due_next_year,profitability,annual_stock_return,stock_volatility
0,1010,2000,20001231,ACF INDUSTRIES HOLDING CORP,0,USA,3794.5,985.2,,,2042.9,0.092819,0.180036,,
1,1010,2001,20011231,ACF INDUSTRIES HOLDING CORP,0,USA,3723.1,954.2,,,1905.5,0.062851,0.397544,,
2,1010,2002,20021231,ACF INDUSTRIES HOLDING CORP,0,USA,3702.5,892.6,,,1968.7,0.106415,0.254389,,
3,1010,2003,20031231,ACF INDUSTRIES HOLDING CORP,0,USA,4832.1,1655.0,,0.134641,1860.1,0.065148,0.150699,,
4,1019,2000,20001231,AFA PROTECTIVE SYSTEMS INC,0,USA,28.638,13.184,30.378,0.063482,1.786,0.045778,0.041455,,
5,1019,2001,20011231,AFA PROTECTIVE SYSTEMS INC,0,USA,30.836,12.026,39.721,0.057173,3.129,0.100337,0.028635,0.382514,
6,1034,2000,20001231,ALPHARMA INC -CL A,1,USA,1610.435,847.887,1764.389,0.045287,525.121,0.025677,0.061621,,0.1339
7,1034,2001,20011231,ALPHARMA INC -CL A,1,USA,2390.008,891.616,1172.211,0.006232,1060.592,0.023443,-0.038887,-0.397151,0.144
8,1034,2002,20021231,ALPHARMA INC -CL A,1,USA,2296.924,1001.843,612.71,0.010393,895.858,0.033603,-0.080975,-0.549716,0.1724
9,1034,2003,20031231,ALPHARMA INC -CL A,1,USA,2329.268,1131.989,1045.883,0.025168,817.156,0.025894,0.010663,0.687658,0.1723


In [40]:
dffirm.describe()

Unnamed: 0,gvkey,year,fiscal_year_end_date,sp1500_firm,total_assets,book_value_equity,market_value_equity,cash_assets,total_debt,debt_due_next_year,profitability,annual_stock_return,stock_volatility
count,125847.0,125847.0,125847.0,125847.0,104671.0,104364.0,108853.0,102534.0,104304.0,99003.0,98158.0,94755.0,44081.0
mean,84950.69304,2009.130325,20092530.0,0.126988,12389.11,1877.717501,3739.485,0.148393,3530.921,1.271861,-10.523709,2.346061,0.162944
std,71790.977679,5.766511,57665.11,0.33296,100527.2,9471.282221,18446.86,0.211071,42627.21,46.974933,277.867929,255.978049,0.107595
min,1010.0,2000.0,20001230.0,0.0,0.0,-139965.0,0.0,-0.825622,-110.0,-0.068244,-32545.0,-1.0,0.016
25%,19440.0,2004.0,20041230.0,0.0,56.359,12.94325,35.65,0.017294,1.304,0.0,-0.113518,-0.249708,0.0893
50%,63081.0,2009.0,20091230.0,0.0,445.453,115.3515,209.9847,0.057857,58.3895,0.025824,0.036737,0.011213,0.1352
75%,158587.0,2014.0,20141230.0,0.0,2494.132,689.302,1219.183,0.184653,668.243,0.104159,0.119577,0.268387,0.2064
max,330942.0,2019.0,20191230.0,1.0,3771200.0,348703.0,1819782.0,1.0,3391920.0,7516.0,5617.383,71014.12,2.0294


In [41]:
dffirm.columns

Index(['gvkey', 'year', 'fiscal_year_end_date', 'company_name', 'sp1500_firm',
       'country', 'total_assets', 'book_value_equity', 'market_value_equity',
       'cash_assets', 'total_debt', 'debt_due_next_year', 'profitability',
       'annual_stock_return', 'stock_volatility'],
      dtype='object')

**Step 2(d)**     
Read through firm_fundamentals.txt and retain only the observations that are relevant for the CDS-Bond basis. Specifically:
- Retain only firms that are in the S&P 1500 index.
- Retain only firms with non-missing values of *total_assets*, *total_debt*, and *book_value_equity*. Missing values are expressed as empty strings instead of "NA". **Note: Write a single logical expression that executes this.**
- Retain only firms whose country of operation is the USA and only fiscal years that end after January 1, 2004. **Note: Write a single logical expression that executes this.**

In [42]:
## ENTER YOUR CODE IN THIS BOX
import numpy as np
dffirm = dffirm.drop(dffirm[dffirm.sp1500_firm == 0].index)
dffirm['total_assets'].replace('', np.nan, inplace=True)
dffirm['total_debt'].replace('', np.nan, inplace=True)
dffirm['book_value_equity'].replace('', np.nan, inplace=True)
dffirm.dropna(subset=['total_assets'], inplace=True)
dffirm.dropna(subset=['total_debt'], inplace=True)
dffirm.dropna(subset=['book_value_equity'], inplace=True)
dffirm = dffirm.drop(dffirm[dffirm.country != 'USA'].index)
dffirm = dffirm.drop(dffirm[dffirm.fiscal_year_end_date >= 20040101].index)
dffirm.head(10)

Unnamed: 0,gvkey,year,fiscal_year_end_date,company_name,sp1500_firm,country,total_assets,book_value_equity,market_value_equity,cash_assets,total_debt,debt_due_next_year,profitability,annual_stock_return,stock_volatility
6,1034,2000,20001231,ALPHARMA INC -CL A,1,USA,1610.435,847.887,1764.389,0.045287,525.121,0.025677,0.061621,,0.1339
7,1034,2001,20011231,ALPHARMA INC -CL A,1,USA,2390.008,891.616,1172.211,0.006232,1060.592,0.023443,-0.038887,-0.397151,0.144
8,1034,2002,20021231,ALPHARMA INC -CL A,1,USA,2296.924,1001.843,612.71,0.010393,895.858,0.033603,-0.080975,-0.549716,0.1724
9,1034,2003,20031231,ALPHARMA INC -CL A,1,USA,2329.268,1131.989,1045.883,0.025168,817.156,0.025894,0.010663,0.687658,0.1723
58,1075,2000,20001231,PINNACLE WEST CAPITAL CORP,1,USA,7149.151,2382.714,4039.791,0.001449,2501.327,0.141235,0.081929,,0.078
59,1075,2001,20011231,PINNACLE WEST CAPITAL CORP,1,USA,7981.748,2499.323,3549.926,0.003586,3204.98,0.082443,0.068587,-0.12126,0.0821
60,1075,2002,20021231,PINNACLE WEST CAPITAL CORP,1,USA,8425.806,2686.153,3110.883,0.009222,3264.901,0.078833,0.056652,-0.185424,0.0909
61,1075,2003,20031231,PINNACLE WEST CAPITAL CORP,1,USA,9536.378,2829.779,3653.346,0.02399,3409.286,0.09826,0.085377,0.173951,0.0901
78,1076,2002,20021231,AARON'S INC,1,USA,483.648,280.545,475.1242,0.000199,73.265,0.001146,0.042829,0.342331,0.1067
79,1076,2003,20031231,AARON'S INC,1,USA,555.292,320.186,659.7607,0.000171,79.57,0.051368,0.047504,0.380028,0.1018


**Step 2(e)**     
In this step you will modify the firm fundamentals dataset to create variables that are useful for linear regression analysis.
- Create a variable *ME_BE* defined as the market value of equity defined by the book value of equity. Omit outliers with a *ME_BE* ratio above 20.
- Create a variable *leverage_ratio* defined as total debt divided by total assets. Omit outliers with a leverage ratio above 1.
- The distribution of firm size is highly skewed. This means that the market contains a few very large firms, and many medium or small firms. Skewed distributions can affect regression estimates. Researchers account for this by transforming skewed variables using natural logarithms. In Python, this can be done by importing the Numpy module and using the code "ln_X = np.log(X)", where X represents the variable that you want to transform. Use this procedure to create the variables *ln_total_assets*, *ln_total_debt*, and *ln_debt_due_next_year*.
- Omit observations with *profitability* values above 1 or below -1, or  *annual_stock_return* values above 2. **Note: Write a single logical expression that executes this.**

After completing the above, save the modified dataset in a separate output file or Pandas data frame. You can choose to combine steps 2(d) and 2(e) if you like.


In [47]:
## ENTER YOUR CODE IN THIS BOX
dffirm2 = pd.read_csv("firm_fundamentals.txt", error_bad_lines=False,sep="\t")
dffirm2['ME_BE'] = dffirm2.apply(lambda row: row.market_value_equity + row.book_value_equity, axis=1)
dffirm2 = dffirm2.drop(dffirm2[dffirm2.ME_BE > 20].index)
dffirm2['leverage_ratio'] = dffirm2.apply(lambda row: row.total_debt + row.total_assets, axis=1)
dffirm2 = dffirm2.drop(dffirm2[dffirm2.leverage_ratio > 1].index)
dffirm2['ln_total_assets'] = ""
dffirm2['ln_total_assets'] = np.log(dffirm2['total_assets'])
dffirm2['ln_total_debt'] = ""
dffirm2['ln_total_debt'] = np.log(dffirm2['total_debt'])
dffirm2['ln_debt_due_next_year'] = ""
dffirm2['ln_debt_due_next_year'] = np.log(dffirm2['debt_due_next_year'])
dffirm2 = dffirm2.drop(dffirm2[(dffirm2.profitability > 1) & (dffirm2.profitability < -1) & (dffirm2.annual_stock_return > 2)].index)
dffirm2

  result = getattr(ufunc, method)(*inputs, **kwargs)


Unnamed: 0,gvkey,year,fiscal_year_end_date,company_name,sp1500_firm,country,total_assets,book_value_equity,market_value_equity,cash_assets,total_debt,debt_due_next_year,profitability,annual_stock_return,stock_volatility,ME_BE,leverage_ratio,ln_total_assets,ln_total_debt,ln_debt_due_next_year
120,1084,2008,20081231,WORLDS INC,0,USA,0.174,-3.538,10.477600,0.0,0.773,4.442529,-8.537635,0.000000,,6.939600,0.947,-1.748700,-0.257476,1.491224
121,1084,2009,20091231,WORLDS INC,0,USA,0.004,-3.864,4.829761,0.0,0.953,238.250000,-7.198020,-0.550000,,0.965761,0.957,-5.521461,-0.048140,5.473321
198,1119,2000,20001231,ADAMS DIVERSIFIED EQUITY FD,0,USA,,,1662.906000,,,,,,,,,,,
199,1119,2001,20011231,ADAMS DIVERSIFIED EQUITY FD,0,USA,,,1160.665000,,,,,-0.322857,,,,,,
200,1119,2002,20021231,ADAMS DIVERSIFIED EQUITY FD,0,USA,,,875.069200,,,,,-0.256681,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
125833,327451,2015,20151231,GRINDROD SHIPPING,0,SGP,,,,,,,,,,,,,,
125840,328795,2013,20131231,ARCOSA INC,0,USA,,,,,,,,,,,,,,
125841,328795,2014,20141231,ARCOSA INC,0,USA,,,,,,,,,,,,,,
125842,328795,2015,20151231,ARCOSA INC,0,USA,,,,,,,,,,,,,,


**Step 2(f)**    
Before estimating any regressions, you must combine the firm fundamental dataset (created in Step 2(e)) with the CDS-Bond basis dataset (created in Step 2(b)). For each firm *F*, you should merge all CDS-Bond basis deviations from year *T* with the firm's fundamentals from year *T-1*. Describe in words each step of this merging process. Make sure to explain what variables uniquely identify the observations in each dataset, and whether each observation in the firm fundamental dataset is matched to a single or multiple observations in the CDS-Bond basis dataset. 

<div class="alert alert-block alert-warning">
Click on this box and type your answer. It is not necessary to fill out any tables.
</div>



**Step 2(g)**    
Now executive the merge described in Step 2(f). Make sure to look over the merged dataset to ensure that observations are matched correctly. 

In [60]:
## ENTER YOUR CODE IN THIS BOX
df10=pd.concat([df2,dffirm2])
df10

Unnamed: 0,gvkey,date,basis_deviation,pre_crisis_period,negative_deviation,year,fiscal_year_end_date,company_name,sp1500_firm,country,...,total_debt,debt_due_next_year,profitability,annual_stock_return,stock_volatility,ME_BE,leverage_ratio,ln_total_assets,ln_total_debt,ln_debt_due_next_year
2016,7435,20110926.0,14.15,0,0,,,,,,...,,,,,,,,,,
2017,7435,20110927.0,26.82,0,0,,,,,,...,,,,,,,,,,
2018,7435,20110928.0,33.12,0,0,,,,,,...,,,,,,,,,,
2019,7435,20110929.0,33.48,0,0,,,,,,...,,,,,,,,,,
2020,7435,20110930.0,37.05,0,0,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
125833,327451,,,,,2015.0,20151231.0,GRINDROD SHIPPING,0.0,SGP,...,,,,,,,,,,
125840,328795,,,,,2013.0,20131231.0,ARCOSA INC,0.0,USA,...,,,,,,,,,,
125841,328795,,,,,2014.0,20141231.0,ARCOSA INC,0.0,USA,...,,,,,,,,,,
125842,328795,,,,,2015.0,20151231.0,ARCOSA INC,0.0,USA,...,,,,,,,,,,


**Step 2(h)**    
Calculate summary statistics (number of observations, mean, median, and standard deviation) for the following variables, separately for the pre-crisis and post-crisis periods: *basis_deviation*, *ln_total_assets*, *ME_BE*, *leverage_ratio*, *profitability*, *annual_stock_return*, and *stock_volatility*. Report the statistics in the table below.

In [66]:
## ENTER YOUR CODE IN THIS BOX
df2.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
gvkey,57670.0,22156.77,39000.991809,1045.0,5046.0,8530.0,14477.0,176760.0
date,57670.0,20117580.0,26057.957569,20060623.0,20090900.0,20120703.0,20140902.0,20151231.0
basis_deviation,57670.0,-9.89159,108.908968,-542.2,-45.5675,-3.83,23.26,1591.9


In [68]:
dffirm2.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
gvkey,24149.0,114266.9,74470.828938,1084.0,27041.0,160597.0,178918.0,330942.0
year,24149.0,2011.989,5.427818,2000.0,2008.0,2013.0,2017.0,2019.0
fiscal_year_end_date,24149.0,20121130.0,54278.182739,20001230.0,20081230.0,20131230.0,20171230.0,20191230.0
sp1500_firm,24149.0,0.001780612,0.042161,0.0,0.0,0.0,0.0,1.0
total_assets,2973.0,72.17349,1217.987453,0.0,0.01,0.117,0.383,42557.05
book_value_equity,2964.0,4.464435,421.824366,-8836.697,-0.827,-0.1835,0.013,15703.0
market_value_equity,20035.0,604.8795,3282.454434,0.0,9.86864,61.77613,272.6449,137883.2
cash_assets,2559.0,0.3596731,0.374192,-0.1686747,0.0245781,0.1965318,0.6871538,1.0
total_debt,2906.0,0.1385375,0.203973,0.0,0.0,0.032,0.208,0.968
debt_due_next_year,2524.0,8.494671,49.508951,0.0,0.0,0.0835828,1.391787,966.9999


In [67]:
df10.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
gvkey,81819.0,49343.2,66888.128157,1045.0,6502.0,12142.0,62374.0,330942.0
date,57670.0,20117580.0,26057.957569,20060620.0,20090900.0,20120700.0,20140900.0,20151230.0
basis_deviation,57670.0,-9.89159,108.908968,-542.2,-45.5675,-3.83,23.26,1591.9
year,24149.0,2011.989,5.427818,2000.0,2008.0,2013.0,2017.0,2019.0
fiscal_year_end_date,24149.0,20121130.0,54278.182739,20001230.0,20081230.0,20131230.0,20171230.0,20191230.0
sp1500_firm,24149.0,0.001780612,0.042161,0.0,0.0,0.0,0.0,1.0
total_assets,2973.0,72.17349,1217.987453,0.0,0.01,0.117,0.383,42557.05
book_value_equity,2964.0,4.464435,421.824366,-8836.697,-0.827,-0.1835,0.013,15703.0
market_value_equity,20035.0,604.8795,3282.454434,0.0,9.86864,61.77613,272.6449,137883.2
cash_assets,2559.0,0.3596731,0.374192,-0.1686747,0.0245781,0.1965318,0.6871538,1.0


<div class="alert alert-block alert-warning">
Answer by double clicking this window and filling out the table below.
    <br><br>
    
**Pre-Crisis Summary Statistics**    
    
|                        | No. of <br> Observations     | Mean   | Median | Standard <br> Deviation| 
|:-----------------------|:----------------------------:|:------:|:------:|:----------------------:|
|*basis_deviation*       |[answer]                      |[answer]|[answer]|[answer]                |
|*negative_basis*        |[answer]                      |[answer]|[answer]|[answer]                |
|*ln_total_assets*       |[answer]                      |[answer]|[answer]|[answer]                |
|*ME_BE*                 |[answer]                      |[answer]|[answer]|[answer]                |
|*leverage_ratio*        |[answer]                      |[answer]|[answer]|[answer]                |
|*profitability*         |[answer]                      |[answer]|[answer]|[answer]                |
|*annual_stock_return*   |[answer]                      |[answer]|[answer]|[answer]                |
|*stock_volatility*      |[answer]                      |[answer]|[answer]|[answer]                |

**Post-Crisis Summary Statistics**
    
|                        | No. of <br> Observations     | Mean   | Median | Standard <br> Deviation| 
|:-----------------------|:----------------------------:|:------:|:------:|:----------------------:|
|*basis_deviation*       |[answer]                      |[answer]|[answer]|[answer]                |
|*negative_basis*        |[answer]                      |[answer]|[answer]|[answer]                |
|*ln_total_assets*       |[answer]                      |[answer]|[answer]|[answer]                |
|*ME_BE*                 |[answer]                      |[answer]|[answer]|[answer]                |
|*leverage_ratio*        |[answer]                      |[answer]|[answer]|[answer]                |
|*profitability*         |[answer]                      |[answer]|[answer]|[answer]                |
|*annual_stock_return*   |[answer]                      |[answer]|[answer]|[answer]                |
|*stock_volatility*      |[answer]                      |[answer]|[answer]|[answer]                |
    
</div>

**Step 2(i)**    
Estimate four regression models testing determinants of the CDS-Bond basis deviations. The dependent variable in each model should be *basis_deviation*. The explanatory variables should be:
- *ln_total_assets* and *leverage_ratio* in Model 1
- *ln_total_assets*, *leverage_ratio*, *annual_stock_return*, and *stock_volatility* in Model 2
- *ln_total_assets*, *leverage_ratio*, *ME_BE*, and *profitability* in Model 3 
- *ln_total_assets* and three other variables of your choice in Model 4 (at least one of the additional variables should not be in models 1 through 3)

Estimate each of the regression models separately for the pre- and post-crisis period (so eight regressions total). Report the regression coefficients and p-values in the table below. Also briefly interpret the results in one paragraph.

**Note:** If you have limited background with regressions, you should read the document "Introduction to Regression" listed under the Canvas module "Reference Materials".

<div class="alert alert-block alert-warning">
Answer by double clicking this window and filling out the tables below.
    <br><br>
    
**Table 1. Regression estimates from pre-crisis period**   
    
|                       | Model 1 <br> Coefficient | Model 1 <br> P-value | Model 2 <br> Coefficient | Model 2 <br> P-value | Model 3 <br> Coefficient | Model 3 <br> P-value | Model 4 <br> Coefficient | Model 4 <br> P-value |
|:----------------------|:------------------------:|:--------------------:|:------------------------:|:----------------------:|:------------------------:|:--------------------:|:------------------------:|:--------------------:| 
|*ln_total_assets*      |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | 
|*leverage_ratio*       |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | 
|*annual_stock_return*  |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | 
|*stock_volatility*     |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              |
|*ME_BE*                |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | |*profitability*        |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]                
|Additional Variable 1  |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | 
|Additional Variable 2  |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | 
|Additional Variable 3  |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | 
<br> 
    
**Table 2. Regression estimates from post-crisis period**   
    
    
|                       | Model 1 <br> Coefficient | Model 1 <br> P-value | Model 2 <br> Coefficient | Model 2 <br> P-value | Model 3 <br> Coefficient | Model 3 <br> P-value | Model 4 <br> Coefficient | Model 4 <br> P-value |
|:----------------------|:------------------------:|:--------------------:|:------------------------:|:----------------------:|:------------------------:|:--------------------:|:------------------------:|:--------------------:| 
|*ln_total_assets*      |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | 
|*leverage_ratio*       |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | 
|*annual_stock_return*  |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | 
|*stock_volatility*     |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              |
|*ME_BE*                |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | |*profitability*        |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]                
|Additional Variable 1  |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | 
|Additional Variable 2  |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | 
|Additional Variable 3  |[answer]                  |[answer]              |[answer]                  |[answer]                 |[answer]                  |[answer]              |[answer]                  |[answer]              | 
    
</div>

**Challenge Exercises**    
The below exercises are intended for students who wish to earn a grade above 8,0 on the resit assignment. They should be possible to complete with the computing knowledge and syntax taught in the course. Students who do not attempt the challenge exercises can receive a grade of up to 8,0 on their resit assignment. Students who make a serious attempt at the challenge exercises will receive a bonus of up to 2,0 points on the resit assignment. The bonus is awarded even to students who do not correctly complete all of the main steps of the resit assignment. However, no bonus is awarded to students who do not attempt at least half of the main steps.

For this exercise, you need the file "crsp_data_hw5.txt" with data from CRSP, that was also used in Homework 4 and the Final Assignment.  

The exercises are:
- Read in the file crsp_data_hw5.txt. For each firm *F* and year *T*, calculate the average of *trading_volume* (measured as the number of shares traded in each month, in millions on shares). Then for each firm *F*, match the average trading volume from year *T* with the value of shares_outstanding from year *T-1* (the latter variable is in firm_fundamentals.txt). Create the variable *scaled_trading_volume* which equals the ratio of average trading volume/shares outstanding.
- Merge *scaled_trading_volume* into the data frame created in Step 2(g). Then estimate a new regression Model 5 with *ln_total_assets*, *leverage_ratio*, *annual_stock_return*, *stock_volatility*, and *scaled_trading_volume*, separately before and after the financial crisis. Explain why you may expect *scaled_trading_volume* to be related to deviations to the CDS-Bond basis. Also interpret the coefficients on *scaled_trading_volume*.


In [16]:
## ENTER YOUR CODE FOR CHALLENGE EXERCISE 1 IN THIS BOX

In [17]:
## ENTER YOUR CODE FOR CHALLENGE EXERCISE 2 IN THIS BOX

<div class="alert alert-block alert-warning">
Answer by double clicking this window and filling out the tables below.
    <br><br>
    
**Table 1. Regression estimates from additional regression model**   
    
|                       | Model 5 Coefficient, <br> Pre-crisis Period |  Model 5 P-value, <br> Pre-crisis Period | Model 5 Coefficient, <br> Post-crisis Period |  Model 5 P-value, <br> Post-crisis Period |
|:----------------------|:-------------------------------------------:|:----------------------------------------:|:-------------------------------------------:|:-----------------------------------------:|
|*ln_total_assets*      |[answer]                                     |[answer]                                  | [answer]                                     |[answer]                                   |
|*annual_stock_return*  |[answer]                                     |[answer]                                  | [answer]                                     |[answer]                                   |
|*stock_volatility*     |[answer]                                     |[answer]                                  | [answer]                                     |[answer]                                   |
|*scaled_trading_volume*|[answer]                                     |[answer]                                  | [answer]                                     |[answer]                                   |    
     
</div>

# Part 3. Creating and Testing a CDS-Bond Basis Factor 

**Objective:** Some quantitative trading firms seek to make a profit by identifying firms with mispriced stocks. There are many different variables that could be related to stock mispricing, so quant firms may construct a wide variety of factors when looking for profitable trading strategies. In this section, you will attempt to replicate this procedure by constructing your own risk factor based on deviations from the CDS-Bond basis.

In the steps below, you need to combine data on the CDS-Bond basis with stock prices from the CRSP database. You then need to construct portfolios based on CDS-Bond basis deviations and calculate the stock returns for each portfolios.


**Getting Started**: The output file from Step #4 of Homework 2 is necessary to complete this part of the assignment. Throughout these instructions, I will refer to this file as “HW2_output.txt”. You also need the file "crsp_data_hw5.txt" with data from CRSP, that was also used in Homework 4 and the Final Assignment.  

**To Complete:**
- Fill in all code boxes
- Fill in all tables and answer boxes below.
- Submit the output file created in step 3(d). 


**Note on writing code:** To streamline the grading process, please use Pandas throughout this part of the assignment. 

**Step 3(a)**     
Start with the file HW2_output.txt that you used in Step 2(a). Read in the file using Pandas. Keep only observations from January 1, 2007 or later, and omit any observations with a missing value for *basis_deviation*. For each firm *F* and year *T* in the file, calculate the average value of *basis_deviation* and store it in a column called *avg_basis_deviation*. Create a new data frame with just the variables *gvkey*, *year*, *avg_basis_deviation*. Make sure observations are unique at the firm and year level.

In [51]:
# ENTER YOUR CODE IN THIS BOX
import pandas as pd
df3 = pd.read_csv("HW2_output.txt", error_bad_lines=False,sep="\t")
df3 = df3.drop(df3[df3.date < 20070101].index)
df3.dropna(subset = ["basis_deviation"], inplace=True)
df3['avg_basis_deviation']=""
df3['avg_basis_deviation']=df3['basis_deviation']
df4 = df3[['gvkey', 'date', 'avg_basis_deviation']].copy()
df4

Unnamed: 0,gvkey,date,avg_basis_deviation
2016,7435,20110926,14.15
2017,7435,20110927,26.82
2018,7435,20110928,33.12
2019,7435,20110929,33.48
2020,7435,20110930,37.05
...,...,...,...
1374077,65417,20160926,-111.91
1374078,65417,20160927,-82.92
1374079,65417,20160928,-91.64
1374098,65417,20161025,-102.30


**Step 3(b)**    
Using the data frame from Step 3(a), in each year form 5 portfolios based on *avg_basis_deviation*. Specifically, in each year calculate the quintiles of *avg_basis_deviation* across all individual firms in the data frame in that year. Then in each year, sort firms into portfolios numbered 1 (lowest value of *avg_basis_deviation*) through 5 (highest value of *avg_basis_deviation*). Your data frame should now have the variables *gvkey*, *year*, *avg_basis_deviation*, and a portfolio identifier with values between 1 and 5.

In [53]:
## ENTER YOUR CODE IN THIS BOX
df4['portfolio']= pd.qcut(df4['avg_basis_deviation'],  q = 5, labels = False) 
df4.sort_values(by =['avg_basis_deviation'], inplace = True)
df4

Unnamed: 0,gvkey,date,avg_basis_deviation,portfolio
1067438,8272,20081224,-542.20,0
1062820,9555,20130501,-511.46,0
1028457,9155,20150716,-508.81,0
1097051,15521,20090212,-493.67,0
1067413,8272,20081119,-474.95,0
...,...,...,...,...
36913,1045,20131217,5028.70,4
36914,1045,20131218,5029.76,4
36916,1045,20131220,5033.66,4
36925,1045,20140102,5046.54,4


**Step 3(c)**    
Now read in the file crsp_data_hw5.txt, which contains monthly stock returns for a large sample of U.S. firms. For each firm *F* and year *T*, merge in the portfolio identifier from year *T-1* that you created in Step 3(b). During the merge you should keep only firms that are in both the CDS-Bond basis dataset and the CRSP dataset.

In [82]:
## ENTER YOUR CODE IN THIS BOX
dfcrsp = pd.read_csv("crsp_data_hw5.txt", error_bad_lines=False,sep="\t")
pd.merge(dfcrsp, df4, on='gvkey', how='inner')

Unnamed: 0,date_x,sic_industry_code,comnam,stock_price,trading_volume,stock_return,bid_price,ask_price,gvkey,year,date_y,avg_basis_deviation,portfolio
0,19630131.0,4511.0,AMERICAN AIRLS INC,2.210165,7988.0,0.102041,,,1045,1963,20101020,488.17,4
1,19630131.0,4511.0,AMERICAN AIRLS INC,2.210165,7988.0,0.102041,,,1045,1963,20100922,546.49,4
2,19630131.0,4511.0,AMERICAN AIRLS INC,2.210165,7988.0,0.102041,,,1045,1963,20101014,554.44,4
3,19630131.0,4511.0,AMERICAN AIRLS INC,2.210165,7988.0,0.102041,,,1045,1963,20101013,562.65,4
4,19630131.0,4511.0,AMERICAN AIRLS INC,2.210165,7988.0,0.102041,,,1045,1963,20101015,570.88,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...
29287944,20151231.0,2621.0,DOMTAR CORP,36.950000,110177.0,-0.091020,36.95,36.96,176760,2015,20090924,-91.95,0
29287945,20151231.0,2621.0,DOMTAR CORP,36.950000,110177.0,-0.091020,36.95,36.96,176760,2015,20090923,-88.98,0
29287946,20151231.0,2621.0,DOMTAR CORP,36.950000,110177.0,-0.091020,36.95,36.96,176760,2015,20090928,-86.36,0
29287947,20151231.0,2621.0,DOMTAR CORP,36.950000,110177.0,-0.091020,36.95,36.96,176760,2015,20091005,-75.29,0


**Step 3(d)**    
For each month in the data frame from Step 3(c), calculate the equal-weighted average stock return across firms in each of the 5 portfolios. Also, for each month calculate the difference between the return of portfolio 5 and portfolio 1, i.e., the monthly value of (equal-weighted average stock return across firms in Portfolio 5) - (equal-weighted average stock return across firms in Portfolio 1). Label this difference in portfolio returns as the *basis_factor*. Briefly explain what this variable represents.

At the end of this step, your data frame should contain a variable for the month, 5 variables for the average monthly returns of each portfolio, and the variable *basis_factor*. It should have unique observations for each month. Store this data frame in an output file.

In [21]:
## ENTER YOUR CODE IN THIS BOX

<div class="alert alert-block alert-warning">
Click on this box and type your answer about what basis_factor represents.
</div>



**Step 3(e)**    
Calculate the average value of *basis_factor* across all months before September 2008. Also calculate the average value of *basis_factor* across all months after March 2009. Report the averages below.

In [22]:
## ENTER YOUR CODE IN THIS BOX
df3d1=df3d['date'][df['date'] < 20080901].mean()
df3d2=df3d['date'][df['date'] < 20090331].mean()

<div class="alert alert-block alert-warning">
Click on this box and type your answer about the averages of basis_factor.
</div>



**Challenge Exercises**    
The below exercises are intended for students who wish to earn a grade above 8,0 on the resit assignment. They should be possible to complete with the computing knowledge and syntax taught in the course. Students who do not attempt the challenge exercises can receive a grade of up to 8,0 on their resit assignment. Students who make a serious attempt at the challenge exercises will receive a bonus of up to 2,0 points on the resit assignment. The bonus is awarded even to students who do not correctly complete all of the main steps of the resit assignment. However, no bonus is awarded to students who do not attempt at least half of the main steps.

For these exercises, you need the monthly factor data in the file "factor_data.txt".

The exercises are:
- Combine the data frame from Step 3(d) with monthly data on the SMB and HML factors, the monthly equity risk premium ERP, and the risk-free rate. Also, subtract the risk-free rate from each of the 5 monthly portfolio returns (i.e., the portfolios based on quintiles of the CDS-Bond basis).
- Estimate a 4-factor model separately for each of the 5 portfolios using an OLS regression. Use the monthly portfolio returns as the dependent variable. The explanatory variables should include the standard Fama-French factors along with *basis_factor*. For each portfolio estimate the regressions using all months in the dataset.  Record the factor loadings, alpha, associated p-values, and R<sup>2</sup> in a table. 
- Explain why a quant trading firm may want to test a stock return factor model that includes a risk factor based on deviations from the CDS-Bond basis. Also interpret the results from tables 1 and 2 statistically and discuss what they mean about the *basis_factor* in 1-2 paragraphs. 

In [23]:
## ENTER YOUR CODE FOR CHALLENGE EXERCISE 1 IN THIS BOX

In [24]:
## ENTER YOUR CODE FOR CHALLENGE EXERCISE 2 IN THIS BOX

<div class="alert alert-block alert-warning">
Answer by double clicking this window and filling out the tables below. Write your interpretation of the regression models under Table 2 in this box.
    <br><br>
    
**Table 1. Factor loadings and p-values**    
    
|             | Portfolio 1 <br> Loading | Portfolio 1 <br> P-value | Portfolio 2 <br> Loading | Portfolio 2 <br> P-value | Portfolio 3 <br> Loading | Portfolio 3 <br> P-value | Portfolio 4 <br> Loading | Portfolio 4 <br> P-value | Portfolio 5 <br> Loading | Portfolio 5 <br> P-value |
|:------------|:------------------------:|:------------------------:|:------------------------:|:------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|:---------------------------:|:-------------------------:|:-------------------------:|
|SMB          |[answer]                  |[answer]                  |[answer]                  |[answer]                   |[answer]                   |[answer]                   |[answer]                   |[answer]                     |[answer]                   |[answer]                   |
|HML          |[answer]                  |[answer]                  |[answer]                  |[answer]                   |[answer]                   |[answer]                   |[answer]                   |[answer]                     |[answer]                   |[answer]                   |
|ERP          |[answer]                  |[answer]                  |[answer]                  |[answer]                   |[answer]                   |[answer]                   |[answer]                   |[answer]                     |[answer]                   |[answer]                   |
    
<br> 
    
**Table 2. Regression R<sup>2</sup> values**    

|                         | Portfolio 1 | Portfolio 2 | Portfolio 3  | Portfolio 4  | Portfolio 5  |
|:------------------------|:-----------:|:-----------:|:------------:|:------------:|:------------:|
|Regression R<sup>2</sup> |[answer]     |[answer]     |[answer]      |[answer]      |[answer]      |
    
</div>

<div class="alert alert-block alert-warning">
Click on this box and type your answer to Challenge Exercise 3. 
</div>

