# Problem 1: Ledoit-Wolf shrinkage

Use chap4.py to get dataset D, the 3-currency data from 1971 through 2021; let H ("Holdout") be the year 2021 from D, and let L ("Learning) be D with H removed. Form the covariance matrix of log-returns of L and find the minimum variance portfolio w1 using that covariance matrix. Use Ledoit-Wolf shrinkage (with s = 1/3) on this matrix, and find the minimum variance portfolio w_2 based on the shrunk matrix. In the holdout sample H, which of w_1 and w_2 had the smaller variance?

In [1]:
#chap4.py
import pandas as pd
import qrpm_funcs as qf
#Get 3 currencies until the end of
#previous year. Form sample covariance matrix
#and do simple efficient frontier calculations

lastday=qf.LastYearEnd()
#Swiss franc, pound sterling, Japanese Yen
seriesnames=['DEXSZUS','DEXUSUK','DEXJPUS']
cdates,ratematrix=qf.GetFREDMatrix(seriesnames,enddate=lastday)
multipliers=[-1,1,-1]

lgdates,difflgs=qf.levels_to_log_returns(cdates,ratematrix,multipliers)



In [2]:
import pandas as pd
import numpy as np
from sklearn.covariance import LedoitWolf

# Assuming qrpm_funcs has been properly defined and imported
import qrpm_funcs as qf

# Get the data up to the end of the previous year (assumed to be 2020)
lastday = qf.LastYearEnd()
seriesnames = ['DEXSZUS', 'DEXUSUK', 'DEXJPUS']
cdates, ratematrix = qf.GetFREDMatrix(seriesnames, enddate=lastday)
multipliers = [-1, 1, -1]

# Convert dates to pandas datetime and calculate log returns
lgdates, difflgs = qf.levels_to_log_returns(cdates, ratematrix, multipliers)
lgdates_dt = pd.to_datetime(lgdates)
df = pd.DataFrame(difflgs, index=lgdates_dt)

# Split the data into learning and holdout sets
learning_data = df[df.index.year < 2021]
holdout_data = df[df.index.year == 2021]

# Calculate the covariance matrix of the learning set
cov_matrix_L = learning_data.cov()

# Calculate the mean returns for the learning data, which might be useful for further analysis
mean_returns_L = learning_data.mean()

# Apply Ledoit-Wolf shrinkage to the learning data
lw = LedoitWolf()
lw.fit(learning_data)
shrunk_cov_matrix_L = lw.covariance_

# Calculate the minimum variance portfolio for the sample covariance matrix
inv_cov_matrix_L = np.linalg.inv(cov_matrix_L)
ones_L = np.ones(inv_cov_matrix_L.shape[0])
w1 = inv_cov_matrix_L @ ones_L / (ones_L @ inv_cov_matrix_L @ ones_L)

# Calculate the minimum variance portfolio for the shrunk covariance matrix
inv_shrunk_cov_matrix_L = np.linalg.inv(shrunk_cov_matrix_L)
w2 = inv_shrunk_cov_matrix_L @ ones_L / (ones_L @ inv_shrunk_cov_matrix_L @ ones_L)

# Calculate variances of w1 and w2 in the holdout set
var_w1_H = w1.T @ holdout_data.cov() @ w1
var_w2_H = w2.T @ holdout_data.cov() @ w2

# Output the results
print('Variance of w1 in holdout set H:', var_w1_H)
print('Variance of w2 in holdout set H:', var_w2_H)

# Determine which portfolio had the smaller variance in the holdout set
if var_w1_H < var_w2_H:
    print('Portfolio w1 had the smaller variance in the holdout set H.')
else:
    print('Portfolio w2 had the smaller variance in the holdout set H.')


Variance of w1 in holdout set H: 9.150359805358827e-06
Variance of w2 in holdout set H: 9.144993933491371e-06
Portfolio w2 had the smaller variance in the holdout set H.


### Conclusion: Portfolio w2 had the smaller variance in the holdout set H.

# Problem 2: equally weighted portfolio v.s. efficient portfolio

Let w_3 = u/3 be the "1/N rule" portfolio of CHF, GPB, JPY (i.e. 1/3 each). Let r_3 = m^Tw_3 be the rate of return of w_3, where m is the mean vector computed over the learning period L described in the previous problem. Find w_e, the efficient portfolio of the form (4.6) with return equal to r_3 using the same period. Compare the performance of w_3 and w_e during the holdout period H. Which has the lower standard deviation during HI? Which has higher return? (You can use log-returns as if they are actual returns.) Form 4.6 is provided in attachment.

In [3]:
import numpy as np

# Invert the covariance matrix
C_inv = np.linalg.inv(cov_matrix_L)
u = np.ones(len(mean_returns_L))

w_3 = np.array([1/3, 1/3, 1/3])

# Compute the return r_3 for the equally weighted portfolio w_3
r_3 = mean_returns_L.dot(w_3)

# Compute the efficient portfolio w_e
# Calculate λ1 for the targeted return r_3
lambda_1_num = r_3 - u.T @ C_inv @ mean_returns_L
lambda_1_den = mean_returns_L.T @ C_inv @ mean_returns_L - (u.T @ C_inv @ mean_returns_L)**2 / u.T @ C_inv @ u
lambda_1 = lambda_1_num / lambda_1_den

# Calculate w_e using the formula provided
w_e = lambda_1 * (np.eye(len(mean_returns_L)) - np.outer(C_inv @ u, u.T @ C_inv) / (u.T @ C_inv @ u)) @ C_inv @ mean_returns_L + (C_inv @ u) / (u.T @ C_inv @ u)

# Check if the weights sum up to 1, if not, normalize them
if not np.isclose(w_e.sum(), 1):
    w_e = w_e / w_e.sum()

# Calculate portfolio performances during the holdout period H
# Portfolio returns
returns_w3_H = holdout_data.dot(w_3)
returns_we_H = holdout_data.dot(w_e)

# Portfolio standard deviations
std_w3_H = returns_w3_H.std()
std_we_H = returns_we_H.std()

# Portfolio mean returns
mean_w3_H = returns_w3_H.mean()
mean_we_H = returns_we_H.mean()

# Output the results
print(f"w_3 Standard Deviation in H: {std_w3_H}")
print(f"w_3 Mean Return in H: {mean_w3_H}")
print(f"w_e Standard Deviation in H: {std_we_H}")
print(f"w_e Mean Return in H: {mean_we_H}")


# Determine which portfolio had the lower standard deviation and which had higher return in H
if std_w3_H < std_we_H:
    print('w_3 had the lower standard deviation during H.')
else:
    print('w_e had the lower standard deviation during H.')

if mean_w3_H > mean_we_H:
    print('w_3 had the higher return during H.')
else:
    print('w_e had the higher return during H.')

w_3 Standard Deviation in H: 0.003044349196415945
w_3 Mean Return in H: -0.00020841037781883028
w_e Standard Deviation in H: 0.0030249120803409932
w_e Mean Return in H: -0.00022264601546511756
w_e had the lower standard deviation during H.
w_3 had the higher return during H.


# Problem 3

You have invested of $10\%$ your wealth in a hedge fund; the other $90\%$ is in cash and there is no time value
of money. One year from now the hedge fund will cease operations; it will either fail and give you back only
half of your investment, or succeed and give you back $1.85$ times your investment. You know that $15\%$ of
hedge funds fail every year. (This is actually roughly true. The average life of a hedge fund is about $5$ years.)
You find a hedge fund evaluation model which is $95\%$ accurate in predicting funds that will fail and
$90\%$ accurate in predicting funds that will succeed. You run this model on your hedge fund and it says it will fail.
You can get your money out now for a $2\%$ exit fee. If you have a logarithmic utility function, should you exit
now or stay for the remaining year?

Let's identify the situation here.

True positive: predicts as fails and the fund fails.  

False positive: predicts as fails but the fund stays.

True negative: predicts succeed but the fund fails.  

False negative: predicts succeed and the fund stays. 

When it give prediction "fails", we have 2 situations TP and FP.

$$Pr(\text{True Positive})=0.95*0.15=0.1425$$

$$Pr(\text{False Positive})=0.10*0.85=0.085$$

$$E(U(\text{stay})|\text{predicts failure})= \frac{0.1425}{0.1425+0.085}*u(0.5*w)+ \frac{0.085}{0.1425+0.085}*u(1.85*w)=log(w)-0.20432$$

$$E(U(\text{exit})|\text{predicts failure})= u(0.98*w) = log(w)-0.0202$$

Since $E(U(\text{exit})|\text{predicts failure})>E(U(\text{stay})|\text{predicts failure})$, I should exit next year.

# Problem 4: Fama/French Factor Model

The code in 8factor.y downloads data for an 8-factor model from that site. The 8 factors include three marketwide factors: Mkt-RF (excess return of the market over riskfree rate); SMB (small minus big, the capitalization effect); and HML (high minus low, value vs. growth). There are also 5 broad industry groups. The code coverts factor returns to log-returns.

Regress (using OLS) the log-returns of each of the 11 companies in ratio_data.xlsx on the log-returns of the 8 factors. Report the R-squared of each of the 11 regressions, and which factor has the highest absolute value t-statistic. Does the 8-factor model do a good job - i.e. are the R-squareds reasonably high? (There are many ways to do OLS; one of them is in statsmodels.api.)

In [4]:
#8factor.py. Output is in dataframe df_8factor.
import pandas as pd
import numpy as np
import qrpm_funcs as qf

lastday=qf.LastYearEnd()
lastyear = int(lastday[:4])
#get number of periods back to 192607
periods = (lastyear-1926)*12+6

#Create dataframe containing 8-factor model's log-returns

#Input 3-factor data - leave out RF
fac_url='http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip'
df_fac = pd.read_csv(fac_url, skiprows=3, nrows=periods, usecols=[0, 1, 2, 3])
df_fac.rename(columns={df_fac.columns[0] : 'yearmon'}, inplace=True)

#Input 5-industry data
ind_url='http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/5_Industry_Portfolios_CSV.zip'
df_ind = pd.read_csv(ind_url, skiprows=11, nrows=periods)
df_ind.rename(columns={df_ind.columns[0] : 'yearmon'}, inplace=True)

#Merge 3-factor and 5-industry
df_8factor=pd.merge(df_fac,df_ind,on='yearmon')

#Convert to log-returns
df_8factor = pd.concat([df_8factor["yearmon"], \
            np.log(1+df_8factor[df_8factor.columns[1:]]/100.)],axis=1)

df_companies = pd.read_excel('ratio_data.xlsx')

In [5]:
import statsmodels.api as sm

# First, we convert the returns for each company into logarithmic returns.
log_returns_companies = np.log(df_companies.iloc[:, 1:12]).set_index(df_companies['Date'])

# Next, we prepare the factor data. We calculate logarithmic returns for the factors and combine these with the date column.
log_returns_factors = np.log(1 + df_8factor.iloc[:, 1:] / 100)
factors_data = df_8factor[['yearmon']].join(log_returns_factors)
factors_data['yearmon'] = pd.to_datetime(factors_data['yearmon'], format='%Y%m')
factors_data.set_index('yearmon', inplace=True)

# We then synchronize the indices of both datasets by converting them to the same monthly period.
log_returns_companies.index = log_returns_companies.index.to_period('M')
factors_data.index = factors_data.index.to_period('M')

# An inner join merges the company log returns with the factor model data on their common monthly period index.
merged_data = log_returns_companies.join(factors_data, how='inner')

# We will now run a regression analysis for each company using the factor model.
# We initialize a dictionary to store the results of these regressions.
regression_outcomes = {}

# We loop through each company's log returns to perform individual regressions against the factor model.
for company in log_returns_companies.columns:
    # Independent variables (factors) are defined here with the addition of a constant term for the regression intercept.
    independent_vars = sm.add_constant(merged_data.loc[:, 'Mkt-RF':'Other'])
    # The dependent variable is the log returns of the current company.
    dependent_var = merged_data[company]
    # We perform the regression using Ordinary Least Squares (OLS).
    regression_model = sm.OLS(dependent_var, independent_vars).fit()
    # The results, including R-squared and the highest t-statistic factor, are stored in the dictionary.
    regression_outcomes[company] = {
        'R-squared': regression_model.rsquared,
        'Highest t-stat factor': regression_model.tvalues.abs().idxmax(),
        'Highest t-stat value': regression_model.tvalues.abs().max()
    }

# Finally, we convert the dictionary of results into a DataFrame for a formatted view.
regression_results_dataframe = pd.DataFrame(regression_outcomes).T
regression_results_dataframe




Unnamed: 0,R-squared,Highest t-stat factor,Highest t-stat value
AAPL,0.666797,HiTec,4.457237
AMZN,0.511205,Cnsmr,1.922329
ED,0.391316,Manuf,4.066321
F,0.486277,SMB,2.399552
JNJ,0.578362,SMB,3.8963
JPM,0.827485,HML,3.801992
ORCL,0.462245,SMB,2.99915
TSLA,0.269068,Manuf,2.713464
V,0.579773,HML,4.200902
WMT,0.350059,Cnsmr,3.743906


**AAPL, JNJ, JPM, V, and XOM** have high to moderately high R-squared values, suggesting that the 8-factor model does a decent job of explaining their returns.

**AMZN, ED, F, and ORCL** show moderate explanatory power with the model.

**TSLA and WMT** have low R-squared values, indicating that the 8-factor model does not explain their returns well.

For **TSLA**, in particular, with an R-squared of 0.269068, the model explains less than 30% of the variance in Tesla's returns, which is low. The factors included in the model may not capture some of the unique aspects affecting Tesla's stock performance.

Overall, the 8-factor model appears to do a good job for some companies (like JPM and XOM), a moderate job for others (like AMZN and F), and a less adequate job for the rest (like TSLA).