#### Chapter 1. Exercise 2.

In [2]:
import pandas as pd

In [3]:
data = pd.read_excel('chapter2data.xlsx', index_col='DATE')

**_Choose one industry that you think is highly risky and another industry that you think is relatively "safe"._**

Food ingustry seems to be relatively safe and banks - risky. Let's look at CITCRP and GENMIL data.

**_Divide your sample into the first half (January 1978 - December 1982) and the second half (January 1983 - December 1987) and choose the half with wich you will work._**

Let's consider the first half of data.

In [4]:
citcrp = data.CITCRP.loc[pd.Timestamp('1978-01-01'):pd.Timestamp('1982-12-01')]
genmil = data.GENMIL.loc[pd.Timestamp('1978-01-01'):pd.Timestamp('1982-12-01')]
rkfree = data.RKFREE.loc[pd.Timestamp('1978-01-01'):pd.Timestamp('1982-12-01')]
market = data.MARKET.loc[pd.Timestamp('1978-01-01'):pd.Timestamp('1982-12-01')]

Eq. 2.17

$$r_j - r_f = \alpha_j + \beta_j (r_m - r_f) + e_j$$

**_a) Using your computer regression software, the 60 observations you have chosen, and Eq. (2.17), estimate by ordinary least squares the parameters $\alpha$ and $\beta$ for one firm in each of these two industries. Do the estimates of $\beta$ correspond well with your prior intuition or belief? Why or why not?_**

Let's use a linear least-squares regression available within the stats module of Scipy (scipy.stats.linregress).

In [5]:
citcrp_risk_premium = citcrp - rkfree
genmil_risk_premium = genmil - rkfree
market_risk_premium = market - rkfree

In [6]:
from scipy import stats
citcrp_slope, citcrp_intercept, citcrp_r_value, citcrp_p_value, citcrp_std_err = stats.linregress(citcrp_risk_premium, market_risk_premium)
genmil_slope, genmil_intercept, genmil_r_value, genmil_p_value, genmil_std_err = stats.linregress(genmil_risk_premium, market_risk_premium)

In [7]:
print('alpha for CONTIL:', citcrp_intercept)
print('beta for CONTIL:', citcrp_slope)
print('alpha for GENMIL:', genmil_intercept)
print('beta for GENMIL:', genmil_slope)

alpha for CONTIL: 0.008755978253758671
beta for CONTIL: 0.43624789298424993
alpha for GENMIL: 0.010334427642417241
beta for GENMIL: 0.16150188991543996


The estimate of $\beta$ for the firm from food industry (~0.16) is less than the one for the bank (~0.44) by 63%. Which means that the food firm is actually less risky that the bank what seems to make sense since food in general is a necessary good and food market shouldn't change a lot even in case of common market shock while banks activity is very sensitive to whatever is going on in the world and depends on insides and news (like it was with Sberbank in December 2014 when it lost during one week more than trillion of rubles because of someone's spam attack).

But let's now check all other firms due to find the safest one and the riskiest one.

In [8]:
for firm in ['MOBIL', 'TEXACO', 'IBM', 'DEC', 'DATGEN', 'CONED', 'PSNH', 'WEYER', 'BOISE', 'MOTOR', 'TANDY', 'PANAM',
            'DELTA', 'CONTIL', 'CITCRP', 'GERBER', 'GENMIL']:
    firm_data = data[firm].loc[pd.Timestamp('1978-01-01'):pd.Timestamp('1982-12-01')]
    firm_risk_premium = firm_data - rkfree
    slope, intercept, r_value, p_value, std_err = stats.linregress(firm_risk_premium, market_risk_premium)
    print(firm, slope, r_value)

MOBIL 0.5456005896315541 0.6090048338697057
TEXACO 0.5633210458483533 0.6019655310019463
IBM 0.5923942373685911 0.4481393516792449
DEC 0.5751301278668433 0.6375628742600715
DATGEN 0.33187394033108486 0.5777007058141425
CONED 0.299648601534844 0.20518346043722774
PSNH 0.5676915710965611 0.35180419533007595
WEYER 0.5472763199100223 0.6224226338412773
BOISE 0.46002849950546026 0.637935536422588
MOTOR 0.43055189391301785 0.4892532636646321
TANDY 0.2997049493276856 0.5558227460949497
PANAM 0.2625724172503772 0.44277284536656053
DELTA 0.24097437141453976 0.3073850517901633
CONTIL 0.2799178866521468 0.3297904357033843
CITCRP 0.43624789298424993 0.44140882103160134
GERBER 0.42087049341019667 0.44150974111942803
GENMIL 0.16150188991543996 0.12627879422349467


GENMIL is indeed the safest company from given. While IBM and DEC companies, which belongs to computers industry, have the highest risk.

**_b) For one of those companies, make a time plot of the historical company risk premium, the company risk premium predicted by the regression model and the associated residuals._**

Let's consider GENMIL company and use the regression model and the second half of market data to predict the second half of genmil risk premium.

In [9]:
genmil_future = data.GENMIL.loc[pd.Timestamp('1983-01-01'):]
market_future = data.MARKET.loc[pd.Timestamp('1983-01-01'):]
rkfree_future = data.RKFREE.loc[pd.Timestamp('1983-01-01'):]
market_risk_premium_future = market_future - rkfree_future 
genmil_risk_premium_forecast = genmil_intercept + genmil_slope * market_risk_premium_future
residuals = genmil_future - genmil_risk_premium_forecast

In [13]:
import plotly
import plotly.graph_objs as go

x1=pd.concat([genmil, genmil_future]).index
y1=pd.concat([genmil, genmil_future]).values

x2=genmil_risk_premium_forecast.index
y2=genmil_risk_premium_forecast.values

x3=residuals.index
y3=residuals.values


trace0 = go.Scatter(
    x = x1,
    y = y1,
    mode = 'lines',
    name = 'history'
)
trace1 = go.Scatter(
    x = x2,
    y = y2,
    mode = 'lines',
    name = 'residuals'
)
trace2 = go.Scatter(
    x = x3,
    y = y3,
    mode = 'lines',
    name = 'model prediction'
)
data = [trace0, trace2, trace1]

plotly.offline.iplot(data, filename='line-mode')

**_c) For each of the companies, test the null hypothesis that $\alpha = 0$ against the alternative hypothesis that $\alpha \neq 0$, using a significance level of 95%._**

For statistical tests, we will use the _statsmodels_. Let's fit the linear model one more time but with statistic software.

In [48]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
df_genmil = pd.DataFrame({'x':market_risk_premium , 'y': genmil_risk_premium})
df_citcrp = pd.DataFrame({'x':market_risk_premium , 'y': citcrp_risk_premium})
model_genmil = ols("y ~ x", df_genmil).fit()
model_citcrp = ols("y ~ x", df_citcrp).fit()


The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.



In [49]:
model_genmil.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.016
Model:,OLS,Adj. R-squared:,-0.001
Method:,Least Squares,F-statistic:,0.9399
Date:,"Thu, 11 Oct 2018",Prob (F-statistic):,0.336
Time:,20:59:52,Log-Likelihood:,84.6
No. Observations:,60,AIC:,-165.2
Df Residuals:,58,BIC:,-161.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.0042,0.008,0.541,0.591,-0.011,0.020
x,0.0987,0.102,0.969,0.336,-0.105,0.303

0,1,2,3
Omnibus:,4.477,Durbin-Watson:,1.907
Prob(Omnibus):,0.107,Jarque-Bera (JB):,3.733
Skew:,0.6,Prob(JB):,0.155
Kurtosis:,3.231,Cond. No.,13.1


In [50]:
model_citcrp.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.195
Model:,OLS,Adj. R-squared:,0.181
Method:,Least Squares,F-statistic:,14.04
Date:,"Thu, 11 Oct 2018",Prob (F-statistic):,0.000415
Time:,20:59:55,Log-Likelihood:,75.152
No. Observations:,60,AIC:,-146.3
Df Residuals:,58,BIC:,-142.1
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.0006,0.009,0.065,0.948,-0.018,0.019
x,0.4466,0.119,3.746,0.000,0.208,0.685

0,1,2,3
Omnibus:,9.036,Durbin-Watson:,1.852
Prob(Omnibus):,0.011,Jarque-Bera (JB):,10.837
Skew:,0.588,Prob(JB):,0.00443
Kurtosis:,4.719,Cond. No.,13.1


For each of these companies $\alpha = 0$ enters the 95% CI for $\alpha$ thus we cannot reject the null hypothesis $\alpha = 0$ for these companies.

If we reject the hypothesis the model is regarded to be inefficient (see p.36 Berndt). Signficant alpha from 0 would mean that for some reason the investors expect to get positive excess return even if the while market don't get anything.

**_d) For each company, construct a 95% confidence interval for $\beta$. Then test the null hypothesis that the company's risk is the same as the average risk over the entire market, that is, test that $\beta = 1$ against the alternative hypothesis that $\beta \neq 1$._**

In [22]:
import random
import math
import collections
import scipy.stats
import scipy.stats.distributions as distributions
import numpy as np
import sys

# taken from scipy source code
def pvalue_beta(X, Y, beta):
    sampleSize = len(X)

    # Compute their average
    avgX = sum(X)/sampleSize
    avgY = sum(Y)/sampleSize

    # Partial steps to compute estimators of linear regression parameters.
    XDiff = [X_i - avgX for X_i in X]
    XDiffSquared = [i*i for i in XDiff]
    YDiff = [Y_i - avgY for Y_i in Y]

    # B1 is the estimator of slope.
    # B0 is the estimator of intercept.
    # r is the estimator of Y given X.
    B1 = sum(x * y for x, y in zip(XDiff, YDiff)) / sum(XDiffSquared)
    B0 = avgY - B1*avgX
    r = lambda x: B0 + B1*x

    # Partial steps to compute Wald Statistic.
    errs = [y - r(x) for x, y in zip(X, Y)]
    errStd = math.sqrt((1/(sampleSize-2))*(sum([err**2 for err in errs])))
    XStd = math.sqrt((1/(sampleSize))*sum([diff**2 for diff in XDiff]))
    stdB1 = errStd / (XStd * math.sqrt(sampleSize))

    # Wald Statistic.
    W = (B1 - beta)/stdB1

    # pvalue of Wald Test of B1 = 0.
    pvalueWald = 2*scipy.stats.norm.cdf(-abs(W))

    # pvalue of T test of B1 = 0.
    pvalueT = 2*distributions.t.sf(abs(W), sampleSize - 2)

    return pvalueT

In [38]:
print('GENMIL P-value for the null hypothesis $\beta = 1$:', pvalue_beta(X = market_risk_premium.values,
                                                                         Y = genmil_risk_premium.values,
                                                                        beta = -0.11))
print('CITCRP P-value for the null hypothesis $\beta = 1$:', pvalue_beta(X = market_risk_premium.values,
                                                                         Y = citcrp_risk_premium.values,
                                                                        beta = 1))

GENMIL P-value for the null hypothesis $eta = 1$: 0.04494243489978189
CITCRP P-value for the null hypothesis $eta = 1$: 2.021992302619369e-05


Both p-value are less than 0.05 => we reject the null hypothesis both times. (We can get the same result looking at summary and 95% CI for beta)

From summaries we get that
- a 95% confidence interval for genmil $\beta$ is $[-0.1345, 0.3870]$;
- a 95% confidence interval for citcrp $\beta$ is $[0.206, 0.677]$.

**_e) For each of the two companies, compute the proportion of total risk that is market risk, also called systematic and nondiversifiable. Does evidence from the two companies you have chosen correspond to Sharpe's typical stock? Why or why not?_**

$R^2$ is for systematic risk.

From summaries we get that
- a systematic risk for genmil is $0.016$;
- a systematic risk for citcrp is $0.195$.
- a specific risk for genmil is $1 - 0.016 = 0.984$
- a specific risk for citcrp is $1 - 0.195 = 0.805$

The numbers reflect the idea that banks are more sensitive to macro shocks (CITCRP's risk is nearly 20%, while GENMIL's one is 1.6% which means 20% of the bank's risk can be explained by uncertainty of the overall market). In case of global shock people loose the confidence in the future. Most likely they would take cash from banks, sell assets and so on rather than stop eating bread.
CITCRP's risk is quite close to Sharpe's statement while GENMIL's is not at all.
Apparently Sharpe's proposition is based on average firm data while we have two specific firms.

**_i) In your sample, do large estimates of $\beta$ correspond with higher $R^2$ values? Would you expect this always to be the case? Why or why not?._**

From summaries we get:

|  	| $\beta$ 	| $R^2$ 	|
|--------	|--------	|:-----:	|
| genmil 	| 0.0987 	| 0.016 	|
| citcrp 	| 0.45 	| 0.195 	|

Here large estimates of $\beta$ **do** correspond with higher $R^2$. Still it's not always the case. Parameter $R^2$ reflects error on input dataset. Assume all values in dataset were multiplied by 1000. Then b will increase respectively while $R^2$ will not change. Moreover,

$$R^2 = \frac{\sum(a+bx)^2}{\sum(y^2)} $$

which doesn't imply at all that increasing of $\beta$ causes increasing of $R^2$.