# Linear regression with SciPy

In [1]:
from scipy import stats

stock_returns = [0.065, 0.0265, -0.0593, -0.001, 0.0346]
mkt_returns = [0.055, -0.09, -0.041, 0.045, 0.022]
beta, alpha, r_value, p_value, std_err = \
stats.linregress(stock_returns, mkt_returns)

In the famous CAPM, the relationship between risk and rates of return in a security is described as follows:

$  R_{i} = R_{f} + \beta i(E[R_{mkt}] - R_{f}) $

For a security, i, its returns are defined as Ri and its beta as ßi. The CAPM defines the return of the security as the sum of the risk-free rate, Rf, and the multiplication of its beta with the risk premium. The risk premium can be thought of as the market portfolio's excess returns
exclusive of the risk-free rate.

The scipty.stats.linregress function returns five values: the slope of the regression line, the intercept of the regression line, the correlation coefficient, the p-value for a hypothesis test with a null hypothesis of a zero slope, and the standard error of the estimate. We are interested in finding the slope and intercept of the line by printing the values of beta and alpha, respectively:

In [2]:
print(beta, alpha)

0.5077431878770808 -0.008481900352462384


In [3]:
print('The beta of the stock is', beta, 'and the alpha is', alpha,'.')

The beta of the stock is 0.5077431878770808 and the alpha is -0.008481900352462384 .


The equation that describes the SML can be written as follows:
$ E[R_{i}] = R_{f} + \beta i(E[R_{M}] - R_{f}) $

The term E[RM]−Rf is the market risk premium, and E[RM] is the expected return on the market portfolio. Rf is the return on the risk-free rate, E[Ri] is the expected return on asset, i, and βi is the beta of the asset.

Suppose the risk-free rate is 5% and the market risk premium is 8.5%. What is the expected return of the stock? Based on the CAPM, an equity with a beta of 0.5077 would have a risk premium of 0.5077×8.5%, or 4.3%. The risk-free rate is 5%, so the expected return on the equity is 9.3%.

### Multivariate linear regression of factor models

#### Least squares regression with statsmodels

In [18]:
import numpy as np
import statsmodels.api as sm

# Generate some sample data
num_periods = 9
all_values = np.array([np.random.random(8) \
                       for i in range(num_periods)])

# Filter the data
y_values = all_values[:, 0] # First column values as Y
x_values = all_values[:, 1:] # All other values as X
x_values = sm.add_constant(x_values) # Include the intercept
results = sm.OLS(y_values, x_values).fit() # Regress and fit the model

print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.182
Model:                            OLS   Adj. R-squared:                 -5.540
Method:                 Least Squares   F-statistic:                   0.03189
Date:                Mon, 09 Sep 2019   Prob (F-statistic):              0.999
Time:                        21:11:30   Log-Likelihood:                -1.3825
No. Observations:                   9   AIC:                             18.76
Df Residuals:                       1   BIC:                             20.34
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.8911      3.960     -0.225      0.8

  "anyway, n=%i" % int(n))


In [19]:
print(results.params)

[-0.8910527   0.41577988  0.56202811  0.21260681  1.41366702 -0.1511463
 -0.73558972  0.90628935]


#### A simple linear optimization problem with 2 variables

In [20]:
import pulp

x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
problem = pulp.LpProblem(
    'A simple maximization objective',
    pulp.LpMaximize)
problem += 3*x + 2*y, 'The objective function'
problem += 2*x + y <= 100, '1st constraint'
problem += x + y <= 80, '2nd constraint'
problem += x <= 40, '3rd constraint'
problem.solve()

print("Maximization Results:")
for variable in problem.variables():
    print(variable.name, '=', variable.varValue)

Maximization Results:
x = 20.0
y = 60.0
