# Theory of Finance

In this notebook, you will work with stock market data from the S&P500. You will learn some basics of portfolio management using Python, such as computing log-returns, computing and plotting the mean-variance frontier, and building the minimum variance portfolio.

In [None]:
# Library to access Yahoo!Finance data through Python
import pandas_datareader.data as pdata
# Numerical computation library
import numpy as np
# Plotting library
import matplotlib.pyplot as plt

## Obtaining financial data

The first thing we are going to do is gather stock prices from the `pandas-datareader` package we have just loaded above. In general, there exist a lot of different sources for financial data, depending on your requirements (which markets, which frequency, which time window, etc.) you might need to look into further sources. For simplicity's sake, this notebook will ignore these complications but these are things you should connsider when writing your research.


We begin by building a list of 30 *random* tickers from stocks in the S&P500. We will grab the data from pandas-datareader for these stocks only.


In [None]:
# Create a list with the tickers of the stocks for which we want to obtain data
stocklist = ['AAPL', 'AFL', 'AMZN', 'APD', 'BAC', 'COF', 'COST', 
             'HBAN', 'HON', 'INTU', 'IPG', 'KEY', 'KLAC', 'LMT', 
             'LOW', 'MSFT', 'NEM', 'NKE', 'OMC', 'PAYX', 'RF', 'RHI', 
             'SNA', 'SWK', 'UPS', 'USB', 'VZ', 'WHR', 'WY', 'YUM']

Because we are querying an online API, obtaining the data can take a few seconds.

In [None]:
# Obtain the data from Yahoo!Finance between 1st Jan 2010 and 1st Jun 2021
prices = pdata.DataReader(stocklist, data_source = 'yahoo', 
                          start = '2010-1-1', end = '2021-6-1')
# Keep only the "Adjusted Close" prices
prices = prices["Adj Close"]
# Display the first 10 rows of the dataset
prices.head(10)

In [None]:
# Create a dataframe of percentage returns
returns = 100 * prices.pct_change()
# Display the first 10 rows
returns.head(10)

Now that we have obtained stock returns for our 30 stocks between 2010 and 2021, we further proceed by splitting the data into train and test sets. In general, a train set is a set you use to fit your model and the test set is used to validate your model. This can help us understand when a model is overfitting. An overfitted model will perform well on the train data (the data that was used to fit the model) and it will perform badly on the test set (data that the model has not "*seen*" yet).

In [None]:
# Separate the data into a training (pre-2020) and a testing set (post-2019)
train_index = returns.index < "2020-1-1"
test_index = returns.index >= "2020-1-1"
returns[train_index]

In [None]:
# Using the training set, compute the stock returns means,
mean_returns = returns[train_index].mean()
# ... standard deviations,
std_returns  = returns[train_index].std()
# ... and covariances
cov_returns = returns[train_index].cov()

## Computing the efficient frontier

The efficient frontier (also mean-variance or portfolio frontier) is a set of portfolio allocations which offer the lowest variance for a given level of return.

Finding the efficient frontier is done by solving the optimization problem

$$
\begin{align}
\min &\mathbb{V}[R_p] \\ \text{s.t.} \ &\mathbb{E}[R_p] = \mu^\star
\end{align}
$$
where $\mathbb{V}[R_p]$ represents the variance of the portfolio and $\mathbb{E}[R_p]$ its expected value. Additionally, it is imperative that the chosen portfolio sum to 1.

The efficient frontier can be obtained through numerical minimization or by linear algebra, we follow the latter approach, as given in https://github.com/PaulSoderlind/FinancialTheoryMSc/blob/master/Ch03_MeanVariance.ipynb (however, the linked notebook implements the approach in Julia, here it is of course done in Python)

In [None]:
# Compute the efficient frontier
I = np.ones(mean_returns.shape)   # Vector of ones
S_1 = np.linalg.inv(cov_returns)  # Inverse of the return covariance matrix
# Compute scalars to help us with the computation of the frontier 
# (see course slides, chapter 3.1.4)
A = np.transpose(mean_returns) @ S_1 @ mean_returns
B = np.transpose(mean_returns) @ S_1 @ I
C = np.transpose(I) @ S_1 @ I
# Create a function that returns the weights that minimize the variance for a 
# given level of returns mu_star
def mv_frontier(mu_star):
  lam = (C * mu_star - B) / (A * C - B**2)
  delta = (A - B * mu_star) / (A * C - B**2)
  w = S_1 @ (mean_returns * lam + I * delta)
  return np.sqrt(np.transpose(w) @ cov_returns @ w), w

In [None]:
# Create a mean-variance diagram for stock returns
fig, ax = plt.subplots(figsize = (14, 10))
# Plot a scatterplot of returns' mean/stdev points for each stock
ax.plot(std_returns, mean_returns, 'o')
# Plot the mean-variance frontier
mu_star = np.arange(0, 0.2, step = 0.001)
ax.plot(np.vectorize(lambda x: mv_frontier(x)[0])(mu_star), mu_star)
# Add ticker annotations
for tic in mean_returns.index:
  ax.annotate(tic, xy = (std_returns[tic], mean_returns[tic]), size = 8)
ax.set_xlabel("Standard deviation")        # Set x-axis label
ax.set_ylabel("Mean")                      # Set y-axis label
ax.set_title("Returns' mean-stdev plot")   # Set title of the plot
plt.show()                                 # Display the plot

## Comparing portfolios

We now compute the weights that produce the minimum variance portfolio on the train set. Using the equal weights portfolio as a benchmark, we compare the performance of both portfolios on the test set.

In [None]:
# Compute the min-variance portfolio weights
w_minvar = (S_1 @ I) / (np.transpose(I) @ S_1 @ I)
# Store uniform weights for a benchmark portfolio
w_uniform = np.ones(prices.shape[1]) / prices.shape[1]

In [None]:
# Create a function to return an array of portfolio value over time
def portfolio_value(prices, weights):
  # Compute the logarithmic returns
  log_returns = np.log(prices) - np.log(prices.shift(1))
  # Multiply the log returns by the associated weights, take the cumulative sum
  # and exponentiate to transform back to percentage returns
  return np.exp(np.cumsum(np.sum(weights * log_returns, axis = 1)))

In [None]:
# Compute the returns for both portfolios for the train and test set
r_minvar_train = portfolio_value(prices[train_index], w_minvar)
r_uniform_train = portfolio_value(prices[train_index], w_uniform)
r_minvar_test = portfolio_value(prices[test_index], w_minvar)
r_uniform_test = portfolio_value(prices[test_index], w_uniform)

In [None]:
# Create a plot of portfolio value over time
fig, ax = plt.subplots(2, figsize = (14, 16))
# Create the plot for the train set
ax[0].plot(r_minvar_train, label='Min-variance')
ax[0].plot(r_uniform_train, label='Equal weights')
ax[0].set_xlabel("Date")            # Set x-axis label
ax[0].set_ylabel("Portfolio value") # Set y-axis label
ax[0].set_title("Train set")        # Set title of the plot
ax[0].grid()                        # Add a grid
ax[0].legend(loc='best')            # Add the legend
# Create the plot for the test set
ax[1].plot(r_minvar_test, label='Min-variance')
ax[1].plot(r_uniform_test, label='Equal weights')
ax[1].set_xlabel("Date")            # Set x-axis label
ax[1].set_ylabel("Portfolio value") # Set y-axis label
ax[1].set_title("Test set")         # Set title of the plot
ax[1].grid()                        # Add a grid
ax[1].legend(loc='best')            # Add the legend
plt.show()                          # Display the plot

We see that, for both the train and test data, the equally weighted portfolio outperforms the minimum-variance portfolio returns-wise, however, it is also quite obvious that its variance is much higher.

To get a better grasp of these differences, we compute the mean, standard deviation, and Sharpe ratio of both portfolios for both the train and test period.

Because computing means of percentage returns does not make sense, we will instead work with log-returns for the portfolio metrics.

In [None]:
# Create a function to output portfolio metrics
def portfolio_metrics(prices, weights):
  # Compute the logarithmic returns
  log_returns = np.log(prices) - np.log(prices.shift(1))
  # Compute the log-returns for the portfolio
  log_returns = np.sum(weights * log_returns, axis = 1)
  # Compute and display the portfolio metrics
  mu = np.mean(log_returns)
  sigma = np.std(log_returns)
  print("Mean:    ", round(mu, 5))
  print("St. dev.:", round(sigma, 5))
  print("Sharpe:  ", round(mu/sigma, 5))

In [None]:
print("Min. variance portfolio (train data)")
print("-----------------------------------------")
portfolio_metrics(prices[train_index], w_minvar)
print("") # Empty line
print("Equally weighted portfolio (train data)")
print("-----------------------------------------")
portfolio_metrics(prices[train_index], w_uniform)

In [None]:
print("Min. variance portfolio (test data)")
print("-----------------------------------------")
portfolio_metrics(prices[test_index], w_minvar)
print("") # Empty line
print("Equally weighted portfolio (test data)")
print("-----------------------------------------")
portfolio_metrics(prices[test_index], w_uniform)