In [None]:
#@title Update Pandas Datareader

# !pip install --upgrade pandas-datareader
! pip install yfinance

In [None]:
#@title Import Dependencies
 
# import dependencies 
import datetime
import numpy as np
import pandas as pd
import pandas_datareader as pdr
import yfinance as yf
import plotly.express as px
import plotly.figure_factory as ff
import plotly.graph_objects as go
import scipy.stats as stats

# set display options
pd.options.plotting.backend = "plotly"
pd.options.display.float_format = '{:,.2f}'.format

# A Brief Introduction to Modern Portfolio Theory

First proposed by economist Harry Markowitz in 1952, Modern Portfolio Theory holds that a given investment's risk and return characteristics should not be viewed in isolation; rather, it should be judged according to how it affects the overall portfolio's risk and return. By optimizing the mix of investments we can do either of the following:

* Maximize the return associated with a given level of risk

or, equivalently:

* Minimize the level of risk associated with a given return

In this notebook we will develop the intuition behind this theory. For those of you who are intrested, click [here](https://www.math.hkust.edu.hk/~maykwok/courses/ma362/07F/markowitz_JF.pdf) for Markowitz's original paper "Portfolio Selection" from the Journal of Finance.   


##Part 1: Gathering Data

To begin, we will need some data. Enter a ticker symbol and a date range below.

In [None]:
# Choose a company 
TICKER = 'TSLA' #@param {type:"string"}

# set the start and end dates
START = '2020-09-24' #@param {type:"date"}
END = '2022-09-24' #@param {type:"date"}

# set the data source
SOURCE = "google"

In [None]:
#@title Download Stock Data
df = yf.download(TICKER, start=START, end=END)
# df = pdr.DataReader(TICKER, SOURCE, start=START, end=END)
df

For our purposes, we are only interested in the **Adj Close** column. That is, the closing price adjusted for any stock splits and/or dividends paid.

## Part 2: Calculating the Return on a Stock


The return on a stock comes in two forms:
* The **capital gain** (the difference between the selling price and purchase price)
* The Dividends paid by the company

We can calculate the return on a stock using the following formula:

<br><br>
$Return=\frac{Price_{t} - Price_{t-1} + Div} {Price_{t-1}}$
<br><br>
That is, the return over period $t$ (be it a day, a week, a month, a quarter, etc...) is equal to the price at the end of the period minus the price at the beginning of the period (i.e. your **Capital Gain**), plus any dividends paid, divided by the price at the beginning of the period. We can multiply this value by 100 to convert it from a decimal to a percentage.<br><br>
Let's use this formula to calculate **Daily Return** for our stock...  

In [None]:
#@title Adding the Daily Return
df["Daily Return"] = np.log(1 + df["Adj Close"].pct_change())
df

## Part 3: Defining Risk

In the simplest terms, the risk of a stock is measured by how much its returns over a given period differ from the average return over that period. The more its returns differ from its average, the riskier the stock is.<br><br>
Mathematically we can measure this difference by calculating the related concepts of **variance** (denoted as $\sigma^2$) and **standard deviation** ($\sigma$).<br><br>
The **variance** can be calculated as follows:<br><br>
$\sigma^2 = \frac{\displaystyle\sum_{i=1}^{n}(x_i - \bar{x})^2} {n - 1}$<br><br>
And the **standard deviation** is just the square root of the **variance** :<br><br>
$\sigma = \sqrt{\frac{\displaystyle\sum_{i=1}^{n}(x_i - \bar{x})^2} {n - 1}}$<br><br>
In words, for each return in the period $x_i$, we subtract it from the average return for the period $\bar{x}$, and square the result. We then add up all these squared values and divide this sum by the number of returns in the period. This is the **variance** $\sigma^2$. Taking the square root gives **standard deviation** $\sigma$. 

### The Normal Distribution

For these measures of risk to make sense, we assume that stock returns are normally distributed. That is to say, we assume that it is unlikely that returns will stray too far from the mean. And when they do stray, it is just as likely that return will be above the average as it will be below the average.

In [None]:
#@title Plot the Distribution
#@markdown Of course, the stock is under no obligation to behave this way. Let’s see how well our stock conforms to this assumption.

fig = px.histogram(df, x="Daily Return", nbins=200)

# add mean to the figure
fig.add_vline(x=df["Daily Return"].mean(), line_dash='dash', line_color='firebrick')

fig.add_vline(x=df["Daily Return"].std(), line_dash='dash', line_color='firebrick')
fig.add_vline(x=df["Daily Return"].std()*-1, line_dash='dash', line_color='firebrick')

# add normal curve to the figure
pdf = stats.norm.pdf(df["Daily Return"].sort_values(), df["Daily Return"].mean(), df["Daily Return"].std())
fig.add_trace(go.Scatter(x=df["Daily Return"].sort_values(), y=pdf))

fig.update_layout(
    title_text=f"Distribution of {TICKER}'s Daily Returns",
    xaxis_title_text='Daily Returns', # xaxis label
    yaxis_title_text='Count', # yaxis label
    bargap=0.2, # gap between bars of adjacent location coordinates
    bargroupgap=0.1 # gap between bars of the same location coordinates
)


fig.show()

## Part 4: Constructing Portfolios

Just as we can measure the risk and return for a single stock, we can do the same for a portfolio of stocks. To calculate the return on a portfolio, simply take the weighted average of each stock’s return.<br><br>

$Return_{p} = w_{1}\cdot r_{1} + w_{2}\cdot r_{2} + \dotsc + w_{n}\cdot r_{n}$
<br><br>
Calculating the variance and standard deviation of a portfolio is somewhat more complicated. In addition to the weights, we also have to take into account the **covariance** between the stocks - that is, the degree to which the stocks in the portfolio move together. For a two stock portfolio, the variance and standard deviation are calculated as follows:<br><br>

$Var_{p} = w_{1}\cdot \sigma^2_{1} + w_{2}\cdot \sigma^2_{1} + 
2\cdot w_{1}\cdot w_{2}\cdot Cov_{1,2}$
<br><br>
$StDev_{p} = \sqrt{w_{1}\cdot \sigma^2_{1} + w_{2}\cdot \sigma^2_{1} + 
2\cdot w_{1}\cdot w_{2}\cdot Cov_{1,2}}$
<br><br>

(You won’t be responsible for calculating these values by hand.)

### The Efficient Frontier

As noted above, Modern Portfolio Theory allows us to optimize the mix of stocks in our portfolio such that we can maximize the return for a given level of risk, or minimize the level of risk associated with a given return. Portfolios that satisfy this optimal mix of stocks are said to lie along the **efficient frontier**. 

To construct such portfolios, we first need to gather some data. Below we define a list of stocks. Feel free to change the ticker symbols or add your own.

In [None]:
# a list of ticker symbols
TICKERS = ["AAPL", "CAT", "BAC", "WMT", "JNJ", "MSFT", "GE",  "BA",   "KO",  "DIS"]

In [None]:
#@title Download Stock Data

# set the start and end dates
START = '2020-01-01' #@param {type:"date"}
END = '2022-01-01' #@param {type:"date"}

# set the data source
SOURCE = "yahoo"

df2 = yf.download(TICKERS, start=START, end=END)["Adj Close"]
df2 = df2[TICKERS]
# df2 = pdr.DataReader(TICKERS, SOURCE, start=START, end=END)["Adj Close"]
df2

In [None]:
#@title
#@markdown Next lets choose how many stocks to include in our portfolios,
stocks = "3" #@param [2, 3, 4, 5, 6]
stocks = int(stocks)

#@markdown as well as how many random portfolios to generate,
portfolios = "1500" #@param [500, 1000, 1500, 2000]
portfolios = int(portfolios)

def get_random_weights(df):
  weights = np.random.random(len(df.columns))
  weights /= weights.sum()
  return weights

def get_portfolio_return(df, weights):
  return np.dot(df.mean(), weights) * 252

def get_portfolio_stdev(df, weights):
  return np.sqrt(np.dot(np.dot(df.cov(), weights), weights)) * np.sqrt(252)

def make_portfolio(df):
    weights = get_random_weights(df)
    port_return = get_portfolio_return(df, weights)
    port_stdev = get_portfolio_stdev(df, weights)
    return weights, port_return, port_stdev

def make_portfolio_df(df, s, p):

  returns_df = np.log(1+df.iloc[:, :s].pct_change())

  cols = [f"Weight: {col}" for col in returns_df]
  cols.append("Return")
  cols.append("Stdev")
  
  port_df = pd.DataFrame(columns=cols)

  for _ in range(p):
    w, r, s = make_portfolio(returns_df)
    port_dict = {cols[i]: weight for i, weight in enumerate(w)}
    port_dict["Return"] = r
    port_dict["Stdev"] = s
    port_df = port_df.append(port_dict, ignore_index=True)

  return port_df

pdf = make_portfolio_df(df2, stocks, portfolios)

#@markdown and finally, plot the Efficient Frontier.
fig = px.scatter(pdf, 
                 x="Stdev", 
                 y="Return", 
                 hover_data={col: ":.2f" for col in pdf.columns},
                 title="Efficient Frontier",
                 labels={"Stdev": "Annualised Risk (Standard Deviation)",
                         "Return": "Annualised Return"})

fig.update_traces(marker=dict(color=pdf["Return"]/pdf["Stdev"], 
                              size=7,
                              line=dict(width=1),
                              colorscale="RdBu"))

fig.show()



## Part 5: The Benefits of Diversification

As we’ve seen, by adding stocks to your portfolio we can reduce the risk associated with a given return. However, the benefits of diversification are not unlimited. Beyond a certain point adding more stocks will cease to have an appreciable effect on the portfolio’s risk.

Of course, diversification is not just about increasing the number of stocks; for the level of risk to fall the stocks in the portfolio must be **uncorrelated**. If instead they tend to rise and fall together, diversification offers no advantage.

In [None]:
#@title The Limits of Diversification

def get_uniform_weights(df):
  return np.array([1/len(df.columns)] * len(df.columns))

def make_dvrs_df(df):

  returns_df = np.log(1+df.pct_change())

  dvrs_df = pd.DataFrame(columns=["Stocks", "Risk"])

  sorted_cols = (returns_df.std() * np.sqrt(252)).sort_values(ascending=False).index

  for i in range(1, len(df.columns) + 1):

      port_df = returns_df[sorted_cols[:i]].copy()
      weights = get_uniform_weights(port_df)
      temp_dict = {"Number of Stocks": i,
                   "Risk": get_portfolio_stdev(port_df, weights),
                   "Stocks": " ".join(port_df.columns)}

      dvrs_df = dvrs_df.append(temp_dict, ignore_index=True)

  return dvrs_df

# get more data
TICKERS = ["META", "HON", "AMC", "GILD",  "PG",  "PFE",  "IBM", "TSLA", "MMM", "GME"]

df3 = df2.append(yf.download(TICKERS, start=START, end=END)["Adj Close"])
# df3 = df2.append(pdr.DataReader(TICKERS, SOURCE, start=START, end=END)["Adj Close"])
ddf = make_dvrs_df(df3)

# make the plot
fig = px.line(ddf, x="Number of Stocks", y="Risk", line_shape='spline')
fig.update_traces(line=dict(width=2, color="red"))
fig.update_layout(title="Diversification’s Diminishing Returns")
fig.show()

We can see that the effects of diversification are negligible beyond the 15th or so stock. We refer to the risk we have diversified away as **company** or **unique** risk; the risk that remains is **market risk**.

## Part 6: Modern Portfolio Theory: A Cynical Take

Like any theory, MPT has its critics. Perhaps its biggest drawback is what it implies about our investing acumen. To accept MPT’s basic premise is to admit we don’t know what we’re doing; we don’t know how to distinguish a good investment from a bad investment, so instead we buy some of everything and let the laws of statistics provide for our returns.
 
While this approach does little to flatter our ego, the good news is that there’s no shame in not knowing how to pick stocks. Because as it happens, neither do the professionals. 
 
It is one of the most replicated findings in all of finance that professional money managers, over the long run, consistently fail to beat the market, especially after management fees are deducted from their returns. This explains the popularity of firms like BlackRock, State Street and Vanguard whose index funds don’t seek to beat, but only track, the market - the ultimate form of diversification, at least when it comes to stocks.  
