# Project Statement

**Use current stock data to create two potentially profitable investment
portfolios. One that is higher risk and one that is lower risk.**

**You are to explain your interpretation of a high risk profile and low
risk profile of a portfolio. You should provide some measurable
quantitative data in you explanation.**

# Portfolio Choice and Motivation

It is well documented that investors that invest for consumption goals
in the long term (i.e. to fund consumption \>10 years from the beginning
of investment) can maximize expected return and minimize risk—as defined
by volatility and probability of (un)realized capital loss—by holding
broadly diversified, low cost index funds of risky assests with expected
returns in excess of 3 month US treasury bills. Furthermore, it is
financial dogma to hold differing proportions of the more "risky" stocks
and "safer" bonds according to how close the investor is to their date
of their consumption beginning.

We take the persona of a young investor, defined as an investor entering
a risky portfolio of stocks and bonds to fund consumption occurring a
least 25 years in the future. We accept the dogma of investing in low
cost, total market index funds to hold long term, and base *both* of our
portfolios on the following assumptions:

1.  Invest in total market index funds at a 90% stock/10% bond split
2.  Stocks should be diversified globally
3.  The bond portion will be zero coupon 25+ year Treasury STRIPS

We introduce a 10% bond allocation due to its curtailing of
psychological risk and improving of risk-adjusted returns. Moreover, our
duration exposure to the bonds should be in line with our consumption
timeline. As such, we adopt ultra-long dated Treasury STRIPS, as opposed
to a total bond market fund, which would have an average duration in the
intermediate length of ~6 years. It has been demonstrated that STRIPS
have lower correlation with stocks than intermediate date bonds, and
would provide a stronger diversification benefit, which is desirable in
our framework. We will give empirical evidence for this fact.

Our "low risk" portfolio will consist of 90% global stocks at
**current** market cap weights and 10% 25+ year US Treasury STRIPS.

| Asset             | %   |
|-------------------|-----|
| Total US Stock    | 57  |
| Total ex-US Stock | 33  |
| Total US Treasury | 10  |

The choice of using current market cap weights for the US and ex-US
geographies is to make the portfolio comparable to the high risk
portfolio.

# Risk Factors as Sources of Increased Risk

In Capital Asset Pricing Model (CAPM), the "low" risk portfolio would be
nearly the riskiest portfolio one can come up with, when only
compensated risks are accounted for. A compensated risk is a source of
risk that an investor can expect to be compensated for, while an
investor taking on uncompensated risk cannot expect to increase their
returns for doing so. Examples of uncompensated risks are concentration
risk, where assets are highly concentrated into one asset (e.g. a
particular stock), and idiosyncratic country risk, where one
concentrates their stock holdings in a particular country's stock
market.

The common thread of uncompensated risk is that they can be eliminated
with diversification, something that the "optimal" portfolio in the CAPM
model supports. Our portfolios will also strive to take on only
compensated risks, but short of having a 100% stock allocation, our "low
risk" portfolio is already about as risky as we can get. One then
wonders, how can we take on more **compensated** risk to produce a
higher risk portfolio than the one we have already?

The work of Eugene Fama and Kenneth French provide a possible answer. In
short, Fama and French observed excess alpha (broadly-occurring return
in excess of the market portfolio) in certain stocks that the CAPM could
not explain. Statistically, the CAPM could only explain 70% of the
returns a diversified portfolio of stocks. Fama and French sought to
modify the pricing model to explain more of the returns.

Summarizing their body of research, they proposed two additional *risk
factors*: the size risk factor and value risk factor. By measuring a
diversified stock portfolio's exposure to companies with smaller size
(known as "small cap" companies) and with lower price-to-book values
(known as "value" companies), Fama and French were able to increase the
explanatory power of the CAPM from 70% to \>90% with their additional
risk factors. Their model is called the Fama-French three-factor model.

The risk factors in this model are

1.  Market beta (exposure to diversified stock)
2.  Size
3.  Value

It should be noted that the model improves on the CAPM *empirically* in
how it explains returns, but past returns do not predict the future.
However, there are good reasons to believe thse addition risk factors
should be compensated ones.

# The High Risk Portfolio

The three factor model predicts that a portfolio with higher exposure to
small cap companies and value companies can give returns higher than the
market portfolio. This is the basis for our high risk portfolio.

Sticking with the 90/10 stock/STRIPS allocation, we allocate 50% of each
of our geography's stock markets to small cap value companies in that
geography.

For example, at current market capitalizations, US stocks consist of
approximately 64% of the world's free-float market capitalization, with
ex-US contributing the remaining 36%. In our low risk portfolio, we
allocated 90% of our assets to this split, so our geography breakdown
would be $0.9\ast 0.64 = 57\%$ total US market, and
$0.9\ast 0.36 = 33\%$ total ex-US stock market.

In our high risk portfolio, these would be halved to $28\% $ and $17\%$
for US and ex-US total market, respectively, making way for $29\%$ and
$16\%$ US and ex-US small cap value, respectively.

The choice of $50\% $ dedicated to small cap value in each geography is
somewhat arbitrary, and is likely considered a very aggressive tilt to
the size and value factors. Our high risk portfolio is then

| Asset                 | %   |
|-----------------------|-----|
| Total US Stock        | 25  |
| US Small Cap Value    | 25  |
| Total ex-US Stock     | 15  |
| ex-US Small Cap Value | 15  |
| 25+ STRIPS            | 10  |

# Risk of Small Cap Value

Though we have used the three factor model to incorporate new sources of
risk into our high risk portfolio, it turns out that we have
incorporated assets that have higher risk even in traditional metrics of
risk such as volatitily. Let us demonstrate this empirically.

We will use the Dimensional funds' DFSVX and DISVX as our representative
tickers for US small cap value and ex-US small cap value, respectively.
We will compare the volatility of these funds against their respective
geography's total stock markets, represented by `VTI` and `VXUS` for US
and ex-US.

In order to capture the **total** return of these assets, we choose to
use data from [testfolio](https://testfol.io), a portfolio backtesting
tool that captures total returns of a portfolio, not just price action
like Yahoo Finance. The data we load gives the hypothetical growth of 4
portfolios, each invested 100% into one of the assets above.

In [None]:
import datetime as dt
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

us_exus = pd.read_csv("./us-exus.csv")
# us_exus.index = pd.to_datetime(us_exus["Date"])
us_exus.set_index(pd.to_datetime(us_exus["Date"]),inplace=True)
us_exus.drop("Date",axis=1,inplace=True)
us_exus.plot(rot=70)


In [None]:
day_returns = (us_exus.shift(1) -us_exus) / us_exus
day_returns.describe()


The "SIM" suffix indicates that the time series is simulating returns of
the ticker in the prefix. e.g. "VTISIM" represents the total return of
the total US equity market, as VTI has only been around since 2001.

With the data of daily returns in hand, we can compare volatilities over
the period from June 1996 to today.

In [None]:
import numpy as np
log_returns = np.log(day_returns+1).dropna()
# log_returns.describe()
(np.sqrt(252)*log_returns).describe().loc["std"]


One can see that the volatility of returns of US small cap value is
higher than the total stock market index, while it is slightly lower in
the ex-US geography. In this sense, adding these assets to the low risk
portfolio increases the volatility, making it higher risk in traditional
metrics. We can also visually inspect a histogram of daily returns and
see how they are distributed.

In [None]:
(100*day_returns).plot(kind="hist",bins=120,xlabel = "Daily Return (%)",
                 xlim=(-3.5,3.5),
                 ylabel = "Observations",
                 subplots=True,layout=(2,2),
                 title = "Distributions of Daily Returns")


The US small cap value asset has fatter tails than the total market
counterpart in the same geography, as demonstrated by the higher
volatility earlier. The additional concentration of the higher risk
portfolio in these assets would make it more risky.

# Bond Returns Time Series

In [None]:
def csv_to_series(csv_path):
    s = pd.read_csv(csv_path)
    s["Date"] = pd.to_datetime(s["Date"])
    s.set_index("Date",inplace = True)
    returns = (s.shift(-1)-s) / s
    returns.columns = [ticker + " Returns" for ticker in s.columns]
    return pd.concat([s, returns],axis=1)


Let us demonstrate why the choice of zero coupon bonds in the higher
risk portfolio is riskier via volatility.

In [None]:
govt = csv_to_series("./GOVT-returns.csv")
zroz = csv_to_series("./ZROZ-returns.csv")
zroz
(np.log(pd.concat([1+ govt.drop("GOVT",axis=1),
                  1+zroz.drop("ZROZ",axis=1)]))*np.sqrt(252)).describe().loc["std"]


One sees that the daily returns of zero coupon bonds is significantly
more volatile than treasuries with lower duration, with data going back
to July 1969. However, this volatility is not taken arbitrarily. The
stripping of interest payments from zero coupon bonds increases their
expected returns over interest paying ones. Since we are interested in
long term investing for both of our portfolios, we compare the calendar
year returns

In [None]:
yearly_zroz = (1+zroz["ZROZ Returns"]).groupby(pd.Grouper(freq = "YE")).agg("prod")
yearly_govt = (1+govt["GOVT Returns"]).groupby(pd.Grouper(freq = "YE")).agg("prod")
pd.concat([yearly_zroz,
           yearly_govt],axis=1).plot(
               y = ["ZROZ Returns", "GOVT Returns"],
               title="Calendar year returns",
               ylabel= "Return (%)",
               xlabel = "Year")


Visually, the volatility difference is very stark. Historically, the
zero coupon bonds have acted as "bonds++", increasing the magnitude of
returns on intermediate-duration treasuries. The returns appear to agree
in their direction, but differ strongly in magnitude, which is
consistent with the interpretation of ultra-long duration bonds as a
leveraged bond allocation.

# A Backtest

We can backtest our two portfolios to get an idea of how they would have
performed against each other, had our hypothetical investor invested in
them for the tested period.

In [None]:
p1 = 0.57*vti + .33 * vxus + 0.1*govt
p2 = 0.25*vti + 0.25*dfsvx + 0.15*vxus + 0.15*disvx + 0.10*zroz


In [None]:
both = pd.concat([p1,p2],axis = 1).set_axis(["Portfolio 1 Returns", "Portfolio 2 Returns"],axis=1)
both.describe()


As we predicted, the standard deviation of daily returns the second
portfolio is higher, indicating higher volatilily.

In [None]:
(((1+both).groupby(pd.Grouper(freq="ME")).agg("prod")-1)*100).plot(kind="hist",
                                                             bins = 50,
                                                     subplots=True,
                                                     title = "Distribution of Monthly Returns (%)",
                                                     xlabel = "Montly Return (%)")


Finally, we backtest both portfolios with a starting value of of \$1000
invested on 1994-12-28.

In [None]:
((1+both).cumprod()*1000).plot(title = "Growth of $1000")


The higher risk portfolio ended up with higher total returns for this
period. Obviously, past performance is not an indicator of future
results.