# Portfolio Diversification
## Comparison of Allocation Methodologies

#### Scott Knapp
November 18, 2020

## Outline

* Executive Summary

* Background
    * Amount invested in traditional cap_weighted products
* // Obtain the data
    * // Fame French - 49 industries from 1/1/2000 to today
        * // Obtain the files
        * // Import the data
* Alternative appraoches
    * // Cap-weighted
    * Equal-weighted
        * Equal-weighted - adjusted for small sectors
    * Discuss problem with Maximum Sharpe Ratio Portfolio with a simple example
    * Global Minimum Variance
        * with sample Covariance
        * with Shrinking Covariance
        * with Min and Mix constraints

* For each approach discuss:  
    * the intuitive background:
    * the math
    * display the summary stats
    * Effective number of constiuents


## Executive Summary

## Background

## Setup and Imports

In [98]:
import pandas as pd
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt

%matplotlib inline

## The Data

Dartmouth College, through their Fama-French Data Library, offers an extensive array of raw data and factor portfolios going back to 1926.  We will be utilizing the 49 Industry Portfolio dataset, analzying monthly data from January, 2000 to September, 2020.  This time period will encompass 3 full market cycles, which will enable a more robust analysis of the various allocation methodologies.

The data can be downloaded here: https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html

We wil begin by importing monthly return data for both the value-weighted and equal-weighted industry portfolios.  These portfolios denote the weighting of the individual securities in each industry.  For example, if there are 44 firms in the `Cnstr` industry, the equal-weighted portfolio will assume an allocation of $\frac{1}{44}$, or more generally $\frac{1}{N}$, to each firm. 

Additionally, we will import the datasets for both the average firm size in each industry and the number of firms in each industry to be able to calculate a cap-weighted (value-weighted) index.  
* (This is not to be confused with the above. First, we separate how each industry is weighted; value vs equal, then we determine how we allocate to each industry, value vs equal).

In [10]:
# importing the monthly value-weighted returns for 49 industries
m_vw_rets = pd.read_csv('data/ind49_m_vw_rets.csv', header=0, index_col=0, parse_dates=True) / 100
# convert the index to equal the date, for time-series analysis
m_vw_rets.index = pd.to_datetime(m_vw_rets.index, format="%Y%m").to_period('M')
# eliminate white space in column names for easier indexing
m_vw_rets.columns = m_vw_rets.columns.str.strip()
m_vw_rets = m_vw_rets["2000":]

The Industries that this dataset uses are:

In [3]:
m_vw_rets.columns

Index(['Agric', 'Food', 'Soda', 'Beer', 'Smoke', 'Toys', 'Fun', 'Books',
       'Hshld', 'Clths', 'Hlth', 'MedEq', 'Drugs', 'Chems', 'Rubbr', 'Txtls',
       'BldMt', 'Cnstr', 'Steel', 'FabPr', 'Mach', 'ElcEq', 'Autos', 'Aero',
       'Ships', 'Guns', 'Gold', 'Mines', 'Coal', 'Oil', 'Util', 'Telcm',
       'PerSv', 'BusSv', 'Hardw', 'Softw', 'Chips', 'LabEq', 'Paper', 'Boxes',
       'Trans', 'Whlsl', 'Rtail', 'Meals', 'Banks', 'Insur', 'RlEst', 'Fin',
       'Other'],
      dtype='object')

In [11]:
# importing the monthly equal-weighted returns for 49 industries
m_ew_rets = pd.read_csv('data/ind49_m_ew_rets.csv', header=0, index_col=0, parse_dates=True) / 100
m_ew_rets.index = pd.to_datetime(m_ew_rets.index, format="%Y%m").to_period('M')
m_ew_rets.columns = m_ew_rets.columns.str.strip()
m_ew_rets = m_ew_rets["2000":]

In [18]:
# importing and formatting the monthly average firm size and number of firms datasets
ind_size = pd.read_csv('data/ind49_m_size.csv', header=0, index_col=0, parse_dates=True)
ind_size.index = pd.to_datetime(ind_size.index, format="%Y%m").to_period('M')
ind_size.columns = ind_size.columns.str.strip()
ind_size = ind_size["2000":]

ind_nfirms = pd.read_csv ('data/ind49_m_nfirms.csv', header=0, index_col=0, parse_dates=True)
ind_nfirms.index = pd.to_datetime(ind_nfirms.index, format="%Y%m").to_period('M')
ind_nfirms.columns = ind_nfirms.columns.str.strip()
ind_nfirms = ind_nfirms["2000":]

Let's write a quick function to calculate the value_weighted portfolio monthly returns for both equal-weighted industries and value_weighted industries. And calculate both value-weighted and equal-weigthed total portfolios.

In [19]:
def value_weighted_returns(ind_returns, ind_size, ind_nfirms):
    # Calculate the market cap for each industry
    ind_mktcap = ind_size * ind_nfirms
    # Colculate the total market cap for all industries
    total_mktcap = ind_mktcap.sum(axis="columns")   
    # Calculate the weighting of each industry in the total market cap
    ind_cap_wgt = ind_mktcap.divide(total_mktcap, axis = "rows")
    # Calcualte the total market return for each period
    total_market_return = (ind_cap_wgt * ind_returns).sum(axis="columns")
    
    return total_market_return

In [50]:
# Calculate the value-weighted portfolio market returns for the value-weighted industries
m_vw_vw_rets = value_weighted_returns(m_vw_rets, ind_size, ind_nfirms)

# Calculate the value-weighted portfolio market returns for the equal-weighted industries
m_vw_ew_rets = value_weighted_returns(m_ew_rets, ind_size, ind_nfirms)

In [51]:
# Calculate the equal-weighted portfolios returns for the value-weigthed industries
m_ew_vw_rets = m_vw_rets.mean(axis="columns")

# Calculate the equal-weighted portfolios returns for the equal-weigthed industries
m_ew_ew_rets = m_ew_rets.mean(axis="columns")

In [53]:
returns = pd.DataFrame({
    "Value-Weighted - EW Port": m_vw_ew_rets,
    "Value-Weighted - VW Port": m_vw_vw_rets,
    "Equal-Weighted - EW Port": m_ew_ew_rets,
    "Equal-Weighted - VW Port": m_ew_vw_rets,
})

returns

Unnamed: 0,Value-Weighted - EW Port,Value-Weighted - VW Port,Equal-Weighted - EW Port,Equal-Weighted - VW Port
2000-01,0.070894,-0.040130,0.044173,-0.033673
2000-02,0.188816,0.019456,0.090876,-0.021078
2000-03,-0.005319,0.074055,0.022084,0.076998
2000-04,-0.113548,-0.045298,-0.056082,-0.003986
2000-05,-0.070897,-0.030830,-0.040545,-0.016059
...,...,...,...,...
2020-05,0.085765,0.055291,0.069798,0.055661
2020-06,0.069212,0.023037,0.073351,0.018688
2020-07,0.057484,0.058037,0.060947,0.060033
2020-08,0.051738,0.077836,0.051535,0.065782


Next, we will create some summary statistics to compare these portfolios.
* Annualized Returns
* Annualized Standard Deviation
* Sharpe Ratio
* Drawdown
* Skewness
* Kurtosis
* Histroic VaR (5%)
* Cornish-Fisher VaR

In [103]:
def annualize_rets(returns, periods_per_year=12):
    # compound each years' return at 1+r
    compounded_growth = (1+returns).prod()
    # calculate the number of periods in ind_returns
    n_periods = returns.shape[0]
    
    return compounded_growth ** (periods_per_year / n_periods) - 1

def annualize_stdev(returns, periods_per_year=12):
    return returns.std() * np.sqrt(periods_per_year)

def sharpe_ratio(returns, risk_free_rate=0, periods_per_year=12):
    rf_per_period = (1+risk_free_rate) ** (1/periods_per_year) - 1
    excess_ret = returns - rf_per_period
    ann_ex_ret = annualize_rets(excess_ret, periods_per_year)
    ann_sd = annualize_stdev(returns, periods_per_year)
    
    return ann_ex_ret / ann_sd

def max_drawdown(returns):
    # calculate the accumulated growth at each period
    compounded_growth = (1+returns).cumprod()
    # calculate the previous peak value at each period
    previous_peaks = compounded_growth.cummax()
    # calculate the drawdowns at each period
    drawdowns = (compounded_growth - previous_peaks) / previous_peaks
    
    return -drawdowns.min()

def skewness(returns):
    # calculate each period's return difference from the average return
    demeaned_r = returns - returns.mean()
    # calculate the standard devistion of the portfolio
    sigma_r = returns.std(ddof=0)  # using ddof=0, to calculate population standard deviation
    # caluclate the numerator in the equation
    exp = (demeaned_r**3).mean()
    
    return exp / sigma_r**3

def kurtosis(returns):
    # calculate each period's return difference from the average return
    demeaned_r = returns - returns.mean()
    # calculate the standard devistion of the portfolio
    sigma_r = returns.std(ddof=0)  # using ddof=0, to calculate population standard deviation
    # caluclate the numerator in the equation
    exp = (demeaned_r**4).mean()
    
    return exp / sigma_r**4

def var_historic(returns, level=5):
    return -np.percentile(returns, level)

def var_cornish_fisher(returns, level=5):
    # compute the Z score assuming it was Gaussian
    z = norm.ppf(level/100)
    
    s = skewness(returns)
    k = kurtosis(returns)
    z = (z +
                 (z**2 - 1) * s/6 +
                 (z**3 - 3*z) * (k-3)/24 -
                 (2*z**3 - 5*z) * (s**2)/36
            )
    return -(returns.mean() + z * returns.std(ddof=0))

def summary_stats(returns, periods_per_year=12, risk_free_rate=0.02):
    summary_df = pd.DataFrame({
        "Annualized Return": returns.aggregate(annualize_rets, periods_per_year=periods_per_year),
        "Annualized Vol": returns.aggregate(annualize_stdev, periods_per_year=periods_per_year),
        "Sharpe Ratio": returns.aggregate(sharpe_ratio, risk_free_rate=risk_free_rate, 
                                              periods_per_year=periods_per_year),
        "Max Drawdown": returns.aggregate(max_drawdown),
        "Skewness": returns.aggregate(skewness),
        "Kurtosis": returns.aggregate(kurtosis),
        "Historic 5% VaR": returns.aggregate(var_historic),
        "CF 5% VaR": returns.aggregate(var_cornish_fisher)
    })
    return summary_df

In [104]:
summary_stats(returns)

Unnamed: 0,Annualized Return,Annualized Vol,Sharpe Ratio,Max Drawdown,Skewness,Kurtosis,Historic 5% VaR,CF 5% VaR
Value-Weighted - EW Port,0.091434,0.216458,0.323901,0.58352,-0.067811,4.919207,0.093561,0.092089
Value-Weighted - VW Port,0.066911,0.154981,0.297115,0.500088,-0.568248,3.966089,0.078383,0.07309
Equal-Weighted - EW Port,0.10164,0.207453,0.386287,0.598352,-0.271297,5.841772,0.087737,0.089501
Equal-Weighted - VW Port,0.088844,0.165339,0.408761,0.528351,-0.642084,5.549245,0.079556,0.075951
