# Hierarchically clustered risk parity optimization

For large portfolios containing many correlated assets standard optimization techniques fail both numerically and conceptually. The latter is because the assumption that one asset is a perfect substitute for another is simply incorrect for a general (large) portfolio. One overcomes this problem by clustering the portfolio correlations and optimizing in a way that respects the implied hierarchy.

In [13]:
import numpy as np
import scipy.cluster.hierarchy as sch
from rpp.portfolio import Portfolio
from rpp.plot_utils import *
import pandas as pd
import plotly.io as pio

pio.renderers.default='jupyterlab'
pio.renderers

Renderers configuration
-----------------------
    Default renderer: 'jupyterlab'
    Available renderers:
        ['plotly_mimetype', 'jupyterlab', 'nteract', 'vscode',
         'notebook', 'notebook_connected', 'kaggle', 'azure', 'colab',
         'cocalc', 'databricks', 'json', 'png', 'jpeg', 'jpg', 'svg',
         'pdf', 'browser', 'firefox', 'chrome', 'chromium', 'iframe',
         'iframe_connected', 'sphinx_gallery', 'sphinx_gallery_png']

Let's read in a list of the largest US equities by market cap and build a portfolio out of a large number of them:

In [2]:
market_cap = pd.read_csv("market_cap_list.csv")
us_companies = market_cap.loc[market_cap.country=="United States"]
Pf = Portfolio(*us_companies.Symbol.iloc[:50], period="2y")
us_companies.head(15)

  0%|          | 0/51 [00:00<?, ?it/s]

Unnamed: 0,Rank,Name,Symbol,marketcap,price (USD),country
0,1,Apple,AAPL,2652870000000.0,162.41,United States
1,2,Microsoft,MSFT,2222590000000.0,296.03,United States
3,4,Alphabet (Google),GOOG,1725670000000.0,2601.84,United States
4,5,Amazon,AMZN,1446820000000.0,2852.86,United States
5,6,Tesla,TSLA,947921000000.0,943.9,United States
6,7,Meta (Facebook),FB,843346000000.0,303.17,United States
7,8,Berkshire Hathaway,BRK-A,682187000000.0,458675.0,United States
9,10,NVIDIA,NVDA,582480000000.0,233.74,United States
11,12,Visa,V,447653000000.0,205.93,United States
12,13,UnitedHealth,UNH,434353000000.0,461.17,United States


In [3]:
with open('top50.txt', 'w') as f:
    dfAsString = us_companies.Symbol.iloc[:50].to_string(header=False, index=False)
    f.write(dfAsString)

Let's visually inspect the correlation matrices with and without clustering. The tree plot of the hierarchical clusters is inline with our intutitive understanding of how these equities are clustered - sector, value vs. growth etc.

In [14]:
corr = Pf.correlation
f1 = heat_map(corr, cluster=False, title="Raw correlations")
f2 = heat_map(corr, cluster=True, title="Clustered correlations")
f3 = tree_plot(corr, width=1000, color_threshold=1.5)
f1.show(), f2.show(), f3.show()

(None, None, None)

In [15]:
Pf.rebalance = 15
Pf.cluster = True
Pf.gamma = 0.6
Pf.optimize()
Pf.plot_perf()

  0%|          | 0/33 [00:00<?, ?it/s]

In [33]:
Pf.cluster = False
Pf.optimize()
Pf.plot_perf()

  0%|          | 0/33 [00:00<?, ?it/s]

In [37]:
cluster_tree = sch.to_tree(linkage(Pf.correlation))
sorted_idx = Pf.correlation.iloc[cluster_tree.pre_order()].index

def recursive_optimize(cov, sorted_idx):
    w = pd.Series(1, index=sorted_idx)
    clusters = [sorted_idx]
    while len(clusters)>0:
        clusters = [i[j:k] for i in clusters for j,k in ((0, int(len(i)/2)), (int(len(i)/2), len(i))) if len(i)>1]
        for i in range(0, len(clusters), 2):
            leaf0, leaf1 = clusters[i], clusters[i+1]
            cov0, cov1 = cov.loc[leaf0, leaf0], cov.loc[leaf1, leaf1]
            clv = []
            for cov_ in [cov0, cov1]:
                w_ = Pf.optimizer(Pf.metric, len(cov_),
                                          (np.zeros(len(cov_)), cov_, np.array(len(cov_)*[1./len(cov_), ]), Pf.gamma), (0, 1),
                                          Pf.constraint)[0]
                clv.append(w_@cov_@w_.T)
            a = 1-clv[0]/(clv[1]+clv[0])
            w[leaf0] *= a
            w[leaf1] *= 1-a
    return w

w_optim = recursive_optimize(Pf.covariance, sorted_idx)
w_optim

TSLA     0.007763
AMZN     0.024625
NFLX     0.013990
ABBV     0.047004
MRK      0.032964
PFE      0.026045
LLY      0.025134
ABT      0.020381
TMO      0.020206
DHR      0.039127
WMT      0.032155
COST     0.033136
ORCL     0.026458
UPS      0.019515
AVGO     0.012387
TXN      0.022900
MSFT     0.013472
GOOG     0.015753
NVDA     0.011118
ADBE     0.010839
CRM      0.009770
FB       0.019658
PYPL     0.009370
INTC     0.009941
QCOM     0.016765
BRK-A    0.025478
MS       0.008522
JPM      0.014148
BAC      0.006321
WFC      0.005235
SCHW     0.014590
XOM      0.007881
CVX      0.006608
VZ       0.056806
KO       0.018657
T        0.016128
UNH      0.025988
PEP      0.027149
CSCO     0.019647
NEE      0.031130
JNJ      0.035666
PG       0.031548
V        0.025597
MA       0.012168
HD       0.015994
MCD      0.023021
NKE      0.016829
DIS      0.013872
CMCSA    0.020542
dtype: float64