# Playing with the min-variance portfolio

We will compute the min-variance portfolio using a gradient-descent algorithm and compare it to its closed-form version.

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


First, we need some data. We will download cryptoasset prices using the public API of Cryptocompare.

In [2]:
import json
import requests
## cryptocompare endpoint for historical daily data
## Documentation: https://min-api.cryptocompare.com/documentation?key=Historical&cat=dataHistoday
url = "https://min-api.cryptocompare.com/data/histoday?fsym={}&tsym=USD&limit={:d}"

## coin names and number of days we want
coins = ["BTC", "BCH", "ETH", "LTC", "XRP"]
n_coins = len(coins)
n_day = 365*2

## Get all time series
dict_df = {}
for coin in coins:
    ## get the data for coin
    res = requests.get(url.format(coin, n_day))
    
    ## reformat and get a dataframe indexed by time
    df = pd.DataFrame(json.loads(res.content)['Data'])\
        .assign(time = lambda x: pd.to_datetime(x.time, unit='s'))\
        .assign(logret = lambda x: np.append(np.nan, np.diff(np.log(x.close))))\
        .set_index('time')
        
    ## you can make query as follows
    # df[df.index.year >= 2019] ## data from 2019
    # df[df.index.day == 15] ## 15-th of the month only
    
    ## save it in the dict
    dict_df[coin] = df

At the moment, there is no train-test sets so we compute the covariance matrix once and treat it a constant.

In [3]:
log_ret = np.hstack([dict_df[coin].logret.values[1:].reshape(-1,1) for coin in coins])
cov_mat = np.cov(log_ret, rowvar=False)
#print('Covariance matrix:\n', np.round(cov_mat, 4))

## Closed-form min-variance portfolio

The variance of the min-variance portfolio is
$$\sigma^2_{\rm MV} = 1 / ({\bf 1}^\top \Sigma^{-1} {\bf 1})$$
and the allocation weights are
$$w_{\rm MV} = (\Sigma^{-1} {\bf 1}) \sigma^2_{\rm MV}$$

In [4]:
ones = np.ones((n_coins, 1))
inv_covmat = np.linalg.inv(cov_mat)
var_mv = (1 / (ones.T @ inv_covmat @ ones)).squeeze()
wgt_mv = (inv_covmat @ ones) * var_mv

print('Min-Var weights: ', np.round(100*(wgt_mv.T), 2))
print('Min-Var volatility: {:.2f}%'.format(100*var_mv**0.5))

Min-Var weights:  [[83.26 -4.44 19.5  -7.75  9.43]]
Min-Var volatility: 4.29%


## Gradient-based approach

First set the hyper-parameters.

In [5]:
## Hyper-parameters
learning_rate = 1.0
n_steps = 5000
pen_wgt = 1.0

Now, we build the model (=loss function), define the optimizer, initialize the variables, and train the model.

**Question:** what do you think will happen?

In [6]:
## build model
tf.reset_default_graph()
tsr_cov = tf.constant(cov_mat, name="cov_mat")
weights_init = np.full((n_coins,1), 1/n_coins)
tsr_wgt = tf.Variable(initial_value=weights_init, name="weights")
tsr_var = tf.matmul(tf.transpose(tsr_wgt), tf.matmul(tsr_cov, tsr_wgt), name="var")
tsr_pen_wgt = tf.identity(tf.abs(tf.reduce_sum(tsr_wgt) - 1.0), name="sum_err")
tsr_loss = tsr_var + pen_wgt * tsr_pen_wgt

## optimizer operation
onestep_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(tsr_loss)

## initialize variables
init = tf.global_variables_initializer()

## save some elements for investigation
di = {'epoch' : np.zeros(n_steps),
      'loss' : np.zeros(n_steps), 
      'vol' : np.zeros(n_steps), 
      'wgt_sum' : np.zeros(n_steps)}  

## run
with tf.Session() as sess:
    sess.run(init)
    for i in range(n_steps):
        sess.run(onestep_op)
        di['epoch'][i] = i
        di['loss'][i] = tsr_loss.eval(session=sess).squeeze()
        di['vol'][i] = (tsr_var.eval(session=sess)**0.5 * 100).squeeze()
        di['wgt_sum'][i] = np.sum(tsr_wgt.eval(session=sess))
        if i % 1e3 == 0 :
            print("[Step {:d}]".format(i))
            print("Weights: ", np.round(100*(tsr_wgt.eval(session=sess).T), 2))
            print("Loss: ", di['loss'][i])
            print("Volatility: {:.2f}%".format(di['vol'][i]))
            print("Weights sum: {:.4f}%".format(100*di['wgt_sum'][i]))
            print("-" * 5)

    ## save final weights
    wgt_gd = tsr_wgt.eval(session=sess)

    ## save as DF
    #print(pd.DataFrame(wgt_gd * 100, index = coins, columns = ["weight_pct"]))

[Step 0]
Weights:  [[19.65 19.39 19.52 19.48 19.47]]
Loss:  0.027279116765012348
Volatility: 4.86%
Weights sum: 97.5086%
-----
[Step 1000]
Weights:  [[-46.12 -51.31 -50.55 -52.78 -50.77]]
Loss:  3.5313886384929045
Volatility: 12.64%
Weights sum: -251.5422%
-----
[Step 2000]
Weights:  [[-49.49 -50.84 -50.67 -51.17 -50.65]]
Loss:  3.544162188079894
Volatility: 12.64%
Weights sum: -252.8187%
-----
[Step 3000]
Weights:  [[-50.23 -50.78 -50.64 -50.77 -50.66]]
Loss:  3.546778265530467
Volatility: 12.64%
Weights sum: -253.0801%
-----
[Step 4000]
Weights:  [[-50.39 -50.77 -50.63 -50.68 -50.67]]
Loss:  3.5473570855993177
Volatility: 12.64%
Weights sum: -253.1379%
-----


Compare the weights.

In [7]:
var_gd = (wgt_gd.T @ cov_mat @ wgt_gd).squeeze()

print('Min-Var weights:\t', np.round(100*(wgt_mv.T), 2))
print('Min-Var weights TF:\t', np.round(100*(wgt_gd.T), 2))
print('Min-Var volatility:\t{:.2f}%'.format(100*var_mv**0.5))
print('Min-Var volatility TF:\t{:.2f}%'.format(100*var_gd**0.5))

Min-Var weights:	 [[83.26 -4.44 19.5  -7.75  9.43]]
Min-Var weights TF:	 [[50.45 50.77 50.61 50.65 50.68]]
Min-Var volatility:	4.29%
Min-Var volatility TF:	12.64%


Plot the metrics during training.

In [8]:
from plotnine import ggplot, aes, geom_line

df = pd.DataFrame(di)

print(ggplot(df, aes(x='epoch', y='loss')) + geom_line())
print(ggplot(df, aes(x='epoch', y='vol')) + geom_line())
print(ggplot(df, aes(x='epoch', y='wgt_sum')) + geom_line())

ModuleNotFoundError: No module named 'plotnine'

## Exercises:

Evaluate:
* split the data into train-test sets.
* compare the in- and out-of-sample performance of the min-variance portfolio.
* compare with the min-variance portfolio from the test set (best ex-post).

Adjust portfolio:
* construct a long-only portfolio (non-negative weights).
* construct a sparse allocation portfolio (e.g. only 3 stocks).
* construct a portfolio with a maximum absolute weight distance (e.g. 50%) with respect to the equiweighted portfolio.
* construct the mean-variance portfolio (e.g. given a target return level).

Improve the algorithm (open subject):
* compute and display the covariance matrix over time (e.g. rolling window).
* modify the methodology to (try to) reduce the out-of-sample variance.
