# Machine Learning used for Asset Allocation: Multi-task Lasso

**Hugh Donnelly, CFA**<br> 
*AlphaWave Data*

**September 2021**

## Introduction

In this article, we use [machine learning](https://hdonnelly6.medium.com/list/machine-learning-for-investing-7f2690bb1826) to make future returns predictions for equity and fixed income ETFs so that we can create optimized Equity Only, Fixed Income Only, and 60/40 Allocation portfolios versus respective benchmarks.  We show that we are able to outperform the benchmarks using the Multi-task Lasso model.

Jupyter Notebooks are available on [Google Colab](https://colab.research.google.com/drive/1HraWoI6I6dHc7YEZuAOgZHMzNgwN8OMZ?usp=sharing) and [Github](https://github.com/AlphaWaveData/Jupyter-Notebooks/blob/master/AlphaWave%20Data%20Machine%20Learning%20used%20for%20Asset%20Allocation%20example.ipynb).

For this project, we use several Python-based scientific computing technologies listed below.

In [1]:
import time
import requests
import numpy as np
import pandas as pd
from tqdm import tqdm
from itertools import product
import plotly.graph_objects as go
from IPython.display import display
from datetime import datetime, timedelta

from scipy import stats
from sklearn.linear_model import MultiTaskLasso
from sklearn.utils.testing import ignore_warnings
from sklearn.exceptions import ConvergenceWarning

## Asset Allocation

Let's start with a quick overview of asset allocation. Asset owners are concerned with accumulating and maintaining the wealth needed to meet their needs and aspirations. In that endeavor, investment portfolios—including individuals’ portfolios and institutional funds—play important roles. Asset allocation is a strategic—and often a first or early—decision in portfolio construction. Because it holds that position, it is widely accepted as important and meriting careful attention.

Generally, investment firms manage a group of portfolios and have particular outcomes or target dates assigned to each of these portfolios. To make sure these portfolios meet their assigned goals, there can be a strategic asset allocation associated with them. The strategic asset allocation decision determines return levels in which allocations are invested, irrespective of the degree of active management.

A common example of a portfolio with a strategic asset allocation is a portfolio with defined weightings for equity and fixed income asset classes.  The equity weighting may be allocated between U.S. equities and global equities while the fixed income weighting may be divided between Treasuries, corporate bonds, high-yield credit, and emerging market debt.  These weightings are usually quite fixed and are based on a long time horizon of historical returns and correlations. There may be other asset classes also considered in a portfolio with a strategic asset allocation, like commodities and derivatives. The portfolios are typically rebalanced periodically in order to maintain the same asset class exposure going forward in time.

However, we live in a world that changes quickly. As a result, some investment firms often employ a tactical asset allocation. This provides the portfolio with short-term tilts. There can be dynamic weightings, or put differently, migrations slightly away from the strategic asset allocation. The portfolio that uses a tactical asset allocation may focus on allocations within sub-asset classes. If you are working within the fixed income asset class, you may move your allocation away from Treasuries and short-term notes and towards a high-yield or emerging market credit allocation. The belief is that the portfolio is better able to outperform the market with these new asset allocations.

To employ a tactical asset allocation, there are a few different approaches that can be taken. One of the simplest approaches is a discretionary one whereby portfolio managers and chief investment officers overweight or underweight particular assets within the portfolio based on their view of the business cycle. They often move within certain risk boundaries when changing portfolio weightings.

Another tactical asset allocation technique used is a momentum, trend following approach. With this, you look to capitalize on an asset class outperforming its peers in the next couple of rebalancing periods.

The Black-Litterman model is yet another tactical asset allocation approach. The model came out of Goldman Sachs research published in 1992 by Fischer Black and Robert Litterman. This model allows you to look at the historical returns, risk, and correlation of the assets in your portfolio. The user is only required to state how her assumptions about expected returns differ from the markets and to state her degree of confidence in the alternative assumptions. From this, the Black–Litterman method computes the desired (mean-variance efficient) asset allocation.

Made famous more recently by the likes of AQR, risk parity has become a popular tactical asset allocation technique. The risk parity approach asserts that when asset allocations are adjusted (leveraged or deleveraged) to the same risk level, the risk parity portfolio can achieve a higher Sharpe ratio and can be more resistant to market downturns than the traditional portfolio. This is a slightly different approach in that you almost forget about forward forecasting returns.  Instead, you take the approach that you better understand where the risks lie in the portfolio and, as such, strive for an equal risk weighting among asset classes in the portfolio.  Essentially, you are focused on allocation of risk, usually defined as volatility, rather than allocation of capital. For example, if you had a 60% allocation to equities and a 40% allocation to fixed income in your portfolio, risk parity will likely force you to increase your allocation to fixed income as it is likely to have less risk than equities. Risk parity is vulnerable to significant shifts in correlation regimes in practice, such as observed in Q1 2020, which led to the significant underperformance of risk-parity funds in the Covid-19 sell-off.

---
## Examine 60/40 Base Case Scenario

Let's first get total returns for equity and fixed income ETFs that will serve as our benchmarks in this asset allocation analysis.

In [2]:
# fetch daily return data for benchmarks: SPY and AGG ETFs
global_eq = 'SPY'
global_fi = 'AGG'

stock_tickers = [global_eq, global_fi]

We can use the [10 Year Historical Monthly Prices](https://rapidapi.com/alphawave/api/stock-prices2) endpoint from the [AlphaWave Data Stock Prices API](https://rapidapi.com/alphawave/api/stock-prices2/endpoints) to pull in the ten year monthly historical prices so that we can calculate the returns.

To call this API with Python, you can choose one of the supported Python code snippets provided in the API console. The following is an example of how to invoke the API with Python Requests. You will need to insert your own <b>x-rapidapi-host</b> and <b>x-rapidapi-key</b> information in the code block below.

In [3]:
# fetch 10 year monthly return data

url = "https://stock-prices2.p.rapidapi.com/api/v1/resources/stock-prices/10y-1mo-interval"

headers = {
    'x-rapidapi-host': "YOUR_X-RAPIDAPI-HOST_WILL_COPY_DIRECTLY_FROM_RAPIDAPI_PYTHON_CODE_SNIPPETS",
    'x-rapidapi-key': "YOUR_X-RAPIDAPI-KEY_WILL_COPY_DIRECTLY_FROM_RAPIDAPI_PYTHON_CODE_SNIPPETS"
    }

stock_frames = []

# for ticker in stock_tickers:
for ticker in tqdm(stock_tickers, position=0, leave=True, desc = "Retrieving AlphaWave Data Benchmark Info"):
    
    querystring = {"ticker":ticker}
    stock_daily_price_response = requests.request("GET", url, headers=headers, params=querystring)

    # Create Stock Prices DataFrame
    stock_daily_price_df = pd.DataFrame.from_dict(stock_daily_price_response.json())
    stock_daily_price_df = stock_daily_price_df.transpose()
    stock_daily_price_df = stock_daily_price_df.rename(columns={'Close':ticker})
    stock_daily_price_df = stock_daily_price_df[{ticker}]
    stock_frames.append(stock_daily_price_df)
    
yf_combined_stock_price_df = pd.concat(stock_frames, axis=1, sort=True)
yf_combined_stock_price_df = yf_combined_stock_price_df.dropna(how='all')
yf_combined_stock_price_df = yf_combined_stock_price_df.fillna("")

periodic_returns = yf_combined_stock_price_df.pct_change().dropna()
periodic_returns

Retrieving AlphaWave Data Benchmark Info: 100%|██████████| 2/2 [00:01<00:00,  1.55it/s]


Unnamed: 0,SPY,AGG
2011-11-01,-0.004064,-0.003926
2011-12-01,0.004080,0.010590
2012-01-01,0.053011,0.012866
2012-02-01,0.043406,-0.002611
2012-03-01,0.027660,-0.005755
...,...,...
2021-06-01,0.019093,0.008333
2021-07-01,0.027764,0.011272
2021-08-01,0.029760,-0.002085
2021-09-01,-0.008150,0.002385


### Add a 60/40 Equity/Fixed Income Allocation

Let's also build a basic benchmark 60/40 Portfolio.

In [4]:
# create benchmark 60/40 Portfolio
periodic_returns['60/40 Portfolio'] = sum([periodic_returns[global_eq] * 0.6, periodic_returns[global_fi] * 0.4])
periodic_returns[["SPY",
                 "AGG",
                 "60/40 Portfolio"]] = periodic_returns[["SPY",
                                                        "AGG",
                                                        "60/40 Portfolio"]].apply(pd.to_numeric)
periodic_returns = periodic_returns.sort_index()
periodic_returns

Unnamed: 0,SPY,AGG,60/40 Portfolio
2011-11-01,-0.004064,-0.003926,-0.004009
2011-12-01,0.004080,0.010590,0.006684
2012-01-01,0.053011,0.012866,0.036953
2012-02-01,0.043406,-0.002611,0.024999
2012-03-01,0.027660,-0.005755,0.014294
...,...,...,...
2021-06-01,0.019093,0.008333,0.014789
2021-07-01,0.027764,0.011272,0.021167
2021-08-01,0.029760,-0.002085,0.017022
2021-09-01,-0.008150,0.002385,-0.003936


### Plot the Cumulative Returns

Next, we define `make_single_line_chart` and `make_all_line_charts` functions that will help us plot the benchmark returns.

In [5]:
# function to create a single line chart
def make_single_line_chart(column, alt_name=None):
    data = cumulative_returns[[column]]
    name = column
    if alt_name is not None:
        name = f'{alt_name} ({column})'

    return go.Scatter(x=data.index, y=data[column], name=name)

In [6]:
# function to create a multi line chart
def make_all_line_charts(emphasize=None):
    alt_names = {'SPY': '100% Equities', 'AGG': '100% Bonds'}
    data = []
    for column in cumulative_returns:
        alt_name = None
        if column in alt_names:
            alt_name = alt_names[column]
        chart = make_single_line_chart(column, alt_name)

        if emphasize is not None:
            if type(emphasize) != list:
                emphasize = [emphasize]
            if column not in emphasize:
                chart.line.width = 1
                chart.mode = 'lines'
            else:
                chart.line.width = 3
                chart.mode = 'lines+markers'
        data.append(chart)
    return data

In [7]:
# let's plot the cumulative returns
cumulative_rtns = (periodic_returns+1).cumprod() - 1
cumulative_returns = cumulative_rtns

chart_title = '60/40 Base Case'
emphasize = '60/40 Portfolio'

data = make_all_line_charts(emphasize)

layout = {'template': 'plotly_dark',
          'title': chart_title,
          'xaxis': {'title': {'text': 'Date'}},
          'yaxis': {'title': {'text': 'Cumulative Total Return'},
                    'tickformat': '.0%'}}

figure = go.Figure(data=data, layout=layout)
f2 = go.FigureWidget(figure)
f2

FigureWidget({
    'data': [{'line': {'width': 1},
              'mode': 'lines',
              'name': '100% …

### Plot Returns Chart - Logarithmic Scale

In [8]:
# let's plot the returns on a logarithmic scale
log_cumulative_rtns = (periodic_returns+1).cumprod() * 100
cumulative_returns = log_cumulative_rtns

chart_title = '60/40 Base Case' 
emphasize = '60/40 Portfolio'

data = make_all_line_charts(emphasize)

layout = ({'template': 'plotly_dark',
           'xaxis': {'title': {'text': 'Date'}},
           'yaxis': {'title': {'text': 'Cumulative Total Return'},
                     'type': 'log', 'tickformat': '$.3s'},
            'title': f'{chart_title} - Logarithmic Scale'})

figure = go.Figure(data=data, layout=layout)
f3 = go.FigureWidget(figure)
f3

FigureWidget({
    'data': [{'line': {'width': 1},
              'mode': 'lines',
              'name': '100% …

---
## Define Universe of Equity and Fixed Income ETFs

Our optimized portfolios will be created using ETFs selected from this universe based on predictions from the Multi-task Lasso model made at a later step.

### Equity ETFs

In [9]:
# fetch daily return data for equity ETFs
equity_etfs = ['QQQ','VUG','VTV','IWF','IJR','IWM','IJH','VIG','IWD','VO','VGT','VB','XLK','XLF']
len(equity_etfs)

14

### Fixed Income ETFs

In [10]:
# fetch daily return data for fixed income ETFs
fi_etfs = ['VCIT','LQD','VCSH','BSV','TIP','IGSB','MBB','MUB','EMB','HYG','SHY','TLT']
len(fi_etfs)

12

---
## Pull Historical Data

Now, let's get historical returns for our equity ETF universe.

We can use the [10 Year Historical Monthly Prices](https://rapidapi.com/alphawave/api/stock-prices2) endpoint from the [AlphaWave Data Stock Prices API](https://rapidapi.com/alphawave/api/stock-prices2/endpoints) to pull in the ten year monthly historical prices so that we can calculate the returns.

To call this API with Python, you can choose one of the supported Python code snippets provided in the API console. The following is an example of how to invoke the API with Python Requests. You will need to insert your own <b>x-rapidapi-host</b> and <b>x-rapidapi-key</b> information in the code block below.

In [11]:
# fetch 10 year monthly return data

url = "https://stock-prices2.p.rapidapi.com/api/v1/resources/stock-prices/10y-1mo-interval"

headers = {
    'x-rapidapi-host': "YOUR_X-RAPIDAPI-HOST_WILL_COPY_DIRECTLY_FROM_RAPIDAPI_PYTHON_CODE_SNIPPETS",
    'x-rapidapi-key': "YOUR_X-RAPIDAPI-KEY_WILL_COPY_DIRECTLY_FROM_RAPIDAPI_PYTHON_CODE_SNIPPETS"
    }

stock_frames = []

# for ticker in equity_etfs:
for ticker in tqdm(equity_etfs, position=0, leave=True, desc = "Retrieving AlphaWave Data Equity ETF Info"):
    
    querystring = {"ticker":ticker}
    stock_daily_price_response = requests.request("GET", url, headers=headers, params=querystring)

    # Create Stock Prices DataFrame
    stock_daily_price_df = pd.DataFrame.from_dict(stock_daily_price_response.json())
    stock_daily_price_df = stock_daily_price_df.transpose()
    stock_daily_price_df = stock_daily_price_df.rename(columns={'Close':ticker})
    stock_daily_price_df = stock_daily_price_df[{ticker}]
    stock_frames.append(stock_daily_price_df)
    
yf_combined_equity_etfs_df = pd.concat(stock_frames, axis=1, sort=True)
yf_combined_equity_etfs_df = yf_combined_equity_etfs_df.dropna(how='all')
yf_combined_equity_etfs_df = yf_combined_equity_etfs_df.fillna("")

equity_returns = yf_combined_equity_etfs_df.pct_change().dropna()
equity_returns

Retrieving AlphaWave Data Equity ETF Info: 100%|██████████| 14/14 [00:09<00:00,  1.55it/s]


Unnamed: 0,QQQ,VUG,VTV,IWF,IJR,IWM,IJH,VIG,IWD,VO,VGT,VB,XLK,XLF
2011-11-01,-0.026920,-0.002082,-0.005590,-0.001030,0.006243,-0.003783,-0.002368,0.018713,-0.006976,-0.006113,-0.019928,-0.004382,-0.014994,-0.051111
2011-12-01,-0.009931,-0.008668,0.017445,-0.007045,0.009012,0.000271,-0.009385,0.003857,0.013572,-0.016810,-0.017451,-0.010931,-0.006635,0.014832
2012-01-01,0.088333,0.066746,0.042726,0.064839,0.069688,0.076685,0.070519,0.032287,0.045638,0.081027,0.087697,0.083850,0.066729,0.088466
2012-02-01,0.064101,0.047836,0.038851,0.047161,0.021703,0.025690,0.045289,0.027644,0.036413,0.043263,0.069756,0.032747,0.071032,0.049787
2012-03-01,0.048750,0.031841,0.022333,0.029765,0.025948,0.021715,0.016286,0.016140,0.025765,0.018986,0.047001,0.023002,0.041796,0.070461
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-06-01,0.061390,0.058808,-0.017792,0.059889,0.001152,0.017162,-0.013907,-0.005845,-0.015578,0.014750,0.071156,0.012130,0.066908,-0.034220
2021-07-01,0.029803,0.033201,0.015624,0.035026,-0.021932,-0.034826,0.006369,0.036500,0.012184,0.016128,0.035255,-0.012336,0.040777,-0.000691
2021-08-01,0.042187,0.036730,0.020886,0.036485,0.019137,0.022031,0.019873,0.016717,0.019504,0.030232,0.035103,0.019644,0.035593,0.051479
2021-09-01,-0.005001,-0.003422,-0.015450,-0.005191,-0.022070,-0.017218,-0.017850,-0.013609,-0.012386,-0.009526,-0.006609,-0.010031,-0.006358,-0.014063


We next pull historical returns for our universe of fixed income ETFs.

We can use the [10 Year Historical Monthly Prices](https://rapidapi.com/alphawave/api/stock-prices2) endpoint from the [AlphaWave Data Stock Prices API](https://rapidapi.com/alphawave/api/stock-prices2/endpoints) to pull in the ten year monthly historical prices so that we can calculate the returns.

To call this API with Python, you can choose one of the supported Python code snippets provided in the API console. The following is an example of how to invoke the API with Python Requests. You will need to insert your own <b>x-rapidapi-host</b> and <b>x-rapidapi-key</b> information in the code block below.

In [12]:
# fetch 10 year monthly return data

url = "https://stock-prices2.p.rapidapi.com/api/v1/resources/stock-prices/10y-1mo-interval"

headers = {
    'x-rapidapi-host': "YOUR_X-RAPIDAPI-HOST_WILL_COPY_DIRECTLY_FROM_RAPIDAPI_PYTHON_CODE_SNIPPETS",
    'x-rapidapi-key': "YOUR_X-RAPIDAPI-KEY_WILL_COPY_DIRECTLY_FROM_RAPIDAPI_PYTHON_CODE_SNIPPETS"
    }

stock_frames = []

# for ticker in fi_etfs:
for ticker in tqdm(fi_etfs, position=0, leave=True, desc = "Retrieving AlphaWave Data FI ETF Info"):
    
    querystring = {"ticker":ticker}
    stock_daily_price_response = requests.request("GET", url, headers=headers, params=querystring)

    # Create Stock Prices DataFrame
    stock_daily_price_df = pd.DataFrame.from_dict(stock_daily_price_response.json())
    stock_daily_price_df = stock_daily_price_df.transpose()
    stock_daily_price_df = stock_daily_price_df.rename(columns={'Close':ticker})
    stock_daily_price_df = stock_daily_price_df[{ticker}]
    stock_frames.append(stock_daily_price_df)
    
yf_combined_fi_etfs_df = pd.concat(stock_frames, axis=1, sort=True)
yf_combined_fi_etfs_df = yf_combined_fi_etfs_df.dropna(how='all')
yf_combined_fi_etfs_df = yf_combined_fi_etfs_df.fillna("")

fi_returns = yf_combined_fi_etfs_df.pct_change().dropna()
fi_returns

Retrieving AlphaWave Data FI ETF Info: 100%|██████████| 12/12 [00:12<00:00,  1.04s/it]


Unnamed: 0,VCIT,LQD,VCSH,BSV,TIP,IGSB,MBB,MUB,EMB,HYG,SHY,TLT
2011-11-01,-0.016619,-0.031356,-0.008432,-0.001437,0.003665,-0.007220,0.001285,0.006404,-0.012718,-0.023676,0.000462,0.019843
2011-12-01,0.024755,0.031350,0.009584,-0.003480,0.002671,0.005391,0.005037,0.022422,0.015241,0.038799,0.000095,0.031250
2012-01-01,0.027170,0.028552,0.014017,0.013223,0.024997,0.008511,0.007834,0.040789,0.019364,0.027266,0.001823,0.002379
2012-02-01,0.006693,0.012909,0.008984,-0.001230,-0.005781,0.002862,-0.001660,-0.013206,0.020721,0.015318,-0.001773,-0.028299
2012-03-01,-0.007371,-0.014270,-0.000974,-0.001789,-0.008595,0.001204,0.000431,-0.008274,-0.001320,-0.009772,-0.000819,-0.042247
...,...,...,...,...,...,...,...,...,...,...,...,...
2021-06-01,0.011132,0.022107,-0.000518,-0.001894,0.005958,-0.000198,0.000178,0.003531,0.008522,0.013272,-0.001715,0.044219
2021-07-01,0.012610,0.014260,0.003266,0.003810,0.025543,0.003304,0.005438,0.004592,0.005619,0.001070,0.001648,0.037307
2021-08-01,-0.003526,-0.003343,-0.000807,-0.000629,-0.001081,-0.000380,-0.000024,-0.001887,0.009040,0.006183,-0.000313,-0.003317
2021-09-01,0.002495,0.005862,0.000700,-0.000099,0.003579,0.000327,0.001105,-0.000595,0.001876,0.003802,0.000023,0.012991


---
## Construct Time Series Model

The goal of this model is to predict the returns for each of these equity and fixed income ETFs and pick the best ETFs to place in the portfolio.  We will not be altering the 60/40 allocation split between equities and fixed income ETFs.  We will be rebalancing monthly to target the 60/40 allocation while also changing the composition of equity and fixed income ETFs in the portfolio.  This will give the model the ability to choose which equity and fixed income ETFs to invest in each month.

For the equity strategy and fixed income strategy, the model will be trained on past return data only.

### Autoregressive Time Series Forecasting

We will be using the Multi-task Lasso model in this analysis.  The model looks at return data for equities and fixed income ETFs.  It will train on five periods (monthly in this example) of returns in order to make a prediction of returns one month ahead.  The Multi-task Lasso model allows us to fit multiple regression problems jointly. This means the model will look at all features at the same time to predict all of the future returns.  For example, if we were analyzing stock A and stock B, the Multi-task Lasso model would not only look at the historical returns of stock A to predict the future returns for stock A.  Rather, the model looks at the historical returns for both Stock A and stock B in order to predict the future returns for stock A.

The reason we use Lasso is because it has a penalty term, called [regularization](https://medium.datadriveninvestor.com/introduction-to-machine-learning-an-overview-5ed43a37985d), for the betas that tries to shrink the coefficients down toward zero.   If a coefficient is not important, it drops out of the model completely.

<img src="img/autoregressive_gif.gif" >

In [13]:
@ignore_warnings(category=ConvergenceWarning)
def forecast_returns(return_time_series_data, non_return_data=None, window_size=5, num_test_dates=90):
    """
    Use a given dataset and the MultiTaskLasso object from sklearn to 
    generate a DataFrame of predicted returns
    
    Args:
    ================================
    return_time_series_data (pandas.DataFrame):
        pandas DataFrame of an actual return time series for a set of given indices.
        Must be in the following format:
        
         Period     |    
         Ending     |    Ticker_1    Ticker_2     ...    Ticker_N
       -----------  |   ----------  ----------   -----  ----------
       YYYY-MM-DD   |      0.01        0.03       ...     -0.05
                    |
       YYYY-MM-DD   |     -0.05       -0.01       ...      0.04
       
       
    non_return_data (pandas.DataFrame):
        pandas DataFrame of an actual time series of non-return data
        for a set of given indices. Must be in the same format, same
        ticker order, and have the same periodicity as the return_time_series_data above
        
        
    window_size (int):
        Number of periods used to predict the next value.
        Example: if window_size = 5, look 5 periods back to predict the next value
        Default = 5
        
    
    num_test_dates (int):
        Number of periods for which to generate forecasts
        Example: 120 = 10 years of monthly predictions, or 30 years of quarterly predicitons
        depending on the periodicity of the input data in return_time_series_data and non_return_data
        Default = 120
        
        
    Returns:
    ================================
    pandas.DataFrame
        Output is a DataFrame of expected returns in the same format as return_time_series_data
    
    """
    
    # descriptive variables for later use
    names = list(return_time_series_data.columns)
    dates = [f'{date.year}-{date.month}-{date.day}' for date in list(pd.to_datetime(return_time_series_data.index))]
    
    # transform pandas to numpy arrays
    X_returns = return_time_series_data.to_numpy()
    X_input = X_returns
    max_iter = 7500
    
    # concatenate non_return_data if it exists
    if non_return_data is not None:
        max_iter = 3000
        X_non_rtn = non_return_data.to_numpy()
        X_input =  np.concatenate((X_returns, X_non_rtn), axis=1)
    
    # number of time series (tickers) to model
    n_series = X_returns.shape[1]
    # number of features at each date; equal to n_series * number of features (return, oas_spread, etc.)
    n_features_per_time_point = X_input.shape[1]
    
    num_features = window_size * n_features_per_time_point
    num_training_points = X_returns.shape[0] - window_size
    X_train = np.zeros((num_training_points, num_features))
    Y_train = X_returns[window_size:,:]
    
    for i in range(num_training_points-1):
        X_train[i,:] = np.matrix.flatten(X_input[i : window_size + i,:])
    
    # establish empty arrays & variables for use in training each model
    mtl_list=[]
    alpha= 0.001
    Y_pred = np.zeros((num_test_dates, n_series))
    delta_Y = np.zeros((num_test_dates, n_series))
    dY_percent = np.zeros((num_test_dates, n_series))
    mse_pred = np.zeros(num_test_dates)
    predict_dates=[]    

    # loop through dates & predict returns
    for i in range(num_test_dates):
        X_i = X_train[:num_training_points - num_test_dates + (i-1)]
        Y_i = Y_train[:num_training_points - num_test_dates + (i-1)]
        print("X shape: ", X_i.shape, "Y shape: ", Y_i.shape)
        print("number of points in training data:", X_i.shape[0] )
        mtl = MultiTaskLasso(alpha=alpha, max_iter=max_iter, warm_start=True).fit(X_i, Y_i)
        mtl_list.append(mtl)
        
        print(f"using X from {dates[num_training_points - num_test_dates + (i-1) + window_size]}\
        to predict {dates[num_training_points - num_test_dates + (i-1) + 1 + window_size]}")
        
        predict_dates.append(dates[num_training_points - num_test_dates + (i-1) + window_size])
        
        X_i_plus_1 = X_train[num_training_points - num_test_dates + (i-1) + 1]
        
        Y_pred[i,:] = mtl.predict([X_i_plus_1])
        Y_act =  Y_train[num_training_points - num_test_dates + (i-1) + 1]
        delta_Y[i] = (Y_pred[i,:] - Y_act)
        mse_pred[i] = np.sqrt(np.sum((Y_pred[i,:] - Y_act)**2))/len(Y_act)
        print("mse", mse_pred[i])
    
    predictions = pd.DataFrame(Y_pred, index=predict_dates, columns=names)
    predictions.index = [pd.Timestamp(i).strftime('%Y-%m-%d') for i in predictions.index]
    
    return predictions

In [14]:
# run the model
eq_predictions = forecast_returns(equity_returns)
fi_predictions = forecast_returns(fi_returns)

X shape:  (24, 70) Y shape:  (24, 14)
number of points in training data: 24
using X from 2014-4-1        to predict 2014-5-1
mse 0.005187248493884684
X shape:  (25, 70) Y shape:  (25, 14)
number of points in training data: 25
using X from 2014-5-1        to predict 2014-6-1
mse 0.004776947369818568
X shape:  (26, 70) Y shape:  (26, 14)
number of points in training data: 26
using X from 2014-6-1        to predict 2014-7-1
mse 0.0118777887186723
X shape:  (27, 70) Y shape:  (27, 14)
number of points in training data: 27
using X from 2014-7-1        to predict 2014-8-1
mse 0.008980829661270187
X shape:  (28, 70) Y shape:  (28, 14)
number of points in training data: 28
using X from 2014-8-1        to predict 2014-9-1
mse 0.01185961070309144


In [15]:
# view predictions
eq_predictions.head()

Unnamed: 0,QQQ,VUG,VTV,IWF,IJR,IWM,IJH,VIG,IWD,VO,VGT,VB,XLK,XLF
2014-04-01,0.006806,0.007609,0.009984,0.00739,0.009871,0.009004,0.008548,0.007145,0.010242,0.008589,0.003818,0.010137,0.005048,0.008764
2014-05-01,0.013471,0.014519,0.017437,0.014345,0.017911,0.017049,0.017197,0.01491,0.017597,0.017284,0.010795,0.018168,0.010093,0.019068
2014-06-01,0.015082,0.015284,0.016809,0.014922,0.016982,0.015944,0.016231,0.014615,0.017228,0.016667,0.01243,0.017171,0.012186,0.017483
2014-07-01,0.011082,0.011083,0.012017,0.010521,0.01205,0.011673,0.011589,0.009522,0.012684,0.011689,0.007936,0.012953,0.00845,0.01129
2014-08-01,0.013345,0.012502,0.014117,0.012036,0.012942,0.011758,0.012786,0.011086,0.014274,0.0135,0.010568,0.013404,0.010882,0.014329


In [16]:
# view returns
equity_returns.head()

Unnamed: 0,QQQ,VUG,VTV,IWF,IJR,IWM,IJH,VIG,IWD,VO,VGT,VB,XLK,XLF
2011-11-01,-0.02692,-0.002082,-0.00559,-0.00103,0.006243,-0.003783,-0.002368,0.018713,-0.006976,-0.006113,-0.019928,-0.004382,-0.014994,-0.051111
2011-12-01,-0.009931,-0.008668,0.017445,-0.007045,0.009012,0.000271,-0.009385,0.003857,0.013572,-0.01681,-0.017451,-0.010931,-0.006635,0.014832
2012-01-01,0.088333,0.066746,0.042726,0.064839,0.069688,0.076685,0.070519,0.032287,0.045638,0.081027,0.087697,0.08385,0.066729,0.088466
2012-02-01,0.064101,0.047836,0.038851,0.047161,0.021703,0.02569,0.045289,0.027644,0.036413,0.043263,0.069756,0.032747,0.071032,0.049787
2012-03-01,0.04875,0.031841,0.022333,0.029765,0.025948,0.021715,0.016286,0.01614,0.025765,0.018986,0.047001,0.023002,0.041796,0.070461


In [17]:
# calculate the average equity prediction error
average_equity_return_error = eq_predictions.subtract(equity_returns).mean(axis=1).dropna()
equity_avg_error_plot_df = pd.DataFrame({'Avg Error': average_equity_return_error}, index=average_equity_return_error.index)
equity_avg_error_plot_df

Unnamed: 0,Avg Error
2014-04-01,0.013106
2014-05-01,-0.006043
2014-06-01,-0.012869
2014-07-01,0.031823
2014-08-01,-0.031489
...,...
2021-05-01,0.003358
2021-06-01,-0.004362
2021-07-01,-0.003876
2021-08-01,-0.016816


In [18]:
# calculate the average fixed income prediction error
average_fi_return_error = fi_predictions.subtract(fi_returns).mean(axis=1).dropna()
fi_avg_error_plot_df = pd.DataFrame({'Avg Error': average_fi_return_error}, index=average_fi_return_error.index)
fi_avg_error_plot_df

Unnamed: 0,Avg Error
2014-04-01,-0.006470
2014-05-01,-0.011109
2014-06-01,0.002353
2014-07-01,0.005313
2014-08-01,-0.010482
...,...
2021-05-01,-0.000485
2021-06-01,-0.005837
2021-07-01,-0.006941
2021-08-01,0.003000


Next, we check if the model introduces any systematic bias by calculating the average prediction error per month.  This means we take the average difference between the actual and estimated returns for each month.  If the error plotted in the below charts were all negative or all positive, then we would know our model has a systematic bias.  According  to the charts below, it appears we do not have a systematic bias which means we are clear to proceed.

In [19]:
# check if the model introduces any systematic bias for equity ETFs
def SetColor(y):
    if(y < 0):
        return "red"
    elif(y >= 0):
        return "green"

layout = ({'template': 'plotly_dark',
           'xaxis': {'title': {'text': 'Date'}},
           'yaxis': {'title': {'text': 'Avg Error %'}},
            'title': f'Average Equity Prediction Error'})

fig = go.Figure(layout=layout)

fig.add_trace(go.Bar(
    x=equity_avg_error_plot_df.index,
    y=equity_avg_error_plot_df.iloc[:,0],
    marker=dict(color = list(map(SetColor, equity_avg_error_plot_df.iloc[:,0])))
    ))

f4 = go.FigureWidget(fig)
f4

FigureWidget({
    'data': [{'marker': {'color': [green, red, red, green, red, green, red, red,
              …

In [20]:
# check if the model introduces any systematic bias for fixed income ETFs
def SetColor(y):
    if(y < 0):
        return "red"
    elif(y >= 0):
        return "green"

layout = ({'template': 'plotly_dark',
           'xaxis': {'title': {'text': 'Date'}},
           'yaxis': {'title': {'text': 'Avg Error %'}},
            'title': f'Average Fixed Income Prediction Error'})

fig = go.Figure(layout=layout)

fig.add_trace(go.Bar(
    x=fi_avg_error_plot_df.index,
    y=fi_avg_error_plot_df.iloc[:,0],
    marker=dict(color = list(map(SetColor, fi_avg_error_plot_df.iloc[:,0])))
    ))

f5 = go.FigureWidget(fig)
f5

FigureWidget({
    'data': [{'marker': {'color': [red, red, green, green, red, green, red, red,
              …

---
## Allocate Strategy Portfolio Based on Model Results

Create three strategies to measure model performance:
1. 60/40 Allocation Strategy
1. Equity Only Portfolio
1. Fixed Income Only Portfolio

Below we define `allocate_portfolio` and `get_historical_portfolio_holdings` functions to create these three strategies.  These functions help us identify the equity and fixed income ETFs with the largest expected returns as calculated by our model for each month, which we then place in our optimized portfolios to see if they beat the benchmarks.

In [21]:
def allocate_portfolio(expected_eq_returns, 
                       expected_fi_returns, 
                       actual_eq_returns,
                       actual_fi_returns,
                       for_period_ending,
                       total_equity_weight=0.6,
                       n_equity_funds=5,
                       n_bond_funds=5):
    
    """
    Allocate a portfolio by picking the top n_equity_funds & top n_bond_funds for the period
    ending on for_period_ending
    
    """
    
    fi_wgt = 1 - total_equity_weight
    eq_fund_wgt = total_equity_weight / n_equity_funds
    fi_fund_wgt = fi_wgt / n_bond_funds
    for_period_ending = pd.Timestamp(for_period_ending).strftime('%Y-%m-%d')
    
    eq_returns = pd.DataFrame(expected_eq_returns.loc[for_period_ending])
    eq_returns.columns = ['Expected Return']
    eq_returns['Type'] = ['Equity'] * len(eq_returns)
    eq_returns['Weight'] = [eq_fund_wgt] * len(eq_returns)
    eq_returns = eq_returns.sort_values(by='Expected Return', ascending=False).head(n_equity_funds)

    fi_returns = pd.DataFrame(expected_fi_returns.loc[for_period_ending])
    fi_returns.columns = ['Expected Return']
    fi_returns['Type'] = ['Fixed Income'] * len(fi_returns)
    fi_returns['Weight'] = [fi_fund_wgt] * len(fi_returns)
    fi_returns = fi_returns.sort_values(by='Expected Return', ascending=False).head(n_bond_funds)
    
    holdings_df = pd.concat([eq_returns, fi_returns], axis=0)
    holdings_df.index.name = 'Index'
    
    actual_returns = []
    for i in range(len(holdings_df)):
        index_type = holdings_df['Type'].iloc[i]
        index_name = holdings_df.index[i]
        if index_type == 'Equity':
            actual_returns.append(actual_eq_returns[index_name].loc[for_period_ending])
        elif index_type == 'Fixed Income':
            actual_returns.append(actual_fi_returns[index_name].loc[for_period_ending])
    holdings_df['Actual Return'] = actual_returns
    
    holdings_df.index = pd.MultiIndex.from_tuples([(for_period_ending, i) for i in holdings_df.index], names=['For Period Ending', 'Fund Ticker'])
    holdings_df = holdings_df[['Type', 'Weight', 'Expected Return', 'Actual Return']]
    
    return holdings_df

In [22]:
def get_historical_portfolio_holdings(expected_eq_returns, 
                                      expected_fi_returns, 
                                      actual_eq_returns, 
                                      actual_fi_returns, 
                                      total_equity_weight):
    """
    Loop over the time frame given in expected_fi_returns 
    and run allocate_portfolio at each date
    
    """

    holdings = []
    for date in expected_fi_returns.index:
        holdings_at_date = allocate_portfolio(expected_eq_returns=expected_eq_returns, 
                                              expected_fi_returns=expected_fi_returns, 
                                              actual_eq_returns=actual_eq_returns,
                                              actual_fi_returns=actual_fi_returns,
                                              for_period_ending=date, 
                                              total_equity_weight=total_equity_weight)
        holdings.append(holdings_at_date)
    return pd.concat(holdings)

Run the functions to create our historical ETF holdings.

In [23]:
params = {'expected_eq_returns': eq_predictions,
          'expected_fi_returns': fi_predictions,
          'actual_eq_returns': equity_returns,
          'actual_fi_returns': fi_returns}

portfolio_holdings = get_historical_portfolio_holdings(**params, total_equity_weight=0.6)
bond_only_holdings = get_historical_portfolio_holdings(**params, total_equity_weight=0)
equity_only_holdings = get_historical_portfolio_holdings(**params, total_equity_weight=1)

portfolio_holdings.tail(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,Type,Weight,Expected Return,Actual Return
For Period Ending,Fund Ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-08-01,VGT,Equity,0.12,0.016345,0.035103
2021-08-01,QQQ,Equity,0.12,0.01623,0.042187
2021-08-01,XLK,Equity,0.12,0.015991,0.035593
2021-08-01,IWF,Equity,0.12,0.013964,0.036485
2021-08-01,VUG,Equity,0.12,0.013927,0.03673
2021-08-01,TLT,Fixed Income,0.08,0.005219,-0.003317
2021-08-01,HYG,Fixed Income,0.08,0.004405,0.006183
2021-08-01,LQD,Fixed Income,0.08,0.004375,-0.003343
2021-08-01,EMB,Fixed Income,0.08,0.004142,0.00904
2021-08-01,VCIT,Fixed Income,0.08,0.004097,-0.003526


---
## Calculate Benchmark & Strategy Portfolio Returns

Before we begin, let's review the assumptions we apply to this example:
* Assume 0% slippage and trading fees
* Risk measures not considered

Below we define `get_excess_return`, `get_excess_return_string`, and `get_portfolio_returns` functions that will help us calculate and compare the returns of the optimized portfolios to those of the benchmarks.

In [24]:
def get_excess_return(strategy, benchmark):
    investment_horizon_years = (datetime.strptime(periodic_returns.index[-1], '%Y-%m-%d') - datetime.strptime(periodic_returns.index[0], '%Y-%m-%d')).days / 365
    annualized_excess_return = (cumulative_returns[strategy][-1] / cumulative_returns[benchmark][-1]) ** (1/investment_horizon_years) - 1
    return annualized_excess_return

In [25]:
def get_excess_return_string(strategy, benchmark):
    start_date = periodic_returns.index[0]
    end_date = periodic_returns.index[-1]
    r = get_excess_return(strategy=strategy, benchmark=benchmark)
    qualifier = 'UNDERPERFORMED'
    if r > 0:
        qualifier = 'OUTPERFORMED'
    return f'{strategy} {qualifier} {benchmark} by an annualized rate of {r:.2%} per year for the period between {start_date} and {end_date}.'

In [26]:
def get_portfolio_returns(portfolio_holdings_df, port_name='Optimized Portfolio'):
    weighted_returns = portfolio_holdings_df['Actual Return'] * portfolio_holdings_df['Weight']
    returns_df = pd.DataFrame(weighted_returns.groupby(level=[0]).sum())
    returns_df.columns = [port_name]
    return returns_df

Let's take a look at a dataframe that includes the returns of all the optimized portfolios and benchmarks.

In [27]:
new_60_40_returns = get_portfolio_returns(portfolio_holdings, 'Optimized 60/40')
bond_strategy_rtns = get_portfolio_returns(bond_only_holdings, 'Optimized Bond Strategy')
equity_strategy_rtns = get_portfolio_returns(equity_only_holdings, 'Optimized Equity Strategy')

all_returns = pd.concat([periodic_returns, new_60_40_returns, bond_strategy_rtns, equity_strategy_rtns], axis=1).dropna()
all_returns.head()

Unnamed: 0,SPY,AGG,60/40 Portfolio,Optimized 60/40,Optimized Bond Strategy,Optimized Equity Strategy
2014-04-01,0.011395,0.008164,0.010103,-0.003208,0.008653,-0.011116
2014-05-01,0.023206,0.011868,0.018671,0.015349,0.021233,0.011427
2014-06-01,0.015778,-0.00064,0.009211,0.018875,0.001132,0.030704
2014-07-01,-0.00871,-0.002413,-0.006191,-0.020813,-0.005099,-0.031289
2014-08-01,0.039463,0.011451,0.028259,0.035377,0.023587,0.043237


### Fixed Income Only Strategy

Let's see if the Optimized Fixed Income Only Strategy beats its benchmark.

In [28]:
# calculate the returns
periodic_returns = all_returns[['AGG', 'Optimized Bond Strategy']]
log_cumulative_rtns = (periodic_returns+1).cumprod() * 100

cumulative_returns = log_cumulative_rtns
cumulative_returns

Unnamed: 0,AGG,Optimized Bond Strategy
2014-04-01,100.816357,100.865330
2014-05-01,102.012838,103.007026
2014-06-01,101.947577,103.123643
2014-07-01,101.701587,102.597816
2014-08-01,102.866189,105.017747
...,...,...
2021-05-01,126.653299,141.879606
2021-06-01,127.708688,144.695957
2021-07-01,129.148261,146.746748
2021-08-01,128.878999,146.894588


In [29]:
# let's plot the returns on a logarithmic scale
chart_title = 'Optimized FI Returns vs Bond Index' 
emphasize = 'Optimized Bond Strategy'

data = make_all_line_charts(emphasize)

layout = ({'template': 'plotly_dark',
           'xaxis': {'title': {'text': 'Date'}},
           'yaxis': {'title': {'text': 'Cumulative Total Return'},
                     'type': 'log', 'tickformat': '$.3s'},
            'title': f'{chart_title} - Logarithmic Scale'})

figure = go.Figure(data=data, layout=layout)

f6 = go.FigureWidget(figure)
f6

FigureWidget({
    'data': [{'line': {'width': 1},
              'mode': 'lines',
              'name': '100% …

In [30]:
print(get_excess_return_string(strategy='Optimized Bond Strategy', benchmark='AGG'))

Optimized Bond Strategy OUTPERFORMED AGG by an annualized rate of 1.82% per year for the period between 2014-04-01 and 2021-09-01.


### Equity Only Strategy

Let's see if the Optimized Equity Only Strategy beats its benchmark.

In [31]:
# calculate the returns
periodic_returns=all_returns[['SPY', 'Optimized Equity Strategy']]
log_cumulative_rtns = (periodic_returns+1).cumprod() * 100

cumulative_returns = log_cumulative_rtns
cumulative_returns

Unnamed: 0,SPY,Optimized Equity Strategy
2014-04-01,101.139544,98.888387
2014-05-01,103.486611,100.018346
2014-06-01,105.119398,103.089350
2014-07-01,104.203772,99.863770
2014-08-01,108.316015,104.181605
...,...,...
2021-05-01,258.264495,302.465158
2021-06-01,263.195640,321.711088
2021-07-01,270.502993,332.910582
2021-08-01,278.553119,345.301462


In [32]:
# let's plot the returns on a logarithmic scale
chart_title = 'Optimized Equity Returns vs Equity Index' 
emphasize = 'Optimized Equity Strategy'

data = make_all_line_charts(emphasize)

layout = ({'template': 'plotly_dark',
           'xaxis': {'title': {'text': 'Date'}},
           'yaxis': {'title': {'text': 'Cumulative Total Return'},
                     'type': 'log', 'tickformat': '$.3s'},
            'title': f'{chart_title} - Logarithmic Scale'})

figure = go.Figure(data=data, layout=layout)

f7 = go.FigureWidget(figure)
f7

FigureWidget({
    'data': [{'line': {'width': 1},
              'mode': 'lines',
              'name': '100% …

In [33]:
print(get_excess_return_string(strategy='Optimized Equity Strategy', benchmark='SPY'))

Optimized Equity Strategy OUTPERFORMED SPY by an annualized rate of 2.97% per year for the period between 2014-04-01 and 2021-09-01.


### 60/40 Allocation Strategy

Let's see if the Optimized 60/40 Allocation Strategy beats its benchmark.

In [34]:
# calculate the returns
periodic_returns = all_returns[['60/40 Portfolio', 'Optimized 60/40']]
log_cumulative_rtns = (periodic_returns+1).cumprod() * 100

cumulative_returns = log_cumulative_rtns
cumulative_returns

Unnamed: 0,60/40 Portfolio,Optimized 60/40
2014-04-01,101.010270,99.679164
2014-05-01,102.896223,101.209165
2014-06-01,103.843975,103.119538
2014-07-01,103.201039,100.973301
2014-08-01,106.117351,104.545432
...,...,...
2021-05-01,197.956225,227.764032
2021-06-01,200.883838,238.268112
2021-07-01,205.136007,244.595704
2021-08-01,208.627821,250.156560


In [35]:
# let's plot the returns on a logarithmic scale
chart_title = 'Optimized 60/40 Returns vs 60/40 Portfolio Index' 
emphasize = 'Optimized 60/40'

data = make_all_line_charts(emphasize)

layout = ({'template': 'plotly_dark',
           'xaxis': {'title': {'text': 'Date'}},
           'yaxis': {'title': {'text': 'Cumulative Total Return'},
                     'type': 'log', 'tickformat': '$.3s'},
            'title': f'{chart_title} - Logarithmic Scale'})

figure = go.Figure(data=data, layout=layout)

f8 = go.FigureWidget(figure)
f8

FigureWidget({
    'data': [{'line': {'width': 1},
              'mode': 'lines',
              'name': '60/40…

In [36]:
print(get_excess_return_string(strategy='Optimized 60/40', benchmark='60/40 Portfolio'))

Optimized 60/40 OUTPERFORMED 60/40 Portfolio by an annualized rate of 2.52% per year for the period between 2014-04-01 and 2021-09-01.


As a framework, we see that the model's optimized portfolios outperform their benchmarks.  We can make this model more complex by adding more data if we wish, but as a start it appears the model is able to make predictions that result in allocations outperforming their benchmarks.

---
## Additional Resources
[Machine Learning for Investing](https://hdonnelly6.medium.com/list/machine-learning-for-investing-7f2690bb1826)

*This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by AlphaWave Data, Inc. ("AlphaWave Data"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company.  In preparing the information contained herein, AlphaWave Data, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to AlphaWave Data, Inc. at the time of publication. AlphaWave Data makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.*